Eric Lee / smarc-fsl-linux-kernel

06 Feb, 2020

1 commit

3ea87219a cgroup: Prevent double killing of css when enabling threaded cgroup ... Browse Code »

commit 3bc0bb36fa30e95ca829e9cf480e1ef7f7638333 upstream.

The test_cgcore_no_internal_process_constraint_on_threads selftest when
running with subsystem controlling noise triggers two warnings:

> [ 597.443115] WARNING: CPU: 1 PID: 28167 at kernel/cgroup/cgroup.c:3131 cgroup_apply_control_enable+0xe0/0x3f0
> [ 597.443413] WARNING: CPU: 1 PID: 28167 at kernel/cgroup/cgroup.c:3177 cgroup_apply_control_disable+0xa6/0x160

Both stem from a call to cgroup_type_write. The first warning was also
triggered by syzkaller.

When we're switching cgroup to threaded mode shortly after a subsystem
was disabled on it, we can see the respective subsystem css dying there.

The warning in cgroup_apply_control_enable is harmless in this case
since we're not adding new subsys anyway.
The warning in cgroup_apply_control_disable indicates an attempt to kill
css of recently disabled subsystem repeatedly.

The commit prevents these situations by making cgroup_type_write wait
for all dying csses to go away before re-applying subtree controls.
When at it, the locations of WARN_ON_ONCE calls are moved so that
warning is triggered only when we are about to misuse the dying css.

Reported-by: syzbot+5493b2a54d31d6aea629@syzkaller.appspotmail.com
Reported-by: Christian Brauner
Signed-off-by: Michal Koutný
Signed-off-by: Tejun Heo
Signed-off-by: Greg Kroah-Hartman

Michal Koutný
2020-02-06 05:22:42 +0800

31 Dec, 2019

1 commit

20832ebf9 cgroup: freezer: don't change task and cgroups status unnecessarily ... Browse Code »

[ Upstream commit 742e8cd3e1ba6f19cad6d912f8d469df5557d0fd ]

It's not necessary to adjust the task state and revisit the state
of source and destination cgroups if the cgroups are not in freeze
state and the task itself is not frozen.

And in this scenario, it wakes up the task who's not supposed to be
ready to run.

Don't do the unnecessary task state adjustment can help stop waking
up the task without a reason.

Signed-off-by: Honglei Wang
Acked-by: Roman Gushchin
Signed-off-by: Tejun Heo
Signed-off-by: Sasha Levin

Honglei Wang
2019-12-31 23:45:06 +0800

18 Dec, 2019

1 commit

2539f282e cgroup: pids: use atomic64_t for pids->limit ... Browse Code »

commit a713af394cf382a30dd28a1015cbe572f1b9ca75 upstream.

Because pids->limit can be changed concurrently (but we don't want to
take a lock because it would be needlessly expensive), use atomic64_ts
instead.

Fixes: commit 49b786ea146f ("cgroup: implement the PIDs subsystem")
Cc: stable@vger.kernel.org # v4.3+
Signed-off-by: Aleksa Sarai
Signed-off-by: Tejun Heo
Signed-off-by: Greg Kroah-Hartman

Aleksa Sarai
2019-12-18 02:56:15 +0800

16 Nov, 2019

1 commit

b4c0800e4 Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull misc vfs fixes from Al Viro:
"Assorted fixes all over the place; some of that is -stable fodder,
some regressions from the last window"

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
ecryptfs_lookup_interpose(): lower_dentry->d_parent is not stable either
ecryptfs_lookup_interpose(): lower_dentry->d_inode is not stable
ecryptfs: fix unlink and rmdir in face of underlying fs modifications
audit_get_nd(): don't unlock parent too early
exportfs_decode_fh(): negative pinned may become positive without the parent locked
cgroup: don't put ERR_PTR() into fc->root
autofs: fix a leak in autofs_expire_indirect()
aio: Fix io_pgetevents() struct __compat_aio_sigset layout
fs/namespace.c: fix use-after-free of mount in mnt_warn_timestamp_expiry()

Linus Torvalds
2019-11-16 00:44:08 +0800

11 Nov, 2019

1 commit

630faf81b cgroup: don't put ERR_PTR() into fc->root ... Browse Code »

the caller of ->get_tree() expects NULL left there on error...

Reported-by: Thibaut Sautereau
Signed-off-by: Al Viro

Al Viro
2019-11-11 00:53:27 +0800

29 Oct, 2019

1 commit

cd1cb3350 sched/topology: Don't try to build empty sched domains ... Browse Code »

Turns out hotplugging CPUs that are in exclusive cpusets can lead to the
cpuset code feeding empty cpumasks to the sched domain rebuild machinery.

This leads to the following splat:

Internal error: Oops: 96000004 [#1] PREEMPT SMP
Modules linked in:
CPU: 0 PID: 235 Comm: kworker/5:2 Not tainted 5.4.0-rc1-00005-g8d495477d62e #23
Hardware name: ARM Juno development board (r0) (DT)
Workqueue: events cpuset_hotplug_workfn
pstate: 60000005 (nZCv daif -PAN -UAO)
pc : build_sched_domains (./include/linux/arch_topology.h:23 kernel/sched/topology.c:1898 kernel/sched/topology.c:1969)
lr : build_sched_domains (kernel/sched/topology.c:1966)
Call trace:
build_sched_domains (./include/linux/arch_topology.h:23 kernel/sched/topology.c:1898 kernel/sched/topology.c:1969)
partition_sched_domains_locked (kernel/sched/topology.c:2250)
rebuild_sched_domains_locked (./include/linux/bitmap.h:370 ./include/linux/cpumask.h:538 kernel/cgroup/cpuset.c:955 kernel/cgroup/cpuset.c:978 kernel/cgroup/cpuset.c:1019)
rebuild_sched_domains (kernel/cgroup/cpuset.c:1032)
cpuset_hotplug_workfn (kernel/cgroup/cpuset.c:3205 (discriminator 2))
process_one_work (./arch/arm64/include/asm/jump_label.h:21 ./include/linux/jump_label.h:200 ./include/trace/events/workqueue.h:114 kernel/workqueue.c:2274)
worker_thread (./include/linux/compiler.h:199 ./include/linux/list.h:268 kernel/workqueue.c:2416)
kthread (kernel/kthread.c:255)
ret_from_fork (arch/arm64/kernel/entry.S:1167)
Code: f860dae2 912802d6 aa1603e1 12800000 (f8616853)

The faulty line in question is:

cap = arch_scale_cpu_capacity(cpumask_first(cpu_map));

and we're not checking the return value against nr_cpu_ids (we shouldn't
have to!), which leads to the above.

Prevent generate_sched_domains() from returning empty cpumasks, and add
some assertion in build_sched_domains() to scream bloody murder if it
happens again.

The above splat was obtained on my Juno r0 with the following reproducer:

$ cgcreate -g cpuset:asym
$ cgset -r cpuset.cpus=0-3 asym
$ cgset -r cpuset.mems=0 asym
$ cgset -r cpuset.cpu_exclusive=1 asym

$ cgcreate -g cpuset:smp
$ cgset -r cpuset.cpus=4-5 smp
$ cgset -r cpuset.mems=0 smp
$ cgset -r cpuset.cpu_exclusive=1 smp

$ cgset -r cpuset.sched_load_balance=0 .

$ echo 0 > /sys/devices/system/cpu/cpu4/online
$ echo 0 > /sys/devices/system/cpu/cpu5/online

Signed-off-by: Valentin Schneider
Signed-off-by: Peter Zijlstra (Intel)
Cc: Dietmar.Eggemann@arm.com
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: hannes@cmpxchg.org
Cc: lizefan@huawei.com
Cc: morten.rasmussen@arm.com
Cc: qperret@google.com
Cc: tj@kernel.org
Cc: vincent.guittot@linaro.org
Fixes: 05484e098448 ("sched/topology: Add SD_ASYM_CPUCAPACITY flag detection")
Link: https://lkml.kernel.org/r/20191023153745.19515-2-valentin.schneider@arm.com
Signed-off-by: Ingo Molnar

Valentin Schneider
2019-10-29 16:58:45 +0800

18 Sep, 2019

1 commit

3ee8d6c59 Merge branch 'for-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup ... Browse Code »

Pull cgroup updates from Tejun Heo:
"Three minor cleanup patches"

* 'for-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
Use kvmalloc in cgroups-v1
cgroup: minor tweak for logic to get cgroup css
cgroup: Replace a seq_printf() call by seq_puts() in cgroup_print_ss_mask()

Linus Torvalds
2019-09-18 06:57:22 +0800

17 Sep, 2019

1 commit

7e67a8599 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull scheduler updates from Ingo Molnar:

- MAINTAINERS: Add Mark Rutland as perf submaintainer, Juri Lelli and
Vincent Guittot as scheduler submaintainers. Add Dietmar Eggemann,
Steven Rostedt, Ben Segall and Mel Gorman as scheduler reviewers.

As perf and the scheduler is getting bigger and more complex,
document the status quo of current responsibilities and interests,
and spread the review pain^H^H^H^H fun via an increase in the Cc:
linecount generated by scripts/get_maintainer.pl. :-)

- Add another series of patches that brings the -rt (PREEMPT_RT) tree
closer to mainline: split the monolithic CONFIG_PREEMPT dependencies
into a new CONFIG_PREEMPTION category that will allow the eventual
introduction of CONFIG_PREEMPT_RT. Still a few more hundred patches
to go though.

- Extend the CPU cgroup controller with uclamp.min and uclamp.max to
allow the finer shaping of CPU bandwidth usage.

- Micro-optimize energy-aware wake-ups from O(CPUS^2) to O(CPUS).

- Improve the behavior of high CPU count, high thread count
applications running under cpu.cfs_quota_us constraints.

- Improve balancing with SCHED_IDLE (SCHED_BATCH) tasks present.

- Improve CPU isolation housekeeping CPU allocation NUMA locality.

- Fix deadline scheduler bandwidth calculations and logic when cpusets
rebuilds the topology, or when it gets deadline-throttled while it's
being offlined.

- Convert the cpuset_mutex to percpu_rwsem, to allow it to be used from
setscheduler() system calls without creating global serialization.
Add new synchronization between cpuset topology-changing events and
the deadline acceptance tests in setscheduler(), which were broken
before.

- Rework the active_mm state machine to be less confusing and more
optimal.

- Rework (simplify) the pick_next_task() slowpath.

- Improve load-balancing on AMD EPYC systems.

- ... and misc cleanups, smaller fixes and improvements - please see
the Git log for more details.

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits)
sched/psi: Correct overly pessimistic size calculation
sched/fair: Speed-up energy-aware wake-ups
sched/uclamp: Always use 'enum uclamp_id' for clamp_id values
sched/uclamp: Update CPU's refcount on TG's clamp changes
sched/uclamp: Use TG's clamps to restrict TASK's clamps
sched/uclamp: Propagate system defaults to the root group
sched/uclamp: Propagate parent clamps
sched/uclamp: Extend CPU's cgroup controller
sched/topology: Improve load balancing on AMD EPYC systems
arch, ia64: Make NUMA select SMP
sched, perf: MAINTAINERS update, add submaintainers and reviewers
sched/fair: Use rq_lock/unlock in online_fair_sched_group
cpufreq: schedutil: fix equation in comment
sched: Rework pick_next_task() slow-path
sched: Allow put_prev_task() to drop rq->lock
sched/fair: Expose newidle_balance()
sched: Add task_struct pointer to sched_class::set_curr_task
sched: Rework CPU hotplug task selection
sched/{rt,deadline}: Fix set_next_task vs pick_next_task
sched: Fix kerneldoc comment for ia64_set_curr_task
...

Linus Torvalds
2019-09-17 08:25:49 +0800

13 Sep, 2019

1 commit

97a613698 cgroup: freezer: fix frozen state inheritance ... Browse Code »

If a new child cgroup is created in the frozen cgroup hierarchy
(one or more of ancestor cgroups is frozen), the CGRP_FREEZE cgroup
flag should be set. Otherwise if a process will be attached to the
child cgroup, it won't become frozen.

The problem can be reproduced with the test_cgfreezer_mkdir test.

This is the output before this patch:
~/test_freezer
ok 1 test_cgfreezer_simple
ok 2 test_cgfreezer_tree
ok 3 test_cgfreezer_forkbomb
Cgroup /sys/fs/cgroup/cg_test_mkdir_A/cg_test_mkdir_B isn't frozen
not ok 4 test_cgfreezer_mkdir
ok 5 test_cgfreezer_rmdir
ok 6 test_cgfreezer_migrate
ok 7 test_cgfreezer_ptrace
ok 8 test_cgfreezer_stopped
ok 9 test_cgfreezer_ptraced
ok 10 test_cgfreezer_vfork

And with this patch:
~/test_freezer
ok 1 test_cgfreezer_simple
ok 2 test_cgfreezer_tree
ok 3 test_cgfreezer_forkbomb
ok 4 test_cgfreezer_mkdir
ok 5 test_cgfreezer_rmdir
ok 6 test_cgfreezer_migrate
ok 7 test_cgfreezer_ptrace
ok 8 test_cgfreezer_stopped
ok 9 test_cgfreezer_ptraced
ok 10 test_cgfreezer_vfork

Reported-by: Mark Crossen
Signed-off-by: Roman Gushchin
Fixes: 76f969e8948d ("cgroup: cgroup v2 freezer")
Cc: Tejun Heo
Cc: stable@vger.kernel.org # v5.2+
Signed-off-by: Tejun Heo

Roman Gushchin
2019-09-13 05:04:45 +0800

08 Aug, 2019

1 commit

653a23ca7 Use kvmalloc in cgroups-v1 ... Browse Code »

Instead of using its own logic for k-/vmalloc rely on
kvmalloc which is actually doing quite the same.

Signed-off-by: Marc Koderer
Signed-off-by: Tejun Heo

Marc Koderer
2019-08-08 02:37:58 +0800

25 Jul, 2019

4 commits

710da3c8e sched/core: Prevent race condition between cpuset and __sched_setscheduler() ... Browse Code »

No synchronisation mechanism exists between the cpuset subsystem and
calls to function __sched_setscheduler(). As such, it is possible that
new root domains are created on the cpuset side while a deadline
acceptance test is carried out in __sched_setscheduler(), leading to a
potential oversell of CPU bandwidth.

Grab cpuset_rwsem read lock from core scheduler, so to prevent
situations such as the one described above from happening.

The only exception is normalize_rt_tasks() which needs to work under
tasklist_lock and can't therefore grab cpuset_rwsem. We are fine with
this, as this function is only called by sysrq and, if that gets
triggered, DEADLINE guarantees are already gone out of the window
anyway.

Tested-by: Dietmar Eggemann
Signed-off-by: Juri Lelli
Signed-off-by: Peter Zijlstra (Intel)
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: bristot@redhat.com
Cc: claudio@evidence.eu.com
Cc: lizefan@huawei.com
Cc: longman@redhat.com
Cc: luca.abeni@santannapisa.it
Cc: mathieu.poirier@linaro.org
Cc: rostedt@goodmis.org
Cc: tj@kernel.org
Cc: tommaso.cucinotta@santannapisa.it
Link: https://lkml.kernel.org/r/20190719140000.31694-9-juri.lelli@redhat.com
Signed-off-by: Ingo Molnar

Juri Lelli
2019-07-25 21:55:04 +0800
d74b27d63 cgroup/cpuset: Change cpuset_rwsem and hotplug lock order ... Browse Code »

cpuset_rwsem is going to be acquired from sched_setscheduler() with a
following patch. There are however paths (e.g., spawn_ksoftirqd) in
which sched_scheduler() is eventually called while holding hotplug lock;
this creates a dependecy between hotplug lock (to be always acquired
first) and cpuset_rwsem (to be always acquired after hotplug lock).

Fix paths which currently take the two locks in the wrong order (after
a following patch is applied).

Tested-by: Dietmar Eggemann
Signed-off-by: Juri Lelli
Signed-off-by: Peter Zijlstra (Intel)
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: bristot@redhat.com
Cc: claudio@evidence.eu.com
Cc: lizefan@huawei.com
Cc: longman@redhat.com
Cc: luca.abeni@santannapisa.it
Cc: mathieu.poirier@linaro.org
Cc: rostedt@goodmis.org
Cc: tj@kernel.org
Cc: tommaso.cucinotta@santannapisa.it
Link: https://lkml.kernel.org/r/20190719140000.31694-7-juri.lelli@redhat.com
Signed-off-by: Ingo Molnar

Juri Lelli
2019-07-25 21:55:03 +0800
1243dc518 cgroup/cpuset: Convert cpuset_mutex to percpu_rwsem ... Browse Code »

Holding cpuset_mutex means that cpusets are stable (only the holder can
make changes) and this is required for fixing a synchronization issue
between cpusets and scheduler core. However, grabbing cpuset_mutex from
setscheduler() hotpath (as implemented in a later patch) is a no-go, as
it would create a bottleneck for tasks concurrently calling
setscheduler().

Convert cpuset_mutex to be a percpu_rwsem (cpuset_rwsem), so that
setscheduler() will then be able to read lock it and avoid concurrency
issues.

Tested-by: Dietmar Eggemann
Signed-off-by: Juri Lelli
Signed-off-by: Peter Zijlstra (Intel)
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: bristot@redhat.com
Cc: claudio@evidence.eu.com
Cc: lizefan@huawei.com
Cc: longman@redhat.com
Cc: luca.abeni@santannapisa.it
Cc: mathieu.poirier@linaro.org
Cc: rostedt@goodmis.org
Cc: tj@kernel.org
Cc: tommaso.cucinotta@santannapisa.it
Link: https://lkml.kernel.org/r/20190719140000.31694-6-juri.lelli@redhat.com
Signed-off-by: Ingo Molnar

Juri Lelli
2019-07-25 21:55:02 +0800
f9a25f776 cpusets: Rebuild root domain deadline accounting information ... Browse Code »

When the topology of root domains is modified by CPUset or CPUhotplug
operations information about the current deadline bandwidth held in the
root domain is lost.

This patch addresses the issue by recalculating the lost deadline
bandwidth information by circling through the deadline tasks held in
CPUsets and adding their current load to the root domain they are
associated with.

Tested-by: Dietmar Eggemann
Signed-off-by: Mathieu Poirier
Signed-off-by: Juri Lelli
[ Various additional modifications. ]
Signed-off-by: Peter Zijlstra (Intel)
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: bristot@redhat.com
Cc: claudio@evidence.eu.com
Cc: lizefan@huawei.com
Cc: longman@redhat.com
Cc: luca.abeni@santannapisa.it
Cc: rostedt@goodmis.org
Cc: tj@kernel.org
Cc: tommaso.cucinotta@santannapisa.it
Link: https://lkml.kernel.org/r/20190719140000.31694-4-juri.lelli@redhat.com
Signed-off-by: Ingo Molnar

Mathieu Poirier
2019-07-25 21:55:01 +0800

24 Jul, 2019

2 commits

a581563f1 cgroup: minor tweak for logic to get cgroup css ... Browse Code »

We could only handle the case that css exists
and css_try_get_online() fails.

Signed-off-by: Peng Wang
Signed-off-by: Tejun Heo

Peng Wang
2019-07-24 06:46:33 +0800
85db00233 cgroup: Replace a seq_printf() call by seq_puts() in cgroup_print_ss_mask() ... Browse Code »

A string which did not contain a data format specification should be put
into a sequence. Thus use the corresponding function “seq_puts”.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring
Signed-off-by: Tejun Heo

Markus Elfring
2019-07-24 06:44:00 +0800

20 Jul, 2019

1 commit

933a90bf4 Merge branch 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs mount updates from Al Viro:
"The first part of mount updates.

Convert filesystems to use the new mount API"

* 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
mnt_init(): call shmem_init() unconditionally
constify ksys_mount() string arguments
don't bother with registering rootfs
init_rootfs(): don't bother with init_ramfs_fs()
vfs: Convert smackfs to use the new mount API
vfs: Convert selinuxfs to use the new mount API
vfs: Convert securityfs to use the new mount API
vfs: Convert apparmorfs to use the new mount API
vfs: Convert openpromfs to use the new mount API
vfs: Convert xenfs to use the new mount API
vfs: Convert gadgetfs to use the new mount API
vfs: Convert oprofilefs to use the new mount API
vfs: Convert ibmasmfs to use the new mount API
vfs: Convert qib_fs/ipathfs to use the new mount API
vfs: Convert efivarfs to use the new mount API
vfs: Convert configfs to use the new mount API
vfs: Convert binfmt_misc to use the new mount API
convenience helper: get_tree_single()
convenience helper get_tree_nodev()
vfs: Kill sget_userns()
...

Linus Torvalds
2019-07-20 01:42:02 +0800

15 Jul, 2019

1 commit

da82c92f1 docs: cgroup-v1: add it to the admin-guide book ... Browse Code »

Those files belong to the admin guide, so add them.

Signed-off-by: Mauro Carvalho Chehab

Mauro Carvalho Chehab
2019-07-15 22:03:02 +0800

12 Jul, 2019

1 commit

237f83dfb Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next ... Browse Code »

Pull networking updates from David Miller:
"Some highlights from this development cycle:

1) Big refactoring of ipv6 route and neigh handling to support
nexthop objects configurable as units from userspace. From David
Ahern.

2) Convert explored_states in BPF verifier into a hash table,
significantly decreased state held for programs with bpf2bpf
calls, from Alexei Starovoitov.

3) Implement bpf_send_signal() helper, from Yonghong Song.

4) Various classifier enhancements to mvpp2 driver, from Maxime
Chevallier.

5) Add aRFS support to hns3 driver, from Jian Shen.

6) Fix use after free in inet frags by allocating fqdirs dynamically
and reworking how rhashtable dismantle occurs, from Eric Dumazet.

7) Add act_ctinfo packet classifier action, from Kevin
Darbyshire-Bryant.

8) Add TFO key backup infrastructure, from Jason Baron.

9) Remove several old and unused ISDN drivers, from Arnd Bergmann.

10) Add devlink notifications for flash update status to mlxsw driver,
from Jiri Pirko.

11) Lots of kTLS offload infrastructure fixes, from Jakub Kicinski.

12) Add support for mv88e6250 DSA chips, from Rasmus Villemoes.

13) Various enhancements to ipv6 flow label handling, from Eric
Dumazet and Willem de Bruijn.

14) Support TLS offload in nfp driver, from Jakub Kicinski, Dirk van
der Merwe, and others.

15) Various improvements to axienet driver including converting it to
phylink, from Robert Hancock.

16) Add PTP support to sja1105 DSA driver, from Vladimir Oltean.

17) Add mqprio qdisc offload support to dpaa2-eth, from Ioana
Radulescu.

18) Add devlink health reporting to mlx5, from Moshe Shemesh.

19) Convert stmmac over to phylink, from Jose Abreu.

20) Add PTP PHC (Physical Hardware Clock) support to mlxsw, from
Shalom Toledo.

21) Add nftables SYNPROXY support, from Fernando Fernandez Mancera.

22) Convert tcp_fastopen over to use SipHash, from Ard Biesheuvel.

23) Track spill/fill of constants in BPF verifier, from Alexei
Starovoitov.

24) Support bounded loops in BPF, from Alexei Starovoitov.

25) Various page_pool API fixes and improvements, from Jesper Dangaard
Brouer.

26) Just like ipv4, support ref-countless ipv6 route handling. From
Wei Wang.

27) Support VLAN offloading in aquantia driver, from Igor Russkikh.

28) Add AF_XDP zero-copy support to mlx5, from Maxim Mikityanskiy.

29) Add flower GRE encap/decap support to nfp driver, from Pieter
Jansen van Vuuren.

30) Protect against stack overflow when using act_mirred, from John
Hurley.

31) Allow devmap map lookups from eBPF, from Toke Høiland-Jørgensen.

32) Use page_pool API in netsec driver, Ilias Apalodimas.

33) Add Google gve network driver, from Catherine Sullivan.

34) More indirect call avoidance, from Paolo Abeni.

35) Add kTLS TX HW offload support to mlx5, from Tariq Toukan.

36) Add XDP_REDIRECT support to bnxt_en, from Andy Gospodarek.

37) Add MPLS manipulation actions to TC, from John Hurley.

38) Add sending a packet to connection tracking from TC actions, and
then allow flower classifier matching on conntrack state. From
Paul Blakey.

39) Netfilter hw offload support, from Pablo Neira Ayuso"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2080 commits)
net/mlx5e: Return in default case statement in tx_post_resync_params
mlx5: Return -EINVAL when WARN_ON_ONCE triggers in mlx5e_tls_resync().
net: dsa: add support for BRIDGE_MROUTER attribute
pkt_sched: Include const.h
net: netsec: remove static declaration for netsec_set_tx_de()
net: netsec: remove superfluous if statement
netfilter: nf_tables: add hardware offload support
net: flow_offload: rename tc_cls_flower_offload to flow_cls_offload
net: flow_offload: add flow_block_cb_is_busy() and use it
net: sched: remove tcf block API
drivers: net: use flow block API
net: sched: use flow block API
net: flow_offload: add flow_block_cb_{priv, incref, decref}()
net: flow_offload: add list handling functions
net: flow_offload: add flow_block_cb_alloc() and flow_block_cb_free()
net: flow_offload: rename TCF_BLOCK_BINDER_TYPE_* to FLOW_BLOCK_BINDER_TYPE_*
net: flow_offload: rename TC_BLOCK_{UN}BIND to FLOW_BLOCK_{UN}BIND
net: flow_offload: add flow_block_cb_setup_simple()
net: hisilicon: Add an tx_desc to adapt HI13X1_GMAC
net: hisilicon: Add an rx_desc to adapt HI13X1_GMAC
...

Linus Torvalds
2019-07-12 01:55:49 +0800

10 Jul, 2019

1 commit

3b99107f0 Merge tag 'for-5.3/block-20190708' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block updates from Jens Axboe:
"This is the main block updates for 5.3. Nothing earth shattering or
major in here, just fixes, additions, and improvements all over the
map. This contains:

- Series of documentation fixes (Bart)

- Optimization of the blk-mq ctx get/put (Bart)

- null_blk removal race condition fix (Bob)

- req/bio_op() cleanups (Chaitanya)

- Series cleaning up the segment accounting, and request/bio mapping
(Christoph)

- Series cleaning up the page getting/putting for bios (Christoph)

- block cgroup cleanups and moving it to where it is used (Christoph)

- block cgroup fixes (Tejun)

- Series of fixes and improvements to bcache, most notably a write
deadlock fix (Coly)

- blk-iolatency STS_AGAIN and accounting fixes (Dennis)

- Series of improvements and fixes to BFQ (Douglas, Paolo)

- debugfs_create() return value check removal for drbd (Greg)

- Use struct_size(), where appropriate (Gustavo)

- Two lighnvm fixes (Heiner, Geert)

- MD fixes, including a read balance and corruption fix (Guoqing,
Marcos, Xiao, Yufen)

- block opal shadow mbr additions (Jonas, Revanth)

- sbitmap compare-and-exhange improvemnts (Pavel)

- Fix for potential bio->bi_size overflow (Ming)

- NVMe pull requests:
- improved PCIe suspent support (Keith Busch)
- error injection support for the admin queue (Akinobu Mita)
- Fibre Channel discovery improvements (James Smart)
- tracing improvements including nvmetc tracing support (Minwoo Im)
- misc fixes and cleanups (Anton Eidelman, Minwoo Im, Chaitanya
Kulkarni)"

- Various little fixes and improvements to drivers and core"

* tag 'for-5.3/block-20190708' of git://git.kernel.dk/linux-block: (153 commits)
blk-iolatency: fix STS_AGAIN handling
block: nr_phys_segments needs to be zero for REQ_OP_WRITE_ZEROES
blk-mq: simplify blk_mq_make_request()
blk-mq: remove blk_mq_put_ctx()
sbitmap: Replace cmpxchg with xchg
block: fix .bi_size overflow
block: sed-opal: check size of shadow mbr
block: sed-opal: ioctl for writing to shadow mbr
block: sed-opal: add ioctl for done-mark of shadow mbr
block: never take page references for ITER_BVEC
direct-io: use bio_release_pages in dio_bio_complete
block_dev: use bio_release_pages in bio_unmap_user
block_dev: use bio_release_pages in blkdev_bio_end_io
iomap: use bio_release_pages in iomap_dio_bio_end_io
block: use bio_release_pages in bio_map_user_iov
block: use bio_release_pages in bio_unmap_user
block: optionally mark pages dirty in bio_release_pages
block: move the BIO_NO_PAGE_REF check into bio_release_pages
block: skd_main.c: Remove call to memset after dma_alloc_coherent
block: mtip32xx: Remove call to memset after dma_alloc_coherent
...

Linus Torvalds
2019-07-10 01:45:06 +0800

09 Jul, 2019

2 commits

92c1d6522 Merge branch 'for-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup ... Browse Code »

Pull cgroup updates from Tejun Heo:
"Documentation updates and the addition of cgroup_parse_float() which
will be used by new controllers including blk-iocost"

* 'for-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
docs: cgroup-v1: convert docs to ReST and rename to *.rst
cgroup: Move cgroup_parse_float() implementation out of CONFIG_SYSFS
cgroup: add cgroup_parse_float()

Linus Torvalds
2019-07-09 12:35:12 +0800
dad1c12ed Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull scheduler updates from Ingo Molnar:

- Remove the unused per rq load array and all its infrastructure, by
Dietmar Eggemann.

- Add utilization clamping support by Patrick Bellasi. This is a
refinement of the energy aware scheduling framework with support for
boosting of interactive and capping of background workloads: to make
sure critical GUI threads get maximum frequency ASAP, and to make
sure background processing doesn't unnecessarily move to cpufreq
governor to higher frequencies and less energy efficient CPU modes.

- Add the bare minimum of tracepoints required for LISA EAS regression
testing, by Qais Yousef - which allows automated testing of various
power management features, including energy aware scheduling.

- Restructure the former tsk_nr_cpus_allowed() facility that the -rt
kernel used to modify the scheduler's CPU affinity logic such as
migrate_disable() - introduce the task->cpus_ptr value instead of
taking the address of &task->cpus_allowed directly - by Sebastian
Andrzej Siewior.

- Misc optimizations, fixes, cleanups and small enhancements - see the
Git log for details.

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits)
sched/uclamp: Add uclamp support to energy_compute()
sched/uclamp: Add uclamp_util_with()
sched/cpufreq, sched/uclamp: Add clamps for FAIR and RT tasks
sched/uclamp: Set default clamps for RT tasks
sched/uclamp: Reset uclamp values on RESET_ON_FORK
sched/uclamp: Extend sched_setattr() to support utilization clamping
sched/core: Allow sched_setattr() to use the current policy
sched/uclamp: Add system default clamps
sched/uclamp: Enforce last task's UCLAMP_MAX
sched/uclamp: Add bucket local max tracking
sched/uclamp: Add CPU's clamp buckets refcounting
sched/fair: Rename weighted_cpuload() to cpu_runnable_load()
sched/debug: Export the newly added tracepoints
sched/debug: Add sched_overutilized tracepoint
sched/debug: Add new tracepoint to track PELT at se level
sched/debug: Add new tracepoints to track PELT at rq level
sched/debug: Add a new sched_trace_*() helper functions
sched/autogroup: Make autogroup_path() always available
sched/wait: Deduplicate code with do-while
sched/topology: Remove unused 'sd' parameter from arch_scale_cpu_capacity()
...

Linus Torvalds
2019-07-09 07:39:53 +0800

01 Jul, 2019

1 commit

5be1f9d82 Merge tag 'v5.2-rc6' into for-5.3/block ... Browse Code »

Merge 5.2-rc6 into for-5.3/block, so we get the same page merge leak
fix. Otherwise we end up having conflicts with future patches between
for-5.3/block and master that touch this area. In particular, it makes
the bio_full() fix hard to backport to stable.

* tag 'v5.2-rc6': (482 commits)
Linux 5.2-rc6
Revert "iommu/vt-d: Fix lock inversion between iommu->lock and device_domain_lock"
Bluetooth: Fix regression with minimum encryption key size alignment
tcp: refine memory limit test in tcp_fragment()
x86/vdso: Prevent segfaults due to hoisted vclock reads
SUNRPC: Fix a credential refcount leak
Revert "SUNRPC: Declare RPC timers as TIMER_DEFERRABLE"
net :sunrpc :clnt :Fix xps refcount imbalance on the error path
NFS4: Only set creation opendata if O_CREAT
ARM: 8867/1: vdso: pass --be8 to linker if necessary
KVM: nVMX: reorganize initial steps of vmx_set_nested_state
KVM: PPC: Book3S HV: Invalidate ERAT when flushing guest TLB entries
habanalabs: use u64_to_user_ptr() for reading user pointers
nfsd: replace Jeff by Chuck as nfsd co-maintainer
inet: clear num_timeout reqsk_alloc()
PCI/P2PDMA: Ignore root complex whitelist when an IOMMU is present
net: mvpp2: debugfs: Add pmap to fs dump
ipv6: Default fib6_type to RTN_UNICAST when not set
net: hns3: Fix inconsistent indenting
net/af_iucv: always register net_device notifier
...

Jens Axboe
2019-07-01 22:16:08 +0800

29 Jun, 2019

1 commit

83086d654 Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmc… ... Browse Code »

…k/linux-rcu into core/rcu

Pull rcu/next + tools/memory-model changes from Paul E. McKenney:

- RCU flavor consolidation cleanups and optmizations
- Documentation updates
- Miscellaneous fixes
- SRCU updates
- RCU-sync flavor consolidation
- Torture-test updates
- Linux-kernel memory-consistency-model updates, most notably the addition of plain C-language accesses

Signed-off-by: Ingo Molnar <mingo@kernel.org>

Ingo Molnar
2019-06-29 01:46:47 +0800

25 Jun, 2019

1 commit

d2abae71e Merge tag 'v5.2-rc6' into sched/core, to refresh the branch ... Browse Code »

Signed-off-by: Ingo Molnar

Ingo Molnar
2019-06-25 01:19:53 +0800

22 Jun, 2019

1 commit

92ad6325c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Minor SPDX change conflict.

Signed-off-by: David S. Miller

David S. Miller
2019-06-22 20:59:24 +0800

21 Jun, 2019

1 commit

474a28003 cgroup: export css_next_descendant_pre for bfq ... Browse Code »

The bfq schedule now uses css_next_descendant_pre directly after
the stats functionality depending on it has been from the core
blk-cgroup code to bfq. Export the symbol so that bfq can still
be build modular.

Fixes: d6258980daf2 ("bfq-iosched: move bfq_stat_recursive_sum into the only caller")
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2019-06-21 16:48:34 +0800

19 Jun, 2019

1 commit

f85d20865 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 451 ... Browse Code »

Based on 1 normalized pattern(s):

this file is subject to the terms and conditions of version 2 of the
gnu general public license see the file copying in the main
directory of the linux distribution for more details

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-only

has been chosen to replace the boilerplate/reference in 5 file(s).

Signed-off-by: Thomas Gleixner
Reviewed-by: Enrico Weigelt
Reviewed-by: Allison Randal
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190604081200.872755311@linutronix.de
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2019-06-19 23:09:08 +0800

18 Jun, 2019

1 commit

13091aa30 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Honestly all the conflicts were simple overlapping changes,
nothing really interesting to report.

Signed-off-by: David S. Miller

David S. Miller
2019-06-18 11:20:36 +0800

17 Jun, 2019

1 commit

23da766ab Merge tag 'v5.2-rc5' into sched/core, to pick up fixes ... Browse Code »

Signed-off-by: Ingo Molnar

Ingo Molnar
2019-06-17 18:12:27 +0800

15 Jun, 2019

3 commits

0011572c8 Merge branch 'for-5.2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup ... Browse Code »

Pull cgroup fixes from Tejun Heo:
"This has an unusually high density of tricky fixes:

- task_get_css() could deadlock when it races against a dying cgroup.

- cgroup.procs didn't list thread group leaders with live threads.

This could mislead readers to think that a cgroup is empty when
it's not. Fixed by making PROCS iterator include dead tasks. I made
a couple mistakes making this change and this pull request contains
a couple follow-up patches.

- When cpusets run out of online cpus, it updates cpusmasks of member
tasks in bizarre ways. Joel improved the behavior significantly"

* 'for-5.2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cpuset: restore sanity to cpuset_cpus_allowed_fallback()
cgroup: Fix css_task_iter_advance_css_set() cset skip condition
cgroup: css_task_iter_skip()'d iterators must be advanced before accessed
cgroup: Include dying leaders with live threads in PROCS iterations
cgroup: Implement css_task_iter_skip()
cgroup: Call cgroup_release() before __exit_signal()
docs cgroups: add another example size for hugetlb
cgroup: Use css_tryget() instead of css_tryget_online() in task_get_css()

Linus Torvalds
2019-06-15 11:46:14 +0800
99c8b231a docs: cgroup-v1: convert docs to ReST and rename to *.rst ... Browse Code »

Convert the cgroup-v1 files to ReST format, in order to
allow a later addition to the admin-guide.

The conversion is actually:
- add blank lines and identation in order to identify paragraphs;
- fix tables markups;
- add some lists markups;
- mark literal blocks;
- adjust title markups.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab
Acked-by: Tejun Heo
Signed-off-by: Tejun Heo

Mauro Carvalho Chehab
2019-06-15 04:29:54 +0800
38cf3a687 cgroup: Move cgroup_parse_float() implementation out of CONFIG_SYSFS ... Browse Code »

a5e112e6424a ("cgroup: add cgroup_parse_float()") accidentally added
cgroup_parse_float() inside CONFIG_SYSFS block. Move it outside so
that it doesn't cause failures on !CONFIG_SYSFS builds.

Signed-off-by: Tejun Heo
Fixes: a5e112e6424a ("cgroup: add cgroup_parse_float()")

Tejun Heo
2019-06-15 01:14:44 +0800

13 Jun, 2019

1 commit

d477f8c20 cpuset: restore sanity to cpuset_cpus_allowed_fallback() ... Browse Code »

In the case that a process is constrained by taskset(1) (i.e.
sched_setaffinity(2)) to a subset of available cpus, and all of those are
subsequently offlined, the scheduler will set tsk->cpus_allowed to
the current value of task_cs(tsk)->effective_cpus.

This is done via a call to do_set_cpus_allowed() in the context of
cpuset_cpus_allowed_fallback() made by the scheduler when this case is
detected. This is the only call made to cpuset_cpus_allowed_fallback()
in the latest mainline kernel.

However, this is not sane behavior.

I will demonstrate this on a system running the latest upstream kernel
with the following initial configuration:

# grep -i cpu /proc/$$/status
Cpus_allowed: ffffffff,fffffff
Cpus_allowed_list: 0-63

(Where cpus 32-63 are provided via smt.)

If we limit our current shell process to cpu2 only and then offline it
and reonline it:

# taskset -p 4 $$
pid 2272's current affinity mask: ffffffffffffffff
pid 2272's new affinity mask: 4

# echo off > /sys/devices/system/cpu/cpu2/online
# dmesg | tail -3
[ 2195.866089] process 2272 (bash) no longer affine to cpu2
[ 2195.872700] IRQ 114: no longer affine to CPU2
[ 2195.879128] smpboot: CPU 2 is now offline

# echo on > /sys/devices/system/cpu/cpu2/online
# dmesg | tail -1
[ 2617.043572] smpboot: Booting Node 0 Processor 2 APIC 0x4

We see that our current process now has an affinity mask containing
every cpu available on the system _except_ the one we originally
constrained it to:

# grep -i cpu /proc/$$/status
Cpus_allowed: ffffffff,fffffffb
Cpus_allowed_list: 0-1,3-63

This is not sane behavior, as the scheduler can now not only place the
process on previously forbidden cpus, it can't even schedule it on
the cpu it was originally constrained to!

Other cases result in even more exotic affinity masks. Take for instance
a process with an affinity mask containing only cpus provided by smt at
the moment that smt is toggled, in a configuration such as the following:

# taskset -p f000000000 $$
# grep -i cpu /proc/$$/status
Cpus_allowed: 000000f0,00000000
Cpus_allowed_list: 36-39

A double toggle of smt results in the following behavior:

# echo off > /sys/devices/system/cpu/smt/control
# echo on > /sys/devices/system/cpu/smt/control
# grep -i cpus /proc/$$/status
Cpus_allowed: ffffff00,ffffffff
Cpus_allowed_list: 0-31,40-63

This is even less sane than the previous case, as the new affinity mask
excludes all smt-provided cpus with ids less than those that were
previously in the affinity mask, as well as those that were actually in
the mask.

With this patch applied, both of these cases end in the following state:

# grep -i cpu /proc/$$/status
Cpus_allowed: ffffffff,ffffffff
Cpus_allowed_list: 0-63

The original policy is discarded. Though not ideal, it is the simplest way
to restore sanity to this fallback case without reinventing the cpuset
wheel that rolls down the kernel just fine in cgroup v2. A user who wishes
for the previous affinity mask to be restored in this fallback case can use
that mechanism instead.

This patch modifies scheduler behavior by instead resetting the mask to
task_cs(tsk)->cpus_allowed by default, and cpu_possible mask in legacy
mode. I tested the cases above on both modes.

Note that the scheduler uses this fallback mechanism if and only if
_every_ other valid avenue has been traveled, and it is the last resort
before calling BUG().

Suggested-by: Waiman Long
Suggested-by: Phil Auld
Signed-off-by: Joel Savitz
Acked-by: Phil Auld
Acked-by: Waiman Long
Acked-by: Peter Zijlstra (Intel)
Signed-off-by: Tejun Heo

Joel Savitz
2019-06-13 02:00:08 +0800

11 Jun, 2019

2 commits

11dc8b401 Merge branch 'for-5.2-fixes' into for-5.3 Browse Code »

Tejun Heo
2019-06-11 00:12:39 +0800
c596687a0 cgroup: Fix css_task_iter_advance_css_set() cset skip condition ... Browse Code »

While adding handling for dying task group leaders c03cd7738a83
("cgroup: Include dying leaders with live threads in PROCS
iterations") added an inverted cset skip condition to
css_task_iter_advance_css_set(). It should skip cset if it's
completely empty but was incorrectly testing for the inverse condition
for the dying_tasks list. Fix it.

Signed-off-by: Tejun Heo
Fixes: c03cd7738a83 ("cgroup: Include dying leaders with live threads in PROCS iterations")
Reported-by: syzbot+d4bba5ccd4f9a2a68681@syzkaller.appspotmail.com

Tejun Heo
2019-06-11 00:08:27 +0800

10 Jun, 2019

1 commit

cf8929885 cgroup/bfq: revert bfq.weight symlink change ... Browse Code »

There's some discussion on how to do this the best, and Tejun prefers
that BFQ just create the file itself instead of having cgroups support
a symlink feature.

Hence revert commit 54b7b868e826 and 19e9da9e86c4 for 5.2, and this
can be done properly for 5.3.

Signed-off-by: Jens Axboe

Jens Axboe
2019-06-10 17:35:41 +0800

08 Jun, 2019

1 commit

a6cdeeb16 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Some ISDN files that got removed in net-next had some changes
done in mainline, take the removals.

Signed-off-by: David S. Miller

David S. Miller
2019-06-08 02:00:14 +0800

07 Jun, 2019

1 commit

54b7b868e cgroup: let a symlink too be created with a cftype file ... Browse Code »

This commit enables a cftype to have a symlink (of any name) that
points to the file associated with the cftype.

Signed-off-by: Angelo Ruocco
Signed-off-by: Paolo Valente
Signed-off-by: Jens Axboe

Angelo Ruocco
2019-06-07 15:29:39 +0800

06 Jun, 2019

1 commit

cee0c33c5 cgroup: css_task_iter_skip()'d iterators must be advanced before accessed ... Browse Code »

b636fd38dc40 ("cgroup: Implement css_task_iter_skip()") introduced
css_task_iter_skip() which is used to fix task iterations skipping
dying threadgroup leaders with live threads. Skipping is implemented
as a subportion of full advancing but css_task_iter_next() forgot to
fully advance a skipped iterator before determining the next task to
visit causing it to return invalid task pointers.

Fix it by making css_task_iter_next() fully advance the iterator if it
has been skipped since the previous iteration.

Signed-off-by: Tejun Heo
Reported-by: syzbot
Link: http://lkml.kernel.org/r/00000000000097025d058a7fd785@google.com
Fixes: b636fd38dc40 ("cgroup: Implement css_task_iter_skip()")

Tejun Heo
2019-06-06 00:54:34 +0800