11 Dec, 2019
8 commits
-
Remove references to unused functions, standardize language, update to
reflect new functionality, migrate to rst format, and fix all kernel-doc
warnings.Fixes: 815613da6a67 ("kernel/padata.c: removed unused code")
Signed-off-by: Daniel Jordan
Cc: Eric Biggers
Cc: Herbert Xu
Cc: Jonathan Corbet
Cc: Steffen Klassert
Cc: linux-crypto@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Daniel Jordan
Signed-off-by: Herbert Xu -
reorder_objects is unused since the rework of padata's flushing, so
remove it.Signed-off-by: Daniel Jordan
Cc: Eric Biggers
Cc: Herbert Xu
Cc: Steffen Klassert
Cc: linux-crypto@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Herbert Xu -
Since commit 63d3578892dc ("crypto: pcrypt - remove padata cpumask
notifier") this feature is unused, so get rid of it.Signed-off-by: Daniel Jordan
Cc: Eric Biggers
Cc: Herbert Xu
Cc: Jonathan Corbet
Cc: Steffen Klassert
Cc: linux-crypto@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Herbert Xu -
lockdep complains when padata's paths to update cpumasks via CPU hotplug
and sysfs are both taken:# echo 0 > /sys/devices/system/cpu/cpu1/online
# echo ff > /sys/kernel/pcrypt/pencrypt/parallel_cpumask======================================================
WARNING: possible circular locking dependency detected
5.4.0-rc8-padata-cpuhp-v3+ #1 Not tainted
------------------------------------------------------
bash/205 is trying to acquire lock:
ffffffff8286bcd0 (cpu_hotplug_lock.rw_sem){++++}, at: padata_set_cpumask+0x2b/0x120but task is already holding lock:
ffff8880001abfa0 (&pinst->lock){+.+.}, at: padata_set_cpumask+0x26/0x120which lock already depends on the new lock.
padata doesn't take cpu_hotplug_lock and pinst->lock in a consistent
order. Which should be first? CPU hotplug calls into padata with
cpu_hotplug_lock already held, so it should have priority.Fixes: 6751fb3c0e0c ("padata: Use get_online_cpus/put_online_cpus")
Signed-off-by: Daniel Jordan
Cc: Eric Biggers
Cc: Herbert Xu
Cc: Steffen Klassert
Cc: linux-crypto@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Herbert Xu -
Configuring an instance's parallel mask without any online CPUs...
echo 2 > /sys/kernel/pcrypt/pencrypt/parallel_cpumask
echo 0 > /sys/devices/system/cpu/cpu1/online...makes tcrypt mode=215 crash like this:
divide error: 0000 [#1] SMP PTI
CPU: 4 PID: 283 Comm: modprobe Not tainted 5.4.0-rc8-padata-doc-v2+ #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20191013_105130-anatol 04/01/2014
RIP: 0010:padata_do_parallel+0x114/0x300
Call Trace:
pcrypt_aead_encrypt+0xc0/0xd0 [pcrypt]
crypto_aead_encrypt+0x1f/0x30
do_mult_aead_op+0x4e/0xdf [tcrypt]
test_mb_aead_speed.constprop.0.cold+0x226/0x564 [tcrypt]
do_test+0x28c2/0x4d49 [tcrypt]
tcrypt_mod_init+0x55/0x1000 [tcrypt]
...cpumask_weight() in padata_cpu_hash() returns 0 because the mask has no
CPUs. The problem is __padata_remove_cpu() checks for valid masks too
early and so doesn't mark the instance PADATA_INVALID as expected, which
would have made padata_do_parallel() return error before doing the
division.Fix by introducing a second padata CPU hotplug state before
CPUHP_BRINGUP_CPU so that __padata_remove_cpu() sees the online mask
without @cpu. No need for the second argument to padata_replace() since
@cpu is now already missing from the online mask.Fixes: 33e54450683c ("padata: Handle empty padata cpumasks")
Signed-off-by: Daniel Jordan
Cc: Eric Biggers
Cc: Herbert Xu
Cc: Sebastian Andrzej Siewior
Cc: Steffen Klassert
Cc: Thomas Gleixner
Cc: linux-crypto@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Herbert Xu -
If the pcrypt template is used multiple times in an algorithm, then a
deadlock occurs because all pcrypt instances share the same
padata_instance, which completes requests in the order submitted. That
is, the inner pcrypt request waits for the outer pcrypt request while
the outer request is already waiting for the inner.This patch fixes this by allocating a set of queues for each pcrypt
instance instead of using two global queues. In order to maintain
the existing user-space interface, the pinst structure remains global
so any sysfs modifications will apply to every pcrypt instance.Note that when an update occurs we have to allocate memory for
every pcrypt instance. Should one of the allocations fail we
will abort the update without rolling back changes already made.The new per-instance data structure is called padata_shell and is
essentially a wrapper around parallel_data.Reproducer:
#include
#include
#includeint main()
{
struct sockaddr_alg addr = {
.salg_type = "aead",
.salg_name = "pcrypt(pcrypt(rfc4106-gcm-aesni))"
};
int algfd, reqfd;
char buf[32] = { 0 };algfd = socket(AF_ALG, SOCK_SEQPACKET, 0);
bind(algfd, (void *)&addr, sizeof(addr));
setsockopt(algfd, SOL_ALG, ALG_SET_KEY, buf, 20);
reqfd = accept(algfd, 0, 0);
write(reqfd, buf, 32);
read(reqfd, buf, 16);
}Reported-by: syzbot+56c7151cad94eec37c521f0e47d2eee53f9361c4@syzkaller.appspotmail.com
Fixes: 5068c7a883d1 ("crypto: pcrypt - Add pcrypt crypto parallelization wrapper")
Signed-off-by: Herbert Xu
Tested-by: Eric Biggers
Signed-off-by: Herbert Xu -
The function padata_remove_cpu was supposed to have been removed
along with padata_add_cpu but somehow it remained behind. Let's
kill it now as it doesn't even have a prototype anymore.Fixes: 815613da6a67 ("kernel/padata.c: removed unused code")
Signed-off-by: Herbert Xu
Reviewed-by: Daniel Jordan
Signed-off-by: Herbert Xu -
The function padata_flush_queues is fundamentally broken because
it cannot force padata users to complete the request that is
underway. IOW padata has to passively wait for the completion
of any outstanding work.As it stands flushing is used in two places. Its use in padata_stop
is simply unnecessary because nothing depends on the queues to
be flushed afterwards.The other use in padata_replace is more substantial as we depend
on it to free the old pd structure. This patch instead uses the
pd->refcnt to dynamically free the pd structure once all requests
are complete.Fixes: 2b73b07ab8a4 ("padata: Flush the padata queues actively")
Cc:
Signed-off-by: Herbert Xu
Reviewed-by: Daniel Jordan
Signed-off-by: Herbert Xu
09 Dec, 2019
1 commit
-
Pull networking fixes from David Miller:
1) More jumbo frame fixes in r8169, from Heiner Kallweit.
2) Fix bpf build in minimal configuration, from Alexei Starovoitov.
3) Use after free in slcan driver, from Jouni Hogander.
4) Flower classifier port ranges don't work properly in the HW offload
case, from Yoshiki Komachi.5) Use after free in hns3_nic_maybe_stop_tx(), from Yunsheng Lin.
6) Out of bounds access in mqprio_dump(), from Vladyslav Tarasiuk.
7) Fix flow dissection in dsa TX path, from Alexander Lobakin.
8) Stale syncookie timestampe fixes from Guillaume Nault.
[ Did an evil merge to silence a warning introduced by this pull - Linus ]
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (84 commits)
r8169: fix rtl_hw_jumbo_disable for RTL8168evl
net_sched: validate TCA_KIND attribute in tc_chain_tmplt_add()
r8169: add missing RX enabling for WoL on RTL8125
vhost/vsock: accept only packets with the right dst_cid
net: phy: dp83867: fix hfs boot in rgmii mode
net: ethernet: ti: cpsw: fix extra rx interrupt
inet: protect against too small mtu values.
gre: refetch erspan header from skb->data after pskb_may_pull()
pppoe: remove redundant BUG_ON() check in pppoe_pernet
tcp: Protect accesses to .ts_recent_stamp with {READ,WRITE}_ONCE()
tcp: tighten acceptance of ACKs not matching a child socket
tcp: fix rejected syncookies due to stale timestamps
lpc_eth: kernel BUG on remove
tcp: md5: fix potential overestimation of TCP option space
net: sched: allow indirect blocks to bind to clsact in TC
net: core: rename indirect block ingress cb function
net-sysfs: Call dev_hold always in netdev_queue_add_kobject
net: dsa: fix flow dissection on Tx path
net/tls: Fix return values to avoid ENOTSUPP
net: avoid an indirect call in ____sys_recvmsg()
...
06 Dec, 2019
3 commits
-
Pull modules updates from Jessica Yu:
"Summary of modules changes for the 5.5 merge window:- Refactor include/linux/export.h and remove code duplication between
EXPORT_SYMBOL and EXPORT_SYMBOL_NS to make it more readable.The most notable change is that no namespace is represented by an
empty string "" rather than NULL.- Fix a module load/unload race where waiter(s) trying to load the
same module weren't being woken up when a module finally goes away"* tag 'modules-for-v5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
kernel/module.c: wakeup processes in module_wq on module unload
moduleparam: fix parameter description mismatch
export: avoid code duplication in include/linux/export.h -
Pull thermal management updates from Zhang Rui:
- Fix a deadlock regression in thermal core framework, which was
introduced in 5.3 (Wei Wang)- Initialize thermal control framework earlier to enable thermal
mitigation during boot (Amit Kucheria)- Convert the Intelligent Power Allocator (IPA) thermal governor to
follow the generic PM_EM instead of its own Energy Model (Quentin
Perret)- Introduce a new Amlogic soc thermal driver (Guillaume La Roque)
- Add interrupt support for tsens thermal driver (Amit Kucheria)
- Add support for MSM8956/8976 in tsens thermal driver
(AngeloGioacchino Del Regno)- Add support for r8a774b1 in rcar thermal driver (Biju Das)
- Add support for Thermal Monitor Unit v2 in qoriq thermal driver
(Yuantian Tang)- Some other fixes/cleanups on thermal core framework and soc thermal
drivers (Colin Ian King, Daniel Lezcano, Hsin-Yi Wang, Tian Tao)* 'thermal/next' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux: (32 commits)
thermal: Fix deadlock in thermal thermal_zone_device_check
thermal: cpu_cooling: Migrate to using the EM framework
thermal: cpu_cooling: Make the power-related code depend on IPA
PM / EM: Declare EM data types unconditionally
arm64: defconfig: Enable CONFIG_ENERGY_MODEL
drivers: thermal: tsens: fix potential integer overflow on multiply
thermal: cpu_cooling: Reorder the header file
thermal: cpu_cooling: Remove pointless dependency on CONFIG_OF
thermal: no need to set .owner when using module_platform_driver
thermal: qcom: tsens-v1: Fix kfree of a non-pointer value
cpufreq: qcom-hw: Move driver initialization earlier
clk: qcom: Initialize clock drivers earlier
cpufreq: Initialize cpufreq-dt driver earlier
cpufreq: Initialize the governors in core_initcall
thermal: Initialize thermal subsystem earlier
thermal: Remove netlink support
dt: thermal: tsens: Document compatible for MSM8976/56
thermal: qcom: tsens-v1: Add support for MSM8956 and MSM8976
MAINTAINERS: add entry for Amlogic Thermal driver
thermal: amlogic: Add thermal driver to support G12 SoCs
... -
Merge more updates from Andrew Morton:
"Most of the rest of MM and various other things. Some Kconfig rework
still awaits merges of dependent trees from linux-next.Subsystems affected by this patch series: mm/hotfixes, mm/memcg,
mm/vmstat, mm/thp, procfs, sysctl, misc, notifiers, core-kernel,
bitops, lib, checkpatch, epoll, binfmt, init, rapidio, uaccess, kcov,
ubsan, ipc, bitmap, mm/pagemap"* akpm: (86 commits)
mm: remove __ARCH_HAS_4LEVEL_HACK and include/asm-generic/4level-fixup.h
um: add support for folded p4d page tables
um: remove unused pxx_offset_proc() and addr_pte() functions
sparc32: use pgtable-nopud instead of 4level-fixup
parisc/hugetlb: use pgtable-nopXd instead of 4level-fixup
parisc: use pgtable-nopXd instead of 4level-fixup
nds32: use pgtable-nopmd instead of 4level-fixup
microblaze: use pgtable-nopmd instead of 4level-fixup
m68k: mm: use pgtable-nopXd instead of 4level-fixup
m68k: nommu: use pgtable-nopud instead of 4level-fixup
c6x: use pgtable-nopud instead of 4level-fixup
arm: nommu: use pgtable-nopud instead of 4level-fixup
alpha: use pgtable-nopud instead of 4level-fixup
gpio: pca953x: tighten up indentation
gpio: pca953x: convert to use bitmap API
gpio: pca953x: use input from regs structure in pca953x_irq_pending()
gpio: pca953x: remove redundant variable and check in IRQ handler
lib/bitmap: introduce bitmap_replace() helper
lib/test_bitmap: fix comment about this file
lib/test_bitmap: move exp1 and exp2 upper for others to use
...
05 Dec, 2019
10 commits
-
For jited bpf program, if the subprogram count is 1, i.e.,
there is no callees in the program, prog->aux->func will be NULL
and prog->bpf_func points to image address of the program.If there is more than one subprogram, prog->aux->func is populated,
and subprogram 0 can be accessed through either prog->bpf_func or
prog->aux->func[0]. Other subprograms should be accessed through
prog->aux->func[subprog_id].This patch fixed a bug in check_attach_btf_id(), where
prog->aux->func[subprog_id] is used to access any subprogram which
caused a segfault like below:
[79162.619208] BUG: kernel NULL pointer dereference, address:
0000000000000000
......
[79162.634255] Call Trace:
[79162.634974] ? _cond_resched+0x15/0x30
[79162.635686] ? kmem_cache_alloc_trace+0x162/0x220
[79162.636398] ? selinux_bpf_prog_alloc+0x1f/0x60
[79162.637111] bpf_prog_load+0x3de/0x690
[79162.637809] __do_sys_bpf+0x105/0x1740
[79162.638488] do_syscall_64+0x5b/0x180
[79162.639147] entry_SYSCALL_64_after_hwframe+0x44/0xa9
......Fixes: 5b92a28aae4d ("bpf: Support attaching tracing BPF program to other BPF programs")
Reported-by: Eelco Chaudron
Signed-off-by: Yonghong Song
Signed-off-by: Alexei Starovoitov
Link: https://lore.kernel.org/bpf/20191205010606.177774-1-yhs@fb.com -
Patch series " kcov: collect coverage from usb and vhost", v3.
This patchset extends kcov to allow collecting coverage from backgound
kernel threads. This extension requires custom annotations for each of
the places where coverage collection is desired. This patchset
implements this for hub events in the USB subsystem and for vhost
workers. See the first patch description for details about the kcov
extension. The other two patches apply this kcov extension to USB and
vhost.Examples of other subsystems that might potentially benefit from this
when custom annotations are added (the list is based on
process_one_work() callers for bugs recently reported by syzbot):1. fs: writeback wb_workfn() worker,
2. net: addrconf_dad_work()/addrconf_verify_work() workers,
3. net: neigh_periodic_work() worker,
4. net/p9: p9_write_work()/p9_read_work() workers,
5. block: blk_mq_run_work_fn() worker.These patches have been used to enable coverage-guided USB fuzzing with
syzkaller for the last few years, see the details here:https://github.com/google/syzkaller/blob/master/docs/linux/external_fuzzing_usb.md
This patchset has been pushed to the public Linux kernel Gerrit
instance:https://linux-review.googlesource.com/c/linux/kernel/git/torvalds/linux/+/1524
This patch (of 3):
Add background thread coverage collection ability to kcov.
With KCOV_ENABLE coverage is collected only for syscalls that are issued
from the current process. With KCOV_REMOTE_ENABLE it's possible to
collect coverage for arbitrary parts of the kernel code, provided that
those parts are annotated with kcov_remote_start()/kcov_remote_stop().This allows to collect coverage from two types of kernel background
threads: the global ones, that are spawned during kernel boot in a
limited number of instances (e.g. one USB hub_event() worker thread is
spawned per USB HCD); and the local ones, that are spawned when a user
interacts with some kernel interface (e.g. vhost workers).To enable collecting coverage from a global background thread, a unique
global handle must be assigned and passed to the corresponding
kcov_remote_start() call. Then a userspace process can pass a list of
such handles to the KCOV_REMOTE_ENABLE ioctl in the handles array field
of the kcov_remote_arg struct. This will attach the used kcov device to
the code sections, that are referenced by those handles.Since there might be many local background threads spawned from
different userspace processes, we can't use a single global handle per
annotation. Instead, the userspace process passes a non-zero handle
through the common_handle field of the kcov_remote_arg struct. This
common handle gets saved to the kcov_handle field in the current
task_struct and needs to be passed to the newly spawned threads via
custom annotations. Those threads should in turn be annotated with
kcov_remote_start()/kcov_remote_stop().Internally kcov stores handles as u64 integers. The top byte of a
handle is used to denote the id of a subsystem that this handle belongs
to, and the lower 4 bytes are used to denote the id of a thread instance
within that subsystem. A reserved value 0 is used as a subsystem id for
common handles as they don't belong to a particular subsystem. The
bytes 4-7 are currently reserved and must be zero. In the future the
number of bytes used for the subsystem or handle ids might be increased.When a particular userspace process collects coverage by via a common
handle, kcov will collect coverage for each code section that is
annotated to use the common handle obtained as kcov_handle from the
current task_struct. However non common handles allow to collect
coverage selectively from different subsystems.Link: http://lkml.kernel.org/r/e90e315426a384207edbec1d6aa89e43008e4caf.1572366574.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov
Cc: Dmitry Vyukov
Cc: Greg Kroah-Hartman
Cc: Alan Stern
Cc: "Michael S. Tsirkin"
Cc: Jason Wang
Cc: Arnd Bergmann
Cc: Steven Rostedt
Cc: David Windsor
Cc: Elena Reshetova
Cc: Anders Roxell
Cc: Alexander Potapenko
Cc: Marco Elver
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Follow the kernel conventions, rename addr_in_gen_pool to
gen_pool_has_addr.[sjhuang@iluvatar.ai: fix Documentation/ too]
Link: http://lkml.kernel.org/r/20181229015914.5573-1-sjhuang@iluvatar.ai
Link: http://lkml.kernel.org/r/20181228083950.20398-1-sjhuang@iluvatar.ai
Signed-off-by: Huang Shijie
Reviewed-by: Andrew Morton
Cc: Russell King
Cc: Arnd Bergmann
Cc: Greg Kroah-Hartman
Cc: Christoph Hellwig
Cc: Marek Szyprowski
Cc: Robin Murphy
Cc: Stephen Rothwell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Initialization is not guaranteed to zero padding bytes so use an
explicit memset instead to avoid leaking any kernel content in any
possible padding bytes.Link: http://lkml.kernel.org/r/dfa331c00881d61c8ee51577a082d8bebd61805c.camel@perches.com
Signed-off-by: Joe Perches
Cc: Dan Carpenter
Cc: Julia Lawall
Cc: Thomas Gleixner
Cc: Kees Cook
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
When building with clang + -Wtautological-pointer-compare, these
instances pop up:kernel/profile.c:339:6: warning: comparison of array 'prof_cpu_mask' not equal to a null pointer is always true [-Wtautological-pointer-compare]
if (prof_cpu_mask != NULL)
^~~~~~~~~~~~~ ~~~~
kernel/profile.c:376:6: warning: comparison of array 'prof_cpu_mask' not equal to a null pointer is always true [-Wtautological-pointer-compare]
if (prof_cpu_mask != NULL)
^~~~~~~~~~~~~ ~~~~
kernel/profile.c:406:26: warning: comparison of array 'prof_cpu_mask' not equal to a null pointer is always true [-Wtautological-pointer-compare]
if (!user_mode(regs) && prof_cpu_mask != NULL &&
^~~~~~~~~~~~~ ~~~~
3 warnings generated.This can be addressed with the cpumask_available helper, introduced in
commit f7e30f01a9e2 ("cpumask: Add helper cpumask_available()") to fix
warnings like this while keeping the code the same.Link: https://github.com/ClangBuiltLinux/linux/issues/747
Link: http://lkml.kernel.org/r/20191022191957.9554-1-natechancellor@gmail.com
Signed-off-by: Nathan Chancellor
Reviewed-by: Andrew Morton
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
blocking_notifier_chain_cond_register() does not consider system_booting
state, which is the only difference between this function and
blocking_notifier_cain_register(). This can be a bug and is a piece of
duplicate code.Delete blocking_notifier_chain_cond_register()
Link: http://lkml.kernel.org/r/1568861888-34045-4-git-send-email-nixiaoming@huawei.com
Signed-off-by: Xiaoming Ni
Reviewed-by: Andrew Morton
Cc: Alan Stern
Cc: Alexey Dobriyan
Cc: Andy Lutomirski
Cc: Anna Schumaker
Cc: Arjan van de Ven
Cc: Chuck Lever
Cc: David S. Miller
Cc: Ingo Molnar
Cc: J. Bruce Fields
Cc: Jeff Layton
Cc: Nadia Derbey
Cc: "Paul E. McKenney"
Cc: Sam Protsenko
Cc: Thomas Gleixner
Cc: Trond Myklebust
Cc: Vasily Averin
Cc: Viresh Kumar
Cc: YueHaibing
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The only difference between notifier_chain_cond_register() and
notifier_chain_register() is the lack of warning hints for duplicate
registrations. Use notifier_chain_register() instead of
notifier_chain_cond_register() to avoid duplicate codeLink: http://lkml.kernel.org/r/1568861888-34045-3-git-send-email-nixiaoming@huawei.com
Signed-off-by: Xiaoming Ni
Reviewed-by: Andrew Morton
Cc: Alan Stern
Cc: Alexey Dobriyan
Cc: Andy Lutomirski
Cc: Anna Schumaker
Cc: Arjan van de Ven
Cc: Chuck Lever
Cc: David S. Miller
Cc: Ingo Molnar
Cc: J. Bruce Fields
Cc: Jeff Layton
Cc: Nadia Derbey
Cc: "Paul E. McKenney"
Cc: Sam Protsenko
Cc: Thomas Gleixner
Cc: Trond Myklebust
Cc: Vasily Averin
Cc: Viresh Kumar
Cc: YueHaibing
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Registering the same notifier to a hook repeatedly can cause the hook
list to form a ring or lose other members of the list.case1: An infinite loop in notifier_chain_register() can cause soft lockup
atomic_notifier_chain_register(&test_notifier_list, &test1);
atomic_notifier_chain_register(&test_notifier_list, &test1);
atomic_notifier_chain_register(&test_notifier_list, &test2);case2: An infinite loop in notifier_chain_register() can cause soft lockup
atomic_notifier_chain_register(&test_notifier_list, &test1);
atomic_notifier_chain_register(&test_notifier_list, &test1);
atomic_notifier_call_chain(&test_notifier_list, 0, NULL);case3: lose other hook test2
atomic_notifier_chain_register(&test_notifier_list, &test1);
atomic_notifier_chain_register(&test_notifier_list, &test2);
atomic_notifier_chain_register(&test_notifier_list, &test1);case4: Unregister returns 0, but the hook is still in the linked list,
and it is not really registered. If you call
notifier_call_chain after ko is unloaded, it will trigger oops.If the system is configured with softlockup_panic and the same hook is
repeatedly registered on the panic_notifier_list, it will cause a loop
panic.Add a check in notifier_chain_register(), intercepting duplicate
registrations to avoid infinite loopsLink: http://lkml.kernel.org/r/1568861888-34045-2-git-send-email-nixiaoming@huawei.com
Signed-off-by: Xiaoming Ni
Reviewed-by: Vasily Averin
Reviewed-by: Andrew Morton
Cc: Alexey Dobriyan
Cc: Anna Schumaker
Cc: Arjan van de Ven
Cc: J. Bruce Fields
Cc: Chuck Lever
Cc: David S. Miller
Cc: Jeff Layton
Cc: Andy Lutomirski
Cc: Ingo Molnar
Cc: Nadia Derbey
Cc: "Paul E. McKenney"
Cc: Sam Protsenko
Cc: Alan Stern
Cc: Thomas Gleixner
Cc: Trond Myklebust
Cc: Viresh Kumar
Cc: Xiaoming Ni
Cc: YueHaibing
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Pull more tracing updates from Steven Rostedt:
"Two fixes and one patch that was missed:Fixes:
- Missing __print_hex_dump undef for processing new function in trace
events- Stop WARN_ON messages when lockdown disables tracing on boot up
Enhancement:
- Debug option to inject trace events from userspace (for rasdaemon)"
The enhancement has its own config option and is non invasive. It's been
discussed for sever months and should have been added to my original
push, but I never pulled it into my queue.* tag 'trace-v5.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Do not create directories if lockdown is in affect
tracing: Introduce trace event injection
tracing: Fix __print_hex_dump scope -
Pull additional power management updates from Rafael Wysocki:
"These fix an ACPI EC driver bug exposed by the recent rework of the
suspend-to-idle code flow, reintroduce frequency constraints into
device PM QoS (in preparation for adding QoS support to devfreq), drop
a redundant field from struct cpuidle_state and clean up Kconfig in
some places.Specifics:
- Avoid a race condition in the ACPI EC driver that may cause systems
to be unable to leave suspend-to-idle (Rafael Wysocki)- Drop the "disabled" field, which is redundant, from struct
cpuidle_state (Rafael Wysocki)- Reintroduce device PM QoS frequency constraints (temporarily
introduced and than dropped during the 5.4 cycle) in preparation
for adding QoS support to devfreq (Leonard Crestez)- Clean up indentation (in multiple places) and the cpuidle drivers
help text in Kconfig (Krzysztof Kozlowski, Randy Dunlap)"* tag 'pm-5.5-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: PM: s2idle: Rework ACPI events synchronization
ACPI: EC: Rework flushing of pending work
PM / devfreq: Add missing locking while setting suspend_freq
PM / QoS: Restore DEV_PM_QOS_MIN/MAX_FREQUENCY
PM / QoS: Reorder pm_qos/freq_qos/dev_pm_qos structs
PM / QoS: Initial kunit test
PM / QoS: Redefine FREQ_QOS_MAX_DEFAULT_VALUE to S32_MAX
power: avs: Fix Kconfig indentation
cpufreq: Fix Kconfig indentation
cpuidle: minor Kconfig help text fixes
cpuidle: Drop disabled field from struct cpuidle_state
cpuidle: Fix Kconfig indentation
04 Dec, 2019
3 commits
-
If lockdown is disabling tracing on boot up, it prevents the tracing files
from even bering created. But when that happens, there's several places that
will give a warning that the files were not created as that is usually a
sign of a bug.Add in strategic locations where a check is made to see if tracing is
disabled by lockdown, and if it is, do not go further, and fail silently
(but print that tracing is disabled by lockdown, without doing a WARN_ON()).Cc: Matthew Garrett
Fixes: 17911ff38aa5 ("tracing: Add locked_down checks to the open calls of files created for tracefs")
Signed-off-by: Steven Rostedt (VMware) -
Pull timer updates from Ingo Molnar:
"The main changes in the timer code in this cycle were:- Clockevent updates:
- timer-of framework cleanups. (Geert Uytterhoeven)
- Use timer-of for the renesas-ostm and the device name to prevent
name collision in case of multiple timers. (Geert Uytterhoeven)- Check if there is an error after calling of_clk_get in asm9260
(Chuhong Yuan)- ABI fix: Zero out high order bits of nanoseconds on compat
syscalls. This got broken a year ago, with apparently no side
effects so far.Since the kernel would use random data otherwise I don't think we'd
have other options but to fix the bug, even if there was a side
effect to applications (Dmitry Safonov)- Optimize ns_to_timespec64() on 32-bit systems: move away from
div_s64_rem() which can be slow, to div_u64_rem() which is faster
(Arnd Bergmann)- Annotate KCSAN-reported false positive data races in
hrtimer_is_queued() users by moving timer->state handling over to
the READ_ONCE()/WRITE_ONCE() APIs. This documents these accesses
(Eric Dumazet)- Misc cleanups and small fixes"
[ I undid the "ABI fix" and updated the comments instead. The reason
there were apparently no side effects is that the fix was a no-op.The updated comment is to say _why_ it was a no-op. - Linus ]
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
time: Zero the upper 32-bits in __kernel_timespec on 32-bit
time: Rename tsk->real_start_time to ->start_boottime
hrtimer: Remove the comment about not used HRTIMER_SOFTIRQ
time: Fix spelling mistake in comment
time: Optimize ns_to_timespec64()
hrtimer: Annotate lockless access to timer->state
clocksource/drivers/asm9260: Add a check for of_clk_get
clocksource/drivers/renesas-ostm: Use unique device name instead of ostm
clocksource/drivers/renesas-ostm: Convert to timer_of
clocksource/drivers/timer-of: Use unique device name instead of timer
clocksource/drivers/timer-of: Convert last full_name to %pOF -
Pull irq updates from Ingo Molnar:
"Most of the IRQ subsystem changes in this cycle were irq-chip driver
updates:- Qualcomm PDC wakeup interrupt support
- Layerscape external IRQ support
- Broadcom bcm7038 PM and wakeup support
- Ingenic driver cleanup and modernization
- GICv3 ITS preparation for GICv4.1 updates
- GICv4 fixes
There's also the series from Frederic Weisbecker that fixes memory
ordering bugs for the irq-work logic, whose primary fix is to turn
work->irq_work.flags into an atomic variable and then convert the
complex (and buggy) atomic_cmpxchg() loop in irq_work_claim() into a
much simpler atomic_fetch_or() call.There are also various smaller cleanups"
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits)
pinctrl/sdm845: Add PDC wakeup interrupt map for GPIOs
pinctrl/msm: Setup GPIO chip in hierarchy
irqchip/qcom-pdc: Add irqchip set/get state calls
irqchip/qcom-pdc: Add irqdomain for wakeup capable GPIOs
irqchip/qcom-pdc: Do not toggle IRQ_ENABLE during mask/unmask
irqchip/qcom-pdc: Update max PDC interrupts
of/irq: Document properties for wakeup interrupt parent
genirq: Introduce irq_chip_get/set_parent_state calls
irqdomain: Add bus token DOMAIN_BUS_WAKEUP
genirq: Fix function documentation of __irq_alloc_descs()
irq_work: Fix IRQ_WORK_BUSY bit clearing
irqchip/ti-sci-inta: Use ERR_CAST inlined function instead of ERR_PTR(PTR_ERR(...))
irq_work: Slightly simplify IRQ_WORK_PENDING clearing
irq_work: Fix irq_work_claim() memory ordering
irq_work: Convert flags to atomic_t
irqchip: Ingenic: Add process for more than one irq at the same time.
irqchip: ingenic: Alloc generic chips from IRQ domain
irqchip: ingenic: Get virq number from IRQ domain
irqchip: ingenic: Error out if IRQ domain creation failed
irqchip: ingenic: Drop redundant irq_suspend / irq_resume functions
...
03 Dec, 2019
3 commits
-
Pull Kbuild updates from Masahiro Yamada:
- remove unneeded asm headers from hexagon, ia64
- add 'dir-pkg' target, which works like 'tar-pkg' but skips archiving
- add 'helpnewconfig' target, which shows help for new CONFIG options
- support 'make nsdeps' for external modules
- make rebuilds faster by deleting $(wildcard $^) checks
- remove compile tests for kernel-space headers
- refactor modpost to simplify modversion handling
- make single target builds faster
- optimize and clean up scripts/kallsyms.c
- refactor various Makefiles and scripts
* tag 'kbuild-v5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (59 commits)
MAINTAINERS: update Kbuild/Kconfig maintainer's email address
scripts/kallsyms: remove redundant initializers
scripts/kallsyms: put check_symbol_range() calls close together
scripts/kallsyms: make check_symbol_range() void function
scripts/kallsyms: move ignored symbol types to is_ignored_symbol()
scripts/kallsyms: move more patterns to the ignored_prefixes array
scripts/kallsyms: skip ignored symbols very early
scripts/kallsyms: add const qualifiers where possible
scripts/kallsyms: make find_token() return (unsigned char *)
scripts/kallsyms: replace prefix_underscores_count() with strspn()
scripts/kallsyms: add sym_name() to mitigate cast ugliness
scripts/kallsyms: remove unneeded length check for prefix matching
scripts/kallsyms: remove redundant is_arm_mapping_symbol()
scripts/kallsyms: set relative_base more effectively
scripts/kallsyms: shrink table before sorting it
scripts/kallsyms: fix definitely-lost memory leak
scripts/kallsyms: remove unneeded #ifndef ARRAY_SIZE
kbuild: make single target builds even faster
modpost: respect the previous export when 'exported twice' is warned
modpost: do not set ->preloaded for symbols from Module.symvers
... -
Daniel Borkmann says:
====================
pull-request: bpf 2019-12-02The following pull-request contains BPF updates for your *net* tree.
We've added 10 non-merge commits during the last 6 day(s) which contain
a total of 10 files changed, 60 insertions(+), 51 deletions(-).The main changes are:
1) Fix vmlinux BTF generation for binutils pre v2.25, from Stanislav Fomichev.
2) Fix libbpf global variable relocation to take symbol's st_value offset
into account, from Andrii Nakryiko.3) Fix libbpf build on powerpc where check_abi target fails due to different
readelf output format, from Aurelien Jarno.4) Don't set BPF insns RO for the case when they are JITed in order to avoid
fragmenting the direct map, from Daniel Borkmann.5) Fix static checker warning in btf_distill_func_proto() as well as a build
error due to empty enum when BPF is compiled out, from Alexei Starovoitov.6) Fix up generation of bpf_helper_defs.h for perf, from Arnaldo Carvalho de Melo.
====================Signed-off-by: David S. Miller
-
We have been trying to use rasdaemon to monitor hardware errors like
correctable memory errors. rasdaemon uses trace events to monitor
various hardware errors. In order to test it, we have to inject some
hardware errors, unfortunately not all of them provide error
injections. MCE does provide a way to inject MCE errors, but errors
like PCI error and devlink error don't, it is not easy to add error
injection to each of them. Instead, it is relatively easier to just
allow users to inject trace events in a generic way so that all trace
events can be injected.This patch introduces trace event injection, where a new 'inject' is
added to each tracepoint directory. Users could write into this file
with key=value pairs to specify the value of each fields of the trace
event, all unspecified fields are set to zero values by default.For example, for the net/net_dev_queue tracepoint, we can inject:
INJECT=/sys/kernel/debug/tracing/events/net/net_dev_queue/inject
echo "" > $INJECT
echo "name='test'" > $INJECT
echo "name='test' len=1024" > $INJECT
cat /sys/kernel/debug/tracing/trace
...
-614 [000] .... 36.571483: net_dev_queue: dev= skbaddr=00000000fbf338c2 len=0
-614 [001] .... 136.588252: net_dev_queue: dev=test skbaddr=00000000fbf338c2 len=0
-614 [001] .N.. 208.431878: net_dev_queue: dev=test skbaddr=00000000fbf338c2 len=1024Triggers could be triggered as usual too:
echo "stacktrace if len == 1025" > /sys/kernel/debug/tracing/events/net/net_dev_queue/trigger
echo "len=1025" > $INJECT
cat /sys/kernel/debug/tracing/trace
...
bash-614 [000] .... 36.571483: net_dev_queue: dev= skbaddr=00000000fbf338c2 len=0
bash-614 [001] .... 136.588252: net_dev_queue: dev=test skbaddr=00000000fbf338c2 len=0
bash-614 [001] .N.. 208.431878: net_dev_queue: dev=test skbaddr=00000000fbf338c2 len=1024
bash-614 [001] .N.1 284.236349:
=> event_inject_write
=> vfs_write
=> ksys_write
=> do_syscall_64
=> entry_SYSCALL_64_after_hwframeThe only thing that can't be injected is string pointers as they
require constant string pointers, this can't be done at run time.Link: http://lkml.kernel.org/r/20191130045218.18979-1-xiyou.wangcong@gmail.com
Cc: Ingo Molnar
Signed-off-by: Cong Wang
Signed-off-by: Steven Rostedt (VMware)
02 Dec, 2019
5 commits
-
Merge updates from Andrew Morton:
"Incoming:- a small number of updates to scripts/, ocfs2 and fs/buffer.c
- most of MM
I still have quite a lot of material (mostly not MM) staged after
linux-next due to -next dependencies. I'll send those across next week
as the preprequisites get merged up"* emailed patches from Andrew Morton : (135 commits)
mm/page_io.c: annotate refault stalls from swap_readpage
mm/Kconfig: fix trivial help text punctuation
mm/Kconfig: fix indentation
mm/memory_hotplug.c: remove __online_page_set_limits()
mm: fix typos in comments when calling __SetPageUptodate()
mm: fix struct member name in function comments
mm/shmem.c: cast the type of unmap_start to u64
mm: shmem: use proper gfp flags for shmem_writepage()
mm/shmem.c: make array 'values' static const, makes object smaller
userfaultfd: require CAP_SYS_PTRACE for UFFD_FEATURE_EVENT_FORK
fs/userfaultfd.c: wp: clear VM_UFFD_MISSING or VM_UFFD_WP during userfaultfd_register()
userfaultfd: wrap the common dst_vma check into an inlined function
userfaultfd: remove unnecessary WARN_ON() in __mcopy_atomic_hugetlb()
userfaultfd: use vma_pagesize for all huge page size calculation
mm/madvise.c: use PAGE_ALIGN[ED] for range checking
mm/madvise.c: replace with page_size() in madvise_inject_error()
mm/mmap.c: make vma_merge() comment more easy to understand
mm/hwpoison-inject: use DEFINE_DEBUGFS_ATTRIBUTE to define debugfs fops
autonuma: reduce cache footprint when scanning page tables
autonuma: fix watermark checking in migrate_balanced_pgdat()
... -
Pull y2038 cleanups from Arnd Bergmann:
"y2038 syscall implementation cleanupsThis is a series of cleanups for the y2038 work, mostly intended for
namespace cleaning: the kernel defines the traditional time_t, timeval
and timespec types that often lead to y2038-unsafe code. Even though
the unsafe usage is mostly gone from the kernel, having the types and
associated functions around means that we can still grow new users,
and that we may be missing conversions to safe types that actually
matter.There are still a number of driver specific patches needed to get the
last users of these types removed, those have been submitted to the
respective maintainers"Link: https://lore.kernel.org/lkml/20191108210236.1296047-1-arnd@arndb.de/
* tag 'y2038-cleanups-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: (26 commits)
y2038: alarm: fix half-second cut-off
y2038: ipc: fix x32 ABI breakage
y2038: fix typo in powerpc vdso "LOPART"
y2038: allow disabling time32 system calls
y2038: itimer: change implementation to timespec64
y2038: move itimer reset into itimer.c
y2038: use compat_{get,set}_itimer on alpha
y2038: itimer: compat handling to itimer.c
y2038: time: avoid timespec usage in settimeofday()
y2038: timerfd: Use timespec64 internally
y2038: elfcore: Use __kernel_old_timeval for process times
y2038: make ns_to_compat_timeval use __kernel_old_timeval
y2038: socket: use __kernel_old_timespec instead of timespec
y2038: socket: remove timespec reference in timestamping
y2038: syscalls: change remaining timeval to __kernel_old_timeval
y2038: rusage: use __kernel_old_timeval
y2038: uapi: change __kernel_time_t to __kernel_old_time_t
y2038: stat: avoid 'time_t' in 'struct stat'
y2038: ipc: remove __kernel_time_t reference from headers
y2038: vdso: powerpc: avoid timespec references
... -
Pull sysctl system call removal from Eric Biederman:
"As far as I can tell we have reached the point where no one enables
the sysctl system call anymore. It still is enabled in a few
defconfigs but they are mostly the rarely used one and in asking
people about that it was more cut & paste enabled than anything else.This is single commit that just deletes code. Leaving just enough code
so that the deprecated sysctl warning continues to be printed. If my
analysis turns out to be wrong and someone actually cares it will be
easy to revert this commit and have the system call again.There was one new xtensa defconfig in linux-next that enabled the
system call this cycle and when asked about it the maintainer of the
code replied that it was not enabled on purpose. As of today's
linux-next tree that defconfig no longer enables the system call.What we saw in the review discussion was that if we go a step farther
than my patch and mess with uapi headers there are pieces of code that
won't compile, but nothing minds the system call actually disappearing
from the kernel"Link: https://lore.kernel.org/lkml/201910011140.EA0181F13@keescook/
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
sysctl: Remove the sysctl system call -
Currently, the drop_caches proc file and sysctl read back the last value
written, suggesting this is somehow a stateful setting instead of a
one-time command. Make it write-only, like e.g. compact_memory.While mitigating a VM problem at scale in our fleet, there was confusion
about whether writing to this file will permanently switch the kernel into
a non-caching mode. This influences the decision making in a tense
situation, where tens of people are trying to fix tens of thousands of
affected machines: Do we need a rollback strategy? What are the
performance implications of operating in a non-caching state for several
days? It also caused confusion when the kernel team said we may need to
write the file several times to make sure it's effective ("But it already
reads back 3?").Link: http://lkml.kernel.org/r/20191031221602.9375-1-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner
Acked-by: Chris Down
Acked-by: Vlastimil Babka
Acked-by: David Hildenbrand
Acked-by: Michal Hocko
Acked-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Supporting VMAP_STACK with KASAN_VMALLOC is straightforward:
- clear the shadow region of vmapped stacks when swapping them in
- tweak Kconfig to allow VMAP_STACK to be turned on with KASANLink: http://lkml.kernel.org/r/20191031093909.9228-4-dja@axtens.net
Signed-off-by: Daniel Axtens
Reviewed-by: Dmitry Vyukov
Reviewed-by: Andrey Ryabinin
Cc: Alexander Potapenko
Cc: Christophe Leroy
Cc: Mark Rutland
Cc: Vasily Gorbik
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
01 Dec, 2019
7 commits
-
get_unmapped_area() returns an address or -errno on failure. Historically
we have checked for the failure by offset_in_page() which is correct but
quite hard to read. Newer code started using IS_ERR_VALUE which is much
easier to read. Convert remaining users of offset_in_page as well.[mhocko@suse.com: rewrite changelog]
[mhocko@kernel.org: fix mremap.c and uprobes.c sites also]
Link: http://lkml.kernel.org/r/20191012102512.28051-1-pugaowei@gmail.com
Signed-off-by: Gaowei Pu
Reviewed-by: Andrew Morton
Acked-by: Michal Hocko
Cc: Vlastimil Babka
Cc: Wei Yang
Cc: Konstantin Khlebnikov
Cc: Kirill A. Shutemov
Cc: "Jérôme Glisse"
Cc: Mike Kravetz
Cc: Rik van Riel
Cc: Qian Cai
Cc: Shakeel Butt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Pull seccomp updates from Kees Cook:
"Mostly this is implementing the new flag SECCOMP_USER_NOTIF_FLAG_CONTINUE,
but there are cleanups as well.- implement SECCOMP_USER_NOTIF_FLAG_CONTINUE (Christian Brauner)
- fixes to selftests (Christian Brauner)
- remove secure_computing() argument (Christian Brauner)"
* tag 'seccomp-v5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
seccomp: rework define for SECCOMP_USER_NOTIF_FLAG_CONTINUE
seccomp: fix SECCOMP_USER_NOTIF_FLAG_CONTINUE test
seccomp: simplify secure_computing()
seccomp: test SECCOMP_USER_NOTIF_FLAG_CONTINUE
seccomp: add SECCOMP_USER_NOTIF_FLAG_CONTINUE
seccomp: avoid overflow in implicit constant conversion -
Pull audit updates from Paul Moore:
"Audit is back for v5.5, albeit with only two patches:- Allow for the auditing of suspicious O_CREAT usage via the new
AUDIT_ANOM_CREAT record.- Remove a redundant if-conditional check found during code analysis.
It's a minor change, but when the pull request is only two patches
long, you need filler in the pull request email"[ Heh on the pull request filler. I wish more people tried to write
better pull request messages, even if maybe it's not worth it for the
trivial cases ;^) - Linus ]* tag 'audit-pr-20191126' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
audit: remove redundant condition check in kauditd_thread()
audit: Report suspicious O_CREAT usage -
Pull kgdb updates from Daniel Thompson:
"The major change here is the work from Douglas Anderson that reworks
the way kdb stack traces are handled on SMP systems. The effect is to
allow all CPUs to issue their stack trace which reduced the need for
architecture specific code to support stack tracing.Also included are general of clean ups from Doug and myself:
- Remove some unused variables or arguments.
- Tidy up the kdb escape handling code and fix a couple of odd corner
cases.- Better ignore escape characters that do not form part of an escape
sequence. This mostly benefits vi users since they are most likely
to press escape as a nervous habit but it won't harm anyone else"* tag 'kgdb-5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux:
kdb: Tweak escape handling for vi users
kdb: Improve handling of characters from different input sources
kdb: Remove special case logic from kdb_read()
kdb: Simplify code to fetch characters from console
kdb: Tidy up code to handle escape sequences
kdb: Avoid array subscript warnings on non-SMP builds
kdb: Fix stack crawling on 'running' CPUs that aren't the master
kdb: Fix "btc " crash if the CPU didn't round up
kdb: Remove unused "argcount" param from kdb_bt1(); make btaprompt bool
kgdb: Remove unused DCPU_SSTEP definition -
Pull parisc updates from Helge Deller:
"Just trivial small updates: An assembler register optimization in the
inlined networking checksum functions, a compiler warning fix and
don't unneccesary print a runtime warning on machines which wouldn't
be affected anyway"* 'parisc-5.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Avoid spurious inequivalent alias kernel error messages
kexec: Fix pointer-to-int-cast warnings
parisc: Do not hardcode registers in checksum functions -
…ux/kernel/git/dhowells/linux-fs
Pull pipe rework from David Howells:
"This is my set of preparatory patches for building a general
notification queue on top of pipes. It makes a number of significant
changes:- It removes the nr_exclusive argument from __wake_up_sync_key() as
this is always 1. This prepares for the next step:- Adds wake_up_interruptible_sync_poll_locked() so that poll can be
woken up from a function that's holding the poll waitqueue
spinlock.- Change the pipe buffer ring to be managed in terms of unbounded
head and tail indices rather than bounded index and length. This
means that reading the pipe only needs to modify one index, not
two.- A selection of helper functions are provided to query the state of
the pipe buffer, plus a couple to apply updates to the pipe
indices.- The pipe ring is allowed to have kernel-reserved slots. This allows
many notification messages to be spliced in by the kernel without
allowing userspace to pin too many pages if it writes to the same
pipe.- Advance the head and tail indices inside the pipe waitqueue lock
and use wake_up_interruptible_sync_poll_locked() to poke poll
without having to take the lock twice.- Rearrange pipe_write() to preallocate the buffer it is going to
write into and then drop the spinlock. This allows kernel
notifications to then be added the ring whilst it is filling the
buffer it allocated. The read side is stalled because the pipe
mutex is still held.- Don't wake up readers on a pipe if there was already data in it
when we added more.- Don't wake up writers on a pipe if the ring wasn't full before we
removed a buffer"* tag 'notifications-pipe-prep-20191115' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
pipe: Remove sync on wake_ups
pipe: Increase the writer-wakeup threshold to reduce context-switch count
pipe: Check for ring full inside of the spinlock in pipe_write()
pipe: Remove redundant wakeup from pipe_write()
pipe: Rearrange sequence in pipe_write() to preallocate slot
pipe: Conditionalise wakeup in pipe_read()
pipe: Advance tail pointer inside of wait spinlock in pipe_read()
pipe: Allow pipes to have kernel-reserved slots
pipe: Use head and tail pointers for the ring, not cursor and length
Add wake_up_interruptible_sync_poll_locked()
Remove the nr_exclusive argument from __wake_up_sync_key()
pipe: Reduce #inclusion of pipe_fs_i.h -
Pull hmm updates from Jason Gunthorpe:
"This is another round of bug fixing and cleanup. This time the focus
is on the driver pattern to use mmu notifiers to monitor a VA range.
This code is lifted out of many drivers and hmm_mirror directly into
the mmu_notifier core and written using the best ideas from all the
driver implementations.This removes many bugs from the drivers and has a very pleasing
diffstat. More drivers can still be converted, but that is for another
cycle.- A shared branch with RDMA reworking the RDMA ODP implementation
- New mmu_interval_notifier API. This is focused on the use case of
monitoring a VA and simplifies the process for drivers- A common seq-count locking scheme built into the
mmu_interval_notifier API usable by drivers that call
get_user_pages() or hmm_range_fault() with the VA range- Conversion of mlx5 ODP, hfi1, radeon, nouveau, AMD GPU, and Xen
GntDev drivers to the new API. This deletes a lot of wonky driver
code.- Two improvements for hmm_range_fault(), from testing done by Ralph"
* tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
mm/hmm: remove hmm_range_dma_map and hmm_range_dma_unmap
mm/hmm: make full use of walk_page_range()
xen/gntdev: use mmu_interval_notifier_insert
mm/hmm: remove hmm_mirror and related
drm/amdgpu: Use mmu_interval_notifier instead of hmm_mirror
drm/amdgpu: Use mmu_interval_insert instead of hmm_mirror
drm/amdgpu: Call find_vma under mmap_sem
nouveau: use mmu_interval_notifier instead of hmm_mirror
nouveau: use mmu_notifier directly for invalidate_range_start
drm/radeon: use mmu_interval_notifier_insert
RDMA/hfi1: Use mmu_interval_notifier_insert for user_exp_rcv
RDMA/odp: Use mmu_interval_notifier_insert()
mm/hmm: define the pre-processor related parts of hmm.h even if disabled
mm/hmm: allow hmm_range to be used with a mmu_interval_notifier or hmm_mirror
mm/mmu_notifier: add an interval tree notifier
mm/mmu_notifier: define the header pre-processor parts even if disabled
mm/hmm: allow snapshot of the special zero page