Eric Lee / smarc-fsl-linux-kernel

17 Mar, 2019

3 commits

a9dce6679 Merge tag 'pidfd-v5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux ... Browse Code »

Pull pidfd system call from Christian Brauner:
"This introduces the ability to use file descriptors from /proc//
as stable handles on struct pid. Even if a pid is recycled the handle
will not change. For a start these fds can be used to send signals to
the processes they refer to.

With the ability to use /proc/ fds as stable handles on struct
pid we can fix a long-standing issue where after a process has exited
its pid can be reused by another process. If a caller sends a signal
to a reused pid it will end up signaling the wrong process.

With this patchset we enable a variety of use cases. One obvious
example is that we can now safely delegate an important part of
process management - sending signals - to processes other than the
parent of a given process by sending file descriptors around via scm
rights and not fearing that the given process will have been recycled
in the meantime. It also allows for easy testing whether a given
process is still alive or not by sending signal 0 to a pidfd which is
quite handy.

There has been some interest in this feature e.g. from systems
management (systemd, glibc) and container managers. I have requested
and gotten comments from glibc to make sure that this syscall is
suitable for their needs as well. In the future I expect it to take on
most other pid-based signal syscalls. But such features are left for
the future once they are needed.

This has been sitting in linux-next for quite a while and has not
caused any issues. It comes with selftests which verify basic
functionality and also test that a recycled pid cannot be signaled via
a pidfd.

Jon has written about a prior version of this patchset. It should
cover the basic functionality since not a lot has changed since then:

https://lwn.net/Articles/773459/

The commit message for the syscall itself is extensively documenting
the syscall, including it's functionality and extensibility"

* tag 'pidfd-v5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
selftests: add tests for pidfd_send_signal()
signal: add pidfd_send_signal() syscall

Linus Torvalds
2019-03-17 04:47:14 +0800
f67e3fb48 Merge tag 'devdax-for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm ... Browse Code »

Pull device-dax updates from Dan Williams:
"New device-dax infrastructure to allow persistent memory and other
"reserved" / performance differentiated memories, to be assigned to
the core-mm as "System RAM".

Some users want to use persistent memory as additional volatile
memory. They are willing to cope with potential performance
differences, for example between DRAM and 3D Xpoint, and want to use
typical Linux memory management apis rather than a userspace memory
allocator layered over an mmap() of a dax file. The administration
model is to decide how much Persistent Memory (pmem) to use as System
RAM, create a device-dax-mode namespace of that size, and then assign
it to the core-mm. The rationale for device-dax is that it is a
generic memory-mapping driver that can be layered over any "special
purpose" memory, not just pmem. On subsequent boots udev rules can be
used to restore the memory assignment.

One implication of using pmem as RAM is that mlock() no longer keeps
data off persistent media. For this reason it is recommended to enable
NVDIMM Security (previously merged for 5.0) to encrypt pmem contents
at rest. We considered making this recommendation an actively enforced
requirement, but in the end decided to leave it as a distribution /
administrator policy to allow for emulation and test environments that
lack security capable NVDIMMs.

Summary:

- Replace the /sys/class/dax device model with /sys/bus/dax, and
include a compat driver so distributions can opt-in to the new ABI.

- Allow for an alternative driver for the device-dax address-range

- Introduce the 'kmem' driver to hotplug / assign a device-dax
address-range to the core-mm.

- Arrange for the device-dax target-node to be onlined so that the
newly added memory range can be uniquely referenced by numa apis"

NOTE! I'm not entirely happy with the whole "PMEM as RAM" model because
we currently have special - and very annoying rules in the kernel about
accessing PMEM only with the "MC safe" accessors, because machine checks
inside the regular repeat string copy functions can be fatal in some
(not described) circumstances.

And apparently the PMEM modules can cause that a lot more than regular
RAM. The argument is that this happens because PMEM doesn't necessarily
get scrubbed at boot like RAM does, but that is planned to be added for
the user space tooling.

Quoting Dan from another email:
"The exposure can be reduced in the volatile-RAM case by scanning for
and clearing errors before it is onlined as RAM. The userspace tooling
for that can be in place before v5.1-final. There's also runtime
notifications of errors via acpi_nfit_uc_error_notify() from
background scrubbers on the DIMM devices. With that mechanism the
kernel could proactively clear newly discovered poison in the volatile
case, but that would be additional development more suitable for v5.2.

I understand the concern, and the need to highlight this issue by
tapping the brakes on feature development, but I don't see PMEM as RAM
making the situation worse when the exposure is also there via DAX in
the PMEM case. Volatile-RAM is arguably a safer use case since it's
possible to repair pages where the persistent case needs active
application coordination"

* tag 'devdax-for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
device-dax: "Hotplug" persistent memory for use like normal RAM
mm/resource: Let walk_system_ram_range() search child resources
mm/memory-hotplug: Allow memory resources to be children
mm/resource: Move HMM pr_debug() deeper into resource code
mm/resource: Return real error codes from walk failures
device-dax: Add a 'modalias' attribute to DAX 'bus' devices
device-dax: Add a 'target_node' attribute
device-dax: Auto-bind device after successful new_id
acpi/nfit, device-dax: Identify differentiated memory with a unique numa-node
device-dax: Add /sys/class/dax backwards compatibility
device-dax: Add support for a dax override driver
device-dax: Move resource pinning+mapping into the common driver
device-dax: Introduce bus + driver model
device-dax: Start defining a dax bus model
device-dax: Remove multi-resource infrastructure
device-dax: Kill dax_region base
device-dax: Kill dax_region ida

Linus Torvalds
2019-03-17 04:05:32 +0800
11efae350 Merge tag 'for-5.1/block-post-20190315' of git://git.kernel.dk/linux-block ... Browse Code »

Pull more block layer changes from Jens Axboe:
"This is a collection of both stragglers, and fixes that came in after
I finalized the initial pull. This contains:

- An MD pull request from Song, with a few minor fixes

- Set of NVMe patches via Christoph

- Pull request from Konrad, with a few fixes for xen/blkback

- pblk fix IO calculation fix (Javier)

- Segment calculation fix for pass-through (Ming)

- Fallthrough annotation for blkcg (Mathieu)"

* tag 'for-5.1/block-post-20190315' of git://git.kernel.dk/linux-block: (25 commits)
blkcg: annotate implicit fall through
nvme-tcp: support C2HData with SUCCESS flag
nvmet: ignore EOPNOTSUPP for discard
nvme: add proper write zeroes setup for the multipath device
nvme: add proper discard setup for the multipath device
nvme: remove nvme_ns_config_oncs
nvme: disable Write Zeroes for qemu controllers
nvmet-fc: bring Disconnect into compliance with FC-NVME spec
nvmet-fc: fix issues with targetport assoc_list list walking
nvme-fc: reject reconnect if io queue count is reduced to zero
nvme-fc: fix numa_node when dev is null
nvme-fc: use nr_phys_segments to determine existence of sgl
nvme-loop: init nvmet_ctrl fatal_err_work when allocate
nvme: update comment to make the code easier to read
nvme: put ns_head ref if namespace fails allocation
nvme-trace: fix cdw10 buffer overrun
nvme: don't warn on block content change effects
nvme: add get-feature to admin cmds tracer
md: Fix failed allocation of md_register_thread
It's wrong to add len to sector_nr in raid10 reshape twice
...

Linus Torvalds
2019-03-17 03:36:39 +0800

16 Mar, 2019

2 commits

aa2e3ac64 Merge tag 'trace-v5.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace ... Browse Code »

Pull tracing fixes and cleanups from Steven Rostedt:
"This contains a series of last minute clean ups, small fixes and error
checks"

* tag 'trace-v5.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing/probe: Verify alloc_trace_*probe() result
tracing/probe: Check event/group naming rule at parsing
tracing/probe: Check the size of argument name and body
tracing/probe: Check event name length correctly
tracing/probe: Check maxactive error cases
tracing: kdb: Fix ftdump to not sleep
trace/probes: Remove kernel doc style from non kernel doc comment
tracing/probes: Make reserved_field_names static

Linus Torvalds
2019-03-16 05:47:02 +0800
2b9c272cf Merge tag 'fbdev-v5.1' of git://github.com/bzolnier/linux ... Browse Code »

Pull fbdev updates from Bartlomiej Zolnierkiewicz:
"Just a couple of small fixes and cleanups:

- fix memory access if logo is bigger than the screen (Manfred
Schlaegl)

- silence fbcon logo on 'quiet' boots (Prarit Bhargava)

- use kvmalloc() for scrollback buffer in fbcon (Konstantin Khorenko)

- misc fixes (Colin Ian King, YueHaibing, Matteo Croce, Mathieu
Malaterre, Anders Roxell, Arnd Bergmann)

- misc cleanups (Rob Herring, Lubomir Rintel, Greg Kroah-Hartman,
Jani Nikula, Michal Vokáč)"

* tag 'fbdev-v5.1' of git://github.com/bzolnier/linux:
fbdev: mbx: fix a misspelled variable name
fbdev: omap2: fix warnings in dss core
video: fbdev: Fix potential NULL pointer dereference
fbcon: Silence fbcon logo on 'quiet' boots
printk: Export console_printk
ARM: dts: imx28-cfa10036: Fix the reset gpio signal polarity
video: ssd1307fb: Do not hard code active-low reset sequence
dt-bindings: display: ssd1307fb: Remove reset-active-low from examples
fbdev: fbmem: fix memory access if logo is bigger than the screen
video/fbdev: refactor video= cmdline parsing
fbdev: mbx: fix up debugfs file creation
fbdev: omap2: no need to check return value of debugfs_create functions
video: fbdev: geode: remove ifdef OLPC noise
video: offb: annotate implicit fall throughs
omapfb: fix typo
fbdev: Use of_node_name_eq for node name comparisons
fbcon: use kvmalloc() for scrollback buffer
fbdev: chipsfb: remove set but not used variable 'size'
fbdev/via: fix spelling mistake "Expandsion" -> "Expansion"

Linus Torvalds
2019-03-16 05:22:59 +0800

15 Mar, 2019

5 commits

a039480e9 tracing/probe: Verify alloc_trace_*probe() result ... Browse Code »

Since alloc_trace_*probe() returns -EINVAL only if !event && !group,
it should not happen in trace_*probe_create(). If we catch that case
there is a bug. So use WARN_ON_ONCE() instead of pr_info().

Link: http://lkml.kernel.org/r/155253785078.14922.16902223633734601469.stgit@devnote2

Signed-off-by: Masami Hiramatsu
Signed-off-by: Steven Rostedt (VMware)

Masami Hiramatsu
2019-03-15 07:54:21 +0800
5b7a96220 tracing/probe: Check event/group naming rule at parsing ... Browse Code »

Check event and group naming rule at parsing it instead
of allocating probes.

Link: http://lkml.kernel.org/r/155253784064.14922.2336893061156236237.stgit@devnote2

Signed-off-by: Masami Hiramatsu
Signed-off-by: Steven Rostedt (VMware)

Masami Hiramatsu
2019-03-15 07:54:11 +0800
b4443c17a tracing/probe: Check the size of argument name and body ... Browse Code »

Check the size of argument name and expression is not 0
and smaller than maximum length.

Link: http://lkml.kernel.org/r/155253783029.14922.12650939303827581096.stgit@devnote2

Signed-off-by: Masami Hiramatsu
Signed-off-by: Steven Rostedt (VMware)

Masami Hiramatsu
2019-03-15 07:53:57 +0800
dec65d79f tracing/probe: Check event name length correctly ... Browse Code »

Ensure given name of event is not too long when parsing it,
and fix to update event name offset correctly when the group
name is given. For example, this makes probe event to check
the "p:foo/" error case correctly.

Link: http://lkml.kernel.org/r/155253782046.14922.14724124823730168629.stgit@devnote2

Signed-off-by: Masami Hiramatsu
Signed-off-by: Steven Rostedt (VMware)

Masami Hiramatsu
2019-03-15 07:53:47 +0800
287c038c0 tracing/probe: Check maxactive error cases ... Browse Code »

Check maxactive on kprobe error case, because maxactive
is only for kretprobe, not for kprobe. Also, maxactive
should not be 0, it should be at least 1.

Link: http://lkml.kernel.org/r/155253780952.14922.15784129810238750331.stgit@devnote2

Signed-off-by: Masami Hiramatsu
Signed-off-by: Steven Rostedt (VMware)

Masami Hiramatsu
2019-03-15 07:53:29 +0800

14 Mar, 2019

1 commit

f6d85f04e blkcg: annotate implicit fall through ... Browse Code »

There is a plan to build the kernel with -Wimplicit-fallthrough and
this place in the code produced a warning (W=1).

This commit remove the following warning:

kernel/trace/blktrace.c:725:9: warning: this statement may fall through [-Wimplicit-fallthrough=]

Signed-off-by: Mathieu Malaterre
Signed-off-by: Jens Axboe

Mathieu Malaterre
2019-03-14 04:31:12 +0800

13 Mar, 2019

9 commits

31b265b3b tracing: kdb: Fix ftdump to not sleep ... Browse Code »

As reported back in 2016-11 [1], the "ftdump" kdb command triggers a
BUG for "sleeping function called from invalid context".

kdb's "ftdump" command wants to call ring_buffer_read_prepare() in
atomic context. A very simple solution for this is to add allocation
flags to ring_buffer_read_prepare() so kdb can call it without
triggering the allocation error. This patch does that.

Note that in the original email thread about this, it was suggested
that perhaps the solution for kdb was to either preallocate the buffer
ahead of time or create our own iterator. I'm hoping that this
alternative of adding allocation flags to ring_buffer_read_prepare()
can be considered since it means I don't need to duplicate more of the
core trace code into "trace_kdb.c" (for either creating my own
iterator or re-preparing a ring allocator whose memory was already
allocated).

NOTE: another option for kdb is to actually figure out how to make it
reuse the existing ftrace_dump() function and totally eliminate the
duplication. This sounds very appealing and actually works (the "sr
z" command can be seen to properly dump the ftrace buffer). The
downside here is that ftrace_dump() fully consumes the trace buffer.
Unless that is changed I'd rather not use it because it means "ftdump
| grep xyz" won't be very useful to search the ftrace buffer since it
will throw away the whole trace on the first grep. A future patch to
dump only the last few lines of the buffer will also be hard to
implement.

[1] https://lkml.kernel.org/r/20161117191605.GA21459@google.com

Link: http://lkml.kernel.org/r/20190308193205.213659-1-dianders@chromium.org

Reported-by: Brian Norris
Signed-off-by: Douglas Anderson
Signed-off-by: Steven Rostedt (VMware)

Douglas Anderson
2019-03-13 21:46:10 +0800
7b47a9e7c Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs mount infrastructure updates from Al Viro:
"The rest of core infrastructure; no new syscalls in that pile, but the
old parts are switched to new infrastructure. At that point
conversions of individual filesystems can happen independently; some
are done here (afs, cgroup, procfs, etc.), there's also a large series
outside of that pile dealing with NFS (quite a bit of option-parsing
stuff is getting used there - it's one of the most convoluted
filesystems in terms of mount-related logics), but NFS bits are the
next cycle fodder.

It got seriously simplified since the last cycle; documentation is
probably the weakest bit at the moment - I considered dropping the
commit introducing Documentation/filesystems/mount_api.txt (cutting
the size increase by quarter ;-), but decided that it would be better
to fix it up after -rc1 instead.

That pile allows to do followup work in independent branches, which
should make life much easier for the next cycle. fs/super.c size
increase is unpleasant; there's a followup series that allows to
shrink it considerably, but I decided to leave that until the next
cycle"

* 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (41 commits)
afs: Use fs_context to pass parameters over automount
afs: Add fs_context support
vfs: Add some logging to the core users of the fs_context log
vfs: Implement logging through fs_context
vfs: Provide documentation for new mount API
vfs: Remove kern_mount_data()
hugetlbfs: Convert to fs_context
cpuset: Use fs_context
kernfs, sysfs, cgroup, intel_rdt: Support fs_context
cgroup: store a reference to cgroup_ns into cgroup_fs_context
cgroup1_get_tree(): separate "get cgroup_root to use" into a separate helper
cgroup_do_mount(): massage calling conventions
cgroup: stash cgroup_root reference into cgroup_fs_context
cgroup2: switch to option-by-option parsing
cgroup1: switch to option-by-option parsing
cgroup: take options parsing into ->parse_monolithic()
cgroup: fold cgroup1_mount() into cgroup1_get_tree()
cgroup: start switching to fs_context
ipc: Convert mqueue fs to fs_context
proc: Add fs_context support to procfs
...

Linus Torvalds
2019-03-13 05:08:19 +0800
5f739e4a4 Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull misc vfs updates from Al Viro:
"Assorted fixes (really no common topic here)"

* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
vfs: Make __vfs_write() static
vfs: fix preadv64v2 and pwritev64v2 compat syscalls with offset == -1
pipe: stop using ->can_merge
splice: don't merge into linked buffers
fs: move generic stat response attr handling to vfs_getattr_nosec
orangefs: don't reinitialize result_mask in ->getattr
fs/devpts: always delete dcache dentry-s in dput()

Linus Torvalds
2019-03-13 04:27:20 +0800
a667cb7a9 Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge misc updates from Andrew Morton:

- a few misc things

- the rest of MM

- remove flex_arrays, replace with new simple radix-tree implementation

* emailed patches from Andrew Morton : (38 commits)
Drop flex_arrays
sctp: convert to genradix
proc: commit to genradix
generic radix trees
selinux: convert to kvmalloc
md: convert to kvmalloc
openvswitch: convert to kvmalloc
of: fix kmemleak crash caused by imbalance in early memory reservation
mm: memblock: update comments and kernel-doc
memblock: split checks whether a region should be skipped to a helper function
memblock: remove memblock_{set,clear}_region_flags
memblock: drop memblock_alloc_*_nopanic() variants
memblock: memblock_alloc_try_nid: don't panic
treewide: add checks for the return value of memblock_alloc*()
swiotlb: add checks for the return value of memblock_alloc*()
init/main: add checks for the return value of memblock_alloc*()
mm/percpu: add checks for the return value of memblock_alloc*()
sparc: add checks for the return value of memblock_alloc*()
ia64: add checks for the return value of memblock_alloc*()
arch: don't memset(0) memory returned by memblock_alloc()
...

Linus Torvalds
2019-03-13 01:39:53 +0800
26fb3dae0 memblock: drop memblock_alloc_*_nopanic() variants ... Browse Code »

As all the memblock allocation functions return NULL in case of error
rather than panic(), the duplicates with _nopanic suffix can be removed.

Link: http://lkml.kernel.org/r/1548057848-15136-22-git-send-email-rppt@linux.ibm.com
Signed-off-by: Mike Rapoport
Acked-by: Greg Kroah-Hartman
Reviewed-by: Petr Mladek [printk]
Cc: Catalin Marinas
Cc: Christophe Leroy
Cc: Christoph Hellwig
Cc: "David S. Miller"
Cc: Dennis Zhou
Cc: Geert Uytterhoeven
Cc: Greentime Hu
Cc: Guan Xuetao
Cc: Guo Ren
Cc: Guo Ren [c-sky]
Cc: Heiko Carstens
Cc: Juergen Gross [Xen]
Cc: Mark Salter
Cc: Matt Turner
Cc: Max Filippov
Cc: Michael Ellerman
Cc: Michal Simek
Cc: Paul Burton
Cc: Richard Weinberger
Cc: Rich Felker
Cc: Rob Herring
Cc: Rob Herring
Cc: Russell King
Cc: Stafford Horne
Cc: Tony Luck
Cc: Vineet Gupta
Cc: Yoshinori Sato
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mike Rapoport
2019-03-13 01:04:02 +0800
8a7f97b90 treewide: add checks for the return value of memblock_alloc*() ... Browse Code »

Add check for the return value of memblock_alloc*() functions and call
panic() in case of error. The panic message repeats the one used by
panicing memblock allocators with adjustment of parameters to include
only relevant ones.

The replacement was mostly automated with semantic patches like the one
below with manual massaging of format strings.

@@
expression ptr, size, align;
@@
ptr = memblock_alloc(size, align);
+ if (!ptr)
+ panic("%s: Failed to allocate %lu bytes align=0x%lx\n", __func__, size, align);

[anders.roxell@linaro.org: use '%pa' with 'phys_addr_t' type]
Link: http://lkml.kernel.org/r/20190131161046.21886-1-anders.roxell@linaro.org
[rppt@linux.ibm.com: fix format strings for panics after memblock_alloc]
Link: http://lkml.kernel.org/r/1548950940-15145-1-git-send-email-rppt@linux.ibm.com
[rppt@linux.ibm.com: don't panic if the allocation in sparse_buffer_init fails]
Link: http://lkml.kernel.org/r/20190131074018.GD28876@rapoport-lnx
[akpm@linux-foundation.org: fix xtensa printk warning]
Link: http://lkml.kernel.org/r/1548057848-15136-20-git-send-email-rppt@linux.ibm.com
Signed-off-by: Mike Rapoport
Signed-off-by: Anders Roxell
Reviewed-by: Guo Ren [c-sky]
Acked-by: Paul Burton [MIPS]
Acked-by: Heiko Carstens [s390]
Reviewed-by: Juergen Gross [Xen]
Reviewed-by: Geert Uytterhoeven [m68k]
Acked-by: Max Filippov [xtensa]
Cc: Catalin Marinas
Cc: Christophe Leroy
Cc: Christoph Hellwig
Cc: "David S. Miller"
Cc: Dennis Zhou
Cc: Greentime Hu
Cc: Greg Kroah-Hartman
Cc: Guan Xuetao
Cc: Guo Ren
Cc: Mark Salter
Cc: Matt Turner
Cc: Michael Ellerman
Cc: Michal Simek
Cc: Petr Mladek
Cc: Richard Weinberger
Cc: Rich Felker
Cc: Rob Herring
Cc: Rob Herring
Cc: Russell King
Cc: Stafford Horne
Cc: Tony Luck
Cc: Vineet Gupta
Cc: Yoshinori Sato
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mike Rapoport
2019-03-13 01:04:02 +0800
a0bf842e8 swiotlb: add checks for the return value of memblock_alloc*() ... Browse Code »

Add panic() calls if memblock_alloc() returns NULL.

The panic() format duplicates the one used by memblock itself and in
order to avoid explosion with long parameters list replace open coded
allocation size calculations with a local variable.

Link: http://lkml.kernel.org/r/1548057848-15136-19-git-send-email-rppt@linux.ibm.com
Signed-off-by: Mike Rapoport
Cc: Catalin Marinas
Cc: Christophe Leroy
Cc: Christoph Hellwig
Cc: "David S. Miller"
Cc: Dennis Zhou
Cc: Geert Uytterhoeven
Cc: Greentime Hu
Cc: Greg Kroah-Hartman
Cc: Guan Xuetao
Cc: Guo Ren
Cc: Guo Ren [c-sky]
Cc: Heiko Carstens
Cc: Juergen Gross [Xen]
Cc: Mark Salter
Cc: Matt Turner
Cc: Max Filippov
Cc: Michael Ellerman
Cc: Michal Simek
Cc: Paul Burton
Cc: Petr Mladek
Cc: Richard Weinberger
Cc: Rich Felker
Cc: Rob Herring
Cc: Rob Herring
Cc: Russell King
Cc: Stafford Horne
Cc: Tony Luck
Cc: Vineet Gupta
Cc: Yoshinori Sato
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mike Rapoport
2019-03-13 01:04:02 +0800
2bc4fc60f kernel/sysctl.c: define minmax conv functions in terms of non-minmax versions ... Browse Code »

do_proc_do[u]intvec_minmax_conv() had included open-coded versions of
do_proc_do[u]intvec_conv(); the duplication led to buggy inconsistencies
(missing range checks). To reduce the likelihood of such problems in the
future, we can instead refactor both to be defined in terms of their
non-bounded counterparts (plus the added check).

Link: http://lkml.kernel.org/r/20190207165138.5oud57vq4ozwb4kh@hatter.bewilderbeest.net
Signed-off-by: Zev Weiss
Cc: Brendan Higgins
Cc: Iurii Zaikin
Cc: Kees Cook
Cc: Luis Chamberlain
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Zev Weiss
2019-03-13 01:04:00 +0800
8cf7630b2 kernel/sysctl.c: add missing range check in do_proc_dointvec_minmax_conv ... Browse Code »

This bug has apparently existed since the introduction of this function
in the pre-git era (4500e91754d3 in Thomas Gleixner's history.git,
"[NET]: Add proc_dointvec_userhz_jiffies, use it for proper handling of
neighbour sysctls.").

As a minimal fix we can simply duplicate the corresponding check in
do_proc_dointvec_conv().

Link: http://lkml.kernel.org/r/20190207123426.9202-3-zev@bewilderbeest.net
Signed-off-by: Zev Weiss
Cc: Brendan Higgins
Cc: Iurii Zaikin
Cc: Kees Cook
Cc: Luis Chamberlain
Cc: [2.6.2+]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Zev Weiss
2019-03-13 01:04:00 +0800

12 Mar, 2019

3 commits

cede666e2 trace/probes: Remove kernel doc style from non kernel doc comment ... Browse Code »

CC kernel/trace/trace_kprobe.o
kernel/trace/trace_kprobe.c:41: warning: cannot understand function prototype: 'struct trace_kprobe '

The real problem is that a comment looked like kerneldoc when it shouldn't be...

Link: http://lkml.kernel.org/r/2812.1552381112@turing-police

Signed-off-by: Valdis Kletnieks
Signed-off-by: Steven Rostedt (VMware)

Valdis Klētnieks
2019-03-12 23:23:52 +0800
084162520 tracing/probes: Make reserved_field_names static ... Browse Code »

sparse complains:
CHECK kernel/trace/trace_probe.c
kernel/trace/trace_probe.c:16:12: warning: symbol 'reserved_field_names' was not declared. Should it be static?

Yes, it should be static.

Link: http://lkml.kernel.org/r/2478.1552380778@turing-police

Signed-off-by: Valdis Kletnieks
Signed-off-by: Steven Rostedt (VMware)

Valdis Klētnieks
2019-03-12 22:59:51 +0800
6cdfa54cd Merge tag 'trace-v5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace ... Browse Code »

Pull tracing updates from Steven Rostedt:
"The biggest change for this release is in the histogram code:

- Add "onchange(var)" histogram handler that executes a action when
$var changes.

- Add new "snapshot()" action for histogram handlers, that causes a
snapshot of the ring buffer when triggered. ie.
onchange(var).snapshot() will trigger a snapshot if var changes.

- Add alternative for "trace()" action. Currently, to trigger a
synthetic event, the name of that event is used as the handler
name, which is inconsistent with the other actions.
onchange(var).synthetic(param) where it can now be
onchange(var).trace(synthetic, param). The older method will still
be allowed, as long as the synthetic events do not overlap with
other handler names.

- The histogram documentation at testcases were updated for the new
changes.

Outside of the histogram code, we have:

- Added a quicker way to enable set_ftrace_filter files, that will
make it much quicker to bisect tracing a function that shouldn't be
traced and crashes the kernel. (You can echo in numbers to
set_ftrace_filter, and it will select the corresponding function
that is in available_filter_functions).

- Some better displaying of the tracing data (and more information
was added).

The rest are small fixes and more clean ups to the code"

* tag 'trace-v5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (37 commits)
tracing: Use strncpy instead of memcpy when copying comm in trace.c
tracing: Use strncpy instead of memcpy when copying comm for hist triggers
tracing: Use strncpy instead of memcpy for string keys in hist triggers
tracing: Use str_has_prefix() in synth_event_create()
x86/ftrace: Fix warning and considate ftrace_jmp_replace() and ftrace_call_replace()
tracing/perf: Use strndup_user() instead of buggy open-coded version
doc: trace: Fix documentation for uprobe_profile
tracing: Fix spelling mistake: "analagous" -> "analogous"
tracing: Comment why cond_snapshot is checked outside of max_lock protection
tracing: Add hist trigger action 'expected fail' test case
tracing: Add alternative synthetic event trace action test case
tracing: Add hist trigger onchange() handler test case
tracing: Add hist trigger snapshot() action test case
tracing: Add SPDX license GPL-2.0 license identifier to inter-event testcases
tracing: Add alternative synthetic event trace action syntax
tracing: Add hist trigger onchange() handler Documentation
tracing: Add hist trigger onchange() handler
tracing: Add hist trigger snapshot() action Documentation
tracing: Add hist trigger snapshot() action
tracing: Add conditional snapshot
...

Linus Torvalds
2019-03-12 08:01:32 +0800

11 Mar, 2019

7 commits

8f49a658b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Pull networking fixes from David Miller:
"First batch of fixes in the new merge window:

1) Double dst_cache free in act_tunnel_key, from Wenxu.

2) Avoid NULL deref in IN_DEV_MFORWARD() by failing early in the
ip_route_input_rcu() path, from Paolo Abeni.

3) Fix appletalk compile regression, from Arnd Bergmann.

4) If SLAB objects reach the TCP sendpage method we are in serious
trouble, so put a debugging check there. From Vasily Averin.

5) Memory leak in hsr layer, from Mao Wenan.

6) Only test GSO type on GSO packets, from Willem de Bruijn.

7) Fix crash in xsk_diag_put_umem(), from Eric Dumazet.

8) Fix VNIC mailbox length in nfp, from Dirk van der Merwe.

9) Fix race in ipv4 route exception handling, from Xin Long.

10) Missing DMA memory barrier in hns3 driver, from Jian Shen.

11) Use after free in __tcf_chain_put(), from Vlad Buslov.

12) Handle inet_csk_reqsk_queue_add() failures, from Guillaume Nault.

13) Return value correction when ip_mc_may_pull() fails, from Eric
Dumazet.

14) Use after free in x25_device_event(), also from Eric"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (72 commits)
gro_cells: make sure device is up in gro_cells_receive()
vxlan: test dev->flags & IFF_UP before calling gro_cells_receive()
net/x25: fix use-after-free in x25_device_event()
isdn: mISDNinfineon: fix potential NULL pointer dereference
net: hns3: fix to stop multiple HNS reset due to the AER changes
ip: fix ip_mc_may_pull() return value
net: keep refcount warning in reqsk_free()
net: stmmac: Avoid one more sometimes uninitialized Clang warning
net: dsa: mv88e6xxx: Set correct interface mode for CPU/DSA ports
rxrpc: Fix client call queueing, waiting for channel
tcp: handle inet_csk_reqsk_queue_add() failures
net: ethernet: sun: Zero initialize class in default case in niu_add_ethtool_tcam_entry
8139too : Add support for U.S. Robotics USR997901A 10/100 Cardbus NIC
fou, fou6: avoid uninit-value in gue_err() and gue6_err()
net: sched: fix potential use-after-free in __tcf_chain_put()
vhost: silence an unused-variable warning
vsock/virtio: fix kernel panic from virtio_transport_reset_no_sock
connector: fix unsafe usage of ->real_parent
vxlan: do not need BH again in vxlan_cleanup()
net: hns3: add dma_rmb() for rx description
...

Linus Torvalds
2019-03-11 23:54:01 +0800
ffd602eb4 Merge tag 'kbuild-v5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild ... Browse Code »

Pull Kbuild updates from Masahiro Yamada:

- do not generate unneeded top-level built-in.a

- let git ignore O= directory entirely

- optimize scripts/kallsyms slightly

- exclude DWARF info from *.s regardless of config options

- fix GCC toolchain search path for Clang to prepare ld.lld support

- do not generate modules.order when CONFIG_MODULES is disabled

- simplify single target rules and remove VPATH for external module
build

- allow to add optional flags to dpkg-buildpackage when building
deb-pkg

- move some compiler option tests from Makefile to Kconfig

- various Makefile cleanups

* tag 'kbuild-v5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (40 commits)
kbuild: remove scripts/basic/% build target
kbuild: use -Werror=implicit-... instead of -Werror-implicit-...
kbuild: clean up scripts/gcc-version.sh
kbuild: remove cc-version macro
kbuild: update comment block of scripts/clang-version.sh
kbuild: remove commented-out INITRD_COMPRESS
kbuild: move -gsplit-dwarf, -gdwarf-4 option tests to Kconfig
kbuild: [bin]deb-pkg: add DPKG_FLAGS variable
kbuild: move ".config not found!" message from Kconfig to Makefile
kbuild: invoke syncconfig if include/config/auto.conf.cmd is missing
kbuild: simplify single target rules
kbuild: remove empty rules for makefiles
kbuild: make -r/-R effective in top Makefile for old Make versions
kbuild: move tools_silent to a more relevant place
kbuild: compute false-positive -Wmaybe-uninitialized cases in Kconfig
kbuild: refactor cc-cross-prefix implementation
kbuild: hardcode genksyms path and remove GENKSYMS variable
scripts/gdb: refactor rules for symlink creation
kbuild: create symlink to vmlinux-gdb.py in scripts_gdb target
scripts/gdb: do not descend into scripts/gdb from scripts
...

Linus Torvalds
2019-03-11 08:48:21 +0800
12ad143e1 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull perf updates from Thomas Gleixner:
"Perf updates and fixes:

Kernel:
- Handle events which have the bpf_event attribute set as side band
events as they carry information about BPF programs.
- Add missing switch-case fall-through comments

Libraries:
- Fix leaks and double frees in error code paths.
- Prevent buffer overflows in libtraceevent

Tools:
- Improvements in handling Intel BT/PTS
- Add BTF ELF markers to perf trace BPF programs to improve output
- Support --time, --cpu, --pid and --tid filters for perf diff
- Calculate the column width in perf annotate as the hardcoded 6
characters for the instruction are not sufficient
- Small fixes all over the place"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits)
perf/core: Mark expected switch fall-through
perf/x86/intel/uncore: Fix client IMC events return huge result
perf/ring_buffer: Use high order allocations for AUX buffers optimistically
perf data: Force perf_data__open|close zero data->file.path
perf session: Fix double free in perf_data__close
perf evsel: Probe for precise_ip with simple attr
perf tools: Read and store caps/max_precise in perf_pmu
perf hist: Fix memory leak of srcline
perf hist: Add error path into hist_entry__init
perf c2c: Fix c2c report for empty numa node
perf script python: Add Python3 support to intel-pt-events.py
perf script python: Add Python3 support to event_analyzing_sample.py
perf script python: add Python3 support to check-perf-trace.py
perf script python: Add Python3 support to futex-contention.py
perf script python: Remove mixed indentation
perf diff: Support --pid/--tid filter options
perf diff: Support --cpu filter option
perf diff: Support --time filter option
perf thread: Generalize function to copy from thread addr space from intel-bts code
perf annotate: Calculate the max instruction name, align column to that
...

Linus Torvalds
2019-03-11 06:22:03 +0800
9e55f87c0 Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull locking fixes from Thomas Gleixner:
"A few fixes for lockdep:

- initialize lockdep internal RCU head after initializing RCU

- prevent use after free in a alloc_workqueue() error handling path

- plug a memory leak in the workqueue core which fails to free a
dynamically allocated lock name.

- make Clang happy"

* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
workqueue, lockdep: Fix a memory leak in wq->lock_name
workqueue, lockdep: Fix an alloc_workqueue() error path
locking/lockdep: Only call init_rcu_head() after RCU has been initialized
locking/lockdep: Avoid a Clang warning

Linus Torvalds
2019-03-11 04:48:14 +0800
077d3dafe Merge branch 'core-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull watchdog core update from Thomas Gleixner:
"A single commit adding a command line parameter which allows to set
the watchdog threshold on the kernel command-line, so kernels with
massive debug facilities enabled won't trigger the watchdog during
early boot and before the threshold can be changed via sysctl"

* 'core-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
watchdog/core: Add watchdog_thresh command line parameter

Linus Torvalds
2019-03-11 04:46:08 +0800
45ba8d5d0 Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost ... Browse Code »

Pull virtio updates from Michael Tsirkin:
"Several fixes, most notably fix for virtio on swiotlb systems"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vhost: silence an unused-variable warning
virtio: hint if callbacks surprisingly might sleep
virtio-ccw: wire up ->bus_name callback
s390/virtio: handle find on invalid queue gracefully
virtio-ccw: diag 500 may return a negative cookie
virtio_balloon: remove the unnecessary 0-initialization
virtio-balloon: improve update_balloon_size_func
virtio-blk: Consider virtio_max_dma_size() for maximum segment size
virtio: Introduce virtio_max_dma_size()
dma: Introduce dma_max_mapping_size()
swiotlb: Add is_swiotlb_active() function
swiotlb: Introduce swiotlb_max_mapping_size()

Linus Torvalds
2019-03-11 03:47:57 +0800
b7a7d1c1e Merge tag 'dma-mapping-5.1' of git://git.infradead.org/users/hch/dma-mapping ... Browse Code »

Pull DMA mapping updates from Christoph Hellwig:

- add debugfs support for dumping dma-debug information (Corentin
Labbe)

- Kconfig cleanups (Andy Shevchenko and me)

- debugfs cleanups (Greg Kroah-Hartman)

- improve dma_map_resource and use it in the media code

- arch_setup_dma_ops / arch_teardown_dma_ops cleanups

- various small cleanups and improvements for the per-device coherent
allocator

- make the DMA mask an upper bound and don't fail "too large" dma mask
in the remaning two architectures - this will allow big driver
cleanups in the following merge windows

* tag 'dma-mapping-5.1' of git://git.infradead.org/users/hch/dma-mapping: (21 commits)
Documentation/DMA-API-HOWTO: update dma_mask sections
sparc64/pci_sun4v: allow large DMA masks
sparc64/iommu: allow large DMA masks
sparc64: refactor the ali DMA quirk
ccio: allow large DMA masks
dma-mapping: remove the DMA_MEMORY_EXCLUSIVE flag
dma-mapping: remove dma_mark_declared_memory_occupied
dma-mapping: move CONFIG_DMA_CMA to kernel/dma/Kconfig
dma-mapping: improve selection of dma_declare_coherent availability
dma-mapping: remove an incorrect __iommem annotation
of: select OF_RESERVED_MEM automatically
device.h: dma_mem is only needed for HAVE_GENERIC_DMA_COHERENT
mfd/sm501: depend on HAS_DMA
dma-mapping: add a kconfig symbol for arch_teardown_dma_ops availability
dma-mapping: add a kconfig symbol for arch_setup_dma_ops availability
dma-mapping: move debug configuration options to kernel/dma
dma-debug: add dumping facility via debugfs
dma: debug: no need to check return value of debugfs_create functions
videobuf2: replace a layering violation with dma_map_resource
dma-mapping: don't BUG when calling dma_map_resource on RAM
...

Linus Torvalds
2019-03-11 02:54:48 +0800

10 Mar, 2019

3 commits

a50243b1d Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma ... Browse Code »

Pull rdma updates from Jason Gunthorpe:
"This has been a slightly more active cycle than normal with ongoing
core changes and quite a lot of collected driver updates.

- Various driver fixes for bnxt_re, cxgb4, hns, mlx5, pvrdma, rxe

- A new data transfer mode for HFI1 giving higher performance

- Significant functional and bug fix update to the mlx5
On-Demand-Paging MR feature

- A chip hang reset recovery system for hns

- Change mm->pinned_vm to an atomic64

- Update bnxt_re to support a new 57500 chip

- A sane netlink 'rdma link add' method for creating rxe devices and
fixing the various unregistration race conditions in rxe's
unregister flow

- Allow lookup up objects by an ID over netlink

- Various reworking of the core to driver interface:
- drivers should not assume umem SGLs are in PAGE_SIZE chunks
- ucontext is accessed via udata not other means
- start to make the core code responsible for object memory
allocation
- drivers should convert struct device to struct ib_device via a
helper
- drivers have more tools to avoid use after unregister problems"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (280 commits)
net/mlx5: ODP support for XRC transport is not enabled by default in FW
IB/hfi1: Close race condition on user context disable and close
RDMA/umem: Revert broken 'off by one' fix
RDMA/umem: minor bug fix in error handling path
RDMA/hns: Use GFP_ATOMIC in hns_roce_v2_modify_qp
cxgb4: kfree mhp after the debug print
IB/rdmavt: Fix concurrency panics in QP post_send and modify to error
IB/rdmavt: Fix loopback send with invalidate ordering
IB/iser: Fix dma_nents type definition
IB/mlx5: Set correct write permissions for implicit ODP MR
bnxt_re: Clean cq for kernel consumers only
RDMA/uverbs: Don't do double free of allocated PD
RDMA: Handle ucontext allocations by IB/core
RDMA/core: Fix a WARN() message
bnxt_re: fix the regression due to changes in alloc_pbl
IB/mlx4: Increase the timeout for CM cache
IB/core: Abort page fault handler silently during owning process exit
IB/mlx5: Validate correct PD before prefetch MR
IB/mlx5: Protect against prefetch of invalid MR
RDMA/uverbs: Store PR pointer before it is overwritten
...

Linus Torvalds
2019-03-10 07:53:03 +0800
c4703acd6 Merge tag 'printk-for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk ... Browse Code »

Pull printk updates from Petr Mladek:

- Allow to sort mixed lines by an extra information about the caller

- Remove no longer used LOG_PREFIX.

- Some clean up and documentation update.

* tag 'printk-for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk:
printk/docs: Add extra integer types to printk-formats
printk: Remove no longer used LOG_PREFIX.
lib/vsprintf: Remove %pCr remnant in comment
printk: Pass caller information to log_store().
printk: Add caller information to printk() output.

Linus Torvalds
2019-03-10 01:22:42 +0800
b339da480 Merge tag 'perf-core-for-mingo-5.1-20190307' of git://git.kernel.org/pub/scm/lin… ... Browse Code »

…ux/kernel/git/acme/linux into perf/urgent

Pull perf/core changes from Arnaldo Carvalho de Melo:

perf bpf:

Arnaldo Carvalho de Melo:

- Automatically add BTF ELF markers to 'perf trace' BPF programs, so that
tools such as 'bpftool map dump' can pretty print map keys and values.

perf c2c:

Jiri Olsa:

- Fix report for empty NUMA node.

perf diff:

Jin Yao:

- Support --time, --cpu, --pid and --tid filter options.

perf probe:

Arnaldo Carvalho de Melo:

- Clarify error message about not finding kernel modules debuginfo.

perf record:

Jiri Olsa:

- Fixup probing for max attr.precise_ip.

perf trace:

Arnaldo Carvalho de Melo:

- Add missing %s lost in the 'msg_flags' recvmmsg arg when adding prefix suppression logic.

perf annotate:

Arnaldo Carvalho de Melo:

- Calculate the max instruction name, align column to that, removing the
hardcoded max 6 chars and cope with instructions with names longer than that,
such as vpmovmskb, vpcmpeqb, etc.

kernel:

Song Liu:

- Consider events with attr.bpf_event set as side-band.

Gustavo A. R. Silva:

- Mark expected switch fall-through in perf_event_parse_addr_filter().

Libraries:

Jiri Olsa:

- Fix leaks and double frees on error paths.

libtraceevent:

Tony Jones:

- Fix buffer overflow in arg_eval().

python scripting:

Tony Jones:

- More python3 fixes.

Trivial:

Yang Wei:

- Remove needless extra semicolon in clang C++ glue code.

Intel PT/BTS:

Adrian Hunter:

- Improve auxtrace address filter error message when there is no DSO.

- Fix divide by zero when TSC is not available.

- Further improvements to the export to sqlite/posgresql python scripts
and to the GUI sqlviewer, exporting 'parent_id' so that we have enable
the creation of call trees.

Andi Kleen:

- Generalize function to copy from thread addr space from intel-bts code.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Ingo Molnar
2019-03-10 00:00:17 +0800

09 Mar, 2019

7 commits

69a106c00 workqueue, lockdep: Fix a memory leak in wq->lock_name ... Browse Code »

The following commit:

669de8bda87b ("kernel/workqueue: Use dynamic lockdep keys for workqueues")

introduced a memory leak as wq_free_lockdep() calls kfree(wq->lock_name),
but wq_init_lockdep() does not point wq->lock_name to the newly allocated
slab object.

This can be reproduced by running LTP fallocate04 followed by oom01 tests:

unreferenced object 0xc0000005876384d8 (size 64):
comm "fallocate04", pid 26972, jiffies 4297139141 (age 40370.480s)
hex dump (first 32 bytes):
28 77 71 5f 63 6f 6d 70 6c 65 74 69 6f 6e 29 65 (wq_completion)e
78 74 34 2d 72 73 76 2d 63 6f 6e 76 65 72 73 69 xt4-rsv-conversi
backtrace:
[] kvasprintf+0x6c/0xe0
[] kasprintf+0x34/0x60
[] alloc_workqueue+0x1f8/0x6ac
[] ext4_fill_super+0x23d4/0x3c80 [ext4]
[] mount_bdev+0x25c/0x290
[] ext4_mount+0x28/0x50 [ext4]
[] legacy_get_tree+0x4c/0xb0
[] vfs_get_tree+0x6c/0x190
[] do_mount+0xb9c/0x1100
[] ksys_mount+0x158/0x180
[] sys_mount+0x20/0x30
[] system_call+0x5c/0x70

Signed-off-by: Qian Cai
Signed-off-by: Peter Zijlstra (Intel)
Reviewed-by: Bart Van Assche
Cc: Andrew Morton
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Dave Hansen
Cc: H. Peter Anvin
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Rik van Riel
Cc: Thomas Gleixner
Cc: Will Deacon
Cc: catalin.marinas@arm.com
Cc: jiangshanlai@gmail.com
Cc: tj@kernel.org
Fixes: 669de8bda87b ("kernel/workqueue: Use dynamic lockdep keys for workqueues")
Link: https://lkml.kernel.org/r/20190307002731.47371-1-cai@lca.pw
Signed-off-by: Ingo Molnar

Qian Cai
2019-03-09 21:15:52 +0800
009bb421b workqueue, lockdep: Fix an alloc_workqueue() error path ... Browse Code »

This patch fixes a use-after-free and a memory leak in an alloc_workqueue()
error path.

Repoted by syzkaller and KASAN:

BUG: KASAN: use-after-free in __read_once_size include/linux/compiler.h:197 [inline]
BUG: KASAN: use-after-free in lockdep_register_key+0x3b9/0x490 kernel/locking/lockdep.c:1023
Read of size 8 at addr ffff888090fc2698 by task syz-executor134/7858

CPU: 1 PID: 7858 Comm: syz-executor134 Not tainted 5.0.0-rc8-next-20190301 #1
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:187
kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
__read_once_size include/linux/compiler.h:197 [inline]
lockdep_register_key+0x3b9/0x490 kernel/locking/lockdep.c:1023
wq_init_lockdep kernel/workqueue.c:3444 [inline]
alloc_workqueue+0x427/0xe70 kernel/workqueue.c:4263
ucma_open+0x76/0x290 drivers/infiniband/core/ucma.c:1732
misc_open+0x398/0x4c0 drivers/char/misc.c:141
chrdev_open+0x247/0x6b0 fs/char_dev.c:417
do_dentry_open+0x488/0x1160 fs/open.c:771
vfs_open+0xa0/0xd0 fs/open.c:880
do_last fs/namei.c:3416 [inline]
path_openat+0x10e9/0x46e0 fs/namei.c:3533
do_filp_open+0x1a1/0x280 fs/namei.c:3563
do_sys_open+0x3fe/0x5d0 fs/open.c:1063
__do_sys_openat fs/open.c:1090 [inline]
__se_sys_openat fs/open.c:1084 [inline]
__x64_sys_openat+0x9d/0x100 fs/open.c:1084
do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Allocated by task 7789:
save_stack+0x45/0xd0 mm/kasan/common.c:75
set_track mm/kasan/common.c:87 [inline]
__kasan_kmalloc mm/kasan/common.c:497 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:470
kasan_kmalloc+0x9/0x10 mm/kasan/common.c:511
__do_kmalloc mm/slab.c:3726 [inline]
__kmalloc+0x15c/0x740 mm/slab.c:3735
kmalloc include/linux/slab.h:553 [inline]
kzalloc include/linux/slab.h:743 [inline]
alloc_workqueue+0x13c/0xe70 kernel/workqueue.c:4236
ucma_open+0x76/0x290 drivers/infiniband/core/ucma.c:1732
misc_open+0x398/0x4c0 drivers/char/misc.c:141
chrdev_open+0x247/0x6b0 fs/char_dev.c:417
do_dentry_open+0x488/0x1160 fs/open.c:771
vfs_open+0xa0/0xd0 fs/open.c:880
do_last fs/namei.c:3416 [inline]
path_openat+0x10e9/0x46e0 fs/namei.c:3533
do_filp_open+0x1a1/0x280 fs/namei.c:3563
do_sys_open+0x3fe/0x5d0 fs/open.c:1063
__do_sys_openat fs/open.c:1090 [inline]
__se_sys_openat fs/open.c:1084 [inline]
__x64_sys_openat+0x9d/0x100 fs/open.c:1084
do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 7789:
save_stack+0x45/0xd0 mm/kasan/common.c:75
set_track mm/kasan/common.c:87 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:459
kasan_slab_free+0xe/0x10 mm/kasan/common.c:467
__cache_free mm/slab.c:3498 [inline]
kfree+0xcf/0x230 mm/slab.c:3821
alloc_workqueue+0xc3e/0xe70 kernel/workqueue.c:4295
ucma_open+0x76/0x290 drivers/infiniband/core/ucma.c:1732
misc_open+0x398/0x4c0 drivers/char/misc.c:141
chrdev_open+0x247/0x6b0 fs/char_dev.c:417
do_dentry_open+0x488/0x1160 fs/open.c:771
vfs_open+0xa0/0xd0 fs/open.c:880
do_last fs/namei.c:3416 [inline]
path_openat+0x10e9/0x46e0 fs/namei.c:3533
do_filp_open+0x1a1/0x280 fs/namei.c:3563
do_sys_open+0x3fe/0x5d0 fs/open.c:1063
__do_sys_openat fs/open.c:1090 [inline]
__se_sys_openat fs/open.c:1084 [inline]
__x64_sys_openat+0x9d/0x100 fs/open.c:1084
do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe

The buggy address belongs to the object at ffff888090fc2580
which belongs to the cache kmalloc-512 of size 512
The buggy address is located 280 bytes inside of
512-byte region [ffff888090fc2580, ffff888090fc2780)

Reported-by: syzbot+17335689e239ce135d8b@syzkaller.appspotmail.com
Signed-off-by: Bart Van Assche
Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Dave Hansen
Cc: H. Peter Anvin
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Rik van Riel
Cc: Thomas Gleixner
Cc: Will Deacon
Fixes: 669de8bda87b ("kernel/workqueue: Use dynamic lockdep keys for workqueues")
Link: https://lkml.kernel.org/r/20190303220046.29448-1-bvanassche@acm.org
Signed-off-by: Ingo Molnar

Bart Van Assche
2019-03-09 21:15:52 +0800
0126574fc locking/lockdep: Only call init_rcu_head() after RCU has been initialized ... Browse Code »

init_data_structures_once() is called for the first time before RCU has
been initialized. Make sure that init_rcu_head() is called before the
RCU head is used and after RCU has been initialized.

Signed-off-by: Bart Van Assche
Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Dave Hansen
Cc: H. Peter Anvin
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Rik van Riel
Cc: Thomas Gleixner
Cc: Will Deacon
Cc: longman@redhat.com
Link: https://lkml.kernel.org/r/c20aa0f0-42ab-a884-d931-7d4ec2bf0cdc@acm.org
Signed-off-by: Ingo Molnar

Bart Van Assche
2019-03-09 21:15:51 +0800
3fe7522fb locking/lockdep: Avoid a Clang warning ... Browse Code »

Clang warns about a tentative array definition without a length:

kernel/locking/lockdep.c:845:12: error: tentative array definition assumed to have one element [-Werror]

There is no real reason to do this here, so just set the same length as
in the real definition later in the same file. It has to be hidden in
an #ifdef or annotated __maybe_unused though, to avoid the unused-variable
warning if CONFIG_PROVE_LOCKING is disabled.

Signed-off-by: Arnd Bergmann
Signed-off-by: Peter Zijlstra (Intel)
Cc: Alexander Shishkin
Cc: Andrew Morton
Cc: Andy Lutomirski
Cc: Arnaldo Carvalho de Melo
Cc: Bart Van Assche
Cc: Borislav Petkov
Cc: Dave Hansen
Cc: Frederic Weisbecker
Cc: H. Peter Anvin
Cc: Jiri Olsa
Cc: Joel Fernandes (Google)
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Rik van Riel
Cc: Stephane Eranian
Cc: Steven Rostedt (VMware)
Cc: Tetsuo Handa
Cc: Thomas Gleixner
Cc: Vince Weaver
Cc: Waiman Long
Cc: Will Deacon
Link: https://lkml.kernel.org/r/20190307075222.3424524-1-arnd@arndb.de
Signed-off-by: Ingo Molnar

Arnd Bergmann
2019-03-09 21:15:51 +0800
43aa378b4 perf/core: Mark expected switch fall-through ... Browse Code »

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

This patch fixes the following warning:

kernel/events/core.c: In function ‘perf_event_parse_addr_filter’:
kernel/events/core.c:9154:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
kernel = 1;
~~~~~~~^~~
kernel/events/core.c:9156:3: note: here
case IF_SRC_FILEADDR:
^~~~

Warning level 3 was used: -Wimplicit-fallthrough=3

This patch is part of the ongoing efforts to enable -Wimplicit-fallthrough.

Signed-off-by: Gustavo A. R. Silva
Signed-off-by: Peter Zijlstra (Intel)
Cc: Alexander Shishkin
Cc: Andy Lutomirski
Cc: Arnaldo Carvalho de Melo
Cc: Arnaldo Carvalho de Melo
Cc: Borislav Petkov
Cc: Dave Hansen
Cc: H. Peter Anvin
Cc: Jiri Olsa
Cc: Kees Cook
Cc: Linus Torvalds
Cc: Namhyung Kim
Cc: Peter Zijlstra
Cc: Rik van Riel
Cc: Stephane Eranian
Cc: Thomas Gleixner
Cc: Vince Weaver
Link: https://lkml.kernel.org/r/20190212205430.GA8446@embeddedor
Signed-off-by: Ingo Molnar

Gustavo A. R. Silva
2019-03-09 21:10:32 +0800
5768402fd perf/ring_buffer: Use high order allocations for AUX buffers optimistically ... Browse Code »

Currently, the AUX buffer allocator will use high-order allocations
for PMUs that don't support hardware scatter-gather chaining to ensure
large contiguous blocks of pages, and always use an array of single
pages otherwise.

There is, however, a tangible performance benefit in using larger chunks
of contiguous memory even in the latter case, that comes from not having
to fetch the next page's address at every page boundary. In particular,
a task running under Intel PT on an Atom CPU shows 1.5%-2% less runtime
penalty with a single multi-page output region in snapshot mode (no PMI)
than with multiple single-page output regions, from ~6% down to ~4%. For
the snapshot mode it does make a difference as it is intended to run over
long periods of time.

For this reason, change the allocation policy to always optimistically
start with the highest possible order when allocating pages for the AUX
buffer, desceding until the allocation succeeds or order zero allocation
fails.

Signed-off-by: Alexander Shishkin
Signed-off-by: Peter Zijlstra (Intel)
Cc: Andy Lutomirski
Cc: Arnaldo Carvalho de Melo
Cc: Borislav Petkov
Cc: Dave Hansen
Cc: H. Peter Anvin
Cc: Jiri Olsa
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Rik van Riel
Cc: Stephane Eranian
Cc: Thomas Gleixner
Cc: Vince Weaver
Link: https://lkml.kernel.org/r/20190215114727.62648-2-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar

Alexander Shishkin
2019-03-09 21:10:30 +0800
38e7571c0 Merge tag 'io_uring-2019-03-06' of git://git.kernel.dk/linux-block ... Browse Code »

Pull io_uring IO interface from Jens Axboe:
"Second attempt at adding the io_uring interface.

Since the first one, we've added basic unit testing of the three
system calls, that resides in liburing like the other unit tests that
we have so far. It'll take a while to get full coverage of it, but
we're working towards it. I've also added two basic test programs to
tools/io_uring. One uses the raw interface and has support for all the
various features that io_uring supports outside of standard IO, like
fixed files, fixed IO buffers, and polled IO. The other uses the
liburing API, and is a simplified version of cp(1).

This adds support for a new IO interface, io_uring.

io_uring allows an application to communicate with the kernel through
two rings, the submission queue (SQ) and completion queue (CQ) ring.
This allows for very efficient handling of IOs, see the v5 posting for
some basic numbers:

https://lore.kernel.org/linux-block/20190116175003.17880-1-axboe@kernel.dk/

Outside of just efficiency, the interface is also flexible and
extendable, and allows for future use cases like the upcoming NVMe
key-value store API, networked IO, and so on. It also supports async
buffered IO, something that we've always failed to support in the
kernel.

Outside of basic IO features, it supports async polled IO as well.
This particular feature has already been tested at Facebook months ago
for flash storage boxes, with 25-33% improvements. It makes polled IO
actually useful for real world use cases, where even basic flash sees
a nice win in terms of efficiency, latency, and performance. These
boxes were IOPS bound before, now they are not.

This series adds three new system calls. One for setting up an
io_uring instance (io_uring_setup(2)), one for submitting/completing
IO (io_uring_enter(2)), and one for aux functions like registrating
file sets, buffers, etc (io_uring_register(2)). Through the help of
Arnd, I've coordinated the syscall numbers so merge on that front
should be painless.

Jon did a writeup of the interface a while back, which (except for
minor details that have been tweaked) is still accurate. Find that
here:

https://lwn.net/Articles/776703/

Huge thanks to Al Viro for helping getting the reference cycle code
correct, and to Jann Horn for his extensive reviews focused on both
security and bugs in general.

There's a userspace library that provides basic functionality for
applications that don't need or want to care about how to fiddle with
the rings directly. It has helpers to allow applications to easily set
up an io_uring instance, and submit/complete IO through it without
knowing about the intricacies of the rings. It also includes man pages
(thanks to Jeff Moyer), and will continue to grow support helper
functions and features as time progresses. Find it here:

git://git.kernel.dk/liburing

Fio has full support for the raw interface, both in the form of an IO
engine (io_uring), but also with a small test application (t/io_uring)
that can exercise and benchmark the interface"

* tag 'io_uring-2019-03-06' of git://git.kernel.dk/linux-block:
io_uring: add a few test tools
io_uring: allow workqueue item to handle multiple buffered requests
io_uring: add support for IORING_OP_POLL
io_uring: add io_kiocb ref count
io_uring: add submission polling
io_uring: add file set registration
net: split out functions related to registering inflight socket files
io_uring: add support for pre-mapped user IO buffers
block: implement bio helper to add iter bvec pages to bio
io_uring: batch io_kiocb allocation
io_uring: use fget/fput_many() for file references
fs: add fget_many() and fput_many()
io_uring: support for IO polling
io_uring: add fsync support
Add io_uring IO interface

Linus Torvalds
2019-03-09 06:48:40 +0800