17 Mar, 2019

3 commits

  • Pull pidfd system call from Christian Brauner:
    "This introduces the ability to use file descriptors from /proc//
    as stable handles on struct pid. Even if a pid is recycled the handle
    will not change. For a start these fds can be used to send signals to
    the processes they refer to.

    With the ability to use /proc/ fds as stable handles on struct
    pid we can fix a long-standing issue where after a process has exited
    its pid can be reused by another process. If a caller sends a signal
    to a reused pid it will end up signaling the wrong process.

    With this patchset we enable a variety of use cases. One obvious
    example is that we can now safely delegate an important part of
    process management - sending signals - to processes other than the
    parent of a given process by sending file descriptors around via scm
    rights and not fearing that the given process will have been recycled
    in the meantime. It also allows for easy testing whether a given
    process is still alive or not by sending signal 0 to a pidfd which is
    quite handy.

    There has been some interest in this feature e.g. from systems
    management (systemd, glibc) and container managers. I have requested
    and gotten comments from glibc to make sure that this syscall is
    suitable for their needs as well. In the future I expect it to take on
    most other pid-based signal syscalls. But such features are left for
    the future once they are needed.

    This has been sitting in linux-next for quite a while and has not
    caused any issues. It comes with selftests which verify basic
    functionality and also test that a recycled pid cannot be signaled via
    a pidfd.

    Jon has written about a prior version of this patchset. It should
    cover the basic functionality since not a lot has changed since then:

    https://lwn.net/Articles/773459/

    The commit message for the syscall itself is extensively documenting
    the syscall, including it's functionality and extensibility"

    * tag 'pidfd-v5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
    selftests: add tests for pidfd_send_signal()
    signal: add pidfd_send_signal() syscall

    Linus Torvalds
     
  • Pull device-dax updates from Dan Williams:
    "New device-dax infrastructure to allow persistent memory and other
    "reserved" / performance differentiated memories, to be assigned to
    the core-mm as "System RAM".

    Some users want to use persistent memory as additional volatile
    memory. They are willing to cope with potential performance
    differences, for example between DRAM and 3D Xpoint, and want to use
    typical Linux memory management apis rather than a userspace memory
    allocator layered over an mmap() of a dax file. The administration
    model is to decide how much Persistent Memory (pmem) to use as System
    RAM, create a device-dax-mode namespace of that size, and then assign
    it to the core-mm. The rationale for device-dax is that it is a
    generic memory-mapping driver that can be layered over any "special
    purpose" memory, not just pmem. On subsequent boots udev rules can be
    used to restore the memory assignment.

    One implication of using pmem as RAM is that mlock() no longer keeps
    data off persistent media. For this reason it is recommended to enable
    NVDIMM Security (previously merged for 5.0) to encrypt pmem contents
    at rest. We considered making this recommendation an actively enforced
    requirement, but in the end decided to leave it as a distribution /
    administrator policy to allow for emulation and test environments that
    lack security capable NVDIMMs.

    Summary:

    - Replace the /sys/class/dax device model with /sys/bus/dax, and
    include a compat driver so distributions can opt-in to the new ABI.

    - Allow for an alternative driver for the device-dax address-range

    - Introduce the 'kmem' driver to hotplug / assign a device-dax
    address-range to the core-mm.

    - Arrange for the device-dax target-node to be onlined so that the
    newly added memory range can be uniquely referenced by numa apis"

    NOTE! I'm not entirely happy with the whole "PMEM as RAM" model because
    we currently have special - and very annoying rules in the kernel about
    accessing PMEM only with the "MC safe" accessors, because machine checks
    inside the regular repeat string copy functions can be fatal in some
    (not described) circumstances.

    And apparently the PMEM modules can cause that a lot more than regular
    RAM. The argument is that this happens because PMEM doesn't necessarily
    get scrubbed at boot like RAM does, but that is planned to be added for
    the user space tooling.

    Quoting Dan from another email:
    "The exposure can be reduced in the volatile-RAM case by scanning for
    and clearing errors before it is onlined as RAM. The userspace tooling
    for that can be in place before v5.1-final. There's also runtime
    notifications of errors via acpi_nfit_uc_error_notify() from
    background scrubbers on the DIMM devices. With that mechanism the
    kernel could proactively clear newly discovered poison in the volatile
    case, but that would be additional development more suitable for v5.2.

    I understand the concern, and the need to highlight this issue by
    tapping the brakes on feature development, but I don't see PMEM as RAM
    making the situation worse when the exposure is also there via DAX in
    the PMEM case. Volatile-RAM is arguably a safer use case since it's
    possible to repair pages where the persistent case needs active
    application coordination"

    * tag 'devdax-for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
    device-dax: "Hotplug" persistent memory for use like normal RAM
    mm/resource: Let walk_system_ram_range() search child resources
    mm/memory-hotplug: Allow memory resources to be children
    mm/resource: Move HMM pr_debug() deeper into resource code
    mm/resource: Return real error codes from walk failures
    device-dax: Add a 'modalias' attribute to DAX 'bus' devices
    device-dax: Add a 'target_node' attribute
    device-dax: Auto-bind device after successful new_id
    acpi/nfit, device-dax: Identify differentiated memory with a unique numa-node
    device-dax: Add /sys/class/dax backwards compatibility
    device-dax: Add support for a dax override driver
    device-dax: Move resource pinning+mapping into the common driver
    device-dax: Introduce bus + driver model
    device-dax: Start defining a dax bus model
    device-dax: Remove multi-resource infrastructure
    device-dax: Kill dax_region base
    device-dax: Kill dax_region ida

    Linus Torvalds
     
  • Pull more block layer changes from Jens Axboe:
    "This is a collection of both stragglers, and fixes that came in after
    I finalized the initial pull. This contains:

    - An MD pull request from Song, with a few minor fixes

    - Set of NVMe patches via Christoph

    - Pull request from Konrad, with a few fixes for xen/blkback

    - pblk fix IO calculation fix (Javier)

    - Segment calculation fix for pass-through (Ming)

    - Fallthrough annotation for blkcg (Mathieu)"

    * tag 'for-5.1/block-post-20190315' of git://git.kernel.dk/linux-block: (25 commits)
    blkcg: annotate implicit fall through
    nvme-tcp: support C2HData with SUCCESS flag
    nvmet: ignore EOPNOTSUPP for discard
    nvme: add proper write zeroes setup for the multipath device
    nvme: add proper discard setup for the multipath device
    nvme: remove nvme_ns_config_oncs
    nvme: disable Write Zeroes for qemu controllers
    nvmet-fc: bring Disconnect into compliance with FC-NVME spec
    nvmet-fc: fix issues with targetport assoc_list list walking
    nvme-fc: reject reconnect if io queue count is reduced to zero
    nvme-fc: fix numa_node when dev is null
    nvme-fc: use nr_phys_segments to determine existence of sgl
    nvme-loop: init nvmet_ctrl fatal_err_work when allocate
    nvme: update comment to make the code easier to read
    nvme: put ns_head ref if namespace fails allocation
    nvme-trace: fix cdw10 buffer overrun
    nvme: don't warn on block content change effects
    nvme: add get-feature to admin cmds tracer
    md: Fix failed allocation of md_register_thread
    It's wrong to add len to sector_nr in raid10 reshape twice
    ...

    Linus Torvalds
     

16 Mar, 2019

2 commits

  • Pull tracing fixes and cleanups from Steven Rostedt:
    "This contains a series of last minute clean ups, small fixes and error
    checks"

    * tag 'trace-v5.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    tracing/probe: Verify alloc_trace_*probe() result
    tracing/probe: Check event/group naming rule at parsing
    tracing/probe: Check the size of argument name and body
    tracing/probe: Check event name length correctly
    tracing/probe: Check maxactive error cases
    tracing: kdb: Fix ftdump to not sleep
    trace/probes: Remove kernel doc style from non kernel doc comment
    tracing/probes: Make reserved_field_names static

    Linus Torvalds
     
  • Pull fbdev updates from Bartlomiej Zolnierkiewicz:
    "Just a couple of small fixes and cleanups:

    - fix memory access if logo is bigger than the screen (Manfred
    Schlaegl)

    - silence fbcon logo on 'quiet' boots (Prarit Bhargava)

    - use kvmalloc() for scrollback buffer in fbcon (Konstantin Khorenko)

    - misc fixes (Colin Ian King, YueHaibing, Matteo Croce, Mathieu
    Malaterre, Anders Roxell, Arnd Bergmann)

    - misc cleanups (Rob Herring, Lubomir Rintel, Greg Kroah-Hartman,
    Jani Nikula, Michal Vokáč)"

    * tag 'fbdev-v5.1' of git://github.com/bzolnier/linux:
    fbdev: mbx: fix a misspelled variable name
    fbdev: omap2: fix warnings in dss core
    video: fbdev: Fix potential NULL pointer dereference
    fbcon: Silence fbcon logo on 'quiet' boots
    printk: Export console_printk
    ARM: dts: imx28-cfa10036: Fix the reset gpio signal polarity
    video: ssd1307fb: Do not hard code active-low reset sequence
    dt-bindings: display: ssd1307fb: Remove reset-active-low from examples
    fbdev: fbmem: fix memory access if logo is bigger than the screen
    video/fbdev: refactor video= cmdline parsing
    fbdev: mbx: fix up debugfs file creation
    fbdev: omap2: no need to check return value of debugfs_create functions
    video: fbdev: geode: remove ifdef OLPC noise
    video: offb: annotate implicit fall throughs
    omapfb: fix typo
    fbdev: Use of_node_name_eq for node name comparisons
    fbcon: use kvmalloc() for scrollback buffer
    fbdev: chipsfb: remove set but not used variable 'size'
    fbdev/via: fix spelling mistake "Expandsion" -> "Expansion"

    Linus Torvalds
     

15 Mar, 2019

5 commits

  • Since alloc_trace_*probe() returns -EINVAL only if !event && !group,
    it should not happen in trace_*probe_create(). If we catch that case
    there is a bug. So use WARN_ON_ONCE() instead of pr_info().

    Link: http://lkml.kernel.org/r/155253785078.14922.16902223633734601469.stgit@devnote2

    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     
  • Check event and group naming rule at parsing it instead
    of allocating probes.

    Link: http://lkml.kernel.org/r/155253784064.14922.2336893061156236237.stgit@devnote2

    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     
  • Check the size of argument name and expression is not 0
    and smaller than maximum length.

    Link: http://lkml.kernel.org/r/155253783029.14922.12650939303827581096.stgit@devnote2

    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     
  • Ensure given name of event is not too long when parsing it,
    and fix to update event name offset correctly when the group
    name is given. For example, this makes probe event to check
    the "p:foo/" error case correctly.

    Link: http://lkml.kernel.org/r/155253782046.14922.14724124823730168629.stgit@devnote2

    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     
  • Check maxactive on kprobe error case, because maxactive
    is only for kretprobe, not for kprobe. Also, maxactive
    should not be 0, it should be at least 1.

    Link: http://lkml.kernel.org/r/155253780952.14922.15784129810238750331.stgit@devnote2

    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     

14 Mar, 2019

1 commit

  • There is a plan to build the kernel with -Wimplicit-fallthrough and
    this place in the code produced a warning (W=1).

    This commit remove the following warning:

    kernel/trace/blktrace.c:725:9: warning: this statement may fall through [-Wimplicit-fallthrough=]

    Signed-off-by: Mathieu Malaterre
    Signed-off-by: Jens Axboe

    Mathieu Malaterre
     

13 Mar, 2019

9 commits

  • As reported back in 2016-11 [1], the "ftdump" kdb command triggers a
    BUG for "sleeping function called from invalid context".

    kdb's "ftdump" command wants to call ring_buffer_read_prepare() in
    atomic context. A very simple solution for this is to add allocation
    flags to ring_buffer_read_prepare() so kdb can call it without
    triggering the allocation error. This patch does that.

    Note that in the original email thread about this, it was suggested
    that perhaps the solution for kdb was to either preallocate the buffer
    ahead of time or create our own iterator. I'm hoping that this
    alternative of adding allocation flags to ring_buffer_read_prepare()
    can be considered since it means I don't need to duplicate more of the
    core trace code into "trace_kdb.c" (for either creating my own
    iterator or re-preparing a ring allocator whose memory was already
    allocated).

    NOTE: another option for kdb is to actually figure out how to make it
    reuse the existing ftrace_dump() function and totally eliminate the
    duplication. This sounds very appealing and actually works (the "sr
    z" command can be seen to properly dump the ftrace buffer). The
    downside here is that ftrace_dump() fully consumes the trace buffer.
    Unless that is changed I'd rather not use it because it means "ftdump
    | grep xyz" won't be very useful to search the ftrace buffer since it
    will throw away the whole trace on the first grep. A future patch to
    dump only the last few lines of the buffer will also be hard to
    implement.

    [1] https://lkml.kernel.org/r/20161117191605.GA21459@google.com

    Link: http://lkml.kernel.org/r/20190308193205.213659-1-dianders@chromium.org

    Reported-by: Brian Norris
    Signed-off-by: Douglas Anderson
    Signed-off-by: Steven Rostedt (VMware)

    Douglas Anderson
     
  • Pull vfs mount infrastructure updates from Al Viro:
    "The rest of core infrastructure; no new syscalls in that pile, but the
    old parts are switched to new infrastructure. At that point
    conversions of individual filesystems can happen independently; some
    are done here (afs, cgroup, procfs, etc.), there's also a large series
    outside of that pile dealing with NFS (quite a bit of option-parsing
    stuff is getting used there - it's one of the most convoluted
    filesystems in terms of mount-related logics), but NFS bits are the
    next cycle fodder.

    It got seriously simplified since the last cycle; documentation is
    probably the weakest bit at the moment - I considered dropping the
    commit introducing Documentation/filesystems/mount_api.txt (cutting
    the size increase by quarter ;-), but decided that it would be better
    to fix it up after -rc1 instead.

    That pile allows to do followup work in independent branches, which
    should make life much easier for the next cycle. fs/super.c size
    increase is unpleasant; there's a followup series that allows to
    shrink it considerably, but I decided to leave that until the next
    cycle"

    * 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (41 commits)
    afs: Use fs_context to pass parameters over automount
    afs: Add fs_context support
    vfs: Add some logging to the core users of the fs_context log
    vfs: Implement logging through fs_context
    vfs: Provide documentation for new mount API
    vfs: Remove kern_mount_data()
    hugetlbfs: Convert to fs_context
    cpuset: Use fs_context
    kernfs, sysfs, cgroup, intel_rdt: Support fs_context
    cgroup: store a reference to cgroup_ns into cgroup_fs_context
    cgroup1_get_tree(): separate "get cgroup_root to use" into a separate helper
    cgroup_do_mount(): massage calling conventions
    cgroup: stash cgroup_root reference into cgroup_fs_context
    cgroup2: switch to option-by-option parsing
    cgroup1: switch to option-by-option parsing
    cgroup: take options parsing into ->parse_monolithic()
    cgroup: fold cgroup1_mount() into cgroup1_get_tree()
    cgroup: start switching to fs_context
    ipc: Convert mqueue fs to fs_context
    proc: Add fs_context support to procfs
    ...

    Linus Torvalds
     
  • Pull misc vfs updates from Al Viro:
    "Assorted fixes (really no common topic here)"

    * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    vfs: Make __vfs_write() static
    vfs: fix preadv64v2 and pwritev64v2 compat syscalls with offset == -1
    pipe: stop using ->can_merge
    splice: don't merge into linked buffers
    fs: move generic stat response attr handling to vfs_getattr_nosec
    orangefs: don't reinitialize result_mask in ->getattr
    fs/devpts: always delete dcache dentry-s in dput()

    Linus Torvalds
     
  • Merge misc updates from Andrew Morton:

    - a few misc things

    - the rest of MM

    - remove flex_arrays, replace with new simple radix-tree implementation

    * emailed patches from Andrew Morton : (38 commits)
    Drop flex_arrays
    sctp: convert to genradix
    proc: commit to genradix
    generic radix trees
    selinux: convert to kvmalloc
    md: convert to kvmalloc
    openvswitch: convert to kvmalloc
    of: fix kmemleak crash caused by imbalance in early memory reservation
    mm: memblock: update comments and kernel-doc
    memblock: split checks whether a region should be skipped to a helper function
    memblock: remove memblock_{set,clear}_region_flags
    memblock: drop memblock_alloc_*_nopanic() variants
    memblock: memblock_alloc_try_nid: don't panic
    treewide: add checks for the return value of memblock_alloc*()
    swiotlb: add checks for the return value of memblock_alloc*()
    init/main: add checks for the return value of memblock_alloc*()
    mm/percpu: add checks for the return value of memblock_alloc*()
    sparc: add checks for the return value of memblock_alloc*()
    ia64: add checks for the return value of memblock_alloc*()
    arch: don't memset(0) memory returned by memblock_alloc()
    ...

    Linus Torvalds
     
  • As all the memblock allocation functions return NULL in case of error
    rather than panic(), the duplicates with _nopanic suffix can be removed.

    Link: http://lkml.kernel.org/r/1548057848-15136-22-git-send-email-rppt@linux.ibm.com
    Signed-off-by: Mike Rapoport
    Acked-by: Greg Kroah-Hartman
    Reviewed-by: Petr Mladek [printk]
    Cc: Catalin Marinas
    Cc: Christophe Leroy
    Cc: Christoph Hellwig
    Cc: "David S. Miller"
    Cc: Dennis Zhou
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Guan Xuetao
    Cc: Guo Ren
    Cc: Guo Ren [c-sky]
    Cc: Heiko Carstens
    Cc: Juergen Gross [Xen]
    Cc: Mark Salter
    Cc: Matt Turner
    Cc: Max Filippov
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Paul Burton
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Rob Herring
    Cc: Rob Herring
    Cc: Russell King
    Cc: Stafford Horne
    Cc: Tony Luck
    Cc: Vineet Gupta
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • Add check for the return value of memblock_alloc*() functions and call
    panic() in case of error. The panic message repeats the one used by
    panicing memblock allocators with adjustment of parameters to include
    only relevant ones.

    The replacement was mostly automated with semantic patches like the one
    below with manual massaging of format strings.

    @@
    expression ptr, size, align;
    @@
    ptr = memblock_alloc(size, align);
    + if (!ptr)
    + panic("%s: Failed to allocate %lu bytes align=0x%lx\n", __func__, size, align);

    [anders.roxell@linaro.org: use '%pa' with 'phys_addr_t' type]
    Link: http://lkml.kernel.org/r/20190131161046.21886-1-anders.roxell@linaro.org
    [rppt@linux.ibm.com: fix format strings for panics after memblock_alloc]
    Link: http://lkml.kernel.org/r/1548950940-15145-1-git-send-email-rppt@linux.ibm.com
    [rppt@linux.ibm.com: don't panic if the allocation in sparse_buffer_init fails]
    Link: http://lkml.kernel.org/r/20190131074018.GD28876@rapoport-lnx
    [akpm@linux-foundation.org: fix xtensa printk warning]
    Link: http://lkml.kernel.org/r/1548057848-15136-20-git-send-email-rppt@linux.ibm.com
    Signed-off-by: Mike Rapoport
    Signed-off-by: Anders Roxell
    Reviewed-by: Guo Ren [c-sky]
    Acked-by: Paul Burton [MIPS]
    Acked-by: Heiko Carstens [s390]
    Reviewed-by: Juergen Gross [Xen]
    Reviewed-by: Geert Uytterhoeven [m68k]
    Acked-by: Max Filippov [xtensa]
    Cc: Catalin Marinas
    Cc: Christophe Leroy
    Cc: Christoph Hellwig
    Cc: "David S. Miller"
    Cc: Dennis Zhou
    Cc: Greentime Hu
    Cc: Greg Kroah-Hartman
    Cc: Guan Xuetao
    Cc: Guo Ren
    Cc: Mark Salter
    Cc: Matt Turner
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Petr Mladek
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Rob Herring
    Cc: Rob Herring
    Cc: Russell King
    Cc: Stafford Horne
    Cc: Tony Luck
    Cc: Vineet Gupta
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • Add panic() calls if memblock_alloc() returns NULL.

    The panic() format duplicates the one used by memblock itself and in
    order to avoid explosion with long parameters list replace open coded
    allocation size calculations with a local variable.

    Link: http://lkml.kernel.org/r/1548057848-15136-19-git-send-email-rppt@linux.ibm.com
    Signed-off-by: Mike Rapoport
    Cc: Catalin Marinas
    Cc: Christophe Leroy
    Cc: Christoph Hellwig
    Cc: "David S. Miller"
    Cc: Dennis Zhou
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Greg Kroah-Hartman
    Cc: Guan Xuetao
    Cc: Guo Ren
    Cc: Guo Ren [c-sky]
    Cc: Heiko Carstens
    Cc: Juergen Gross [Xen]
    Cc: Mark Salter
    Cc: Matt Turner
    Cc: Max Filippov
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Paul Burton
    Cc: Petr Mladek
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Rob Herring
    Cc: Rob Herring
    Cc: Russell King
    Cc: Stafford Horne
    Cc: Tony Luck
    Cc: Vineet Gupta
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • do_proc_do[u]intvec_minmax_conv() had included open-coded versions of
    do_proc_do[u]intvec_conv(); the duplication led to buggy inconsistencies
    (missing range checks). To reduce the likelihood of such problems in the
    future, we can instead refactor both to be defined in terms of their
    non-bounded counterparts (plus the added check).

    Link: http://lkml.kernel.org/r/20190207165138.5oud57vq4ozwb4kh@hatter.bewilderbeest.net
    Signed-off-by: Zev Weiss
    Cc: Brendan Higgins
    Cc: Iurii Zaikin
    Cc: Kees Cook
    Cc: Luis Chamberlain
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zev Weiss
     
  • This bug has apparently existed since the introduction of this function
    in the pre-git era (4500e91754d3 in Thomas Gleixner's history.git,
    "[NET]: Add proc_dointvec_userhz_jiffies, use it for proper handling of
    neighbour sysctls.").

    As a minimal fix we can simply duplicate the corresponding check in
    do_proc_dointvec_conv().

    Link: http://lkml.kernel.org/r/20190207123426.9202-3-zev@bewilderbeest.net
    Signed-off-by: Zev Weiss
    Cc: Brendan Higgins
    Cc: Iurii Zaikin
    Cc: Kees Cook
    Cc: Luis Chamberlain
    Cc: [2.6.2+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zev Weiss
     

12 Mar, 2019

3 commits

  • CC kernel/trace/trace_kprobe.o
    kernel/trace/trace_kprobe.c:41: warning: cannot understand function prototype: 'struct trace_kprobe '

    The real problem is that a comment looked like kerneldoc when it shouldn't be...

    Link: http://lkml.kernel.org/r/2812.1552381112@turing-police

    Signed-off-by: Valdis Kletnieks
    Signed-off-by: Steven Rostedt (VMware)

    Valdis Klētnieks
     
  • sparse complains:
    CHECK kernel/trace/trace_probe.c
    kernel/trace/trace_probe.c:16:12: warning: symbol 'reserved_field_names' was not declared. Should it be static?

    Yes, it should be static.

    Link: http://lkml.kernel.org/r/2478.1552380778@turing-police

    Signed-off-by: Valdis Kletnieks
    Signed-off-by: Steven Rostedt (VMware)

    Valdis Klētnieks
     
  • Pull tracing updates from Steven Rostedt:
    "The biggest change for this release is in the histogram code:

    - Add "onchange(var)" histogram handler that executes a action when
    $var changes.

    - Add new "snapshot()" action for histogram handlers, that causes a
    snapshot of the ring buffer when triggered. ie.
    onchange(var).snapshot() will trigger a snapshot if var changes.

    - Add alternative for "trace()" action. Currently, to trigger a
    synthetic event, the name of that event is used as the handler
    name, which is inconsistent with the other actions.
    onchange(var).synthetic(param) where it can now be
    onchange(var).trace(synthetic, param). The older method will still
    be allowed, as long as the synthetic events do not overlap with
    other handler names.

    - The histogram documentation at testcases were updated for the new
    changes.

    Outside of the histogram code, we have:

    - Added a quicker way to enable set_ftrace_filter files, that will
    make it much quicker to bisect tracing a function that shouldn't be
    traced and crashes the kernel. (You can echo in numbers to
    set_ftrace_filter, and it will select the corresponding function
    that is in available_filter_functions).

    - Some better displaying of the tracing data (and more information
    was added).

    The rest are small fixes and more clean ups to the code"

    * tag 'trace-v5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (37 commits)
    tracing: Use strncpy instead of memcpy when copying comm in trace.c
    tracing: Use strncpy instead of memcpy when copying comm for hist triggers
    tracing: Use strncpy instead of memcpy for string keys in hist triggers
    tracing: Use str_has_prefix() in synth_event_create()
    x86/ftrace: Fix warning and considate ftrace_jmp_replace() and ftrace_call_replace()
    tracing/perf: Use strndup_user() instead of buggy open-coded version
    doc: trace: Fix documentation for uprobe_profile
    tracing: Fix spelling mistake: "analagous" -> "analogous"
    tracing: Comment why cond_snapshot is checked outside of max_lock protection
    tracing: Add hist trigger action 'expected fail' test case
    tracing: Add alternative synthetic event trace action test case
    tracing: Add hist trigger onchange() handler test case
    tracing: Add hist trigger snapshot() action test case
    tracing: Add SPDX license GPL-2.0 license identifier to inter-event testcases
    tracing: Add alternative synthetic event trace action syntax
    tracing: Add hist trigger onchange() handler Documentation
    tracing: Add hist trigger onchange() handler
    tracing: Add hist trigger snapshot() action Documentation
    tracing: Add hist trigger snapshot() action
    tracing: Add conditional snapshot
    ...

    Linus Torvalds
     

11 Mar, 2019

7 commits

  • Pull networking fixes from David Miller:
    "First batch of fixes in the new merge window:

    1) Double dst_cache free in act_tunnel_key, from Wenxu.

    2) Avoid NULL deref in IN_DEV_MFORWARD() by failing early in the
    ip_route_input_rcu() path, from Paolo Abeni.

    3) Fix appletalk compile regression, from Arnd Bergmann.

    4) If SLAB objects reach the TCP sendpage method we are in serious
    trouble, so put a debugging check there. From Vasily Averin.

    5) Memory leak in hsr layer, from Mao Wenan.

    6) Only test GSO type on GSO packets, from Willem de Bruijn.

    7) Fix crash in xsk_diag_put_umem(), from Eric Dumazet.

    8) Fix VNIC mailbox length in nfp, from Dirk van der Merwe.

    9) Fix race in ipv4 route exception handling, from Xin Long.

    10) Missing DMA memory barrier in hns3 driver, from Jian Shen.

    11) Use after free in __tcf_chain_put(), from Vlad Buslov.

    12) Handle inet_csk_reqsk_queue_add() failures, from Guillaume Nault.

    13) Return value correction when ip_mc_may_pull() fails, from Eric
    Dumazet.

    14) Use after free in x25_device_event(), also from Eric"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (72 commits)
    gro_cells: make sure device is up in gro_cells_receive()
    vxlan: test dev->flags & IFF_UP before calling gro_cells_receive()
    net/x25: fix use-after-free in x25_device_event()
    isdn: mISDNinfineon: fix potential NULL pointer dereference
    net: hns3: fix to stop multiple HNS reset due to the AER changes
    ip: fix ip_mc_may_pull() return value
    net: keep refcount warning in reqsk_free()
    net: stmmac: Avoid one more sometimes uninitialized Clang warning
    net: dsa: mv88e6xxx: Set correct interface mode for CPU/DSA ports
    rxrpc: Fix client call queueing, waiting for channel
    tcp: handle inet_csk_reqsk_queue_add() failures
    net: ethernet: sun: Zero initialize class in default case in niu_add_ethtool_tcam_entry
    8139too : Add support for U.S. Robotics USR997901A 10/100 Cardbus NIC
    fou, fou6: avoid uninit-value in gue_err() and gue6_err()
    net: sched: fix potential use-after-free in __tcf_chain_put()
    vhost: silence an unused-variable warning
    vsock/virtio: fix kernel panic from virtio_transport_reset_no_sock
    connector: fix unsafe usage of ->real_parent
    vxlan: do not need BH again in vxlan_cleanup()
    net: hns3: add dma_rmb() for rx description
    ...

    Linus Torvalds
     
  • Pull Kbuild updates from Masahiro Yamada:

    - do not generate unneeded top-level built-in.a

    - let git ignore O= directory entirely

    - optimize scripts/kallsyms slightly

    - exclude DWARF info from *.s regardless of config options

    - fix GCC toolchain search path for Clang to prepare ld.lld support

    - do not generate modules.order when CONFIG_MODULES is disabled

    - simplify single target rules and remove VPATH for external module
    build

    - allow to add optional flags to dpkg-buildpackage when building
    deb-pkg

    - move some compiler option tests from Makefile to Kconfig

    - various Makefile cleanups

    * tag 'kbuild-v5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (40 commits)
    kbuild: remove scripts/basic/% build target
    kbuild: use -Werror=implicit-... instead of -Werror-implicit-...
    kbuild: clean up scripts/gcc-version.sh
    kbuild: remove cc-version macro
    kbuild: update comment block of scripts/clang-version.sh
    kbuild: remove commented-out INITRD_COMPRESS
    kbuild: move -gsplit-dwarf, -gdwarf-4 option tests to Kconfig
    kbuild: [bin]deb-pkg: add DPKG_FLAGS variable
    kbuild: move ".config not found!" message from Kconfig to Makefile
    kbuild: invoke syncconfig if include/config/auto.conf.cmd is missing
    kbuild: simplify single target rules
    kbuild: remove empty rules for makefiles
    kbuild: make -r/-R effective in top Makefile for old Make versions
    kbuild: move tools_silent to a more relevant place
    kbuild: compute false-positive -Wmaybe-uninitialized cases in Kconfig
    kbuild: refactor cc-cross-prefix implementation
    kbuild: hardcode genksyms path and remove GENKSYMS variable
    scripts/gdb: refactor rules for symlink creation
    kbuild: create symlink to vmlinux-gdb.py in scripts_gdb target
    scripts/gdb: do not descend into scripts/gdb from scripts
    ...

    Linus Torvalds
     
  • Pull perf updates from Thomas Gleixner:
    "Perf updates and fixes:

    Kernel:
    - Handle events which have the bpf_event attribute set as side band
    events as they carry information about BPF programs.
    - Add missing switch-case fall-through comments

    Libraries:
    - Fix leaks and double frees in error code paths.
    - Prevent buffer overflows in libtraceevent

    Tools:
    - Improvements in handling Intel BT/PTS
    - Add BTF ELF markers to perf trace BPF programs to improve output
    - Support --time, --cpu, --pid and --tid filters for perf diff
    - Calculate the column width in perf annotate as the hardcoded 6
    characters for the instruction are not sufficient
    - Small fixes all over the place"

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits)
    perf/core: Mark expected switch fall-through
    perf/x86/intel/uncore: Fix client IMC events return huge result
    perf/ring_buffer: Use high order allocations for AUX buffers optimistically
    perf data: Force perf_data__open|close zero data->file.path
    perf session: Fix double free in perf_data__close
    perf evsel: Probe for precise_ip with simple attr
    perf tools: Read and store caps/max_precise in perf_pmu
    perf hist: Fix memory leak of srcline
    perf hist: Add error path into hist_entry__init
    perf c2c: Fix c2c report for empty numa node
    perf script python: Add Python3 support to intel-pt-events.py
    perf script python: Add Python3 support to event_analyzing_sample.py
    perf script python: add Python3 support to check-perf-trace.py
    perf script python: Add Python3 support to futex-contention.py
    perf script python: Remove mixed indentation
    perf diff: Support --pid/--tid filter options
    perf diff: Support --cpu filter option
    perf diff: Support --time filter option
    perf thread: Generalize function to copy from thread addr space from intel-bts code
    perf annotate: Calculate the max instruction name, align column to that
    ...

    Linus Torvalds
     
  • Pull locking fixes from Thomas Gleixner:
    "A few fixes for lockdep:

    - initialize lockdep internal RCU head after initializing RCU

    - prevent use after free in a alloc_workqueue() error handling path

    - plug a memory leak in the workqueue core which fails to free a
    dynamically allocated lock name.

    - make Clang happy"

    * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    workqueue, lockdep: Fix a memory leak in wq->lock_name
    workqueue, lockdep: Fix an alloc_workqueue() error path
    locking/lockdep: Only call init_rcu_head() after RCU has been initialized
    locking/lockdep: Avoid a Clang warning

    Linus Torvalds
     
  • Pull watchdog core update from Thomas Gleixner:
    "A single commit adding a command line parameter which allows to set
    the watchdog threshold on the kernel command-line, so kernels with
    massive debug facilities enabled won't trigger the watchdog during
    early boot and before the threshold can be changed via sysctl"

    * 'core-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    watchdog/core: Add watchdog_thresh command line parameter

    Linus Torvalds
     
  • Pull virtio updates from Michael Tsirkin:
    "Several fixes, most notably fix for virtio on swiotlb systems"

    * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
    vhost: silence an unused-variable warning
    virtio: hint if callbacks surprisingly might sleep
    virtio-ccw: wire up ->bus_name callback
    s390/virtio: handle find on invalid queue gracefully
    virtio-ccw: diag 500 may return a negative cookie
    virtio_balloon: remove the unnecessary 0-initialization
    virtio-balloon: improve update_balloon_size_func
    virtio-blk: Consider virtio_max_dma_size() for maximum segment size
    virtio: Introduce virtio_max_dma_size()
    dma: Introduce dma_max_mapping_size()
    swiotlb: Add is_swiotlb_active() function
    swiotlb: Introduce swiotlb_max_mapping_size()

    Linus Torvalds
     
  • Pull DMA mapping updates from Christoph Hellwig:

    - add debugfs support for dumping dma-debug information (Corentin
    Labbe)

    - Kconfig cleanups (Andy Shevchenko and me)

    - debugfs cleanups (Greg Kroah-Hartman)

    - improve dma_map_resource and use it in the media code

    - arch_setup_dma_ops / arch_teardown_dma_ops cleanups

    - various small cleanups and improvements for the per-device coherent
    allocator

    - make the DMA mask an upper bound and don't fail "too large" dma mask
    in the remaning two architectures - this will allow big driver
    cleanups in the following merge windows

    * tag 'dma-mapping-5.1' of git://git.infradead.org/users/hch/dma-mapping: (21 commits)
    Documentation/DMA-API-HOWTO: update dma_mask sections
    sparc64/pci_sun4v: allow large DMA masks
    sparc64/iommu: allow large DMA masks
    sparc64: refactor the ali DMA quirk
    ccio: allow large DMA masks
    dma-mapping: remove the DMA_MEMORY_EXCLUSIVE flag
    dma-mapping: remove dma_mark_declared_memory_occupied
    dma-mapping: move CONFIG_DMA_CMA to kernel/dma/Kconfig
    dma-mapping: improve selection of dma_declare_coherent availability
    dma-mapping: remove an incorrect __iommem annotation
    of: select OF_RESERVED_MEM automatically
    device.h: dma_mem is only needed for HAVE_GENERIC_DMA_COHERENT
    mfd/sm501: depend on HAS_DMA
    dma-mapping: add a kconfig symbol for arch_teardown_dma_ops availability
    dma-mapping: add a kconfig symbol for arch_setup_dma_ops availability
    dma-mapping: move debug configuration options to kernel/dma
    dma-debug: add dumping facility via debugfs
    dma: debug: no need to check return value of debugfs_create functions
    videobuf2: replace a layering violation with dma_map_resource
    dma-mapping: don't BUG when calling dma_map_resource on RAM
    ...

    Linus Torvalds
     

10 Mar, 2019

3 commits

  • Pull rdma updates from Jason Gunthorpe:
    "This has been a slightly more active cycle than normal with ongoing
    core changes and quite a lot of collected driver updates.

    - Various driver fixes for bnxt_re, cxgb4, hns, mlx5, pvrdma, rxe

    - A new data transfer mode for HFI1 giving higher performance

    - Significant functional and bug fix update to the mlx5
    On-Demand-Paging MR feature

    - A chip hang reset recovery system for hns

    - Change mm->pinned_vm to an atomic64

    - Update bnxt_re to support a new 57500 chip

    - A sane netlink 'rdma link add' method for creating rxe devices and
    fixing the various unregistration race conditions in rxe's
    unregister flow

    - Allow lookup up objects by an ID over netlink

    - Various reworking of the core to driver interface:
    - drivers should not assume umem SGLs are in PAGE_SIZE chunks
    - ucontext is accessed via udata not other means
    - start to make the core code responsible for object memory
    allocation
    - drivers should convert struct device to struct ib_device via a
    helper
    - drivers have more tools to avoid use after unregister problems"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (280 commits)
    net/mlx5: ODP support for XRC transport is not enabled by default in FW
    IB/hfi1: Close race condition on user context disable and close
    RDMA/umem: Revert broken 'off by one' fix
    RDMA/umem: minor bug fix in error handling path
    RDMA/hns: Use GFP_ATOMIC in hns_roce_v2_modify_qp
    cxgb4: kfree mhp after the debug print
    IB/rdmavt: Fix concurrency panics in QP post_send and modify to error
    IB/rdmavt: Fix loopback send with invalidate ordering
    IB/iser: Fix dma_nents type definition
    IB/mlx5: Set correct write permissions for implicit ODP MR
    bnxt_re: Clean cq for kernel consumers only
    RDMA/uverbs: Don't do double free of allocated PD
    RDMA: Handle ucontext allocations by IB/core
    RDMA/core: Fix a WARN() message
    bnxt_re: fix the regression due to changes in alloc_pbl
    IB/mlx4: Increase the timeout for CM cache
    IB/core: Abort page fault handler silently during owning process exit
    IB/mlx5: Validate correct PD before prefetch MR
    IB/mlx5: Protect against prefetch of invalid MR
    RDMA/uverbs: Store PR pointer before it is overwritten
    ...

    Linus Torvalds
     
  • Pull printk updates from Petr Mladek:

    - Allow to sort mixed lines by an extra information about the caller

    - Remove no longer used LOG_PREFIX.

    - Some clean up and documentation update.

    * tag 'printk-for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk:
    printk/docs: Add extra integer types to printk-formats
    printk: Remove no longer used LOG_PREFIX.
    lib/vsprintf: Remove %pCr remnant in comment
    printk: Pass caller information to log_store().
    printk: Add caller information to printk() output.

    Linus Torvalds
     
  • …ux/kernel/git/acme/linux into perf/urgent

    Pull perf/core changes from Arnaldo Carvalho de Melo:

    perf bpf:

    Arnaldo Carvalho de Melo:

    - Automatically add BTF ELF markers to 'perf trace' BPF programs, so that
    tools such as 'bpftool map dump' can pretty print map keys and values.

    perf c2c:

    Jiri Olsa:

    - Fix report for empty NUMA node.

    perf diff:

    Jin Yao:

    - Support --time, --cpu, --pid and --tid filter options.

    perf probe:

    Arnaldo Carvalho de Melo:

    - Clarify error message about not finding kernel modules debuginfo.

    perf record:

    Jiri Olsa:

    - Fixup probing for max attr.precise_ip.

    perf trace:

    Arnaldo Carvalho de Melo:

    - Add missing %s lost in the 'msg_flags' recvmmsg arg when adding prefix suppression logic.

    perf annotate:

    Arnaldo Carvalho de Melo:

    - Calculate the max instruction name, align column to that, removing the
    hardcoded max 6 chars and cope with instructions with names longer than that,
    such as vpmovmskb, vpcmpeqb, etc.

    kernel:

    Song Liu:

    - Consider events with attr.bpf_event set as side-band.

    Gustavo A. R. Silva:

    - Mark expected switch fall-through in perf_event_parse_addr_filter().

    Libraries:

    Jiri Olsa:

    - Fix leaks and double frees on error paths.

    libtraceevent:

    Tony Jones:

    - Fix buffer overflow in arg_eval().

    python scripting:

    Tony Jones:

    - More python3 fixes.

    Trivial:

    Yang Wei:

    - Remove needless extra semicolon in clang C++ glue code.

    Intel PT/BTS:

    Adrian Hunter:

    - Improve auxtrace address filter error message when there is no DSO.

    - Fix divide by zero when TSC is not available.

    - Further improvements to the export to sqlite/posgresql python scripts
    and to the GUI sqlviewer, exporting 'parent_id' so that we have enable
    the creation of call trees.

    Andi Kleen:

    - Generalize function to copy from thread addr space from intel-bts code.

    Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>

    Ingo Molnar
     

09 Mar, 2019

7 commits

  • The following commit:

    669de8bda87b ("kernel/workqueue: Use dynamic lockdep keys for workqueues")

    introduced a memory leak as wq_free_lockdep() calls kfree(wq->lock_name),
    but wq_init_lockdep() does not point wq->lock_name to the newly allocated
    slab object.

    This can be reproduced by running LTP fallocate04 followed by oom01 tests:

    unreferenced object 0xc0000005876384d8 (size 64):
    comm "fallocate04", pid 26972, jiffies 4297139141 (age 40370.480s)
    hex dump (first 32 bytes):
    28 77 71 5f 63 6f 6d 70 6c 65 74 69 6f 6e 29 65 (wq_completion)e
    78 74 34 2d 72 73 76 2d 63 6f 6e 76 65 72 73 69 xt4-rsv-conversi
    backtrace:
    [] kvasprintf+0x6c/0xe0
    [] kasprintf+0x34/0x60
    [] alloc_workqueue+0x1f8/0x6ac
    [] ext4_fill_super+0x23d4/0x3c80 [ext4]
    [] mount_bdev+0x25c/0x290
    [] ext4_mount+0x28/0x50 [ext4]
    [] legacy_get_tree+0x4c/0xb0
    [] vfs_get_tree+0x6c/0x190
    [] do_mount+0xb9c/0x1100
    [] ksys_mount+0x158/0x180
    [] sys_mount+0x20/0x30
    [] system_call+0x5c/0x70

    Signed-off-by: Qian Cai
    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Bart Van Assche
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Dave Hansen
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Cc: catalin.marinas@arm.com
    Cc: jiangshanlai@gmail.com
    Cc: tj@kernel.org
    Fixes: 669de8bda87b ("kernel/workqueue: Use dynamic lockdep keys for workqueues")
    Link: https://lkml.kernel.org/r/20190307002731.47371-1-cai@lca.pw
    Signed-off-by: Ingo Molnar

    Qian Cai
     
  • This patch fixes a use-after-free and a memory leak in an alloc_workqueue()
    error path.

    Repoted by syzkaller and KASAN:

    BUG: KASAN: use-after-free in __read_once_size include/linux/compiler.h:197 [inline]
    BUG: KASAN: use-after-free in lockdep_register_key+0x3b9/0x490 kernel/locking/lockdep.c:1023
    Read of size 8 at addr ffff888090fc2698 by task syz-executor134/7858

    CPU: 1 PID: 7858 Comm: syz-executor134 Not tainted 5.0.0-rc8-next-20190301 #1
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x172/0x1f0 lib/dump_stack.c:113
    print_address_description.cold+0x7c/0x20d mm/kasan/report.c:187
    kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
    __asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
    __read_once_size include/linux/compiler.h:197 [inline]
    lockdep_register_key+0x3b9/0x490 kernel/locking/lockdep.c:1023
    wq_init_lockdep kernel/workqueue.c:3444 [inline]
    alloc_workqueue+0x427/0xe70 kernel/workqueue.c:4263
    ucma_open+0x76/0x290 drivers/infiniband/core/ucma.c:1732
    misc_open+0x398/0x4c0 drivers/char/misc.c:141
    chrdev_open+0x247/0x6b0 fs/char_dev.c:417
    do_dentry_open+0x488/0x1160 fs/open.c:771
    vfs_open+0xa0/0xd0 fs/open.c:880
    do_last fs/namei.c:3416 [inline]
    path_openat+0x10e9/0x46e0 fs/namei.c:3533
    do_filp_open+0x1a1/0x280 fs/namei.c:3563
    do_sys_open+0x3fe/0x5d0 fs/open.c:1063
    __do_sys_openat fs/open.c:1090 [inline]
    __se_sys_openat fs/open.c:1084 [inline]
    __x64_sys_openat+0x9d/0x100 fs/open.c:1084
    do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    Allocated by task 7789:
    save_stack+0x45/0xd0 mm/kasan/common.c:75
    set_track mm/kasan/common.c:87 [inline]
    __kasan_kmalloc mm/kasan/common.c:497 [inline]
    __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:470
    kasan_kmalloc+0x9/0x10 mm/kasan/common.c:511
    __do_kmalloc mm/slab.c:3726 [inline]
    __kmalloc+0x15c/0x740 mm/slab.c:3735
    kmalloc include/linux/slab.h:553 [inline]
    kzalloc include/linux/slab.h:743 [inline]
    alloc_workqueue+0x13c/0xe70 kernel/workqueue.c:4236
    ucma_open+0x76/0x290 drivers/infiniband/core/ucma.c:1732
    misc_open+0x398/0x4c0 drivers/char/misc.c:141
    chrdev_open+0x247/0x6b0 fs/char_dev.c:417
    do_dentry_open+0x488/0x1160 fs/open.c:771
    vfs_open+0xa0/0xd0 fs/open.c:880
    do_last fs/namei.c:3416 [inline]
    path_openat+0x10e9/0x46e0 fs/namei.c:3533
    do_filp_open+0x1a1/0x280 fs/namei.c:3563
    do_sys_open+0x3fe/0x5d0 fs/open.c:1063
    __do_sys_openat fs/open.c:1090 [inline]
    __se_sys_openat fs/open.c:1084 [inline]
    __x64_sys_openat+0x9d/0x100 fs/open.c:1084
    do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    Freed by task 7789:
    save_stack+0x45/0xd0 mm/kasan/common.c:75
    set_track mm/kasan/common.c:87 [inline]
    __kasan_slab_free+0x102/0x150 mm/kasan/common.c:459
    kasan_slab_free+0xe/0x10 mm/kasan/common.c:467
    __cache_free mm/slab.c:3498 [inline]
    kfree+0xcf/0x230 mm/slab.c:3821
    alloc_workqueue+0xc3e/0xe70 kernel/workqueue.c:4295
    ucma_open+0x76/0x290 drivers/infiniband/core/ucma.c:1732
    misc_open+0x398/0x4c0 drivers/char/misc.c:141
    chrdev_open+0x247/0x6b0 fs/char_dev.c:417
    do_dentry_open+0x488/0x1160 fs/open.c:771
    vfs_open+0xa0/0xd0 fs/open.c:880
    do_last fs/namei.c:3416 [inline]
    path_openat+0x10e9/0x46e0 fs/namei.c:3533
    do_filp_open+0x1a1/0x280 fs/namei.c:3563
    do_sys_open+0x3fe/0x5d0 fs/open.c:1063
    __do_sys_openat fs/open.c:1090 [inline]
    __se_sys_openat fs/open.c:1084 [inline]
    __x64_sys_openat+0x9d/0x100 fs/open.c:1084
    do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    The buggy address belongs to the object at ffff888090fc2580
    which belongs to the cache kmalloc-512 of size 512
    The buggy address is located 280 bytes inside of
    512-byte region [ffff888090fc2580, ffff888090fc2780)

    Reported-by: syzbot+17335689e239ce135d8b@syzkaller.appspotmail.com
    Signed-off-by: Bart Van Assche
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Dave Hansen
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Fixes: 669de8bda87b ("kernel/workqueue: Use dynamic lockdep keys for workqueues")
    Link: https://lkml.kernel.org/r/20190303220046.29448-1-bvanassche@acm.org
    Signed-off-by: Ingo Molnar

    Bart Van Assche
     
  • init_data_structures_once() is called for the first time before RCU has
    been initialized. Make sure that init_rcu_head() is called before the
    RCU head is used and after RCU has been initialized.

    Signed-off-by: Bart Van Assche
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Dave Hansen
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Cc: longman@redhat.com
    Link: https://lkml.kernel.org/r/c20aa0f0-42ab-a884-d931-7d4ec2bf0cdc@acm.org
    Signed-off-by: Ingo Molnar

    Bart Van Assche
     
  • Clang warns about a tentative array definition without a length:

    kernel/locking/lockdep.c:845:12: error: tentative array definition assumed to have one element [-Werror]

    There is no real reason to do this here, so just set the same length as
    in the real definition later in the same file. It has to be hidden in
    an #ifdef or annotated __maybe_unused though, to avoid the unused-variable
    warning if CONFIG_PROVE_LOCKING is disabled.

    Signed-off-by: Arnd Bergmann
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Alexander Shishkin
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Arnaldo Carvalho de Melo
    Cc: Bart Van Assche
    Cc: Borislav Petkov
    Cc: Dave Hansen
    Cc: Frederic Weisbecker
    Cc: H. Peter Anvin
    Cc: Jiri Olsa
    Cc: Joel Fernandes (Google)
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Stephane Eranian
    Cc: Steven Rostedt (VMware)
    Cc: Tetsuo Handa
    Cc: Thomas Gleixner
    Cc: Vince Weaver
    Cc: Waiman Long
    Cc: Will Deacon
    Link: https://lkml.kernel.org/r/20190307075222.3424524-1-arnd@arndb.de
    Signed-off-by: Ingo Molnar

    Arnd Bergmann
     
  • In preparation to enabling -Wimplicit-fallthrough, mark switch cases
    where we are expecting to fall through.

    This patch fixes the following warning:

    kernel/events/core.c: In function ‘perf_event_parse_addr_filter’:
    kernel/events/core.c:9154:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
    kernel = 1;
    ~~~~~~~^~~
    kernel/events/core.c:9156:3: note: here
    case IF_SRC_FILEADDR:
    ^~~~

    Warning level 3 was used: -Wimplicit-fallthrough=3

    This patch is part of the ongoing efforts to enable -Wimplicit-fallthrough.

    Signed-off-by: Gustavo A. R. Silva
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Alexander Shishkin
    Cc: Andy Lutomirski
    Cc: Arnaldo Carvalho de Melo
    Cc: Arnaldo Carvalho de Melo
    Cc: Borislav Petkov
    Cc: Dave Hansen
    Cc: H. Peter Anvin
    Cc: Jiri Olsa
    Cc: Kees Cook
    Cc: Linus Torvalds
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: Vince Weaver
    Link: https://lkml.kernel.org/r/20190212205430.GA8446@embeddedor
    Signed-off-by: Ingo Molnar

    Gustavo A. R. Silva
     
  • Currently, the AUX buffer allocator will use high-order allocations
    for PMUs that don't support hardware scatter-gather chaining to ensure
    large contiguous blocks of pages, and always use an array of single
    pages otherwise.

    There is, however, a tangible performance benefit in using larger chunks
    of contiguous memory even in the latter case, that comes from not having
    to fetch the next page's address at every page boundary. In particular,
    a task running under Intel PT on an Atom CPU shows 1.5%-2% less runtime
    penalty with a single multi-page output region in snapshot mode (no PMI)
    than with multiple single-page output regions, from ~6% down to ~4%. For
    the snapshot mode it does make a difference as it is intended to run over
    long periods of time.

    For this reason, change the allocation policy to always optimistically
    start with the highest possible order when allocating pages for the AUX
    buffer, desceding until the allocation succeeds or order zero allocation
    fails.

    Signed-off-by: Alexander Shishkin
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andy Lutomirski
    Cc: Arnaldo Carvalho de Melo
    Cc: Borislav Petkov
    Cc: Dave Hansen
    Cc: H. Peter Anvin
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: Vince Weaver
    Link: https://lkml.kernel.org/r/20190215114727.62648-2-alexander.shishkin@linux.intel.com
    Signed-off-by: Ingo Molnar

    Alexander Shishkin
     
  • Pull io_uring IO interface from Jens Axboe:
    "Second attempt at adding the io_uring interface.

    Since the first one, we've added basic unit testing of the three
    system calls, that resides in liburing like the other unit tests that
    we have so far. It'll take a while to get full coverage of it, but
    we're working towards it. I've also added two basic test programs to
    tools/io_uring. One uses the raw interface and has support for all the
    various features that io_uring supports outside of standard IO, like
    fixed files, fixed IO buffers, and polled IO. The other uses the
    liburing API, and is a simplified version of cp(1).

    This adds support for a new IO interface, io_uring.

    io_uring allows an application to communicate with the kernel through
    two rings, the submission queue (SQ) and completion queue (CQ) ring.
    This allows for very efficient handling of IOs, see the v5 posting for
    some basic numbers:

    https://lore.kernel.org/linux-block/20190116175003.17880-1-axboe@kernel.dk/

    Outside of just efficiency, the interface is also flexible and
    extendable, and allows for future use cases like the upcoming NVMe
    key-value store API, networked IO, and so on. It also supports async
    buffered IO, something that we've always failed to support in the
    kernel.

    Outside of basic IO features, it supports async polled IO as well.
    This particular feature has already been tested at Facebook months ago
    for flash storage boxes, with 25-33% improvements. It makes polled IO
    actually useful for real world use cases, where even basic flash sees
    a nice win in terms of efficiency, latency, and performance. These
    boxes were IOPS bound before, now they are not.

    This series adds three new system calls. One for setting up an
    io_uring instance (io_uring_setup(2)), one for submitting/completing
    IO (io_uring_enter(2)), and one for aux functions like registrating
    file sets, buffers, etc (io_uring_register(2)). Through the help of
    Arnd, I've coordinated the syscall numbers so merge on that front
    should be painless.

    Jon did a writeup of the interface a while back, which (except for
    minor details that have been tweaked) is still accurate. Find that
    here:

    https://lwn.net/Articles/776703/

    Huge thanks to Al Viro for helping getting the reference cycle code
    correct, and to Jann Horn for his extensive reviews focused on both
    security and bugs in general.

    There's a userspace library that provides basic functionality for
    applications that don't need or want to care about how to fiddle with
    the rings directly. It has helpers to allow applications to easily set
    up an io_uring instance, and submit/complete IO through it without
    knowing about the intricacies of the rings. It also includes man pages
    (thanks to Jeff Moyer), and will continue to grow support helper
    functions and features as time progresses. Find it here:

    git://git.kernel.dk/liburing

    Fio has full support for the raw interface, both in the form of an IO
    engine (io_uring), but also with a small test application (t/io_uring)
    that can exercise and benchmark the interface"

    * tag 'io_uring-2019-03-06' of git://git.kernel.dk/linux-block:
    io_uring: add a few test tools
    io_uring: allow workqueue item to handle multiple buffered requests
    io_uring: add support for IORING_OP_POLL
    io_uring: add io_kiocb ref count
    io_uring: add submission polling
    io_uring: add file set registration
    net: split out functions related to registering inflight socket files
    io_uring: add support for pre-mapped user IO buffers
    block: implement bio helper to add iter bvec pages to bio
    io_uring: batch io_kiocb allocation
    io_uring: use fget/fput_many() for file references
    fs: add fget_many() and fput_many()
    io_uring: support for IO polling
    io_uring: add fsync support
    Add io_uring IO interface

    Linus Torvalds