06 Jan, 2021

2 commits

  • [ Upstream commit fc6b6a872dcd48c6f39c7975836d75113db67d37 ]

    Internally, UBD treats each physical IO segment as a separate command to
    be submitted in the execution pipe. If the pipe returns a transient
    error after a few segments have already been written, UBD will tell the
    block layer to requeue the request, but there is no way to reclaim the
    segments already submitted. When a new attempt to dispatch the request
    is done, those segments already submitted will get duplicated, causing
    the WARN_ON below in the best case, and potentially data corruption.

    In my system, running a UML instance with 2GB of RAM and a 50M UBD disk,
    I can reproduce the WARN_ON by simply running mkfs.fvat against the
    disk on a freshly booted system.

    There are a few ways to around this, like reducing the pressure on
    the pipe by reducing the queue depth, which almost eliminates the
    occurrence of the problem, increasing the pipe buffer size on the host
    system, or by limiting the request to one physical segment, which causes
    the block layer to submit way more requests to resolve a single
    operation.

    Instead, this patch modifies the format of a UBD command, such that all
    segments are sent through a single element in the communication pipe,
    turning the command submission atomic from the point of view of the
    block layer. The new format has a variable size, depending on the
    number of elements, and looks like this:

    +------------+-----------+-----------+------------
    | cmd_header | segment 0 | segment 1 | segment ...
    +------------+-----------+-----------+------------

    With this format, we push a pointer to cmd_header in the submission
    pipe.

    This has the advantage of reducing the memory footprint of executing a
    single request, since it allow us to merge some fields in the header.
    It is possible to reduce even further each segment memory footprint, by
    merging bitmap_words and cow_offset, for instance, but this is not the
    focus of this patch and is left as future work. One issue with the
    patch is that for a big number of segments, we now perform one big
    memory allocation instead of multiple small ones, but I wasn't able to
    trigger any real issues or -ENOMEM because of this change, that wouldn't
    be reproduced otherwise.

    This was tested using fio with the verify-crc32 option, and by running
    an ext4 filesystem over this UBD device.

    The original WARN_ON was:

    ------------[ cut here ]------------
    WARNING: CPU: 0 PID: 0 at lib/refcount.c:28 refcount_warn_saturate+0x13f/0x141
    refcount_t: underflow; use-after-free.
    Modules linked in:
    CPU: 0 PID: 0 Comm: swapper Not tainted 5.5.0-rc6-00002-g2a5bb2cf75c8 #346
    Stack:
    6084eed0 6063dc77 00000009 6084ef60
    00000000 604b8d9f 6084eee0 6063dcbc
    6084ef40 6006ab8d e013d780 1c00000000
    Call Trace:
    [] ? printk+0x0/0x94
    [] show_stack+0x13b/0x155
    [] ? dump_stack_print_info+0xdf/0xe8
    [] ? refcount_warn_saturate+0x13f/0x141
    [] dump_stack+0x2a/0x2c
    [] __warn+0x107/0x134
    [] ? wake_up_process+0x17/0x19
    [] ? blk_queue_max_discard_sectors+0x0/0xd
    [] warn_slowpath_fmt+0xd1/0xdf
    [] ? warn_slowpath_fmt+0x0/0xdf
    [] ? raw_read_seqcount_begin.constprop.0+0x0/0x15
    [] ? os_nsecs+0x1d/0x2b
    [] refcount_warn_saturate+0x13f/0x141
    [] refcount_sub_and_test.constprop.0+0x2f/0x37
    [] blk_mq_free_request+0xf1/0x10d
    [] __blk_mq_end_request+0x10c/0x114
    [] ubd_intr+0xb5/0x169
    [] __handle_irq_event_percpu+0x6b/0x17e
    [] handle_irq_event_percpu+0x26/0x69
    [] handle_irq_event+0x26/0x34
    [] ? handle_irq_event+0x0/0x34
    [] ? unmask_irq+0x0/0x37
    [] handle_edge_irq+0xbc/0xd6
    [] generic_handle_irq+0x21/0x29
    [] do_IRQ+0x39/0x54
    [...]
    ---[ end trace c6e7444e55386c0f ]---

    Cc: Christopher Obbard
    Reported-by: Martyn Welch
    Signed-off-by: Gabriel Krisman Bertazi
    Tested-by: Christopher Obbard
    Acked-by: Anton Ivanov
    Signed-off-by: Richard Weinberger
    Signed-off-by: Sasha Levin

    Gabriel Krisman Bertazi
     
  • [ Upstream commit 72d3e093afae79611fa38f8f2cfab9a888fe66f2 ]

    The UML random driver creates a dummy device under the guest,
    /dev/hw_random. When this file is read from the guest, the driver
    reads from the host machine's /dev/random, in-turn reading from
    the host kernel's entropy pool. This entropy pool could have been
    filled by a hardware random number generator or just the host
    kernel's internal software entropy generator.

    Currently the driver does not fill the guests kernel entropy pool,
    this requires a userspace tool running inside the guest (like
    rng-tools) to read from the dummy device provided by this driver,
    which then would fill the guest's internal entropy pool.

    This all seems quite pointless when we are already reading from an
    entropy pool, so this patch aims to register the device as a hwrng
    device using the hwrng-core framework. This not only improves and
    cleans up the driver, but also fills the guest's entropy pool
    without having to resort to using extra userspace tools in the guest.

    This is typically a nuisance when booting a guest: the random pool
    takes a long time (~200s) to build up enough entropy since the dummy
    hwrng is not used to fill the guest's pool.

    This port was originally attempted by Alexander Neville "dark" (in CC,
    discussion in Link), but the conversation there stalled since the
    handling of -EAGAIN errors were no removed and longer handled by the
    driver. This patch attempts to use the existing method of error
    handling but utilises the new hwrng core.

    The issue can be noticed when booting a UML guest:

    [ 2.560000] random: fast init done
    [ 214.000000] random: crng init done

    With the patch applied, filling the pool becomes a lot quicker:

    [ 2.560000] random: fast init done
    [ 12.000000] random: crng init done

    Cc: Alexander Neville
    Link: https://lore.kernel.org/lkml/20190828204609.02a7ff70@TheDarkness/
    Link: https://lore.kernel.org/lkml/20190829135001.6a5ff940@TheDarkness.local/
    Cc: Sjoerd Simons
    Signed-off-by: Christopher Obbard
    Acked-by: Anton Ivanov
    Signed-off-by: Richard Weinberger
    Signed-off-by: Sasha Levin

    Christopher Obbard
     

30 Dec, 2020

5 commits

  • commit ff9632d2a66512436d616ef4c380a0e73f748db1 upstream.

    Since the time-travel rework, basic time-travel mode hasn't worked
    properly, but there's no longer a need for this WARN_ON() so just
    remove it and thereby fix things.

    Cc: stable@vger.kernel.org
    Fixes: 4b786e24ca80 ("um: time-travel: Rewrite as an event scheduler")
    Signed-off-by: Johannes Berg
    Signed-off-by: Richard Weinberger
    Signed-off-by: Greg Kroah-Hartman

    Johannes Berg
     
  • commit 97be7ceaf7fea68104824b6aa874cff235333ac1 upstream.

    asprintf is not compatible with the existing uml memory allocation
    mechanism. Its use on the "user" side of UML results in a corrupt slab
    state.

    Fixes: 0d4e5ac7e780 ("um: remove uses of variable length arrays")
    Cc: stable@vger.kernel.org
    Signed-off-by: Anton Ivanov
    Signed-off-by: Richard Weinberger
    Signed-off-by: Greg Kroah-Hartman

    Anton Ivanov
     
  • [ Upstream commit 9431f7c199ab0d02da1482d62255e0b4621cb1b5 ]

    xterm serial channel was leaking a fd used in setting up the
    port helper

    This bug is prehistoric - it predates switching to git. The "fixes"
    header here is really just to mark all the versions we would like this to
    apply to which is "Anything from the Cretaceous period onwards".

    No dinosaurs were harmed in fixing this bug.

    Fixes: b40997b872cd ("um: drivers/xterm.c: fix a file descriptor leak")
    Signed-off-by: Anton Ivanov
    Signed-off-by: Richard Weinberger
    Signed-off-by: Sasha Levin

    Anton Ivanov
     
  • [ Upstream commit 9b1c0c0e25dcccafd30e7d4c150c249cc65550eb ]

    Fix a logical error in tty reading. We get 0 and errno == EAGAIN
    on the first attempt to read from a closed file descriptor.

    Compared to that a true EAGAIN is EAGAIN and -1.

    If we check errno for EAGAIN first, before checking the return
    value we miss the fact that the descriptor is closed.

    This bug is as old as the driver. It was not showing up with
    the original POLL based IRQ controller, because it was
    producing multiple events. Switching to EPOLL unmasked it.

    Fixes: ff6a17989c08 ("Epoll based IRQ controller")
    Signed-off-by: Anton Ivanov
    Signed-off-by: Richard Weinberger
    Signed-off-by: Sasha Levin

    Anton Ivanov
     
  • [ Upstream commit e3a01cbee9c5f2c6fc813dd6af007716e60257e7 ]

    Ensure that file closes, connection closes, etc are propagated
    as interrupts in the interrupt controller.

    Fixes: ff6a17989c08 ("Epoll based IRQ controller")
    Signed-off-by: Anton Ivanov
    Signed-off-by: Richard Weinberger
    Signed-off-by: Sasha Levin

    Anton Ivanov
     

30 Nov, 2020

1 commit


24 Nov, 2020

1 commit

  • We call arch_cpu_idle() with RCU disabled, but then use
    local_irq_{en,dis}able(), which invokes tracing, which relies on RCU.

    Switch all arch_cpu_idle() implementations to use
    raw_local_irq_{en,dis}able() and carefully manage the
    lockdep,rcu,tracing state like we do in entry.

    (XXX: we really should change arch_cpu_idle() to not return with
    interrupts enabled)

    Reported-by: Sven Schnelle
    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Mark Rutland
    Tested-by: Mark Rutland
    Link: https://lkml.kernel.org/r/20201120114925.594122626@infradead.org

    Peter Zijlstra
     

11 Nov, 2020

1 commit


27 Oct, 2020

1 commit

  • A couple of um files ended up not including the header file that defines
    the __section() macro, and the simplest fix is to just revert the change
    for those files.

    Fixes: 33def8498fdd treewide: Convert macro and uses of __section(foo) to __section("foo")
    Reported-and-tested-by: Guenter Roeck
    Cc: Joe Perches
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

26 Oct, 2020

1 commit

  • Use a more generic form for __section that requires quotes to avoid
    complications with clang and gcc differences.

    Remove the quote operator # from compiler_attributes.h __section macro.

    Convert all unquoted __section(foo) uses to quoted __section("foo").
    Also convert __attribute__((section("foo"))) uses to __section("foo")
    even if the __attribute__ has multiple list entry forms.

    Conversion done using the script at:

    https://lore.kernel.org/lkml/75393e5ddc272dc7403de74d645e6c6e0f4e70eb.camel@perches.com/2-convert_section.pl

    Signed-off-by: Joe Perches
    Reviewed-by: Nick Desaulniers
    Reviewed-by: Miguel Ojeda
    Signed-off-by: Linus Torvalds

    Joe Perches
     

24 Oct, 2020

1 commit

  • Pull arch task_work cleanups from Jens Axboe:
    "Two cleanups that don't fit other categories:

    - Finally get the task_work_add() cleanup done properly, so we don't
    have random 0/1/false/true/TWA_SIGNAL confusing use cases. Updates
    all callers, and also fixes up the documentation for
    task_work_add().

    - While working on some TIF related changes for 5.11, this
    TIF_NOTIFY_RESUME cleanup fell out of that. Remove some arch
    duplication for how that is handled"

    * tag 'arch-cleanup-2020-10-22' of git://git.kernel.dk/linux-block:
    task_work: cleanup notification modes
    tracehook: clear TIF_NOTIFY_RESUME in tracehook_notify_resume()

    Linus Torvalds
     

23 Oct, 2020

2 commits

  • Pull Kbuild updates from Masahiro Yamada:

    - Support 'make compile_commands.json' to generate the compilation
    database more easily, avoiding stale entries

    - Support 'make clang-analyzer' and 'make clang-tidy' for static checks
    using clang-tidy

    - Preprocess scripts/modules.lds.S to allow CONFIG options in the
    module linker script

    - Drop cc-option tests from compiler flags supported by our minimal
    GCC/Clang versions

    - Use always 12-digits commit hash for CONFIG_LOCALVERSION_AUTO=y

    - Use sha1 build id for both BFD linker and LLD

    - Improve deb-pkg for reproducible builds and rootless builds

    - Remove stale, useless scripts/namespace.pl

    - Turn -Wreturn-type warning into error

    - Fix build error of deb-pkg when CONFIG_MODULES=n

    - Replace 'hostname' command with more portable 'uname -n'

    - Various Makefile cleanups

    * tag 'kbuild-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (34 commits)
    kbuild: Use uname for LINUX_COMPILE_HOST detection
    kbuild: Only add -fno-var-tracking-assignments for old GCC versions
    kbuild: remove leftover comment for filechk utility
    treewide: remove DISABLE_LTO
    kbuild: deb-pkg: clean up package name variables
    kbuild: deb-pkg: do not build linux-headers package if CONFIG_MODULES=n
    kbuild: enforce -Werror=return-type
    scripts: remove namespace.pl
    builddeb: Add support for all required debian/rules targets
    builddeb: Enable rootless builds
    builddeb: Pass -n to gzip for reproducible packages
    kbuild: split the build log of kallsyms
    kbuild: explicitly specify the build id style
    scripts/setlocalversion: make git describe output more reliable
    kbuild: remove cc-option test of -Werror=date-time
    kbuild: remove cc-option test of -fno-stack-check
    kbuild: remove cc-option test of -fno-strict-overflow
    kbuild: move CFLAGS_{KASAN,UBSAN,KCSAN} exports to relevant Makefiles
    kbuild: remove redundant CONFIG_KASAN check from scripts/Makefile.kasan
    kbuild: do not create built-in objects for external module builds
    ...

    Linus Torvalds
     
  • Pull initial set_fs() removal from Al Viro:
    "Christoph's set_fs base series + fixups"

    * 'work.set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    fs: Allow a NULL pos pointer to __kernel_read
    fs: Allow a NULL pos pointer to __kernel_write
    powerpc: remove address space overrides using set_fs()
    powerpc: use non-set_fs based maccess routines
    x86: remove address space overrides using set_fs()
    x86: make TASK_SIZE_MAX usable from assembly code
    x86: move PAGE_OFFSET, TASK_SIZE & friends to page_{32,64}_types.h
    lkdtm: remove set_fs-based tests
    test_bitmap: remove user bitmap tests
    uaccess: add infrastructure for kernel builds with set_fs()
    fs: don't allow splice read/write without explicit ops
    fs: don't allow kernel reads and writes without iter ops
    sysctl: Convert to iter interfaces
    proc: add a read_iter method to proc proc_ops
    proc: cleanup the compat vs no compat file ops
    proc: remove a level of indentation in proc_get_inode

    Linus Torvalds
     

19 Oct, 2020

1 commit

  • Pull UML updates from Richard Weinberger:

    - Improve support for non-glibc systems

    - Vector: Add support for scripting and dynamic tap devices

    - Various fixes for the vector networking driver

    - Various fixes for time travel mode

    * tag 'for-linus-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
    um: vector: Add dynamic tap interfaces and scripting
    um: Clean up stacktrace dump
    um: Fix incorrect assumptions about max pid length
    um: Remove dead usage of TIF_IA32
    um: Remove redundant NULL check
    um: change sigio_spinlock to a mutex
    um: time-travel: Return the sequence number in ACK messages
    um: time-travel: Fix IRQ handling in time_travel_handle_message()
    um: Allow static linking for non-glibc implementations
    um: Some fixes to build UML with musl
    um: vector: Use GFP_ATOMIC under spin lock
    um: Fix null pointer dereference in vector_user_bpf

    Linus Torvalds
     

18 Oct, 2020

1 commit


14 Oct, 2020

1 commit

  • Pull seccomp updates from Kees Cook:
    "The bulk of the changes are with the seccomp selftests to accommodate
    some powerpc-specific behavioral characteristics. Additional cleanups,
    fixes, and improvements are also included:

    - heavily refactor seccomp selftests (and clone3 selftests
    dependency) to fix powerpc (Kees Cook, Thadeu Lima de Souza
    Cascardo)

    - fix style issue in selftests (Zou Wei)

    - upgrade "unknown action" from KILL_THREAD to KILL_PROCESS (Rich
    Felker)

    - replace task_pt_regs(current) with current_pt_regs() (Denis
    Efremov)

    - fix corner-case race in USER_NOTIF (Jann Horn)

    - make CONFIG_SECCOMP no longer per-arch (YiFei Zhu)"

    * tag 'seccomp-v5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (23 commits)
    seccomp: Make duplicate listener detection non-racy
    seccomp: Move config option SECCOMP to arch/Kconfig
    selftests/clone3: Avoid OS-defined clone_args
    selftests/seccomp: powerpc: Set syscall return during ptrace syscall exit
    selftests/seccomp: Allow syscall nr and ret value to be set separately
    selftests/seccomp: Record syscall during ptrace entry
    selftests/seccomp: powerpc: Fix seccomp return value testing
    selftests/seccomp: Remove SYSCALL_NUM_RET_SHARE_REG in favor of SYSCALL_RET_SET
    selftests/seccomp: Avoid redundant register flushes
    selftests/seccomp: Convert REGSET calls into ARCH_GETREG/ARCH_SETREG
    selftests/seccomp: Convert HAVE_GETREG into ARCH_GETREG/ARCH_SETREG
    selftests/seccomp: Remove syscall setting #ifdefs
    selftests/seccomp: mips: Remove O32-specific macro
    selftests/seccomp: arm64: Define SYSCALL_NUM_SET macro
    selftests/seccomp: arm: Define SYSCALL_NUM_SET macro
    selftests/seccomp: mips: Define SYSCALL_NUM_SET macro
    selftests/seccomp: Provide generic syscall setting macro
    selftests/seccomp: Refactor arch register macros to avoid xtensa special case
    selftests/seccomp: Use __NR_mknodat instead of __NR_mknod
    selftests/seccomp: Use bitwise instead of arithmetic operator for flags
    ...

    Linus Torvalds
     

13 Oct, 2020

1 commit

  • Pull orphan section checking from Ingo Molnar:
    "Orphan link sections were a long-standing source of obscure bugs,
    because the heuristics that various linkers & compilers use to handle
    them (include these bits into the output image vs discarding them
    silently) are both highly idiosyncratic and also version dependent.

    Instead of this historically problematic mess, this tree by Kees Cook
    (et al) adds build time asserts and build time warnings if there's any
    orphan section in the kernel or if a section is not sized as expected.

    And because we relied on so many silent assumptions in this area, fix
    a metric ton of dependencies and some outright bugs related to this,
    before we can finally enable the checks on the x86, ARM and ARM64
    platforms"

    * tag 'core-build-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
    x86/boot/compressed: Warn on orphan section placement
    x86/build: Warn on orphan section placement
    arm/boot: Warn on orphan section placement
    arm/build: Warn on orphan section placement
    arm64/build: Warn on orphan section placement
    x86/boot/compressed: Add missing debugging sections to output
    x86/boot/compressed: Remove, discard, or assert for unwanted sections
    x86/boot/compressed: Reorganize zero-size section asserts
    x86/build: Add asserts for unwanted sections
    x86/build: Enforce an empty .got.plt section
    x86/asm: Avoid generating unused kprobe sections
    arm/boot: Handle all sections explicitly
    arm/build: Assert for unwanted sections
    arm/build: Add missing sections
    arm/build: Explicitly keep .ARM.attributes sections
    arm/build: Refactor linker script headers
    arm64/build: Assert for unwanted sections
    arm64/build: Add missing DWARF sections
    arm64/build: Use common DISCARDS in linker script
    arm64/build: Remove .eh_frame* sections due to unwind tables
    ...

    Linus Torvalds
     

12 Oct, 2020

11 commits

  • Provide functionality roughly compatible with the existing qemu
    ifup scripting:
    * invocation of an ifup script. The interface name is passed as the
    first and only argument
    * allocating tap interfaces on the fly if they are not explicitly
    specified

    Signed-off-by: Anton Ivanov
    Signed-off-by: Richard Weinberger

    Anton Ivanov
     
  • We currently get a few stray newlines, due to the interaction
    between printk() and the code here. Remove a few explicit
    newline prints to neaten the output.

    Signed-off-by: Johannes Berg
    Signed-off-by: Richard Weinberger

    Johannes Berg
     
  • pids are no longer limited to 16-bits, bump to 32-bits,
    ie. 9 decimal characters. Additionally sizeof("/") already
    returns 2 - ie. it already accounts for trailing zero.

    Cc: Jeff Dike
    Cc: Richard Weinberger
    Cc: Anton Ivanov
    Cc: Linux UM Mailing List
    Signed-off-by: Maciej Żenczykowski
    Signed-off-by: Richard Weinberger

    Maciej Żenczykowski
     
  • Fix below warnings reported by coccicheck:
    ./arch/um/drivers/vector_user.c:403:2-7: WARNING: NULL check before some freeing functions is not needed.

    Fixes: bc8f8e4e6e7a ("um: Add a generic "fd" vector transport")
    Signed-off-by: Li Heng
    Signed-off-by: Richard Weinberger

    Li Heng
     
  • Lockdep complains at boot:

    =============================
    [ BUG: Invalid wait context ]
    5.7.0-05093-g46d91ecd597b #98 Not tainted
    -----------------------------
    swapper/1 is trying to lock:
    0000000060931b98 (&desc[i].request_mutex){+.+.}-{3:3}, at: __setup_irq+0x11d/0x623
    other info that might help us debug this:
    context-{4:4}
    1 lock held by swapper/1:
    #0: 000000006074fed8 (sigio_spinlock){+.+.}-{2:2}, at: sigio_lock+0x1a/0x1c
    stack backtrace:
    CPU: 0 PID: 1 Comm: swapper Not tainted 5.7.0-05093-g46d91ecd597b #98
    Stack:
    7fa4fab0 6028dfd1 0000002a 6008bea5
    7fa50700 7fa50040 7fa4fac0 6028e016
    7fa4fb50 6007f6da 60959c18 00000000
    Call Trace:
    [] show_stack+0x13b/0x155
    [] dump_stack+0x2a/0x2c
    [] __lock_acquire+0x515/0x15f2
    [] lock_acquire+0x245/0x273
    [] __mutex_lock+0xbd/0x325
    [] mutex_lock_nested+0x1d/0x1f
    [] __setup_irq+0x11d/0x623
    [] request_threaded_irq+0x169/0x1a6
    [] um_request_irq+0x1ee/0x24b
    [] write_sigio_irq+0x3b/0x76
    [] sigio_broken+0x146/0x2e4
    [] do_one_initcall+0xde/0x281

    Because we hold sigio_spinlock and then get into requesting
    an interrupt with a mutex.

    Change the spinlock to a mutex to avoid that.

    Signed-off-by: Johannes Berg
    Signed-off-by: Richard Weinberger

    Johannes Berg
     
  • For external time travel, the protocol says to return the
    incoming sequence number in the ACK message to aid debugging,
    so do that.

    Signed-off-by: Johannes Berg
    Acked-By: Anton Ivanov
    Signed-off-by: Richard Weinberger

    Johannes Berg
     
  • As the comment here indicates, we need to do the polling in the
    idle loop without blocking interrupts, since interrupts can be
    vhost-user messages that we must process even while in our idle
    loop.

    I don't know why I explained one thing and implemented another,
    but we have indeed observed random hangs due to this, depending
    on the timing of the messages.

    Fixes: 88ce64249233 ("um: Implement time-travel=ext")
    Signed-off-by: Johannes Berg
    Acked-By: Anton Ivanov
    Signed-off-by: Richard Weinberger

    Johannes Berg
     
  • It is possible to produce a statically linked UML binary with UML_NET_VECTOR,
    UML_NET_VDE and UML_NET_PCAP options enabled using alternative libc
    implementations, which do not rely on NSS, such as musl.

    Allow static linking in this case.

    Signed-off-by: Ignat Korchagin
    Reviewed-by: Brendan Higgins
    Tested-by: Brendan Higgins
    Signed-off-by: Richard Weinberger

    Ignat Korchagin
     
  • musl toolchain and headers are a bit more strict. These fixes enable building
    UML with musl as well as seem not to break on glibc.

    Signed-off-by: Ignat Korchagin
    Tested-by: Brendan Higgins
    Signed-off-by: Richard Weinberger

    Ignat Korchagin
     
  • Use GFP_ATOMIC instead of GFP_KERNEL under spin lock to fix possible
    sleep-in-atomic-context bugs.

    Fixes: 9807019a62dc ("um: Loadable BPF "Firmware" for vector drivers")
    Signed-off-by: Tiezhu Yang
    Acked-By: Anton Ivanov
    Signed-off-by: Richard Weinberger

    Tiezhu Yang
     
  • The bpf_prog is being checked for !NULL after uml_kmalloc
    but later its used directly for example:
    bpf_prog->filter = bpf and is also later returned upon
    success. Fix this, do a NULL check and return right away.

    Signed-off-by: Gaurav Singh
    Acked-By: Anton Ivanov
    Signed-off-by: Richard Weinberger

    Gaurav Singh
     

09 Oct, 2020

1 commit

  • In order to make adding configurable features into seccomp easier,
    it's better to have the options at one single location, considering
    especially that the bulk of seccomp code is arch-independent. An quick
    look also show that many SECCOMP descriptions are outdated; they talk
    about /proc rather than prctl.

    As a result of moving the config option and keeping it default on,
    architectures arm, arm64, csky, riscv, sh, and xtensa did not have SECCOMP
    on by default prior to this and SECCOMP will be default in this change.

    Architectures microblaze, mips, powerpc, s390, sh, and sparc have an
    outdated depend on PROC_FS and this dependency is removed in this change.

    Suggested-by: Jann Horn
    Link: https://lore.kernel.org/lkml/CAG48ez1YWz9cnp08UZgeieYRhHdqh-ch7aNwc4JRBnGyrmgfMg@mail.gmail.com/
    Signed-off-by: YiFei Zhu
    [kees: added HAVE_ARCH_SECCOMP help text, tweaked wording]
    Signed-off-by: Kees Cook
    Link: https://lore.kernel.org/r/9ede6ef35c847e58d61e476c6a39540520066613.1600951211.git.yifeifz2@illinois.edu

    YiFei Zhu
     

24 Sep, 2020

1 commit

  • There was a request to preprocess the module linker script like we
    do for the vmlinux one. (https://lkml.org/lkml/2020/8/21/512)

    The difference between vmlinux.lds and module.lds is that the latter
    is needed for external module builds, thus must be cleaned up by
    'make mrproper' instead of 'make clean'. Also, it must be created
    by 'make modules_prepare'.

    You cannot put it in arch/$(SRCARCH)/kernel/, which is cleaned up by
    'make clean'. I moved arch/$(SRCARCH)/kernel/module.lds to
    arch/$(SRCARCH)/include/asm/module.lds.h, which is included from
    scripts/module.lds.S.

    scripts/module.lds is fine because 'make clean' keeps all the
    build artifacts under scripts/.

    You can add arch-specific sections in .

    Signed-off-by: Masahiro Yamada
    Tested-by: Jessica Yu
    Acked-by: Will Deacon
    Acked-by: Geert Uytterhoeven
    Acked-by: Palmer Dabbelt
    Reviewed-by: Kees Cook
    Acked-by: Jessica Yu

    Masahiro Yamada
     

09 Sep, 2020

1 commit

  • Add a CONFIG_SET_FS option that is selected by architecturess that
    implement set_fs, which is all of them initially. If the option is not
    set stubs for routines related to overriding the address space are
    provided so that architectures can start to opt out of providing set_fs.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Kees Cook
    Signed-off-by: Al Viro

    Christoph Hellwig
     

01 Sep, 2020

1 commit

  • The .comment section doesn't belong in STABS_DEBUG. Split it out into a
    new macro named ELF_DETAILS. This will gain other non-debug sections
    that need to be accounted for when linking with --orphan-handling=warn.

    Signed-off-by: Kees Cook
    Signed-off-by: Ingo Molnar
    Cc: linux-arch@vger.kernel.org
    Link: https://lore.kernel.org/r/20200821194310.3089815-5-keescook@chromium.org

    Kees Cook
     

24 Aug, 2020

1 commit

  • Replace the existing /* fall through */ comments and its variants with
    the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
    fall-through markings when it is the case.

    [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

    Signed-off-by: Gustavo A. R. Silva

    Gustavo A. R. Silva
     

16 Aug, 2020

1 commit

  • Pull arch/sh updates from Rich Felker:
    "Cleanup, SECCOMP_FILTER support, message printing fixes, and other
    changes to arch/sh"

    * tag 'sh-for-5.9' of git://git.libc.org/linux-sh: (34 commits)
    sh: landisk: Add missing initialization of sh_io_port_base
    sh: bring syscall_set_return_value in line with other architectures
    sh: Add SECCOMP_FILTER
    sh: Rearrange blocks in entry-common.S
    sh: switch to copy_thread_tls()
    sh: use the generic dma coherent remap allocator
    sh: don't allow non-coherent DMA for NOMMU
    dma-mapping: consolidate the NO_DMA definition in kernel/dma/Kconfig
    sh: unexport register_trapped_io and match_trapped_io_handler
    sh: don't include in
    sh: move the ioremap implementation out of line
    sh: move ioremap_fixed details out of
    sh: remove __KERNEL__ ifdefs from non-UAPI headers
    sh: sort the selects for SUPERH alphabetically
    sh: remove -Werror from Makefiles
    sh: Replace HTTP links with HTTPS ones
    arch/sh/configs: remove obsolete CONFIG_SOC_CAMERA*
    sh: stacktrace: Remove stacktrace_ops.stack()
    sh: machvec: Modernize printing of kernel messages
    sh: pci: Modernize printing of kernel messages
    ...

    Linus Torvalds
     

15 Aug, 2020

1 commit


13 Aug, 2020

3 commits

  • Merge more updates from Andrew Morton:

    - most of the rest of MM (memcg, hugetlb, vmscan, proc, compaction,
    mempolicy, oom-kill, hugetlbfs, migration, thp, cma, util,
    memory-hotplug, cleanups, uaccess, migration, gup, pagemap),

    - various other subsystems (alpha, misc, sparse, bitmap, lib, bitops,
    checkpatch, autofs, minix, nilfs, ufs, fat, signals, kmod, coredump,
    exec, kdump, rapidio, panic, kcov, kgdb, ipc).

    * emailed patches from Andrew Morton : (164 commits)
    mm/gup: remove task_struct pointer for all gup code
    mm: clean up the last pieces of page fault accountings
    mm/xtensa: use general page fault accounting
    mm/x86: use general page fault accounting
    mm/sparc64: use general page fault accounting
    mm/sparc32: use general page fault accounting
    mm/sh: use general page fault accounting
    mm/s390: use general page fault accounting
    mm/riscv: use general page fault accounting
    mm/powerpc: use general page fault accounting
    mm/parisc: use general page fault accounting
    mm/openrisc: use general page fault accounting
    mm/nios2: use general page fault accounting
    mm/nds32: use general page fault accounting
    mm/mips: use general page fault accounting
    mm/microblaze: use general page fault accounting
    mm/m68k: use general page fault accounting
    mm/ia64: use general page fault accounting
    mm/hexagon: use general page fault accounting
    mm/csky: use general page fault accounting
    ...

    Linus Torvalds
     
  • Here're the last pieces of page fault accounting that were still done
    outside handle_mm_fault() where we still have regs==NULL when calling
    handle_mm_fault():

    arch/powerpc/mm/copro_fault.c: copro_handle_mm_fault
    arch/sparc/mm/fault_32.c: force_user_fault
    arch/um/kernel/trap.c: handle_page_fault
    mm/gup.c: faultin_page
    fixup_user_fault
    mm/hmm.c: hmm_vma_fault
    mm/ksm.c: break_ksm

    Some of them has the issue of duplicated accounting for page fault
    retries. Some of them didn't do the accounting at all.

    This patch cleans all these up by letting handle_mm_fault() to do per-task
    page fault accounting even if regs==NULL (though we'll still skip the perf
    event accountings). With that, we can safely remove all the outliers now.

    There's another functional change in that now we account the page faults
    to the caller of gup, rather than the task_struct that passed into the gup
    code. More information of this can be found at [1].

    After this patch, below things should never be touched again outside
    handle_mm_fault():

    - task_struct.[maj|min]_flt
    - PERF_COUNT_SW_PAGE_FAULTS_[MAJ|MIN]

    [1] https://lore.kernel.org/lkml/CAHk-=wj_V2Tps2QrMn20_W0OJF9xqNh52XSGA42s-ZJ8Y+GyKw@mail.gmail.com/

    Signed-off-by: Peter Xu
    Signed-off-by: Andrew Morton
    Cc: Albert Ou
    Cc: Alexander Gordeev
    Cc: Andy Lutomirski
    Cc: Benjamin Herrenschmidt
    Cc: Borislav Petkov
    Cc: Brian Cain
    Cc: Catalin Marinas
    Cc: Christian Borntraeger
    Cc: Chris Zankel
    Cc: Dave Hansen
    Cc: David S. Miller
    Cc: Geert Uytterhoeven
    Cc: Gerald Schaefer
    Cc: Greentime Hu
    Cc: Guo Ren
    Cc: Heiko Carstens
    Cc: Helge Deller
    Cc: H. Peter Anvin
    Cc: Ingo Molnar
    Cc: Ivan Kokshaysky
    Cc: James E.J. Bottomley
    Cc: John Hubbard
    Cc: Jonas Bonn
    Cc: Ley Foon Tan
    Cc: "Luck, Tony"
    Cc: Matt Turner
    Cc: Max Filippov
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Nick Hu
    Cc: Palmer Dabbelt
    Cc: Paul Mackerras
    Cc: Paul Walmsley
    Cc: Pekka Enberg
    Cc: Peter Zijlstra
    Cc: Richard Henderson
    Cc: Rich Felker
    Cc: Russell King
    Cc: Stafford Horne
    Cc: Stefan Kristiansson
    Cc: Thomas Bogendoerfer
    Cc: Thomas Gleixner
    Cc: Vasily Gorbik
    Cc: Vincent Chen
    Cc: Vineet Gupta
    Cc: Will Deacon
    Cc: Yoshinori Sato
    Link: http://lkml.kernel.org/r/20200707225021.200906-25-peterx@redhat.com
    Signed-off-by: Linus Torvalds

    Peter Xu
     
  • Patch series "mm: Page fault accounting cleanups", v5.

    This is v5 of the pf accounting cleanup series. It originates from Gerald
    Schaefer's report on an issue a week ago regarding to incorrect page fault
    accountings for retried page fault after commit 4064b9827063 ("mm: allow
    VM_FAULT_RETRY for multiple times"):

    https://lore.kernel.org/lkml/20200610174811.44b94525@thinkpad/

    What this series did:

    - Correct page fault accounting: we do accounting for a page fault
    (no matter whether it's from #PF handling, or gup, or anything else)
    only with the one that completed the fault. For example, page fault
    retries should not be counted in page fault counters. Same to the
    perf events.

    - Unify definition of PERF_COUNT_SW_PAGE_FAULTS: currently this perf
    event is used in an adhoc way across different archs.

    Case (1): for many archs it's done at the entry of a page fault
    handler, so that it will also cover e.g. errornous faults.

    Case (2): for some other archs, it is only accounted when the page
    fault is resolved successfully.

    Case (3): there're still quite some archs that have not enabled
    this perf event.

    Since this series will touch merely all the archs, we unify this
    perf event to always follow case (1), which is the one that makes most
    sense. And since we moved the accounting into handle_mm_fault, the
    other two MAJ/MIN perf events are well taken care of naturally.

    - Unify definition of "major faults": the definition of "major
    fault" is slightly changed when used in accounting (not
    VM_FAULT_MAJOR). More information in patch 1.

    - Always account the page fault onto the one that triggered the page
    fault. This does not matter much for #PF handlings, but mostly for
    gup. More information on this in patch 25.

    Patchset layout:

    Patch 1: Introduced the accounting in handle_mm_fault(), not enabled.
    Patch 2-23: Enable the new accounting for arch #PF handlers one by one.
    Patch 24: Enable the new accounting for the rest outliers (gup, iommu, etc.)
    Patch 25: Cleanup GUP task_struct pointer since it's not needed any more

    This patch (of 25):

    This is a preparation patch to move page fault accountings into the
    general code in handle_mm_fault(). This includes both the per task
    flt_maj/flt_min counters, and the major/minor page fault perf events. To
    do this, the pt_regs pointer is passed into handle_mm_fault().

    PERF_COUNT_SW_PAGE_FAULTS should still be kept in per-arch page fault
    handlers.

    So far, all the pt_regs pointer that passed into handle_mm_fault() is
    NULL, which means this patch should have no intented functional change.

    Suggested-by: Linus Torvalds
    Signed-off-by: Peter Xu
    Signed-off-by: Andrew Morton
    Cc: Albert Ou
    Cc: Alexander Gordeev
    Cc: Andy Lutomirski
    Cc: Benjamin Herrenschmidt
    Cc: Borislav Petkov
    Cc: Brian Cain
    Cc: Catalin Marinas
    Cc: Christian Borntraeger
    Cc: Chris Zankel
    Cc: Dave Hansen
    Cc: David S. Miller
    Cc: Geert Uytterhoeven
    Cc: Gerald Schaefer
    Cc: Greentime Hu
    Cc: Guo Ren
    Cc: Heiko Carstens
    Cc: Helge Deller
    Cc: H. Peter Anvin
    Cc: Ingo Molnar
    Cc: Ivan Kokshaysky
    Cc: James E.J. Bottomley
    Cc: John Hubbard
    Cc: Jonas Bonn
    Cc: Ley Foon Tan
    Cc: "Luck, Tony"
    Cc: Matt Turner
    Cc: Max Filippov
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Nick Hu
    Cc: Palmer Dabbelt
    Cc: Paul Mackerras
    Cc: Paul Walmsley
    Cc: Pekka Enberg
    Cc: Peter Zijlstra
    Cc: Richard Henderson
    Cc: Rich Felker
    Cc: Russell King
    Cc: Stafford Horne
    Cc: Stefan Kristiansson
    Cc: Thomas Bogendoerfer
    Cc: Thomas Gleixner
    Cc: Vasily Gorbik
    Cc: Vincent Chen
    Cc: Vineet Gupta
    Cc: Will Deacon
    Cc: Yoshinori Sato
    Link: http://lkml.kernel.org/r/20200707225021.200906-1-peterx@redhat.com
    Link: http://lkml.kernel.org/r/20200707225021.200906-2-peterx@redhat.com
    Signed-off-by: Linus Torvalds

    Peter Xu