06 Jan, 2021

1 commit


30 Dec, 2020

3 commits

  • commit 454efcf82ea17d7efeb86ebaa20775a21ec87d27 upstream.

    When a machine check interrupt is triggered during idle, the code
    is using the async timer/clock for idle time calculation. It should use
    the machine check enter timer/clock which is passed to the macro.

    Fixes: 0b0ed657fe00 ("s390: remove critical section cleanup from entry.S")
    Cc: # 5.8
    Reviewed-by: Heiko Carstens
    Signed-off-by: Sven Schnelle
    Signed-off-by: Heiko Carstens
    Signed-off-by: Greg Kroah-Hartman

    Sven Schnelle
     
  • commit e259b3fafa7de362b04ecd86e7fa9a9e9273e5fb upstream.

    During removal of the critical section cleanup the calculation
    of mt_cycles during idle was removed. This causes invalid
    accounting on systems with SMT enabled.

    Fixes: 0b0ed657fe00 ("s390: remove critical section cleanup from entry.S")
    Cc: # 5.8
    Reviewed-by: Heiko Carstens
    Signed-off-by: Sven Schnelle
    Signed-off-by: Heiko Carstens
    Signed-off-by: Greg Kroah-Hartman

    Sven Schnelle
     
  • commit b5e438ebd7e808d1d2435159ac4742e01a94b8da upstream.

    Not resetting the SMT siblings might leave them in unpredictable
    state. One of the observed problems was that the CPU timer wasn't
    reset and therefore large system time values where accounted during
    CPU bringup.

    Cc: # 4.0
    Fixes: 10ad34bc76dfb ("s390: add SMT support")
    Reviewed-by: Heiko Carstens
    Signed-off-by: Sven Schnelle
    Signed-off-by: Heiko Carstens
    Signed-off-by: Greg Kroah-Hartman

    Sven Schnelle
     

03 Dec, 2020

1 commit

  • With commit 58c644ba512c ("sched/idle: Fix arch_cpu_idle() vs
    tracing") common code calls arch_cpu_idle() with a lockdep state that
    tells irqs are on.

    This doesn't work very well for s390: psw_idle() will enable interrupts
    to wait for an interrupt. As soon as an interrupt occurs the interrupt
    handler will verify if the old context was psw_idle(). If that is the
    case the interrupt enablement bits in the old program status word will
    be cleared.

    A subsequent test in both the external as well as the io interrupt
    handler checks if in the old context interrupts were enabled. Due to
    the above patching of the old program status word it is assumed the
    old context had interrupts disabled, and therefore a call to
    TRACE_IRQS_OFF (aka trace_hardirqs_off_caller) is skipped. Which in
    turn makes lockdep incorrectly "think" that interrupts are enabled
    within the interrupt handler.

    Fix this by unconditionally calling TRACE_IRQS_OFF when entering
    interrupt handlers. Also call unconditionally TRACE_IRQS_ON when
    leaving interrupts handlers.

    This leaves the special psw_idle() case, which now returns with
    interrupts disabled, but has an "irqs on" lockdep state. So callers of
    psw_idle() must adjust the state on their own, if required. This is
    currently only __udelay_disabled().

    Fixes: 58c644ba512c ("sched/idle: Fix arch_cpu_idle() vs tracing")
    Acked-by: Peter Zijlstra (Intel)
    Signed-off-by: Heiko Carstens

    Heiko Carstens
     

30 Nov, 2020

1 commit


28 Nov, 2020

1 commit

  • Pull kvm fixes from Paolo Bonzini:
    "ARM:
    - Fix alignment of the new HYP sections
    - Fix GICR_TYPER access from userspace

    S390:
    - do not reset the global diag318 data for per-cpu reset
    - do not mark memory as protected too early
    - fix for destroy page ultravisor call

    x86:
    - fix for SEV debugging
    - fix incorrect return code
    - fix for 'noapic' with PIC in userspace and LAPIC in kernel
    - fix for 5-level paging"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
    kvm: x86/mmu: Fix get_mmio_spte() on CPUs supporting 5-level PT
    KVM: x86: Fix split-irqchip vs interrupt injection window request
    KVM: x86: handle !lapic_in_kernel case in kvm_cpu_*_extint
    MAINTAINERS: Update email address for Sean Christopherson
    MAINTAINERS: add uv.c also to KVM/s390
    s390/uv: handle destroy page legacy interface
    KVM: arm64: vgic-v3: Drop the reporting of GICR_TYPER.Last for userspace
    KVM: SVM: fix error return code in svm_create_vcpu()
    KVM: SVM: Fix offset computation bug in __sev_dbg_decrypt().
    KVM: arm64: Correctly align nVHE percpu data
    KVM: s390: remove diag318 reset code
    KVM: s390: pv: Mark mm as protected after the set secure parameters and improve cleanup

    Linus Torvalds
     

25 Nov, 2020

1 commit


24 Nov, 2020

1 commit

  • We call arch_cpu_idle() with RCU disabled, but then use
    local_irq_{en,dis}able(), which invokes tracing, which relies on RCU.

    Switch all arch_cpu_idle() implementations to use
    raw_local_irq_{en,dis}able() and carefully manage the
    lockdep,rcu,tracing state like we do in entry.

    (XXX: we really should change arch_cpu_idle() to not return with
    interrupts enabled)

    Reported-by: Sven Schnelle
    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Mark Rutland
    Tested-by: Mark Rutland
    Link: https://lkml.kernel.org/r/20201120114925.594122626@infradead.org

    Peter Zijlstra
     

23 Nov, 2020

1 commit

  • We need to disable interrupts in load_fpu_regs(). Otherwise an
    interrupt might come in after the registers are loaded, but before
    CIF_FPU is cleared in load_fpu_regs(). When the interrupt returns,
    CIF_FPU will be cleared and the registers will never be restored.

    The entry.S code usually saves the interrupt state in __SF_EMPTY on the
    stack when disabling/restoring interrupts. sie64a however saves the pointer
    to the sie control block in __SF_SIE_CONTROL, which references the same
    location. This is non-obvious to the reader. To avoid thrashing the sie
    control block pointer in load_fpu_regs(), move the __SIE_* offsets eight
    bytes after __SF_EMPTY on the stack.

    Cc: # 5.8
    Fixes: 0b0ed657fe00 ("s390: remove critical section cleanup from entry.S")
    Reported-by: Pierre Morel
    Signed-off-by: Sven Schnelle
    Acked-by: Christian Borntraeger
    Reviewed-by: Heiko Carstens
    Signed-off-by: Heiko Carstens

    Sven Schnelle
     

19 Nov, 2020

1 commit


18 Nov, 2020

2 commits

  • Older firmware can return rc=0x107 rrc=0xd for destroy page if the
    page is already non-secure. This should be handled like a success
    as already done by newer firmware.

    Signed-off-by: Christian Borntraeger
    Fixes: 1a80b54d1ce1 ("s390/uv: add destroy page call")
    Reviewed-by: David Hildenbrand
    Acked-by: Cornelia Huck
    Reviewed-by: Janosch Frank

    Christian Borntraeger
     
  • Pull s390 fixes from Heiko Carstens:

    - fix system call exit path; avoid return to user space with any
    TIF/CIF/PIF set

    - fix file permission for cpum_sfb_size parameter

    - another small defconfig update

    * tag 's390-5.10-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
    s390/cpum_sf.c: fix file permission for cpum_sfb_size
    s390: update defconfigs
    s390: fix system call exit path

    Linus Torvalds
     

12 Nov, 2020

1 commit

  • This file is installed by the s390 CPU Measurement sampling
    facility device driver to export supported minimum and
    maximum sample buffer sizes.
    This file is read by lscpumf tool to display the details
    of the device driver capabilities. The lscpumf tool might
    be invoked by a non-root user. In this case it does not
    print anything because the file contents can not be read.

    Fix this by allowing read access for all users. Reading
    the file contents is ok, changing the file contents is
    left to the root user only.

    For further reference and details see:
    [1] https://github.com/ibm-s390-tools/s390-tools/issues/97

    Fixes: 69f239ed335a ("s390/cpum_sf: Dynamically extend the sampling buffer if overflows occur")
    Cc: # 3.14
    Signed-off-by: Thomas Richter
    Acked-by: Sumanth Korikkar
    Signed-off-by: Heiko Carstens

    Thomas Richter
     

10 Nov, 2020

2 commits

  • struct perf_sample_data lives on-stack, we should be careful about it's
    size. Furthermore, the pt_regs copy in there is only because x86_64 is a
    trainwreck, solve it differently.

    Reported-by: Thomas Gleixner
    Signed-off-by: Peter Zijlstra (Intel)
    Tested-by: Steven Rostedt
    Link: https://lkml.kernel.org/r/20201030151955.258178461@infradead.org

    Peter Zijlstra
     
  • __perf_output_begin() has an on-stack struct perf_sample_data in the
    unlikely case it needs to generate a LOST record. However, every call
    to perf_output_begin() must already have a perf_sample_data on-stack.

    Reported-by: Thomas Gleixner
    Signed-off-by: Peter Zijlstra (Intel)
    Link: https://lkml.kernel.org/r/20201030151954.985416146@infradead.org

    Peter Zijlstra
     

09 Nov, 2020

1 commit

  • The system call exit path is running with interrupts enabled while
    checking for TIF/PIF/CIF bits which require special handling. If all
    bits have been checked interrupts are disabled and the kernel exits to
    user space.
    The problem is that after checking all bits and before interrupts are
    disabled bits can be set already again, due to interrupt handling.

    This means that the kernel can exit to user space with some
    TIF/PIF/CIF bits set, which should never happen. E.g. TIF_NEED_RESCHED
    might be set, which might lead to additional latencies, since that bit
    will only be recognized with next exit to user space.

    Fix this by checking the corresponding bits only when interrupts are
    disabled.

    Fixes: 0b0ed657fe00 ("s390: remove critical section cleanup from entry.S")
    Cc: # 5.8
    Acked-by: Sven Schnelle
    Signed-off-by: Heiko Carstens

    Heiko Carstens
     

03 Nov, 2020

2 commits

  • The call to rcu_cpu_starting() in smp_init_secondary() is not early
    enough in the CPU-hotplug onlining process, which results in lockdep
    splats as follows:

    WARNING: suspicious RCU usage
    -----------------------------
    kernel/locking/lockdep.c:3497 RCU-list traversed in non-reader section!!

    other info that might help us debug this:

    RCU used illegally from offline CPU!
    rcu_scheduler_active = 1, debug_locks = 1
    no locks held by swapper/1/0.

    Call Trace:
    show_stack+0x158/0x1f0
    dump_stack+0x1f2/0x238
    __lock_acquire+0x2640/0x4dd0
    lock_acquire+0x3a8/0xd08
    _raw_spin_lock_irqsave+0xc0/0xf0
    clockevents_register_device+0xa8/0x528
    init_cpu_timer+0x33e/0x468
    smp_init_secondary+0x11a/0x328
    smp_start_secondary+0x82/0x88

    This is avoided by moving the call to rcu_cpu_starting up near the
    beginning of the smp_init_secondary() function. Note that the
    raw_smp_processor_id() is required in order to avoid calling into
    lockdep before RCU has declared the CPU to be watched for readers.

    Link: https://lore.kernel.org/lkml/160223032121.7002.1269740091547117869.tip-bot2@tip-bot2/
    Signed-off-by: Qian Cai
    Acked-by: Paul E. McKenney
    Signed-off-by: Heiko Carstens

    Qian Cai
     
  • Signed-off-by: Heiko Carstens

    Heiko Carstens
     

24 Oct, 2020

1 commit

  • Pull arch task_work cleanups from Jens Axboe:
    "Two cleanups that don't fit other categories:

    - Finally get the task_work_add() cleanup done properly, so we don't
    have random 0/1/false/true/TWA_SIGNAL confusing use cases. Updates
    all callers, and also fixes up the documentation for
    task_work_add().

    - While working on some TIF related changes for 5.11, this
    TIF_NOTIFY_RESUME cleanup fell out of that. Remove some arch
    duplication for how that is handled"

    * tag 'arch-cleanup-2020-10-22' of git://git.kernel.dk/linux-block:
    task_work: cleanup notification modes
    tracehook: clear TIF_NOTIFY_RESUME in tracehook_notify_resume()

    Linus Torvalds
     

23 Oct, 2020

1 commit

  • Pull Kbuild updates from Masahiro Yamada:

    - Support 'make compile_commands.json' to generate the compilation
    database more easily, avoiding stale entries

    - Support 'make clang-analyzer' and 'make clang-tidy' for static checks
    using clang-tidy

    - Preprocess scripts/modules.lds.S to allow CONFIG options in the
    module linker script

    - Drop cc-option tests from compiler flags supported by our minimal
    GCC/Clang versions

    - Use always 12-digits commit hash for CONFIG_LOCALVERSION_AUTO=y

    - Use sha1 build id for both BFD linker and LLD

    - Improve deb-pkg for reproducible builds and rootless builds

    - Remove stale, useless scripts/namespace.pl

    - Turn -Wreturn-type warning into error

    - Fix build error of deb-pkg when CONFIG_MODULES=n

    - Replace 'hostname' command with more portable 'uname -n'

    - Various Makefile cleanups

    * tag 'kbuild-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (34 commits)
    kbuild: Use uname for LINUX_COMPILE_HOST detection
    kbuild: Only add -fno-var-tracking-assignments for old GCC versions
    kbuild: remove leftover comment for filechk utility
    treewide: remove DISABLE_LTO
    kbuild: deb-pkg: clean up package name variables
    kbuild: deb-pkg: do not build linux-headers package if CONFIG_MODULES=n
    kbuild: enforce -Werror=return-type
    scripts: remove namespace.pl
    builddeb: Add support for all required debian/rules targets
    builddeb: Enable rootless builds
    builddeb: Pass -n to gzip for reproducible packages
    kbuild: split the build log of kallsyms
    kbuild: explicitly specify the build id style
    scripts/setlocalversion: make git describe output more reliable
    kbuild: remove cc-option test of -Werror=date-time
    kbuild: remove cc-option test of -fno-stack-check
    kbuild: remove cc-option test of -fno-strict-overflow
    kbuild: move CFLAGS_{KASAN,UBSAN,KCSAN} exports to relevant Makefiles
    kbuild: remove redundant CONFIG_KASAN check from scripts/Makefile.kasan
    kbuild: do not create built-in objects for external module builds
    ...

    Linus Torvalds
     

19 Oct, 2020

1 commit

  • There is usecase that System Management Software(SMS) want to give a
    memory hint like MADV_[COLD|PAGEEOUT] to other processes and in the
    case of Android, it is the ActivityManagerService.

    The information required to make the reclaim decision is not known to the
    app. Instead, it is known to the centralized userspace
    daemon(ActivityManagerService), and that daemon must be able to initiate
    reclaim on its own without any app involvement.

    To solve the issue, this patch introduces a new syscall
    process_madvise(2). It uses pidfd of an external process to give the
    hint. It also supports vector address range because Android app has
    thousands of vmas due to zygote so it's totally waste of CPU and power if
    we should call the syscall one by one for each vma.(With testing 2000-vma
    syscall vs 1-vector syscall, it showed 15% performance improvement. I
    think it would be bigger in real practice because the testing ran very
    cache friendly environment).

    Another potential use case for the vector range is to amortize the cost
    ofTLB shootdowns for multiple ranges when using MADV_DONTNEED; this could
    benefit users like TCP receive zerocopy and malloc implementations. In
    future, we could find more usecases for other advises so let's make it
    happens as API since we introduce a new syscall at this moment. With
    that, existing madvise(2) user could replace it with process_madvise(2)
    with their own pid if they want to have batch address ranges support
    feature.

    ince it could affect other process's address range, only privileged
    process(PTRACE_MODE_ATTACH_FSCREDS) or something else(e.g., being the same
    UID) gives it the right to ptrace the process could use it successfully.
    The flag argument is reserved for future use if we need to extend the API.

    I think supporting all hints madvise has/will supported/support to
    process_madvise is rather risky. Because we are not sure all hints make
    sense from external process and implementation for the hint may rely on
    the caller being in the current context so it could be error-prone. Thus,
    I just limited hints as MADV_[COLD|PAGEOUT] in this patch.

    If someone want to add other hints, we could hear the usecase and review
    it for each hint. It's safer for maintenance rather than introducing a
    buggy syscall but hard to fix it later.

    So finally, the API is as follows,

    ssize_t process_madvise(int pidfd, const struct iovec *iovec,
    unsigned long vlen, int advice, unsigned int flags);

    DESCRIPTION
    The process_madvise() system call is used to give advice or directions
    to the kernel about the address ranges from external process as well as
    local process. It provides the advice to address ranges of process
    described by iovec and vlen. The goal of such advice is to improve
    system or application performance.

    The pidfd selects the process referred to by the PID file descriptor
    specified in pidfd. (See pidofd_open(2) for further information)

    The pointer iovec points to an array of iovec structures, defined in
    as:

    struct iovec {
    void *iov_base; /* starting address */
    size_t iov_len; /* number of bytes to be advised */
    };

    The iovec describes address ranges beginning at address(iov_base)
    and with size length of bytes(iov_len).

    The vlen represents the number of elements in iovec.

    The advice is indicated in the advice argument, which is one of the
    following at this moment if the target process specified by pidfd is
    external.

    MADV_COLD
    MADV_PAGEOUT

    Permission to provide a hint to external process is governed by a
    ptrace access mode PTRACE_MODE_ATTACH_FSCREDS check; see ptrace(2).

    The process_madvise supports every advice madvise(2) has if target
    process is in same thread group with calling process so user could
    use process_madvise(2) to extend existing madvise(2) to support
    vector address ranges.

    RETURN VALUE
    On success, process_madvise() returns the number of bytes advised.
    This return value may be less than the total number of requested
    bytes, if an error occurred. The caller should check return value
    to determine whether a partial advice occurred.

    FAQ:

    Q.1 - Why does any external entity have better knowledge?

    Quote from Sandeep

    "For Android, every application (including the special SystemServer)
    are forked from Zygote. The reason of course is to share as many
    libraries and classes between the two as possible to benefit from the
    preloading during boot.

    After applications start, (almost) all of the APIs end up calling into
    this SystemServer process over IPC (binder) and back to the
    application.

    In a fully running system, the SystemServer monitors every single
    process periodically to calculate their PSS / RSS and also decides
    which process is "important" to the user for interactivity.

    So, because of how these processes start _and_ the fact that the
    SystemServer is looping to monitor each process, it does tend to *know*
    which address range of the application is not used / useful.

    Besides, we can never rely on applications to clean things up
    themselves. We've had the "hey app1, the system is low on memory,
    please trim your memory usage down" notifications for a long time[1].
    They rely on applications honoring the broadcasts and very few do.

    So, if we want to avoid the inevitable killing of the application and
    restarting it, some way to be able to tell the OS about unimportant
    memory in these applications will be useful.

    - ssp

    Q.2 - How to guarantee the race(i.e., object validation) between when
    giving a hint from an external process and get the hint from the target
    process?

    process_madvise operates on the target process's address space as it
    exists at the instant that process_madvise is called. If the space
    target process can run between the time the process_madvise process
    inspects the target process address space and the time that
    process_madvise is actually called, process_madvise may operate on
    memory regions that the calling process does not expect. It's the
    responsibility of the process calling process_madvise to close this
    race condition. For example, the calling process can suspend the
    target process with ptrace, SIGSTOP, or the freezer cgroup so that it
    doesn't have an opportunity to change its own address space before
    process_madvise is called. Another option is to operate on memory
    regions that the caller knows a priori will be unchanged in the target
    process. Yet another option is to accept the race for certain
    process_madvise calls after reasoning that mistargeting will do no
    harm. The suggested API itself does not provide synchronization. It
    also apply other APIs like move_pages, process_vm_write.

    The race isn't really a problem though. Why is it so wrong to require
    that callers do their own synchronization in some manner? Nobody
    objects to write(2) merely because it's possible for two processes to
    open the same file and clobber each other's writes --- instead, we tell
    people to use flock or something. Think about mmap. It never
    guarantees newly allocated address space is still valid when the user
    tries to access it because other threads could unmap the memory right
    before. That's where we need synchronization by using other API or
    design from userside. It shouldn't be part of API itself. If someone
    needs more fine-grained synchronization rather than process level,
    there were two ideas suggested - cookie[2] and anon-fd[3]. Both are
    applicable via using last reserved argument of the API but I don't
    think it's necessary right now since we have already ways to prevent
    the race so don't want to add additional complexity with more
    fine-grained optimization model.

    To make the API extend, it reserved an unsigned long as last argument
    so we could support it in future if someone really needs it.

    Q.3 - Why doesn't ptrace work?

    Injecting an madvise in the target process using ptrace would not work
    for us because such injected madvise would have to be executed by the
    target process, which means that process would have to be runnable and
    that creates the risk of the abovementioned race and hinting a wrong
    VMA. Furthermore, we want to act the hint in caller's context, not the
    callee's, because the callee is usually limited in cpuset/cgroups or
    even freezed state so they can't act by themselves quick enough, which
    causes more thrashing/kill. It doesn't work if the target process are
    ptraced(e.g., strace, debugger, minidump) because a process can have at
    most one ptracer.

    [1] https://developer.android.com/topic/performance/memory"

    [2] process_getinfo for getting the cookie which is updated whenever
    vma of process address layout are changed - Daniel Colascione -
    https://lore.kernel.org/lkml/20190520035254.57579-1-minchan@kernel.org/T/#m7694416fd179b2066a2c62b5b139b14e3894e224

    [3] anonymous fd which is used for the object(i.e., address range)
    validation - Michal Hocko -
    https://lore.kernel.org/lkml/20200120112722.GY18451@dhcp22.suse.cz/

    [minchan@kernel.org: fix process_madvise build break for arm64]
    Link: http://lkml.kernel.org/r/20200303145756.GA219683@google.com
    [minchan@kernel.org: fix build error for mips of process_madvise]
    Link: http://lkml.kernel.org/r/20200508052517.GA197378@google.com
    [akpm@linux-foundation.org: fix patch ordering issue]
    [akpm@linux-foundation.org: fix arm64 whoops]
    [minchan@kernel.org: make process_madvise() vlen arg have type size_t, per Florian]
    [akpm@linux-foundation.org: fix i386 build]
    [sfr@canb.auug.org.au: fix syscall numbering]
    Link: https://lkml.kernel.org/r/20200905142639.49fc3f1a@canb.auug.org.au
    [sfr@canb.auug.org.au: madvise.c needs compat.h]
    Link: https://lkml.kernel.org/r/20200908204547.285646b4@canb.auug.org.au
    [minchan@kernel.org: fix mips build]
    Link: https://lkml.kernel.org/r/20200909173655.GC2435453@google.com
    [yuehaibing@huawei.com: remove duplicate header which is included twice]
    Link: https://lkml.kernel.org/r/20200915121550.30584-1-yuehaibing@huawei.com
    [minchan@kernel.org: do not use helper functions for process_madvise]
    Link: https://lkml.kernel.org/r/20200921175539.GB387368@google.com
    [akpm@linux-foundation.org: pidfd_get_pid() gained an argument]
    [sfr@canb.auug.org.au: fix up for "iov_iter: transparently handle compat iovecs in import_iovec"]
    Link: https://lkml.kernel.org/r/20200928212542.468e1fef@canb.auug.org.au

    Signed-off-by: Minchan Kim
    Signed-off-by: YueHaibing
    Signed-off-by: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Reviewed-by: Suren Baghdasaryan
    Reviewed-by: Vlastimil Babka
    Acked-by: David Rientjes
    Cc: Alexander Duyck
    Cc: Brian Geffon
    Cc: Christian Brauner
    Cc: Daniel Colascione
    Cc: Jann Horn
    Cc: Jens Axboe
    Cc: Joel Fernandes
    Cc: Johannes Weiner
    Cc: John Dias
    Cc: Kirill Tkhai
    Cc: Michal Hocko
    Cc: Oleksandr Natalenko
    Cc: Sandeep Patil
    Cc: SeongJae Park
    Cc: SeongJae Park
    Cc: Shakeel Butt
    Cc: Sonny Rao
    Cc: Tim Murray
    Cc: Christian Brauner
    Cc: Florian Weimer
    Cc:
    Link: http://lkml.kernel.org/r/20200302193630.68771-3-minchan@kernel.org
    Link: http://lkml.kernel.org/r/20200508183320.GA125527@google.com
    Link: http://lkml.kernel.org/r/20200622192900.22757-4-minchan@kernel.org
    Link: https://lkml.kernel.org/r/20200901000633.1920247-4-minchan@kernel.org
    Signed-off-by: Linus Torvalds

    Minchan Kim
     

18 Oct, 2020

1 commit


17 Oct, 2020

1 commit

  • Pull s390 updates from Vasily Gorbik:

    - Remove address space overrides using set_fs()

    - Convert to generic vDSO

    - Convert to generic page table dumper

    - Add ARCH_HAS_DEBUG_WX support

    - Add leap seconds handling support

    - Add NVMe firmware-assisted kernel dump support

    - Extend NVMe boot support with memory clearing control and addition of
    kernel parameters

    - AP bus and zcrypt api code rework. Add adapter configure/deconfigure
    interface. Extend debug features. Add failure injection support

    - Add ECC secure private keys support

    - Add KASan support for running protected virtualization host with
    4-level paging

    - Utilize destroy page ultravisor call to speed up secure guests
    shutdown

    - Implement ioremap_wc() and ioremap_prot() with MIO in PCI code

    - Various checksum improvements

    - Other small various fixes and improvements all over the code

    * tag 's390-5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (85 commits)
    s390/uaccess: fix indentation
    s390/uaccess: add default cases for __put_user_fn()/__get_user_fn()
    s390/zcrypt: fix wrong format specifications
    s390/kprobes: move insn_page to text segment
    s390/sie: fix typo in SIGP code description
    s390/lib: fix kernel doc for memcmp()
    s390/zcrypt: Introduce Failure Injection feature
    s390/zcrypt: move ap_msg param one level up the call chain
    s390/ap/zcrypt: revisit ap and zcrypt error handling
    s390/ap: Support AP card SCLP config and deconfig operations
    s390/sclp: Add support for SCLP AP adapter config/deconfig
    s390/ap: add card/queue deconfig state
    s390/ap: add error response code field for ap queue devices
    s390/ap: split ap queue state machine state from device state
    s390/zcrypt: New config switch CONFIG_ZCRYPT_DEBUG
    s390/zcrypt: introduce msg tracking in zcrypt functions
    s390/startup: correct early pgm check info formatting
    s390: remove orphaned extern variables declarations
    s390/kasan: make sure int handler always run with DAT on
    s390/ipl: add support to control memory clearing for nvme re-IPL
    ...

    Linus Torvalds
     

16 Oct, 2020

1 commit

  • Pull dma-mapping updates from Christoph Hellwig:

    - rework the non-coherent DMA allocator

    - move private definitions out of

    - lower CMA_ALIGNMENT (Paul Cercueil)

    - remove the omap1 dma address translation in favor of the common code

    - make dma-direct aware of multiple dma offset ranges (Jim Quinlan)

    - support per-node DMA CMA areas (Barry Song)

    - increase the default seg boundary limit (Nicolin Chen)

    - misc fixes (Robin Murphy, Thomas Tai, Xu Wang)

    - various cleanups

    * tag 'dma-mapping-5.10' of git://git.infradead.org/users/hch/dma-mapping: (63 commits)
    ARM/ixp4xx: add a missing include of dma-map-ops.h
    dma-direct: simplify the DMA_ATTR_NO_KERNEL_MAPPING handling
    dma-direct: factor out a dma_direct_alloc_from_pool helper
    dma-direct check for highmem pages in dma_direct_alloc_pages
    dma-mapping: merge into
    dma-mapping: move large parts of to kernel/dma
    dma-mapping: move dma-debug.h to kernel/dma/
    dma-mapping: remove
    dma-mapping: merge into
    dma-contiguous: remove dma_contiguous_set_default
    dma-contiguous: remove dev_set_cma_area
    dma-contiguous: remove dma_declare_contiguous
    dma-mapping: split
    cma: decrease CMA_ALIGNMENT lower limit to 2
    firewire-ohci: use dma_alloc_pages
    dma-iommu: implement ->alloc_noncoherent
    dma-mapping: add new {alloc,free}_noncoherent dma_map_ops methods
    dma-mapping: add a new dma_alloc_pages API
    dma-mapping: remove dma_cache_sync
    53c700: convert to dma_alloc_noncoherent
    ...

    Linus Torvalds
     

14 Oct, 2020

2 commits

  • There are several occurrences of the following pattern:

    for_each_memblock(memory, reg) {
    start = __pfn_to_phys(memblock_region_memory_base_pfn(reg);
    end = __pfn_to_phys(memblock_region_memory_end_pfn(reg));

    /* do something with start and end */
    }

    Using for_each_mem_range() iterator is more appropriate in such cases and
    allows simpler and cleaner code.

    [akpm@linux-foundation.org: fix arch/arm/mm/pmsa-v7.c build]
    [rppt@linux.ibm.com: mips: fix cavium-octeon build caused by memblock refactoring]
    Link: http://lkml.kernel.org/r/20200827124549.GD167163@linux.ibm.com

    Signed-off-by: Mike Rapoport
    Signed-off-by: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Baoquan He
    Cc: Benjamin Herrenschmidt
    Cc: Borislav Petkov
    Cc: Catalin Marinas
    Cc: Christoph Hellwig
    Cc: Daniel Axtens
    Cc: Dave Hansen
    Cc: Emil Renner Berthing
    Cc: Hari Bathini
    Cc: Ingo Molnar
    Cc: Ingo Molnar
    Cc: Jonathan Cameron
    Cc: Marek Szyprowski
    Cc: Max Filippov
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Miguel Ojeda
    Cc: Palmer Dabbelt
    Cc: Paul Mackerras
    Cc: Paul Walmsley
    Cc: Peter Zijlstra
    Cc: Russell King
    Cc: Stafford Horne
    Cc: Thomas Bogendoerfer
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Cc: Yoshinori Sato
    Link: https://lkml.kernel.org/r/20200818151634.14343-13-rppt@kernel.org
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • The only user of memblock_dbg() outside memblock was s390 setup code and
    it is converted to use pr_debug() instead. This allows to stop exposing
    memblock_debug and memblock_dbg() to the rest of the kernel.

    [akpm@linux-foundation.org: make memblock_dbg() safer and neater]

    Signed-off-by: Mike Rapoport
    Signed-off-by: Andrew Morton
    Reviewed-by: Baoquan He
    Cc: Andy Lutomirski
    Cc: Benjamin Herrenschmidt
    Cc: Borislav Petkov
    Cc: Catalin Marinas
    Cc: Christoph Hellwig
    Cc: Daniel Axtens
    Cc: Dave Hansen
    Cc: Emil Renner Berthing
    Cc: Hari Bathini
    Cc: Ingo Molnar
    Cc: Ingo Molnar
    Cc: Jonathan Cameron
    Cc: Marek Szyprowski
    Cc: Max Filippov
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Miguel Ojeda
    Cc: Palmer Dabbelt
    Cc: Paul Mackerras
    Cc: Paul Walmsley
    Cc: Peter Zijlstra
    Cc: Russell King
    Cc: Stafford Horne
    Cc: Thomas Bogendoerfer
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Cc: Yoshinori Sato
    Link: https://lkml.kernel.org/r/20200818151634.14343-10-rppt@kernel.org
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     

13 Oct, 2020

5 commits

  • Pull compat mount cleanups from Al Viro:
    "The last remnants of mount(2) compat buried by Christoph.

    Buried into NFS, that is.

    Generally I'm less enthusiastic about "let's use in_compat_syscall()
    deep in call chain" kind of approach than Christoph seems to be, but
    in this case it's warranted - that had been an NFS-specific wart,
    hopefully not to be repeated in any other filesystems (read: any new
    filesystem introducing non-text mount options will get NAKed even if
    it doesn't mess the layout up).

    IOW, not worth trying to grow an infrastructure that would avoid that
    use of in_compat_syscall()..."

    * 'compat.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    fs: remove compat_sys_mount
    fs,nfs: lift compat nfs4 mount data handling into the nfs code
    nfs: simplify nfs4_parse_monolithic

    Linus Torvalds
     
  • Pull compat iovec cleanups from Al Viro:
    "Christoph's series around import_iovec() and compat variant thereof"

    * 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    security/keys: remove compat_keyctl_instantiate_key_iov
    mm: remove compat_process_vm_{readv,writev}
    fs: remove compat_sys_vmsplice
    fs: remove the compat readv/writev syscalls
    fs: remove various compat readv/writev helpers
    iov_iter: transparently handle compat iovecs in import_iovec
    iov_iter: refactor rw_copy_check_uvector and import_iovec
    iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c
    compat.h: fix a spelling error in

    Linus Torvalds
     
  • Pull perf/kprobes updates from Ingo Molnar:
    "This prepares to unify the kretprobe trampoline handler and make
    kretprobe lockless (those patches are still work in progress)"

    * tag 'perf-kprobes-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    kprobes: Fix to check probe enabled before disarm_kprobe_ftrace()
    kprobes: Make local functions static
    kprobes: Free kretprobe_instance with RCU callback
    kprobes: Remove NMI context check
    sparc: kprobes: Use generic kretprobe trampoline handler
    sh: kprobes: Use generic kretprobe trampoline handler
    s390: kprobes: Use generic kretprobe trampoline handler
    powerpc: kprobes: Use generic kretprobe trampoline handler
    parisc: kprobes: Use generic kretprobe trampoline handler
    mips: kprobes: Use generic kretprobe trampoline handler
    ia64: kprobes: Use generic kretprobe trampoline handler
    csky: kprobes: Use generic kretprobe trampoline handler
    arc: kprobes: Use generic kretprobe trampoline handler
    arm64: kprobes: Use generic kretprobe trampoline handler
    arm: kprobes: Use generic kretprobe trampoline handler
    x86/kprobes: Use generic kretprobe trampoline handler
    kprobes: Add generic kretprobe trampoline handler

    Linus Torvalds
     
  • Pull orphan section checking from Ingo Molnar:
    "Orphan link sections were a long-standing source of obscure bugs,
    because the heuristics that various linkers & compilers use to handle
    them (include these bits into the output image vs discarding them
    silently) are both highly idiosyncratic and also version dependent.

    Instead of this historically problematic mess, this tree by Kees Cook
    (et al) adds build time asserts and build time warnings if there's any
    orphan section in the kernel or if a section is not sized as expected.

    And because we relied on so many silent assumptions in this area, fix
    a metric ton of dependencies and some outright bugs related to this,
    before we can finally enable the checks on the x86, ARM and ARM64
    platforms"

    * tag 'core-build-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
    x86/boot/compressed: Warn on orphan section placement
    x86/build: Warn on orphan section placement
    arm/boot: Warn on orphan section placement
    arm/build: Warn on orphan section placement
    arm64/build: Warn on orphan section placement
    x86/boot/compressed: Add missing debugging sections to output
    x86/boot/compressed: Remove, discard, or assert for unwanted sections
    x86/boot/compressed: Reorganize zero-size section asserts
    x86/build: Add asserts for unwanted sections
    x86/build: Enforce an empty .got.plt section
    x86/asm: Avoid generating unused kprobe sections
    arm/boot: Handle all sections explicitly
    arm/build: Assert for unwanted sections
    arm/build: Add missing sections
    arm/build: Explicitly keep .ARM.attributes sections
    arm/build: Refactor linker script headers
    arm64/build: Assert for unwanted sections
    arm64/build: Add missing DWARF sections
    arm64/build: Use common DISCARDS in linker script
    arm64/build: Remove .eh_frame* sections due to unwind tables
    ...

    Linus Torvalds
     
  • Pull arm64 updates from Will Deacon:
    "There's quite a lot of code here, but much of it is due to the
    addition of a new PMU driver as well as some arm64-specific selftests
    which is an area where we've traditionally been lagging a bit.

    In terms of exciting features, this includes support for the Memory
    Tagging Extension which narrowly missed 5.9, hopefully allowing
    userspace to run with use-after-free detection in production on CPUs
    that support it. Work is ongoing to integrate the feature with KASAN
    for 5.11.

    Another change that I'm excited about (assuming they get the hardware
    right) is preparing the ASID allocator for sharing the CPU page-table
    with the SMMU. Those changes will also come in via Joerg with the
    IOMMU pull.

    We do stray outside of our usual directories in a few places, mostly
    due to core changes required by MTE. Although much of this has been
    Acked, there were a couple of places where we unfortunately didn't get
    any review feedback.

    Other than that, we ran into a handful of minor conflicts in -next,
    but nothing that should post any issues.

    Summary:

    - Userspace support for the Memory Tagging Extension introduced by
    Armv8.5. Kernel support (via KASAN) is likely to follow in 5.11.

    - Selftests for MTE, Pointer Authentication and FPSIMD/SVE context
    switching.

    - Fix and subsequent rewrite of our Spectre mitigations, including
    the addition of support for PR_SPEC_DISABLE_NOEXEC.

    - Support for the Armv8.3 Pointer Authentication enhancements.

    - Support for ASID pinning, which is required when sharing
    page-tables with the SMMU.

    - MM updates, including treating flush_tlb_fix_spurious_fault() as a
    no-op.

    - Perf/PMU driver updates, including addition of the ARM CMN PMU
    driver and also support to handle CPU PMU IRQs as NMIs.

    - Allow prefetchable PCI BARs to be exposed to userspace using normal
    non-cacheable mappings.

    - Implementation of ARCH_STACKWALK for unwinding.

    - Improve reporting of unexpected kernel traps due to BPF JIT
    failure.

    - Improve robustness of user-visible HWCAP strings and their
    corresponding numerical constants.

    - Removal of TEXT_OFFSET.

    - Removal of some unused functions, parameters and prototypes.

    - Removal of MPIDR-based topology detection in favour of firmware
    description.

    - Cleanups to handling of SVE and FPSIMD register state in
    preparation for potential future optimisation of handling across
    syscalls.

    - Cleanups to the SDEI driver in preparation for support in KVM.

    - Miscellaneous cleanups and refactoring work"

    * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (148 commits)
    Revert "arm64: initialize per-cpu offsets earlier"
    arm64: random: Remove no longer needed prototypes
    arm64: initialize per-cpu offsets earlier
    kselftest/arm64: Check mte tagged user address in kernel
    kselftest/arm64: Verify KSM page merge for MTE pages
    kselftest/arm64: Verify all different mmap MTE options
    kselftest/arm64: Check forked child mte memory accessibility
    kselftest/arm64: Verify mte tag inclusion via prctl
    kselftest/arm64: Add utilities and a test to validate mte memory
    perf: arm-cmn: Fix conversion specifiers for node type
    perf: arm-cmn: Fix unsigned comparison to less than zero
    arm64: dbm: Invalidate local TLB when setting TCR_EL1.HD
    arm64: mm: Make flush_tlb_fix_spurious_fault() a no-op
    arm64: Add support for PR_SPEC_DISABLE_NOEXEC prctl() option
    arm64: Pull in task_stack_page() to Spectre-v4 mitigation code
    KVM: arm64: Allow patching EL2 vectors even with KASLR is not enabled
    arm64: Get rid of arm64_ssbd_state
    KVM: arm64: Convert ARCH_WORKAROUND_2 to arm64_get_spectre_v4_state()
    KVM: arm64: Get rid of kvm_arm_have_ssbd()
    KVM: arm64: Simplify handling of ARCH_WORKAROUND_2
    ...

    Linus Torvalds
     

10 Oct, 2020

1 commit

  • Move the in-kernel kprobes insn page to text segment. Rationale:
    having that page in rw data segment is suboptimal, since as soon as a
    kprobe is set, this will split the 1:1 kernel mapping for a single
    page which get new permissions.

    Note: there is always at least one kprobe present for the kretprobe
    trampoline; so the mapping will always be split into smaller 4k
    mappings because of this.

    Moving the kprobes insn page into text segment makes sure that the
    page is mapped RO/X in any case, and avoids that the 1:1 mapping is
    split.

    The kprobe insn_page is defined as a dummy function which is filled
    with "br %r14" instructions.

    Signed-off-by: Heiko Carstens
    Signed-off-by: Vasily Gorbik

    Heiko Carstens
     

09 Oct, 2020

1 commit


06 Oct, 2020

1 commit


03 Oct, 2020

3 commits


02 Oct, 2020

2 commits

  • arch/s390/kernel/entry.h: suspend_zero_pages - only declaration left
    after commit 394216275c7d ("s390: remove broken hibernate / power
    management support")

    arch/s390/include/asm/setup.h: vmhalt_cmd - only declaration left after
    commit 99ca4e582d4a ("[S390] kernel: Shutdown Actions Interface")

    arch/s390/include/asm/setup.h: vmpoff_cmd - only declaration left after
    commit 99ca4e582d4a ("[S390] kernel: Shutdown Actions Interface")

    Signed-off-by: Vasily Gorbik

    Vasily Gorbik
     
  • Since commit 998f5bbe3dbd ("s390/kasan: fix early pgm check handler
    execution") early pgm check handler is executed with DAT on if Kasan
    is enabled.

    Still there is a window between setup_lowcore_dat_off() and
    setup_lowcore_dat_on() when int handlers could be executed with DAT off
    under Kasan. If this happens the kernel ends up in pgm check loop due
    to Kasan shadow memory access attempts.

    With Kasan enabled paging is initialized much earlier and DAT flag has to
    be on at all times instrumented code is executed. Make sure int handlers
    are set up to be called with DAT on right away in this case.

    Signed-off-by: Vasily Gorbik

    Vasily Gorbik