20 Jan, 2021

5 commits

  • commit 69e976831cd53f9ba304fd20305b2025ecc78eab upstream.

    LLVM-built Linux triggered a boot hangup with KASLR enabled.

    arch/mips/kernel/relocate.c:get_random_boot() uses linux_banner,
    which is a string constant, as a random seed, but accesses it
    as an array of unsigned long (in rotate_xor()).
    When the address of linux_banner is not aligned to sizeof(long),
    such access emits unaligned access exception and hangs the kernel.

    Use PTR_ALIGN() to align input address to sizeof(long) and also
    align down the input length to prevent possible access-beyond-end.

    Fixes: 405bc8fd12f5 ("MIPS: Kernel: Implement KASLR using CONFIG_RELOCATABLE")
    Cc: stable@vger.kernel.org # 4.7+
    Signed-off-by: Alexander Lobakin
    Tested-by: Nathan Chancellor
    Reviewed-by: Kees Cook
    Signed-off-by: Thomas Bogendoerfer
    Signed-off-by: Greg Kroah-Hartman

    Alexander Lobakin
     
  • commit 698222457465ce343443be81c5512edda86e5914 upstream.

    Patches that introduced NT_FILE and NT_SIGINFO notes back in 2012
    had taken care of native (fs/binfmt_elf.c) and compat (fs/compat_binfmt_elf.c)
    coredumps; unfortunately, compat on mips (which does not go through the
    usual compat_binfmt_elf.c) had not been noticed.

    As the result, both N32 and O32 coredumps on 64bit mips kernels
    have those sections malformed enough to confuse the living hell out of
    all gdb and readelf versions (up to and including the tip of binutils-gdb.git).

    Longer term solution is to make both O32 and N32 compat use the
    regular compat_binfmt_elf.c, but that's too much for backports. The minimal
    solution is to do in arch/mips/kernel/binfmt_elf[on]32.c the same thing
    those patches have done in fs/compat_binfmt_elf.c

    Cc: stable@kernel.org # v3.7+
    Signed-off-by: Al Viro
    Signed-off-by: Thomas Bogendoerfer
    Signed-off-by: Greg Kroah-Hartman

    Al Viro
     
  • commit 4d4f9c1a17a3480f8fe523673f7232b254d724b7 upstream.

    The compressed payload is not necesarily 4-byte aligned, at least when
    compiling with Clang. In that case, the 4-byte value appended to the
    compressed payload that corresponds to the uncompressed kernel image
    size must be read using get_unaligned_le32().

    This fixes Clang-built kernels not booting on MIPS (tested on a Ingenic
    JZ4770 board).

    Fixes: b8f54f2cde78 ("MIPS: ZBOOT: copy appended dtb to the end of the kernel")
    Cc: # v4.7
    Signed-off-by: Paul Cercueil
    Reviewed-by: Nick Desaulniers
    Reviewed-by: Philippe Mathieu-Daudé
    Signed-off-by: Thomas Bogendoerfer
    Signed-off-by: Greg Kroah-Hartman

    Paul Cercueil
     
  • commit 5b058973d3205578aa6c9a71392e072a11ca44ef upstream.

    When building mips tinyconfig with clang the following warning show up:

    arch/mips/lib/uncached.c:45:6: warning: variable 'sp' is uninitialized when used here [-Wuninitialized]
    if (sp >= (long)CKSEG0 && sp < (long)CKSEG2)
    ^~
    arch/mips/lib/uncached.c:40:18: note: initialize the variable 'sp' to silence this warning
    register long sp __asm__("$sp");
    ^
    = 0
    1 warning generated.

    Rework to make an explicit inline move, instead of the non-standard use
    of specifying registers for local variables. This is what's written
    from the gcc-10 manual [1] about specifying registers for local
    variables:

    "6.47.5.2 Specifying Registers for Local Variables
    .................................................
    [...]

    "The only supported use for this feature is to specify registers for
    input and output operands when calling Extended 'asm' (*note Extended
    Asm::). [...]".

    [1] https://docs.w3cub.com/gcc~10/local-register-variables
    Signed-off-by: Anders Roxell
    Reported-by: Nathan Chancellor
    Reported-by: Naresh Kamboju
    Reviewed-by: Nick Desaulniers
    Signed-off-by: Thomas Bogendoerfer
    Signed-off-by: Greg Kroah-Hartman

    Anders Roxell
     
  • commit ad4fddef5f2345aa9214e979febe2f47639c10d9 upstream.

    When building mips tinyconfig with clang the following error show up:

    WARNING: modpost: vmlinux.o(.text+0x1940c): Section mismatch in reference from the function r4k_cache_init() to the function .init.text:loongson3_sc_init()
    The function r4k_cache_init() references
    the function __init loongson3_sc_init().
    This is often because r4k_cache_init lacks a __init
    annotation or the annotation of loongson3_sc_init is wrong.

    Remove marked __init from function loongson3_sc_init(),
    mips_sc_probe_cm3(), and mips_sc_probe().

    Signed-off-by: Anders Roxell
    Reviewed-by: Nick Desaulniers
    Signed-off-by: Thomas Bogendoerfer
    Signed-off-by: Greg Kroah-Hartman

    Anders Roxell
     

13 Jan, 2021

1 commit

  • [ Upstream commit 87dbc209ea04645fd2351981f09eff5d23f8e2e9 ]

    Make mandatory in include/asm-generic/Kbuild and
    remove all arch/*/include/asm/local64.h arch-specific files since they
    only #include .

    This fixes build errors on arch/c6x/ and arch/nios2/ for
    block/blk-iocost.c.

    Build-tested on 21 of 25 arch-es. (tools problems on the others)

    Yes, we could even rename to
    and change all #includes to use
    instead.

    Link: https://lkml.kernel.org/r/20201227024446.17018-1-rdunlap@infradead.org
    Signed-off-by: Randy Dunlap
    Suggested-by: Christoph Hellwig
    Reviewed-by: Masahiro Yamada
    Cc: Jens Axboe
    Cc: Ley Foon Tan
    Cc: Mark Salter
    Cc: Aurelien Jacquiot
    Cc: Peter Zijlstra
    Cc: Arnd Bergmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Sasha Levin

    Randy Dunlap
     

30 Dec, 2020

2 commits

  • [ Upstream commit d121f125af22a16f0f679293756d28a9691fa46d ]

    Linux doesn't own the memory immediately after the kernel image. On Octeon
    bootloader places a shared structure right close after the kernel _end,
    refer to "struct cvmx_bootinfo *octeon_bootinfo" in cavium-octeon/setup.c.

    If check_kernel_sections_mem() rounds the PFNs up, first memblock_alloc()
    inside early_init_dt_alloc_memory_arch()
    Signed-off-by: Thomas Bogendoerfer
    Signed-off-by: Sasha Levin

    Alexander Sverdlin
     
  • [ Upstream commit 3a5fe2fb9635c43359c9729352f45044f3c8df6b ]

    When BCM47XX_BCMA is enabled and BCMA_DRIVER_PCI is disabled, it results
    in the following Kbuild warning:

    WARNING: unmet direct dependencies detected for BCMA_DRIVER_PCI_HOSTMODE
    Depends on [n]: MIPS [=y] && BCMA_DRIVER_PCI [=n] && PCI_DRIVERS_LEGACY [=y] && BCMA [=y]=y
    Selected by [y]:
    - BCM47XX_BCMA [=y] && BCM47XX [=y] && PCI [=y]

    The reason is that BCM47XX_BCMA selects BCMA_DRIVER_PCI_HOSTMODE without
    depending on or selecting BCMA_DRIVER_PCI while BCMA_DRIVER_PCI_HOSTMODE
    depends on BCMA_DRIVER_PCI. This can also fail building the kernel.

    Honor the kconfig dependency to remove unmet direct dependency warnings
    and avoid any potential build failures.

    Fixes: c1d1c5d4213e ("bcm47xx: add support for bcma bus")
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=209879
    Signed-off-by: Necip Fazil Yildiran
    Signed-off-by: Thomas Bogendoerfer
    Signed-off-by: Sasha Levin

    Necip Fazil Yildiran
     

30 Nov, 2020

1 commit


28 Nov, 2020

1 commit


24 Nov, 2020

1 commit

  • We call arch_cpu_idle() with RCU disabled, but then use
    local_irq_{en,dis}able(), which invokes tracing, which relies on RCU.

    Switch all arch_cpu_idle() implementations to use
    raw_local_irq_{en,dis}able() and carefully manage the
    lockdep,rcu,tracing state like we do in entry.

    (XXX: we really should change arch_cpu_idle() to not return with
    interrupts enabled)

    Reported-by: Sven Schnelle
    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Mark Rutland
    Tested-by: Mark Rutland
    Link: https://lkml.kernel.org/r/20201120114925.594122626@infradead.org

    Peter Zijlstra
     

17 Nov, 2020

2 commits


16 Nov, 2020

1 commit

  • Stefan Agner reported a bug when using zsram on 32-bit Arm machines
    with RAM above the 4GB address boundary:

    Unable to handle kernel NULL pointer dereference at virtual address 00000000
    pgd = a27bd01c
    [00000000] *pgd=236a0003, *pmd=1ffa64003
    Internal error: Oops: 207 [#1] SMP ARM
    Modules linked in: mdio_bcm_unimac(+) brcmfmac cfg80211 brcmutil raspberrypi_hwmon hci_uart crc32_arm_ce bcm2711_thermal phy_generic genet
    CPU: 0 PID: 123 Comm: mkfs.ext4 Not tainted 5.9.6 #1
    Hardware name: BCM2711
    PC is at zs_map_object+0x94/0x338
    LR is at zram_bvec_rw.constprop.0+0x330/0xa64
    pc : [] lr : [] psr: 60000013
    sp : e376bbe0 ip : 00000000 fp : c1e2921c
    r10: 00000002 r9 : c1dda730 r8 : 00000000
    r7 : e8ff7a00 r6 : 00000000 r5 : 02f9ffa0 r4 : e3710000
    r3 : 000fdffe r2 : c1e0ce80 r1 : ebf979a0 r0 : 00000000
    Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
    Control: 30c5383d Table: 235c2a80 DAC: fffffffd
    Process mkfs.ext4 (pid: 123, stack limit = 0x495a22e6)
    Stack: (0xe376bbe0 to 0xe376c000)

    As it turns out, zsram needs to know the maximum memory size, which
    is defined in MAX_PHYSMEM_BITS when CONFIG_SPARSEMEM is set, or in
    MAX_POSSIBLE_PHYSMEM_BITS on the x86 architecture.

    The same problem will be hit on all 32-bit architectures that have a
    physical address space larger than 4GB and happen to not enable sparsemem
    and include asm/sparsemem.h from asm/pgtable.h.

    After the initial discussion, I suggested just always defining
    MAX_POSSIBLE_PHYSMEM_BITS whenever CONFIG_PHYS_ADDR_T_64BIT is
    set, or provoking a build error otherwise. This addresses all
    configurations that can currently have this runtime bug, but
    leaves all other configurations unchanged.

    I looked up the possible number of bits in source code and
    datasheets, here is what I found:

    - on ARC, CONFIG_ARC_HAS_PAE40 controls whether 32 or 40 bits are used
    - on ARM, CONFIG_LPAE enables 40 bit addressing, without it we never
    support more than 32 bits, even though supersections in theory allow
    up to 40 bits as well.
    - on MIPS, some MIPS32r1 or later chips support 36 bits, and MIPS32r5
    XPA supports up to 60 bits in theory, but 40 bits are more than
    anyone will ever ship
    - On PowerPC, there are three different implementations of 36 bit
    addressing, but 32-bit is used without CONFIG_PTE_64BIT
    - On RISC-V, the normal page table format can support 34 bit
    addressing. There is no highmem support on RISC-V, so anything
    above 2GB is unused, but it might be useful to eventually support
    CONFIG_ZRAM for high pages.

    Fixes: 61989a80fb3a ("staging: zsmalloc: zsmalloc memory allocation library")
    Fixes: 02390b87a945 ("mm/zsmalloc: Prepare to variable MAX_PHYSMEM_BITS")
    Acked-by: Thomas Bogendoerfer
    Reviewed-by: Stefan Agner
    Tested-by: Stefan Agner
    Acked-by: Mike Rapoport
    Link: https://lore.kernel.org/linux-mm/bdfa44bf1c570b05d6c70898e2bbb0acf234ecdf.1604762181.git.stefan@agner.ch/
    Signed-off-by: Arnd Bergmann

    Arnd Bergmann
     

27 Oct, 2020

1 commit

  • MIPS should export its local version of "has_transparent_hugepage"
    so that loadable modules (dax) can use it.

    Fixes this build error:
    ERROR: modpost: "has_transparent_hugepage" [drivers/dax/dax.ko] undefined!

    Fixes: fd8cfd300019 ("arch: fix has_transparent_hugepage()")
    Reported-by: kernel test robot
    Signed-off-by: Randy Dunlap
    Cc: Thomas Bogendoerfer
    Cc: linux-mips@vger.kernel.org
    Cc: Dan Williams
    Cc: Vishal Verma
    Cc: Dave Jiang
    Cc: linux-nvdimm@lists.01.org
    Cc: Hugh Dickins
    Cc: Andrew Morton
    Signed-off-by: Thomas Bogendoerfer

    Randy Dunlap
     

26 Oct, 2020

1 commit

  • Use a more generic form for __section that requires quotes to avoid
    complications with clang and gcc differences.

    Remove the quote operator # from compiler_attributes.h __section macro.

    Convert all unquoted __section(foo) uses to quoted __section("foo").
    Also convert __attribute__((section("foo"))) uses to __section("foo")
    even if the __attribute__ has multiple list entry forms.

    Conversion done using the script at:

    https://lore.kernel.org/lkml/75393e5ddc272dc7403de74d645e6c6e0f4e70eb.camel@perches.com/2-convert_section.pl

    Signed-off-by: Joe Perches
    Reviewed-by: Nick Desaulniers
    Reviewed-by: Miguel Ojeda
    Signed-off-by: Linus Torvalds

    Joe Perches
     

25 Oct, 2020

1 commit

  • Pull ARM SoC-related driver updates from Olof Johansson:
    "Various driver updates for platforms. A bulk of this is smaller fixes
    or cleanups, but some of the new material this time around is:

    - Support for Nvidia Tegra234 SoC

    - Ring accelerator support for TI AM65x

    - PRUSS driver for TI platforms

    - Renesas support for R-Car V3U SoC

    - Reset support for Cortex-M4 processor on i.MX8MQ

    There are also new socinfo entries for a handful of different SoCs and
    platforms"

    * tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (131 commits)
    drm/mediatek: reduce clear event
    soc: mediatek: cmdq: add clear option in cmdq_pkt_wfe api
    soc: mediatek: cmdq: add jump function
    soc: mediatek: cmdq: add write_s_mask value function
    soc: mediatek: cmdq: add write_s value function
    soc: mediatek: cmdq: add read_s function
    soc: mediatek: cmdq: add write_s_mask function
    soc: mediatek: cmdq: add write_s function
    soc: mediatek: cmdq: add address shift in jump
    soc: mediatek: mtk-infracfg: Fix kerneldoc
    soc: amlogic: pm-domains: use always-on flag
    reset: sti: reset-syscfg: fix struct description warnings
    reset: imx7: add the cm4 reset for i.MX8MQ
    dt-bindings: reset: imx8mq: add m4 reset
    reset: Fix and extend kerneldoc
    reset: reset-zynqmp: Added support for Versal platform
    dt-bindings: reset: Updated binding for Versal reset driver
    reset: imx7: Support module build
    soc: fsl: qe: Remove unnessesary check in ucc_set_tdm_rxtx_clk
    soc: fsl: qman: convert to use be32_add_cpu()
    ...

    Linus Torvalds
     

24 Oct, 2020

2 commits

  • Pull KVM updates from Paolo Bonzini:
    "For x86, there is a new alternative and (in the future) more scalable
    implementation of extended page tables that does not need a reverse
    map from guest physical addresses to host physical addresses.

    For now it is disabled by default because it is still lacking a few of
    the existing MMU's bells and whistles. However it is a very solid
    piece of work and it is already available for people to hammer on it.

    Other updates:

    ARM:
    - New page table code for both hypervisor and guest stage-2
    - Introduction of a new EL2-private host context
    - Allow EL2 to have its own private per-CPU variables
    - Support of PMU event filtering
    - Complete rework of the Spectre mitigation

    PPC:
    - Fix for running nested guests with in-kernel IRQ chip
    - Fix race condition causing occasional host hard lockup
    - Minor cleanups and bugfixes

    x86:
    - allow trapping unknown MSRs to userspace
    - allow userspace to force #GP on specific MSRs
    - INVPCID support on AMD
    - nested AMD cleanup, on demand allocation of nested SVM state
    - hide PV MSRs and hypercalls for features not enabled in CPUID
    - new test for MSR_IA32_TSC writes from host and guest
    - cleanups: MMU, CPUID, shared MSRs
    - LAPIC latency optimizations ad bugfixes"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (232 commits)
    kvm: x86/mmu: NX largepage recovery for TDP MMU
    kvm: x86/mmu: Don't clear write flooding count for direct roots
    kvm: x86/mmu: Support MMIO in the TDP MMU
    kvm: x86/mmu: Support write protection for nesting in tdp MMU
    kvm: x86/mmu: Support disabling dirty logging for the tdp MMU
    kvm: x86/mmu: Support dirty logging for the TDP MMU
    kvm: x86/mmu: Support changed pte notifier in tdp MMU
    kvm: x86/mmu: Add access tracking for tdp_mmu
    kvm: x86/mmu: Support invalidate range MMU notifier for TDP MMU
    kvm: x86/mmu: Allocate struct kvm_mmu_pages for all pages in TDP MMU
    kvm: x86/mmu: Add TDP MMU PF handler
    kvm: x86/mmu: Remove disallowed_hugepage_adjust shadow_walk_iterator arg
    kvm: x86/mmu: Support zapping SPTEs in the TDP MMU
    KVM: Cache as_id in kvm_memory_slot
    kvm: x86/mmu: Add functions to handle changed TDP SPTEs
    kvm: x86/mmu: Allocate and free TDP MMU roots
    kvm: x86/mmu: Init / Uninit the TDP MMU
    kvm: x86/mmu: Introduce tdp_iter
    KVM: mmu: extract spte.h and spte.c
    KVM: mmu: Separate updating a PTE from kvm_set_pte_rmapp
    ...

    Linus Torvalds
     
  • Pull arch task_work cleanups from Jens Axboe:
    "Two cleanups that don't fit other categories:

    - Finally get the task_work_add() cleanup done properly, so we don't
    have random 0/1/false/true/TWA_SIGNAL confusing use cases. Updates
    all callers, and also fixes up the documentation for
    task_work_add().

    - While working on some TIF related changes for 5.11, this
    TIF_NOTIFY_RESUME cleanup fell out of that. Remove some arch
    duplication for how that is handled"

    * tag 'arch-cleanup-2020-10-22' of git://git.kernel.dk/linux-block:
    task_work: cleanup notification modes
    tracehook: clear TIF_NOTIFY_RESUME in tracehook_notify_resume()

    Linus Torvalds
     

23 Oct, 2020

2 commits

  • Pull Kbuild updates from Masahiro Yamada:

    - Support 'make compile_commands.json' to generate the compilation
    database more easily, avoiding stale entries

    - Support 'make clang-analyzer' and 'make clang-tidy' for static checks
    using clang-tidy

    - Preprocess scripts/modules.lds.S to allow CONFIG options in the
    module linker script

    - Drop cc-option tests from compiler flags supported by our minimal
    GCC/Clang versions

    - Use always 12-digits commit hash for CONFIG_LOCALVERSION_AUTO=y

    - Use sha1 build id for both BFD linker and LLD

    - Improve deb-pkg for reproducible builds and rootless builds

    - Remove stale, useless scripts/namespace.pl

    - Turn -Wreturn-type warning into error

    - Fix build error of deb-pkg when CONFIG_MODULES=n

    - Replace 'hostname' command with more portable 'uname -n'

    - Various Makefile cleanups

    * tag 'kbuild-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (34 commits)
    kbuild: Use uname for LINUX_COMPILE_HOST detection
    kbuild: Only add -fno-var-tracking-assignments for old GCC versions
    kbuild: remove leftover comment for filechk utility
    treewide: remove DISABLE_LTO
    kbuild: deb-pkg: clean up package name variables
    kbuild: deb-pkg: do not build linux-headers package if CONFIG_MODULES=n
    kbuild: enforce -Werror=return-type
    scripts: remove namespace.pl
    builddeb: Add support for all required debian/rules targets
    builddeb: Enable rootless builds
    builddeb: Pass -n to gzip for reproducible packages
    kbuild: split the build log of kallsyms
    kbuild: explicitly specify the build id style
    scripts/setlocalversion: make git describe output more reliable
    kbuild: remove cc-option test of -Werror=date-time
    kbuild: remove cc-option test of -fno-stack-check
    kbuild: remove cc-option test of -fno-strict-overflow
    kbuild: move CFLAGS_{KASAN,UBSAN,KCSAN} exports to relevant Makefiles
    kbuild: remove redundant CONFIG_KASAN check from scripts/Makefile.kasan
    kbuild: do not create built-in objects for external module builds
    ...

    Linus Torvalds
     
  • Pull initial set_fs() removal from Al Viro:
    "Christoph's set_fs base series + fixups"

    * 'work.set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    fs: Allow a NULL pos pointer to __kernel_read
    fs: Allow a NULL pos pointer to __kernel_write
    powerpc: remove address space overrides using set_fs()
    powerpc: use non-set_fs based maccess routines
    x86: remove address space overrides using set_fs()
    x86: make TASK_SIZE_MAX usable from assembly code
    x86: move PAGE_OFFSET, TASK_SIZE & friends to page_{32,64}_types.h
    lkdtm: remove set_fs-based tests
    test_bitmap: remove user bitmap tests
    uaccess: add infrastructure for kernel builds with set_fs()
    fs: don't allow splice read/write without explicit ops
    fs: don't allow kernel reads and writes without iter ops
    sysctl: Convert to iter interfaces
    proc: add a read_iter method to proc proc_ops
    proc: cleanup the compat vs no compat file ops
    proc: remove a level of indentation in proc_get_inode

    Linus Torvalds
     

19 Oct, 2020

1 commit

  • There is usecase that System Management Software(SMS) want to give a
    memory hint like MADV_[COLD|PAGEEOUT] to other processes and in the
    case of Android, it is the ActivityManagerService.

    The information required to make the reclaim decision is not known to the
    app. Instead, it is known to the centralized userspace
    daemon(ActivityManagerService), and that daemon must be able to initiate
    reclaim on its own without any app involvement.

    To solve the issue, this patch introduces a new syscall
    process_madvise(2). It uses pidfd of an external process to give the
    hint. It also supports vector address range because Android app has
    thousands of vmas due to zygote so it's totally waste of CPU and power if
    we should call the syscall one by one for each vma.(With testing 2000-vma
    syscall vs 1-vector syscall, it showed 15% performance improvement. I
    think it would be bigger in real practice because the testing ran very
    cache friendly environment).

    Another potential use case for the vector range is to amortize the cost
    ofTLB shootdowns for multiple ranges when using MADV_DONTNEED; this could
    benefit users like TCP receive zerocopy and malloc implementations. In
    future, we could find more usecases for other advises so let's make it
    happens as API since we introduce a new syscall at this moment. With
    that, existing madvise(2) user could replace it with process_madvise(2)
    with their own pid if they want to have batch address ranges support
    feature.

    ince it could affect other process's address range, only privileged
    process(PTRACE_MODE_ATTACH_FSCREDS) or something else(e.g., being the same
    UID) gives it the right to ptrace the process could use it successfully.
    The flag argument is reserved for future use if we need to extend the API.

    I think supporting all hints madvise has/will supported/support to
    process_madvise is rather risky. Because we are not sure all hints make
    sense from external process and implementation for the hint may rely on
    the caller being in the current context so it could be error-prone. Thus,
    I just limited hints as MADV_[COLD|PAGEOUT] in this patch.

    If someone want to add other hints, we could hear the usecase and review
    it for each hint. It's safer for maintenance rather than introducing a
    buggy syscall but hard to fix it later.

    So finally, the API is as follows,

    ssize_t process_madvise(int pidfd, const struct iovec *iovec,
    unsigned long vlen, int advice, unsigned int flags);

    DESCRIPTION
    The process_madvise() system call is used to give advice or directions
    to the kernel about the address ranges from external process as well as
    local process. It provides the advice to address ranges of process
    described by iovec and vlen. The goal of such advice is to improve
    system or application performance.

    The pidfd selects the process referred to by the PID file descriptor
    specified in pidfd. (See pidofd_open(2) for further information)

    The pointer iovec points to an array of iovec structures, defined in
    as:

    struct iovec {
    void *iov_base; /* starting address */
    size_t iov_len; /* number of bytes to be advised */
    };

    The iovec describes address ranges beginning at address(iov_base)
    and with size length of bytes(iov_len).

    The vlen represents the number of elements in iovec.

    The advice is indicated in the advice argument, which is one of the
    following at this moment if the target process specified by pidfd is
    external.

    MADV_COLD
    MADV_PAGEOUT

    Permission to provide a hint to external process is governed by a
    ptrace access mode PTRACE_MODE_ATTACH_FSCREDS check; see ptrace(2).

    The process_madvise supports every advice madvise(2) has if target
    process is in same thread group with calling process so user could
    use process_madvise(2) to extend existing madvise(2) to support
    vector address ranges.

    RETURN VALUE
    On success, process_madvise() returns the number of bytes advised.
    This return value may be less than the total number of requested
    bytes, if an error occurred. The caller should check return value
    to determine whether a partial advice occurred.

    FAQ:

    Q.1 - Why does any external entity have better knowledge?

    Quote from Sandeep

    "For Android, every application (including the special SystemServer)
    are forked from Zygote. The reason of course is to share as many
    libraries and classes between the two as possible to benefit from the
    preloading during boot.

    After applications start, (almost) all of the APIs end up calling into
    this SystemServer process over IPC (binder) and back to the
    application.

    In a fully running system, the SystemServer monitors every single
    process periodically to calculate their PSS / RSS and also decides
    which process is "important" to the user for interactivity.

    So, because of how these processes start _and_ the fact that the
    SystemServer is looping to monitor each process, it does tend to *know*
    which address range of the application is not used / useful.

    Besides, we can never rely on applications to clean things up
    themselves. We've had the "hey app1, the system is low on memory,
    please trim your memory usage down" notifications for a long time[1].
    They rely on applications honoring the broadcasts and very few do.

    So, if we want to avoid the inevitable killing of the application and
    restarting it, some way to be able to tell the OS about unimportant
    memory in these applications will be useful.

    - ssp

    Q.2 - How to guarantee the race(i.e., object validation) between when
    giving a hint from an external process and get the hint from the target
    process?

    process_madvise operates on the target process's address space as it
    exists at the instant that process_madvise is called. If the space
    target process can run between the time the process_madvise process
    inspects the target process address space and the time that
    process_madvise is actually called, process_madvise may operate on
    memory regions that the calling process does not expect. It's the
    responsibility of the process calling process_madvise to close this
    race condition. For example, the calling process can suspend the
    target process with ptrace, SIGSTOP, or the freezer cgroup so that it
    doesn't have an opportunity to change its own address space before
    process_madvise is called. Another option is to operate on memory
    regions that the caller knows a priori will be unchanged in the target
    process. Yet another option is to accept the race for certain
    process_madvise calls after reasoning that mistargeting will do no
    harm. The suggested API itself does not provide synchronization. It
    also apply other APIs like move_pages, process_vm_write.

    The race isn't really a problem though. Why is it so wrong to require
    that callers do their own synchronization in some manner? Nobody
    objects to write(2) merely because it's possible for two processes to
    open the same file and clobber each other's writes --- instead, we tell
    people to use flock or something. Think about mmap. It never
    guarantees newly allocated address space is still valid when the user
    tries to access it because other threads could unmap the memory right
    before. That's where we need synchronization by using other API or
    design from userside. It shouldn't be part of API itself. If someone
    needs more fine-grained synchronization rather than process level,
    there were two ideas suggested - cookie[2] and anon-fd[3]. Both are
    applicable via using last reserved argument of the API but I don't
    think it's necessary right now since we have already ways to prevent
    the race so don't want to add additional complexity with more
    fine-grained optimization model.

    To make the API extend, it reserved an unsigned long as last argument
    so we could support it in future if someone really needs it.

    Q.3 - Why doesn't ptrace work?

    Injecting an madvise in the target process using ptrace would not work
    for us because such injected madvise would have to be executed by the
    target process, which means that process would have to be runnable and
    that creates the risk of the abovementioned race and hinting a wrong
    VMA. Furthermore, we want to act the hint in caller's context, not the
    callee's, because the callee is usually limited in cpuset/cgroups or
    even freezed state so they can't act by themselves quick enough, which
    causes more thrashing/kill. It doesn't work if the target process are
    ptraced(e.g., strace, debugger, minidump) because a process can have at
    most one ptracer.

    [1] https://developer.android.com/topic/performance/memory"

    [2] process_getinfo for getting the cookie which is updated whenever
    vma of process address layout are changed - Daniel Colascione -
    https://lore.kernel.org/lkml/20190520035254.57579-1-minchan@kernel.org/T/#m7694416fd179b2066a2c62b5b139b14e3894e224

    [3] anonymous fd which is used for the object(i.e., address range)
    validation - Michal Hocko -
    https://lore.kernel.org/lkml/20200120112722.GY18451@dhcp22.suse.cz/

    [minchan@kernel.org: fix process_madvise build break for arm64]
    Link: http://lkml.kernel.org/r/20200303145756.GA219683@google.com
    [minchan@kernel.org: fix build error for mips of process_madvise]
    Link: http://lkml.kernel.org/r/20200508052517.GA197378@google.com
    [akpm@linux-foundation.org: fix patch ordering issue]
    [akpm@linux-foundation.org: fix arm64 whoops]
    [minchan@kernel.org: make process_madvise() vlen arg have type size_t, per Florian]
    [akpm@linux-foundation.org: fix i386 build]
    [sfr@canb.auug.org.au: fix syscall numbering]
    Link: https://lkml.kernel.org/r/20200905142639.49fc3f1a@canb.auug.org.au
    [sfr@canb.auug.org.au: madvise.c needs compat.h]
    Link: https://lkml.kernel.org/r/20200908204547.285646b4@canb.auug.org.au
    [minchan@kernel.org: fix mips build]
    Link: https://lkml.kernel.org/r/20200909173655.GC2435453@google.com
    [yuehaibing@huawei.com: remove duplicate header which is included twice]
    Link: https://lkml.kernel.org/r/20200915121550.30584-1-yuehaibing@huawei.com
    [minchan@kernel.org: do not use helper functions for process_madvise]
    Link: https://lkml.kernel.org/r/20200921175539.GB387368@google.com
    [akpm@linux-foundation.org: pidfd_get_pid() gained an argument]
    [sfr@canb.auug.org.au: fix up for "iov_iter: transparently handle compat iovecs in import_iovec"]
    Link: https://lkml.kernel.org/r/20200928212542.468e1fef@canb.auug.org.au

    Signed-off-by: Minchan Kim
    Signed-off-by: YueHaibing
    Signed-off-by: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Reviewed-by: Suren Baghdasaryan
    Reviewed-by: Vlastimil Babka
    Acked-by: David Rientjes
    Cc: Alexander Duyck
    Cc: Brian Geffon
    Cc: Christian Brauner
    Cc: Daniel Colascione
    Cc: Jann Horn
    Cc: Jens Axboe
    Cc: Joel Fernandes
    Cc: Johannes Weiner
    Cc: John Dias
    Cc: Kirill Tkhai
    Cc: Michal Hocko
    Cc: Oleksandr Natalenko
    Cc: Sandeep Patil
    Cc: SeongJae Park
    Cc: SeongJae Park
    Cc: Shakeel Butt
    Cc: Sonny Rao
    Cc: Tim Murray
    Cc: Christian Brauner
    Cc: Florian Weimer
    Cc:
    Link: http://lkml.kernel.org/r/20200302193630.68771-3-minchan@kernel.org
    Link: http://lkml.kernel.org/r/20200508183320.GA125527@google.com
    Link: http://lkml.kernel.org/r/20200622192900.22757-4-minchan@kernel.org
    Link: https://lkml.kernel.org/r/20200901000633.1920247-4-minchan@kernel.org
    Signed-off-by: Linus Torvalds

    Minchan Kim
     

18 Oct, 2020

1 commit


17 Oct, 2020

1 commit

  • Pull MIPS updates from Thomas Bogendoerfer:

    - removed support for PNX833x alias NXT_STB22x

    - included Ingenic SoC support into generic MIPS kernels

    - added support for new Ingenic SoCs

    - converted workaround selection to use Kconfig

    - replaced old boot mem functions by memblock_*

    - enabled COP2 usage in kernel for Loongson64 to make use
    of 16byte load/stores possible

    - cleanups and fixes

    * tag 'mips_5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: (92 commits)
    MIPS: DEC: Restore bootmem reservation for firmware working memory area
    MIPS: dec: fix section mismatch
    bcm963xx_tag.h: fix duplicated word
    mips: ralink: enable zboot support
    MIPS: ingenic: Remove CPU_SUPPORTS_HUGEPAGES
    MIPS: cpu-probe: remove MIPS_CPU_BP_GHIST option bit
    MIPS: cpu-probe: introduce exclusive R3k CPU probe
    MIPS: cpu-probe: move fpu probing/handling into its own file
    MIPS: replace add_memory_region with memblock
    MIPS: Loongson64: Clean up numa.c
    MIPS: Loongson64: Select SMP in Kconfig to avoid build error
    mips: octeon: Add Ubiquiti E200 and E220 boards
    MIPS: SGI-IP28: disable use of ll/sc in kernel
    MIPS: tx49xx: move tx4939_add_memory_regions into only user
    MIPS: pgtable: Remove used PAGE_USERIO define
    MIPS: alchemy: Share prom_init implementation
    MIPS: alchemy: Fix build breakage, if TOUCHSCREEN_WM97XX is disabled
    MIPS: process: include exec.h header in process.c
    MIPS: process: Add prototype for function arch_dup_task_struct
    MIPS: idle: Add prototype for function check_wait
    ...

    Linus Torvalds
     

16 Oct, 2020

2 commits

  • Pull networking updates from Jakub Kicinski:

    - Add redirect_neigh() BPF packet redirect helper, allowing to limit
    stack traversal in common container configs and improving TCP
    back-pressure.

    Daniel reports ~10Gbps => ~15Gbps single stream TCP performance gain.

    - Expand netlink policy support and improve policy export to user
    space. (Ge)netlink core performs request validation according to
    declared policies. Expand the expressiveness of those policies
    (min/max length and bitmasks). Allow dumping policies for particular
    commands. This is used for feature discovery by user space (instead
    of kernel version parsing or trial and error).

    - Support IGMPv3/MLDv2 multicast listener discovery protocols in
    bridge.

    - Allow more than 255 IPv4 multicast interfaces.

    - Add support for Type of Service (ToS) reflection in SYN/SYN-ACK
    packets of TCPv6.

    - In Multi-patch TCP (MPTCP) support concurrent transmission of data on
    multiple subflows in a load balancing scenario. Enhance advertising
    addresses via the RM_ADDR/ADD_ADDR options.

    - Support SMC-Dv2 version of SMC, which enables multi-subnet
    deployments.

    - Allow more calls to same peer in RxRPC.

    - Support two new Controller Area Network (CAN) protocols - CAN-FD and
    ISO 15765-2:2016.

    - Add xfrm/IPsec compat layer, solving the 32bit user space on 64bit
    kernel problem.

    - Add TC actions for implementing MPLS L2 VPNs.

    - Improve nexthop code - e.g. handle various corner cases when nexthop
    objects are removed from groups better, skip unnecessary
    notifications and make it easier to offload nexthops into HW by
    converting to a blocking notifier.

    - Support adding and consuming TCP header options by BPF programs,
    opening the doors for easy experimental and deployment-specific TCP
    option use.

    - Reorganize TCP congestion control (CC) initialization to simplify
    life of TCP CC implemented in BPF.

    - Add support for shipping BPF programs with the kernel and loading
    them early on boot via the User Mode Driver mechanism, hence reusing
    all the user space infra we have.

    - Support sleepable BPF programs, initially targeting LSM and tracing.

    - Add bpf_d_path() helper for returning full path for given 'struct
    path'.

    - Make bpf_tail_call compatible with bpf-to-bpf calls.

    - Allow BPF programs to call map_update_elem on sockmaps.

    - Add BPF Type Format (BTF) support for type and enum discovery, as
    well as support for using BTF within the kernel itself (current use
    is for pretty printing structures).

    - Support listing and getting information about bpf_links via the bpf
    syscall.

    - Enhance kernel interfaces around NIC firmware update. Allow
    specifying overwrite mask to control if settings etc. are reset
    during update; report expected max time operation may take to users;
    support firmware activation without machine reboot incl. limits of
    how much impact reset may have (e.g. dropping link or not).

    - Extend ethtool configuration interface to report IEEE-standard
    counters, to limit the need for per-vendor logic in user space.

    - Adopt or extend devlink use for debug, monitoring, fw update in many
    drivers (dsa loop, ice, ionic, sja1105, qed, mlxsw, mv88e6xxx,
    dpaa2-eth).

    - In mlxsw expose critical and emergency SFP module temperature alarms.
    Refactor port buffer handling to make the defaults more suitable and
    support setting these values explicitly via the DCBNL interface.

    - Add XDP support for Intel's igb driver.

    - Support offloading TC flower classification and filtering rules to
    mscc_ocelot switches.

    - Add PTP support for Marvell Octeontx2 and PP2.2 hardware, as well as
    fixed interval period pulse generator and one-step timestamping in
    dpaa-eth.

    - Add support for various auth offloads in WiFi APs, e.g. SAE (WPA3)
    offload.

    - Add Lynx PHY/PCS MDIO module, and convert various drivers which have
    this HW to use it. Convert mvpp2 to split PCS.

    - Support Marvell Prestera 98DX3255 24-port switch ASICs, as well as
    7-port Mediatek MT7531 IP.

    - Add initial support for QCA6390 and IPQ6018 in ath11k WiFi driver,
    and wcn3680 support in wcn36xx.

    - Improve performance for packets which don't require much offloads on
    recent Mellanox NICs by 20% by making multiple packets share a
    descriptor entry.

    - Move chelsio inline crypto drivers (for TLS and IPsec) from the
    crypto subtree to drivers/net. Move MDIO drivers out of the phy
    directory.

    - Clean up a lot of W=1 warnings, reportedly the actively developed
    subsections of networking drivers should now build W=1 warning free.

    - Make sure drivers don't use in_interrupt() to dynamically adapt their
    code. Convert tasklets to use new tasklet_setup API (sadly this
    conversion is not yet complete).

    * tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2583 commits)
    Revert "bpfilter: Fix build error with CONFIG_BPFILTER_UMH"
    net, sockmap: Don't call bpf_prog_put() on NULL pointer
    bpf, selftest: Fix flaky tcp_hdr_options test when adding addr to lo
    bpf, sockmap: Add locking annotations to iterator
    netfilter: nftables: allow re-computing sctp CRC-32C in 'payload' statements
    net: fix pos incrementment in ipv6_route_seq_next
    net/smc: fix invalid return code in smcd_new_buf_create()
    net/smc: fix valid DMBE buffer sizes
    net/smc: fix use-after-free of delayed events
    bpfilter: Fix build error with CONFIG_BPFILTER_UMH
    cxgb4/ch_ipsec: Replace the module name to ch_ipsec from chcr
    net: sched: Fix suspicious RCU usage while accessing tcf_tunnel_info
    bpf: Fix register equivalence tracking.
    rxrpc: Fix loss of final ack on shutdown
    rxrpc: Fix bundle counting for exclusive connections
    netfilter: restore NF_INET_NUMHOOKS
    ibmveth: Identify ingress large send packets.
    ibmveth: Switch order of ibmveth_helper calls.
    cxgb4: handle 4-tuple PEDIT to NAT mode translation
    selftests: Add VRF route leaking tests
    ...

    Linus Torvalds
     
  • Pull dma-mapping updates from Christoph Hellwig:

    - rework the non-coherent DMA allocator

    - move private definitions out of

    - lower CMA_ALIGNMENT (Paul Cercueil)

    - remove the omap1 dma address translation in favor of the common code

    - make dma-direct aware of multiple dma offset ranges (Jim Quinlan)

    - support per-node DMA CMA areas (Barry Song)

    - increase the default seg boundary limit (Nicolin Chen)

    - misc fixes (Robin Murphy, Thomas Tai, Xu Wang)

    - various cleanups

    * tag 'dma-mapping-5.10' of git://git.infradead.org/users/hch/dma-mapping: (63 commits)
    ARM/ixp4xx: add a missing include of dma-map-ops.h
    dma-direct: simplify the DMA_ATTR_NO_KERNEL_MAPPING handling
    dma-direct: factor out a dma_direct_alloc_from_pool helper
    dma-direct check for highmem pages in dma_direct_alloc_pages
    dma-mapping: merge into
    dma-mapping: move large parts of to kernel/dma
    dma-mapping: move dma-debug.h to kernel/dma/
    dma-mapping: remove
    dma-mapping: merge into
    dma-contiguous: remove dma_contiguous_set_default
    dma-contiguous: remove dev_set_cma_area
    dma-contiguous: remove dma_declare_contiguous
    dma-mapping: split
    cma: decrease CMA_ALIGNMENT lower limit to 2
    firewire-ohci: use dma_alloc_pages
    dma-iommu: implement ->alloc_noncoherent
    dma-mapping: add new {alloc,free}_noncoherent dma_map_ops methods
    dma-mapping: add a new dma_alloc_pages API
    dma-mapping: remove dma_cache_sync
    53c700: convert to dma_alloc_noncoherent
    ...

    Linus Torvalds
     

15 Oct, 2020

3 commits

  • Fix a crash on DEC platforms starting with:

    VFS: Mounted root (nfs filesystem) on device 0:11.
    Freeing unused PROM memory: 124k freed
    BUG: Bad page state in process swapper pfn:00001
    page:(ptrval) refcount:0 mapcount:-128 mapping:00000000 index:0x1 pfn:0x1
    flags: 0x0()
    raw: 00000000 00000100 00000122 00000000 00000001 00000000 ffffff7f 00000000
    page dumped because: nonzero mapcount
    Modules linked in:
    CPU: 0 PID: 1 Comm: swapper Not tainted 5.9.0-00858-g865c50e1d279 #1
    Stack : 8065dc48 0000000b 8065d2b8 9bc27dcc 80645bfc 9bc259a4 806a1b97 80703124
    80710000 8064a900 00000001 80099574 806b116c 1000ec00 9bc27d88 806a6f30
    00000000 00000000 80645bfc 00000000 31232039 80706ba4 2e392e35 8039f348
    2d383538 00000070 0000000a 35363867 00000000 806c2830 80710000 806b0000
    80710000 8064a900 00000001 81000000 00000000 00000000 8035af2c 80700000
    ...
    Call Trace:
    [] show_stack+0x34/0x104
    [] bad_page+0xfc/0x128
    [] free_pcppages_bulk+0x1f4/0x5dc
    [] free_unref_page+0xc0/0x130
    [] free_reserved_area+0x144/0x1d8
    [] kernel_init+0x20/0x100
    [] ret_from_kernel_thread+0x14/0x1c
    Disabling lock debugging due to kernel taint

    caused by an attempt to free bootmem space that as from
    commit b93ddc4f9156 ("mips: Reserve memory for the kernel image resources")
    has not been anymore reserved due to the removal of generic MIPS arch code
    that used to reserve all the memory from the beginning of RAM up to the
    kernel load address.

    This memory does need to be reserved on DEC platforms however as it is
    used by REX firmware as working area, as per the TURBOchannel firmware
    specification[1]:

    Table 2-2 REX Memory Regions
    -------------------------------------------------------------------------
    Starting Ending
    Region Address Address Use
    -------------------------------------------------------------------------
    0 0xa0000000 0xa000ffff Restart block, exception vectors,
    REX stack and bss
    1 0xa0010000 0xa0017fff Keyboard or tty drivers

    2 0xa0018000 0xa001f3ff 1) CRT driver

    3 0xa0020000 0xa002ffff boot, cnfg, init and t objects

    4 0xa0020000 0xa002ffff 64KB scratch space
    -------------------------------------------------------------------------
    1) Note that the last 3 Kbytes of region 2 are reserved for backward
    compatibility with previous system software.
    -------------------------------------------------------------------------

    (this table uses KSEG2 unmapped virtual addresses, which in the MIPS
    architecture are offset from physical addresses by a fixed value of
    0xa0000000 and therefore the regions referred do correspond to the
    beginning of the physical address space) and we call into the firmware
    on several occasions throughout the bootstrap process. It is believed
    that pre-REX firmware used with non-TURBOchannel DEC platforms has the
    same requirements, as hinted by note #1 cited.

    Recreate the discarded reservation then, in DEC platform code, removing
    the crash.

    References:

    [1] "TURBOchannel Firmware Specification", On-line version,
    EK-TCAAD-FS-004, Digital Equipment Corporation, January 1993,
    Chapter 2 "System Module Firmware", p. 2-5

    Signed-off-by: Maciej W. Rozycki
    Fixes: b93ddc4f9156 ("mips: Reserve memory for the kernel image resources")
    Cc: stable@vger.kernel.org # v5.2+

    Signed-off-by: Thomas Bogendoerfer

    Maciej W. Rozycki
     
  • Drop inline for memory setup functions and mark them __init to
    fix section mismatch of pmax_setup_memory_region.

    Signed-off-by: Thomas Bogendoerfer
    Acked-by: Maciej W. Rozycki

    Thomas Bogendoerfer
     
  • Merge misc updates from Andrew Morton:
    "181 patches.

    Subsystems affected by this patch series: kbuild, scripts, ntfs,
    ocfs2, vfs, mm (slab, slub, kmemleak, dax, debug, pagecache, fadvise,
    gup, swap, memremap, memcg, selftests, pagemap, mincore, hmm, dma,
    memory-failure, vmallo and migration)"

    * emailed patches from Andrew Morton : (181 commits)
    mm/migrate: remove obsolete comment about device public
    mm/migrate: remove cpages-- in migrate_vma_finalize()
    mm, oom_adj: don't loop through tasks in __set_oom_adj when not necessary
    memblock: use separate iterators for memory and reserved regions
    memblock: implement for_each_reserved_mem_region() using __next_mem_region()
    memblock: remove unused memblock_mem_size()
    x86/setup: simplify reserve_crashkernel()
    x86/setup: simplify initrd relocation and reservation
    arch, drivers: replace for_each_membock() with for_each_mem_range()
    arch, mm: replace for_each_memblock() with for_each_mem_pfn_range()
    memblock: reduce number of parameters in for_each_mem_range()
    memblock: make memblock_debug and related functionality private
    memblock: make for_each_memblock_type() iterator private
    mircoblaze: drop unneeded NUMA and sparsemem initializations
    riscv: drop unneeded node initialization
    h8300, nds32, openrisc: simplify detection of memory extents
    arm64: numa: simplify dummy_numa_init()
    arm, xtensa: simplify initialization of high memory pages
    dma-contiguous: simplify cma_early_percent_memory()
    KVM: PPC: Book3S HV: simplify kvm_cma_reserve()
    ...

    Linus Torvalds
     

14 Oct, 2020

3 commits

  • for_each_memblock() is used to iterate over memblock.memory in a few
    places that use data from memblock_region rather than the memory ranges.

    Introduce separate for_each_mem_region() and
    for_each_reserved_mem_region() to improve encapsulation of memblock
    internals from its users.

    Signed-off-by: Mike Rapoport
    Signed-off-by: Andrew Morton
    Reviewed-by: Baoquan He
    Acked-by: Ingo Molnar [x86]
    Acked-by: Thomas Bogendoerfer [MIPS]
    Acked-by: Miguel Ojeda [.clang-format]
    Cc: Andy Lutomirski
    Cc: Benjamin Herrenschmidt
    Cc: Borislav Petkov
    Cc: Catalin Marinas
    Cc: Christoph Hellwig
    Cc: Daniel Axtens
    Cc: Dave Hansen
    Cc: Emil Renner Berthing
    Cc: Hari Bathini
    Cc: Ingo Molnar
    Cc: Jonathan Cameron
    Cc: Marek Szyprowski
    Cc: Max Filippov
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Palmer Dabbelt
    Cc: Paul Mackerras
    Cc: Paul Walmsley
    Cc: Peter Zijlstra
    Cc: Russell King
    Cc: Stafford Horne
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Cc: Yoshinori Sato
    Link: https://lkml.kernel.org/r/20200818151634.14343-18-rppt@kernel.org
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • There are several occurrences of the following pattern:

    for_each_memblock(memory, reg) {
    start = __pfn_to_phys(memblock_region_memory_base_pfn(reg);
    end = __pfn_to_phys(memblock_region_memory_end_pfn(reg));

    /* do something with start and end */
    }

    Using for_each_mem_range() iterator is more appropriate in such cases and
    allows simpler and cleaner code.

    [akpm@linux-foundation.org: fix arch/arm/mm/pmsa-v7.c build]
    [rppt@linux.ibm.com: mips: fix cavium-octeon build caused by memblock refactoring]
    Link: http://lkml.kernel.org/r/20200827124549.GD167163@linux.ibm.com

    Signed-off-by: Mike Rapoport
    Signed-off-by: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Baoquan He
    Cc: Benjamin Herrenschmidt
    Cc: Borislav Petkov
    Cc: Catalin Marinas
    Cc: Christoph Hellwig
    Cc: Daniel Axtens
    Cc: Dave Hansen
    Cc: Emil Renner Berthing
    Cc: Hari Bathini
    Cc: Ingo Molnar
    Cc: Ingo Molnar
    Cc: Jonathan Cameron
    Cc: Marek Szyprowski
    Cc: Max Filippov
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Miguel Ojeda
    Cc: Palmer Dabbelt
    Cc: Paul Mackerras
    Cc: Paul Walmsley
    Cc: Peter Zijlstra
    Cc: Russell King
    Cc: Stafford Horne
    Cc: Thomas Bogendoerfer
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Cc: Yoshinori Sato
    Link: https://lkml.kernel.org/r/20200818151634.14343-13-rppt@kernel.org
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • Pull seccomp updates from Kees Cook:
    "The bulk of the changes are with the seccomp selftests to accommodate
    some powerpc-specific behavioral characteristics. Additional cleanups,
    fixes, and improvements are also included:

    - heavily refactor seccomp selftests (and clone3 selftests
    dependency) to fix powerpc (Kees Cook, Thadeu Lima de Souza
    Cascardo)

    - fix style issue in selftests (Zou Wei)

    - upgrade "unknown action" from KILL_THREAD to KILL_PROCESS (Rich
    Felker)

    - replace task_pt_regs(current) with current_pt_regs() (Denis
    Efremov)

    - fix corner-case race in USER_NOTIF (Jann Horn)

    - make CONFIG_SECCOMP no longer per-arch (YiFei Zhu)"

    * tag 'seccomp-v5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (23 commits)
    seccomp: Make duplicate listener detection non-racy
    seccomp: Move config option SECCOMP to arch/Kconfig
    selftests/clone3: Avoid OS-defined clone_args
    selftests/seccomp: powerpc: Set syscall return during ptrace syscall exit
    selftests/seccomp: Allow syscall nr and ret value to be set separately
    selftests/seccomp: Record syscall during ptrace entry
    selftests/seccomp: powerpc: Fix seccomp return value testing
    selftests/seccomp: Remove SYSCALL_NUM_RET_SHARE_REG in favor of SYSCALL_RET_SET
    selftests/seccomp: Avoid redundant register flushes
    selftests/seccomp: Convert REGSET calls into ARCH_GETREG/ARCH_SETREG
    selftests/seccomp: Convert HAVE_GETREG into ARCH_GETREG/ARCH_SETREG
    selftests/seccomp: Remove syscall setting #ifdefs
    selftests/seccomp: mips: Remove O32-specific macro
    selftests/seccomp: arm64: Define SYSCALL_NUM_SET macro
    selftests/seccomp: arm: Define SYSCALL_NUM_SET macro
    selftests/seccomp: mips: Define SYSCALL_NUM_SET macro
    selftests/seccomp: Provide generic syscall setting macro
    selftests/seccomp: Refactor arch register macros to avoid xtensa special case
    selftests/seccomp: Use __NR_mknodat instead of __NR_mknod
    selftests/seccomp: Use bitwise instead of arithmetic operator for flags
    ...

    Linus Torvalds
     

13 Oct, 2020

8 commits

  • Some of these ralink devices come with an ancient u-boot which can't
    extract LZMA properly when image gets too big.
    Enable zboot support to get a self-extracting kernel instead of relying
    on broken u-boot support.

    Signed-off-by: Chuanhong Guo
    Signed-off-by: Thomas Bogendoerfer

    Chuanhong Guo
     
  • While it is true that Ingenic SoCs support huge pages, we cannot use
    them yet as PTEs don't have any single bit that is free. Right now,
    having that symbol only causes build errors, so remove it until the
    situation with PTEs is resolved.

    Fixes: f0f4a753079c ("MIPS: generic: Add support for Ingenic SoCs")
    Signed-off-by: Paul Cercueil
    Reviewed-by: Guenter Roeck
    Tested-by: Guenter Roeck
    Signed-off-by: Thomas Bogendoerfer

    Paul Cercueil
     
  • Pull compat mount cleanups from Al Viro:
    "The last remnants of mount(2) compat buried by Christoph.

    Buried into NFS, that is.

    Generally I'm less enthusiastic about "let's use in_compat_syscall()
    deep in call chain" kind of approach than Christoph seems to be, but
    in this case it's warranted - that had been an NFS-specific wart,
    hopefully not to be repeated in any other filesystems (read: any new
    filesystem introducing non-text mount options will get NAKed even if
    it doesn't mess the layout up).

    IOW, not worth trying to grow an infrastructure that would avoid that
    use of in_compat_syscall()..."

    * 'compat.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    fs: remove compat_sys_mount
    fs,nfs: lift compat nfs4 mount data handling into the nfs code
    nfs: simplify nfs4_parse_monolithic

    Linus Torvalds
     
  • Pull compat quotactl cleanups from Al Viro:
    "More Christoph's compat cleanups: quotactl(2)"

    * 'work.quota-compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    quota: simplify the quotactl compat handling
    compat: add a compat_need_64bit_alignment_fixup() helper
    compat: lift compat_s64 and compat_u64 to

    Linus Torvalds
     
  • Pull compat iovec cleanups from Al Viro:
    "Christoph's series around import_iovec() and compat variant thereof"

    * 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    security/keys: remove compat_keyctl_instantiate_key_iov
    mm: remove compat_process_vm_{readv,writev}
    fs: remove compat_sys_vmsplice
    fs: remove the compat readv/writev syscalls
    fs: remove various compat readv/writev helpers
    iov_iter: transparently handle compat iovecs in import_iovec
    iov_iter: refactor rw_copy_check_uvector and import_iovec
    iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c
    compat.h: fix a spelling error in

    Linus Torvalds
     
  • Pull copy_and_csum cleanups from Al Viro:
    "Saner calling conventions for csum_and_copy_..._user() and friends"

    [ Removing 800+ lines of code and cleaning stuff up is good - Linus ]

    * 'work.csum_and_copy' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    ppc: propagate the calling conventions change down to csum_partial_copy_generic()
    amd64: switch csum_partial_copy_generic() to new calling conventions
    sparc64: propagate the calling convention changes down to __csum_partial_copy_...()
    xtensa: propagate the calling conventions change down into csum_partial_copy_generic()
    mips: propagate the calling convention change down into __csum_partial_copy_..._user()
    mips: __csum_partial_copy_kernel() has no users left
    mips: csum_and_copy_{to,from}_user() are never called under KERNEL_DS
    sparc32: propagate the calling conventions change down to __csum_partial_copy_sparc_generic()
    i386: propagate the calling conventions change down to csum_partial_copy_generic()
    sh: propage the calling conventions change down to csum_partial_copy_generic()
    m68k: get rid of zeroing destination on error in csum_and_copy_from_user()
    arm: propagate the calling convention changes down to csum_partial_copy_from_user()
    alpha: propagate the calling convention changes down to csum_partial_copy.c helpers
    saner calling conventions for csum_and_copy_..._user()
    csum_and_copy_..._user(): pass 0xffffffff instead of 0 as initial sum
    csum_partial_copy_nocheck(): drop the last argument
    unify generic instances of csum_partial_copy_nocheck()
    icmp_push_reply(): reorder adding the checksum up
    skb_copy_and_csum_bits(): don't bother with the last argument

    Linus Torvalds
     
  • Pull perf/kprobes updates from Ingo Molnar:
    "This prepares to unify the kretprobe trampoline handler and make
    kretprobe lockless (those patches are still work in progress)"

    * tag 'perf-kprobes-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    kprobes: Fix to check probe enabled before disarm_kprobe_ftrace()
    kprobes: Make local functions static
    kprobes: Free kretprobe_instance with RCU callback
    kprobes: Remove NMI context check
    sparc: kprobes: Use generic kretprobe trampoline handler
    sh: kprobes: Use generic kretprobe trampoline handler
    s390: kprobes: Use generic kretprobe trampoline handler
    powerpc: kprobes: Use generic kretprobe trampoline handler
    parisc: kprobes: Use generic kretprobe trampoline handler
    mips: kprobes: Use generic kretprobe trampoline handler
    ia64: kprobes: Use generic kretprobe trampoline handler
    csky: kprobes: Use generic kretprobe trampoline handler
    arc: kprobes: Use generic kretprobe trampoline handler
    arm64: kprobes: Use generic kretprobe trampoline handler
    arm: kprobes: Use generic kretprobe trampoline handler
    x86/kprobes: Use generic kretprobe trampoline handler
    kprobes: Add generic kretprobe trampoline handler

    Linus Torvalds
     
  • Pull orphan section checking from Ingo Molnar:
    "Orphan link sections were a long-standing source of obscure bugs,
    because the heuristics that various linkers & compilers use to handle
    them (include these bits into the output image vs discarding them
    silently) are both highly idiosyncratic and also version dependent.

    Instead of this historically problematic mess, this tree by Kees Cook
    (et al) adds build time asserts and build time warnings if there's any
    orphan section in the kernel or if a section is not sized as expected.

    And because we relied on so many silent assumptions in this area, fix
    a metric ton of dependencies and some outright bugs related to this,
    before we can finally enable the checks on the x86, ARM and ARM64
    platforms"

    * tag 'core-build-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
    x86/boot/compressed: Warn on orphan section placement
    x86/build: Warn on orphan section placement
    arm/boot: Warn on orphan section placement
    arm/build: Warn on orphan section placement
    arm64/build: Warn on orphan section placement
    x86/boot/compressed: Add missing debugging sections to output
    x86/boot/compressed: Remove, discard, or assert for unwanted sections
    x86/boot/compressed: Reorganize zero-size section asserts
    x86/build: Add asserts for unwanted sections
    x86/build: Enforce an empty .got.plt section
    x86/asm: Avoid generating unused kprobe sections
    arm/boot: Handle all sections explicitly
    arm/build: Assert for unwanted sections
    arm/build: Add missing sections
    arm/build: Explicitly keep .ARM.attributes sections
    arm/build: Refactor linker script headers
    arm64/build: Assert for unwanted sections
    arm64/build: Add missing DWARF sections
    arm64/build: Use common DISCARDS in linker script
    arm64/build: Remove .eh_frame* sections due to unwind tables
    ...

    Linus Torvalds