13 Jan, 2021

1 commit

  • [ Upstream commit 87dbc209ea04645fd2351981f09eff5d23f8e2e9 ]

    Make mandatory in include/asm-generic/Kbuild and
    remove all arch/*/include/asm/local64.h arch-specific files since they
    only #include .

    This fixes build errors on arch/c6x/ and arch/nios2/ for
    block/blk-iocost.c.

    Build-tested on 21 of 25 arch-es. (tools problems on the others)

    Yes, we could even rename to
    and change all #includes to use
    instead.

    Link: https://lkml.kernel.org/r/20201227024446.17018-1-rdunlap@infradead.org
    Signed-off-by: Randy Dunlap
    Suggested-by: Christoph Hellwig
    Reviewed-by: Masahiro Yamada
    Cc: Jens Axboe
    Cc: Ley Foon Tan
    Cc: Mark Salter
    Cc: Aurelien Jacquiot
    Cc: Peter Zijlstra
    Cc: Arnd Bergmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Sasha Levin

    Randy Dunlap
     

24 Nov, 2020

1 commit

  • We call arch_cpu_idle() with RCU disabled, but then use
    local_irq_{en,dis}able(), which invokes tracing, which relies on RCU.

    Switch all arch_cpu_idle() implementations to use
    raw_local_irq_{en,dis}able() and carefully manage the
    lockdep,rcu,tracing state like we do in entry.

    (XXX: we really should change arch_cpu_idle() to not return with
    interrupts enabled)

    Reported-by: Sven Schnelle
    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Mark Rutland
    Tested-by: Mark Rutland
    Link: https://lkml.kernel.org/r/20201120114925.594122626@infradead.org

    Peter Zijlstra
     

26 Oct, 2020

1 commit

  • Use a more generic form for __section that requires quotes to avoid
    complications with clang and gcc differences.

    Remove the quote operator # from compiler_attributes.h __section macro.

    Convert all unquoted __section(foo) uses to quoted __section("foo").
    Also convert __attribute__((section("foo"))) uses to __section("foo")
    even if the __attribute__ has multiple list entry forms.

    Conversion done using the script at:

    https://lore.kernel.org/lkml/75393e5ddc272dc7403de74d645e6c6e0f4e70eb.camel@perches.com/2-convert_section.pl

    Signed-off-by: Joe Perches
    Reviewed-by: Nick Desaulniers
    Reviewed-by: Miguel Ojeda
    Signed-off-by: Linus Torvalds

    Joe Perches
     

24 Oct, 2020

1 commit

  • Pull arch task_work cleanups from Jens Axboe:
    "Two cleanups that don't fit other categories:

    - Finally get the task_work_add() cleanup done properly, so we don't
    have random 0/1/false/true/TWA_SIGNAL confusing use cases. Updates
    all callers, and also fixes up the documentation for
    task_work_add().

    - While working on some TIF related changes for 5.11, this
    TIF_NOTIFY_RESUME cleanup fell out of that. Remove some arch
    duplication for how that is handled"

    * tag 'arch-cleanup-2020-10-22' of git://git.kernel.dk/linux-block:
    task_work: cleanup notification modes
    tracehook: clear TIF_NOTIFY_RESUME in tracehook_notify_resume()

    Linus Torvalds
     

23 Oct, 2020

1 commit

  • Pull initial set_fs() removal from Al Viro:
    "Christoph's set_fs base series + fixups"

    * 'work.set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    fs: Allow a NULL pos pointer to __kernel_read
    fs: Allow a NULL pos pointer to __kernel_write
    powerpc: remove address space overrides using set_fs()
    powerpc: use non-set_fs based maccess routines
    x86: remove address space overrides using set_fs()
    x86: make TASK_SIZE_MAX usable from assembly code
    x86: move PAGE_OFFSET, TASK_SIZE & friends to page_{32,64}_types.h
    lkdtm: remove set_fs-based tests
    test_bitmap: remove user bitmap tests
    uaccess: add infrastructure for kernel builds with set_fs()
    fs: don't allow splice read/write without explicit ops
    fs: don't allow kernel reads and writes without iter ops
    sysctl: Convert to iter interfaces
    proc: add a read_iter method to proc proc_ops
    proc: cleanup the compat vs no compat file ops
    proc: remove a level of indentation in proc_get_inode

    Linus Torvalds
     

19 Oct, 2020

1 commit

  • There is usecase that System Management Software(SMS) want to give a
    memory hint like MADV_[COLD|PAGEEOUT] to other processes and in the
    case of Android, it is the ActivityManagerService.

    The information required to make the reclaim decision is not known to the
    app. Instead, it is known to the centralized userspace
    daemon(ActivityManagerService), and that daemon must be able to initiate
    reclaim on its own without any app involvement.

    To solve the issue, this patch introduces a new syscall
    process_madvise(2). It uses pidfd of an external process to give the
    hint. It also supports vector address range because Android app has
    thousands of vmas due to zygote so it's totally waste of CPU and power if
    we should call the syscall one by one for each vma.(With testing 2000-vma
    syscall vs 1-vector syscall, it showed 15% performance improvement. I
    think it would be bigger in real practice because the testing ran very
    cache friendly environment).

    Another potential use case for the vector range is to amortize the cost
    ofTLB shootdowns for multiple ranges when using MADV_DONTNEED; this could
    benefit users like TCP receive zerocopy and malloc implementations. In
    future, we could find more usecases for other advises so let's make it
    happens as API since we introduce a new syscall at this moment. With
    that, existing madvise(2) user could replace it with process_madvise(2)
    with their own pid if they want to have batch address ranges support
    feature.

    ince it could affect other process's address range, only privileged
    process(PTRACE_MODE_ATTACH_FSCREDS) or something else(e.g., being the same
    UID) gives it the right to ptrace the process could use it successfully.
    The flag argument is reserved for future use if we need to extend the API.

    I think supporting all hints madvise has/will supported/support to
    process_madvise is rather risky. Because we are not sure all hints make
    sense from external process and implementation for the hint may rely on
    the caller being in the current context so it could be error-prone. Thus,
    I just limited hints as MADV_[COLD|PAGEOUT] in this patch.

    If someone want to add other hints, we could hear the usecase and review
    it for each hint. It's safer for maintenance rather than introducing a
    buggy syscall but hard to fix it later.

    So finally, the API is as follows,

    ssize_t process_madvise(int pidfd, const struct iovec *iovec,
    unsigned long vlen, int advice, unsigned int flags);

    DESCRIPTION
    The process_madvise() system call is used to give advice or directions
    to the kernel about the address ranges from external process as well as
    local process. It provides the advice to address ranges of process
    described by iovec and vlen. The goal of such advice is to improve
    system or application performance.

    The pidfd selects the process referred to by the PID file descriptor
    specified in pidfd. (See pidofd_open(2) for further information)

    The pointer iovec points to an array of iovec structures, defined in
    as:

    struct iovec {
    void *iov_base; /* starting address */
    size_t iov_len; /* number of bytes to be advised */
    };

    The iovec describes address ranges beginning at address(iov_base)
    and with size length of bytes(iov_len).

    The vlen represents the number of elements in iovec.

    The advice is indicated in the advice argument, which is one of the
    following at this moment if the target process specified by pidfd is
    external.

    MADV_COLD
    MADV_PAGEOUT

    Permission to provide a hint to external process is governed by a
    ptrace access mode PTRACE_MODE_ATTACH_FSCREDS check; see ptrace(2).

    The process_madvise supports every advice madvise(2) has if target
    process is in same thread group with calling process so user could
    use process_madvise(2) to extend existing madvise(2) to support
    vector address ranges.

    RETURN VALUE
    On success, process_madvise() returns the number of bytes advised.
    This return value may be less than the total number of requested
    bytes, if an error occurred. The caller should check return value
    to determine whether a partial advice occurred.

    FAQ:

    Q.1 - Why does any external entity have better knowledge?

    Quote from Sandeep

    "For Android, every application (including the special SystemServer)
    are forked from Zygote. The reason of course is to share as many
    libraries and classes between the two as possible to benefit from the
    preloading during boot.

    After applications start, (almost) all of the APIs end up calling into
    this SystemServer process over IPC (binder) and back to the
    application.

    In a fully running system, the SystemServer monitors every single
    process periodically to calculate their PSS / RSS and also decides
    which process is "important" to the user for interactivity.

    So, because of how these processes start _and_ the fact that the
    SystemServer is looping to monitor each process, it does tend to *know*
    which address range of the application is not used / useful.

    Besides, we can never rely on applications to clean things up
    themselves. We've had the "hey app1, the system is low on memory,
    please trim your memory usage down" notifications for a long time[1].
    They rely on applications honoring the broadcasts and very few do.

    So, if we want to avoid the inevitable killing of the application and
    restarting it, some way to be able to tell the OS about unimportant
    memory in these applications will be useful.

    - ssp

    Q.2 - How to guarantee the race(i.e., object validation) between when
    giving a hint from an external process and get the hint from the target
    process?

    process_madvise operates on the target process's address space as it
    exists at the instant that process_madvise is called. If the space
    target process can run between the time the process_madvise process
    inspects the target process address space and the time that
    process_madvise is actually called, process_madvise may operate on
    memory regions that the calling process does not expect. It's the
    responsibility of the process calling process_madvise to close this
    race condition. For example, the calling process can suspend the
    target process with ptrace, SIGSTOP, or the freezer cgroup so that it
    doesn't have an opportunity to change its own address space before
    process_madvise is called. Another option is to operate on memory
    regions that the caller knows a priori will be unchanged in the target
    process. Yet another option is to accept the race for certain
    process_madvise calls after reasoning that mistargeting will do no
    harm. The suggested API itself does not provide synchronization. It
    also apply other APIs like move_pages, process_vm_write.

    The race isn't really a problem though. Why is it so wrong to require
    that callers do their own synchronization in some manner? Nobody
    objects to write(2) merely because it's possible for two processes to
    open the same file and clobber each other's writes --- instead, we tell
    people to use flock or something. Think about mmap. It never
    guarantees newly allocated address space is still valid when the user
    tries to access it because other threads could unmap the memory right
    before. That's where we need synchronization by using other API or
    design from userside. It shouldn't be part of API itself. If someone
    needs more fine-grained synchronization rather than process level,
    there were two ideas suggested - cookie[2] and anon-fd[3]. Both are
    applicable via using last reserved argument of the API but I don't
    think it's necessary right now since we have already ways to prevent
    the race so don't want to add additional complexity with more
    fine-grained optimization model.

    To make the API extend, it reserved an unsigned long as last argument
    so we could support it in future if someone really needs it.

    Q.3 - Why doesn't ptrace work?

    Injecting an madvise in the target process using ptrace would not work
    for us because such injected madvise would have to be executed by the
    target process, which means that process would have to be runnable and
    that creates the risk of the abovementioned race and hinting a wrong
    VMA. Furthermore, we want to act the hint in caller's context, not the
    callee's, because the callee is usually limited in cpuset/cgroups or
    even freezed state so they can't act by themselves quick enough, which
    causes more thrashing/kill. It doesn't work if the target process are
    ptraced(e.g., strace, debugger, minidump) because a process can have at
    most one ptracer.

    [1] https://developer.android.com/topic/performance/memory"

    [2] process_getinfo for getting the cookie which is updated whenever
    vma of process address layout are changed - Daniel Colascione -
    https://lore.kernel.org/lkml/20190520035254.57579-1-minchan@kernel.org/T/#m7694416fd179b2066a2c62b5b139b14e3894e224

    [3] anonymous fd which is used for the object(i.e., address range)
    validation - Michal Hocko -
    https://lore.kernel.org/lkml/20200120112722.GY18451@dhcp22.suse.cz/

    [minchan@kernel.org: fix process_madvise build break for arm64]
    Link: http://lkml.kernel.org/r/20200303145756.GA219683@google.com
    [minchan@kernel.org: fix build error for mips of process_madvise]
    Link: http://lkml.kernel.org/r/20200508052517.GA197378@google.com
    [akpm@linux-foundation.org: fix patch ordering issue]
    [akpm@linux-foundation.org: fix arm64 whoops]
    [minchan@kernel.org: make process_madvise() vlen arg have type size_t, per Florian]
    [akpm@linux-foundation.org: fix i386 build]
    [sfr@canb.auug.org.au: fix syscall numbering]
    Link: https://lkml.kernel.org/r/20200905142639.49fc3f1a@canb.auug.org.au
    [sfr@canb.auug.org.au: madvise.c needs compat.h]
    Link: https://lkml.kernel.org/r/20200908204547.285646b4@canb.auug.org.au
    [minchan@kernel.org: fix mips build]
    Link: https://lkml.kernel.org/r/20200909173655.GC2435453@google.com
    [yuehaibing@huawei.com: remove duplicate header which is included twice]
    Link: https://lkml.kernel.org/r/20200915121550.30584-1-yuehaibing@huawei.com
    [minchan@kernel.org: do not use helper functions for process_madvise]
    Link: https://lkml.kernel.org/r/20200921175539.GB387368@google.com
    [akpm@linux-foundation.org: pidfd_get_pid() gained an argument]
    [sfr@canb.auug.org.au: fix up for "iov_iter: transparently handle compat iovecs in import_iovec"]
    Link: https://lkml.kernel.org/r/20200928212542.468e1fef@canb.auug.org.au

    Signed-off-by: Minchan Kim
    Signed-off-by: YueHaibing
    Signed-off-by: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Reviewed-by: Suren Baghdasaryan
    Reviewed-by: Vlastimil Babka
    Acked-by: David Rientjes
    Cc: Alexander Duyck
    Cc: Brian Geffon
    Cc: Christian Brauner
    Cc: Daniel Colascione
    Cc: Jann Horn
    Cc: Jens Axboe
    Cc: Joel Fernandes
    Cc: Johannes Weiner
    Cc: John Dias
    Cc: Kirill Tkhai
    Cc: Michal Hocko
    Cc: Oleksandr Natalenko
    Cc: Sandeep Patil
    Cc: SeongJae Park
    Cc: SeongJae Park
    Cc: Shakeel Butt
    Cc: Sonny Rao
    Cc: Tim Murray
    Cc: Christian Brauner
    Cc: Florian Weimer
    Cc:
    Link: http://lkml.kernel.org/r/20200302193630.68771-3-minchan@kernel.org
    Link: http://lkml.kernel.org/r/20200508183320.GA125527@google.com
    Link: http://lkml.kernel.org/r/20200622192900.22757-4-minchan@kernel.org
    Link: https://lkml.kernel.org/r/20200901000633.1920247-4-minchan@kernel.org
    Signed-off-by: Linus Torvalds

    Minchan Kim
     

18 Oct, 2020

1 commit


16 Oct, 2020

1 commit

  • Pull dma-mapping updates from Christoph Hellwig:

    - rework the non-coherent DMA allocator

    - move private definitions out of

    - lower CMA_ALIGNMENT (Paul Cercueil)

    - remove the omap1 dma address translation in favor of the common code

    - make dma-direct aware of multiple dma offset ranges (Jim Quinlan)

    - support per-node DMA CMA areas (Barry Song)

    - increase the default seg boundary limit (Nicolin Chen)

    - misc fixes (Robin Murphy, Thomas Tai, Xu Wang)

    - various cleanups

    * tag 'dma-mapping-5.10' of git://git.infradead.org/users/hch/dma-mapping: (63 commits)
    ARM/ixp4xx: add a missing include of dma-map-ops.h
    dma-direct: simplify the DMA_ATTR_NO_KERNEL_MAPPING handling
    dma-direct: factor out a dma_direct_alloc_from_pool helper
    dma-direct check for highmem pages in dma_direct_alloc_pages
    dma-mapping: merge into
    dma-mapping: move large parts of to kernel/dma
    dma-mapping: move dma-debug.h to kernel/dma/
    dma-mapping: remove
    dma-mapping: merge into
    dma-contiguous: remove dma_contiguous_set_default
    dma-contiguous: remove dev_set_cma_area
    dma-contiguous: remove dma_declare_contiguous
    dma-mapping: split
    cma: decrease CMA_ALIGNMENT lower limit to 2
    firewire-ohci: use dma_alloc_pages
    dma-iommu: implement ->alloc_noncoherent
    dma-mapping: add new {alloc,free}_noncoherent dma_map_ops methods
    dma-mapping: add a new dma_alloc_pages API
    dma-mapping: remove dma_cache_sync
    53c700: convert to dma_alloc_noncoherent
    ...

    Linus Torvalds
     

15 Oct, 2020

1 commit

  • Merge misc updates from Andrew Morton:
    "181 patches.

    Subsystems affected by this patch series: kbuild, scripts, ntfs,
    ocfs2, vfs, mm (slab, slub, kmemleak, dax, debug, pagecache, fadvise,
    gup, swap, memremap, memcg, selftests, pagemap, mincore, hmm, dma,
    memory-failure, vmallo and migration)"

    * emailed patches from Andrew Morton : (181 commits)
    mm/migrate: remove obsolete comment about device public
    mm/migrate: remove cpages-- in migrate_vma_finalize()
    mm, oom_adj: don't loop through tasks in __set_oom_adj when not necessary
    memblock: use separate iterators for memory and reserved regions
    memblock: implement for_each_reserved_mem_region() using __next_mem_region()
    memblock: remove unused memblock_mem_size()
    x86/setup: simplify reserve_crashkernel()
    x86/setup: simplify initrd relocation and reservation
    arch, drivers: replace for_each_membock() with for_each_mem_range()
    arch, mm: replace for_each_memblock() with for_each_mem_pfn_range()
    memblock: reduce number of parameters in for_each_mem_range()
    memblock: make memblock_debug and related functionality private
    memblock: make for_each_memblock_type() iterator private
    mircoblaze: drop unneeded NUMA and sparsemem initializations
    riscv: drop unneeded node initialization
    h8300, nds32, openrisc: simplify detection of memory extents
    arm64: numa: simplify dummy_numa_init()
    arm, xtensa: simplify initialization of high memory pages
    dma-contiguous: simplify cma_early_percent_memory()
    KVM: PPC: Book3S HV: simplify kvm_cma_reserve()
    ...

    Linus Torvalds
     

14 Oct, 2020

3 commits

  • There are several occurrences of the following pattern:

    for_each_memblock(memory, reg) {
    start = __pfn_to_phys(memblock_region_memory_base_pfn(reg);
    end = __pfn_to_phys(memblock_region_memory_end_pfn(reg));

    /* do something with start and end */
    }

    Using for_each_mem_range() iterator is more appropriate in such cases and
    allows simpler and cleaner code.

    [akpm@linux-foundation.org: fix arch/arm/mm/pmsa-v7.c build]
    [rppt@linux.ibm.com: mips: fix cavium-octeon build caused by memblock refactoring]
    Link: http://lkml.kernel.org/r/20200827124549.GD167163@linux.ibm.com

    Signed-off-by: Mike Rapoport
    Signed-off-by: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Baoquan He
    Cc: Benjamin Herrenschmidt
    Cc: Borislav Petkov
    Cc: Catalin Marinas
    Cc: Christoph Hellwig
    Cc: Daniel Axtens
    Cc: Dave Hansen
    Cc: Emil Renner Berthing
    Cc: Hari Bathini
    Cc: Ingo Molnar
    Cc: Ingo Molnar
    Cc: Jonathan Cameron
    Cc: Marek Szyprowski
    Cc: Max Filippov
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Miguel Ojeda
    Cc: Palmer Dabbelt
    Cc: Paul Mackerras
    Cc: Paul Walmsley
    Cc: Peter Zijlstra
    Cc: Russell King
    Cc: Stafford Horne
    Cc: Thomas Bogendoerfer
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Cc: Yoshinori Sato
    Link: https://lkml.kernel.org/r/20200818151634.14343-13-rppt@kernel.org
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • microblaze does not support neither NUMA not SPARSMEM, so there is no
    point to call memblock_set_node() and
    sparse_memory_present_with_active_regions() functions during microblaze
    memory initialization.

    Remove these calls and the surrounding code.

    Signed-off-by: Mike Rapoport
    Signed-off-by: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Baoquan He
    Cc: Benjamin Herrenschmidt
    Cc: Borislav Petkov
    Cc: Catalin Marinas
    Cc: Christoph Hellwig
    Cc: Daniel Axtens
    Cc: Dave Hansen
    Cc: Emil Renner Berthing
    Cc: Hari Bathini
    Cc: Ingo Molnar
    Cc: Ingo Molnar
    Cc: Jonathan Cameron
    Cc: Marek Szyprowski
    Cc: Max Filippov
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Miguel Ojeda
    Cc: Palmer Dabbelt
    Cc: Paul Mackerras
    Cc: Paul Walmsley
    Cc: Peter Zijlstra
    Cc: Russell King
    Cc: Stafford Horne
    Cc: Thomas Bogendoerfer
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Cc: Yoshinori Sato
    Link: https://lkml.kernel.org/r/20200818151634.14343-8-rppt@kernel.org
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • Pull seccomp updates from Kees Cook:
    "The bulk of the changes are with the seccomp selftests to accommodate
    some powerpc-specific behavioral characteristics. Additional cleanups,
    fixes, and improvements are also included:

    - heavily refactor seccomp selftests (and clone3 selftests
    dependency) to fix powerpc (Kees Cook, Thadeu Lima de Souza
    Cascardo)

    - fix style issue in selftests (Zou Wei)

    - upgrade "unknown action" from KILL_THREAD to KILL_PROCESS (Rich
    Felker)

    - replace task_pt_regs(current) with current_pt_regs() (Denis
    Efremov)

    - fix corner-case race in USER_NOTIF (Jann Horn)

    - make CONFIG_SECCOMP no longer per-arch (YiFei Zhu)"

    * tag 'seccomp-v5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (23 commits)
    seccomp: Make duplicate listener detection non-racy
    seccomp: Move config option SECCOMP to arch/Kconfig
    selftests/clone3: Avoid OS-defined clone_args
    selftests/seccomp: powerpc: Set syscall return during ptrace syscall exit
    selftests/seccomp: Allow syscall nr and ret value to be set separately
    selftests/seccomp: Record syscall during ptrace entry
    selftests/seccomp: powerpc: Fix seccomp return value testing
    selftests/seccomp: Remove SYSCALL_NUM_RET_SHARE_REG in favor of SYSCALL_RET_SET
    selftests/seccomp: Avoid redundant register flushes
    selftests/seccomp: Convert REGSET calls into ARCH_GETREG/ARCH_SETREG
    selftests/seccomp: Convert HAVE_GETREG into ARCH_GETREG/ARCH_SETREG
    selftests/seccomp: Remove syscall setting #ifdefs
    selftests/seccomp: mips: Remove O32-specific macro
    selftests/seccomp: arm64: Define SYSCALL_NUM_SET macro
    selftests/seccomp: arm: Define SYSCALL_NUM_SET macro
    selftests/seccomp: mips: Define SYSCALL_NUM_SET macro
    selftests/seccomp: Provide generic syscall setting macro
    selftests/seccomp: Refactor arch register macros to avoid xtensa special case
    selftests/seccomp: Use __NR_mknodat instead of __NR_mknod
    selftests/seccomp: Use bitwise instead of arithmetic operator for flags
    ...

    Linus Torvalds
     

13 Oct, 2020

1 commit


09 Oct, 2020

1 commit

  • In order to make adding configurable features into seccomp easier,
    it's better to have the options at one single location, considering
    especially that the bulk of seccomp code is arch-independent. An quick
    look also show that many SECCOMP descriptions are outdated; they talk
    about /proc rather than prctl.

    As a result of moving the config option and keeping it default on,
    architectures arm, arm64, csky, riscv, sh, and xtensa did not have SECCOMP
    on by default prior to this and SECCOMP will be default in this change.

    Architectures microblaze, mips, powerpc, s390, sh, and sparc have an
    outdated depend on PROC_FS and this dependency is removed in this change.

    Suggested-by: Jann Horn
    Link: https://lore.kernel.org/lkml/CAG48ez1YWz9cnp08UZgeieYRhHdqh-ch7aNwc4JRBnGyrmgfMg@mail.gmail.com/
    Signed-off-by: YiFei Zhu
    [kees: added HAVE_ARCH_SECCOMP help text, tweaked wording]
    Signed-off-by: Kees Cook
    Link: https://lore.kernel.org/r/9ede6ef35c847e58d61e476c6a39540520066613.1600951211.git.yifeifz2@illinois.edu

    YiFei Zhu
     

06 Oct, 2020

3 commits


09 Sep, 2020

1 commit

  • Add a CONFIG_SET_FS option that is selected by architecturess that
    implement set_fs, which is all of them initially. If the option is not
    set stubs for routines related to overriding the address space are
    provided so that architectures can start to opt out of providing set_fs.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Kees Cook
    Signed-off-by: Al Viro

    Christoph Hellwig
     

01 Sep, 2020

2 commits

  • Fix build warning since this file is already listed in
    include/asm-generic/Kbuild.

    ../scripts/Makefile.asm-generic:25: redundant generic-y found in arch/microblaze/include/asm/Kbuild: hw_irq.h

    Fixes: 630f289b7114 ("asm-generic: make more kernel-space headers mandatory")
    Signed-off-by: Randy Dunlap
    Cc: Michal Simek
    Cc: Michal Simek
    Cc: Masahiro Yamada
    Reviewed-by: Masahiro Yamada
    Link: https://lore.kernel.org/r/4d992aee-8a69-1769-e622-8d6d6e316346@infradead.org
    Signed-off-by: Michal Simek

    Randy Dunlap
     
  • Fix min_low_pfn/max_low_pfn build errors for arch/microblaze/: (e.g.)

    ERROR: "min_low_pfn" [drivers/rpmsg/virtio_rpmsg_bus.ko] undefined!
    ERROR: "min_low_pfn" [drivers/hwtracing/intel_th/intel_th_msu_sink.ko] undefined!
    ERROR: "min_low_pfn" [drivers/hwtracing/intel_th/intel_th_msu.ko] undefined!
    ERROR: "min_low_pfn" [drivers/mmc/core/mmc_core.ko] undefined!
    ERROR: "min_low_pfn" [drivers/md/dm-crypt.ko] undefined!
    ERROR: "min_low_pfn" [drivers/net/wireless/ath/ath6kl/ath6kl_sdio.ko] undefined!
    ERROR: "min_low_pfn" [crypto/tcrypt.ko] undefined!
    ERROR: "min_low_pfn" [crypto/asymmetric_keys/asym_tpm.ko] undefined!

    Mike had/has an alternate patch for Microblaze:
    https://lore.kernel.org/lkml/20200630111519.GA1951986@linux.ibm.com/

    David suggested just exporting min_low_pfn & max_low_pfn in
    mm/memblock.c:
    https://lore.kernel.org/lkml/alpine.DEB.2.22.394.2006291911220.1118534@chino.kir.corp.google.com/

    Reported-by: kernel test robot
    Signed-off-by: Randy Dunlap
    Acked-by: Michal Simek
    Acked-by: David Rientjes
    Cc: linux-mm@kvack.org
    Cc: Andrew Morton
    Cc: David Rientjes
    Cc: Mike Rapoport
    Cc: Michal Simek
    Cc: Michal Simek
    Signed-off-by: Mike Rapoport

    Randy Dunlap
     

24 Aug, 2020

1 commit

  • Replace the existing /* fall through */ comments and its variants with
    the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
    fall-through markings when it is the case.

    [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

    Signed-off-by: Gustavo A. R. Silva

    Gustavo A. R. Silva
     

15 Aug, 2020

1 commit

  • Since commit 61a47c1ad3a4dc ("sysctl: Remove the sysctl system call"),
    sys_sysctl is actually unavailable: any input can only return an error.

    We have been warning about people using the sysctl system call for years
    and believe there are no more users. Even if there are users of this
    interface if they have not complained or fixed their code by now they
    probably are not going to, so there is no point in warning them any
    longer.

    So completely remove sys_sysctl on all architectures.

    [nixiaoming@huawei.com: s390: fix build error for sys_call_table_emu]
    Link: http://lkml.kernel.org/r/20200618141426.16884-1-nixiaoming@huawei.com

    Signed-off-by: Xiaoming Ni
    Signed-off-by: Andrew Morton
    Acked-by: Will Deacon [arm/arm64]
    Acked-by: "Eric W. Biederman"
    Cc: Aleksa Sarai
    Cc: Alexander Shishkin
    Cc: Al Viro
    Cc: Andi Kleen
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Arnaldo Carvalho de Melo
    Cc: Arnd Bergmann
    Cc: Benjamin Herrenschmidt
    Cc: Bin Meng
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Catalin Marinas
    Cc: chenzefeng
    Cc: Christian Borntraeger
    Cc: Christian Brauner
    Cc: Chris Zankel
    Cc: David Howells
    Cc: David S. Miller
    Cc: Diego Elio Pettenò
    Cc: Dmitry Vyukov
    Cc: Dominik Brodowski
    Cc: Fenghua Yu
    Cc: Geert Uytterhoeven
    Cc: Heiko Carstens
    Cc: Helge Deller
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Iurii Zaikin
    Cc: Ivan Kokshaysky
    Cc: James Bottomley
    Cc: Jens Axboe
    Cc: Jiri Olsa
    Cc: Kars de Jong
    Cc: Kees Cook
    Cc: Krzysztof Kozlowski
    Cc: Luis Chamberlain
    Cc: Marco Elver
    Cc: Mark Rutland
    Cc: Martin K. Petersen
    Cc: Masahiro Yamada
    Cc: Matt Turner
    Cc: Max Filippov
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Miklos Szeredi
    Cc: Minchan Kim
    Cc: Namhyung Kim
    Cc: Naveen N. Rao
    Cc: Nick Piggin
    Cc: Oleg Nesterov
    Cc: Olof Johansson
    Cc: Paul Burton
    Cc: "Paul E. McKenney"
    Cc: Paul Mackerras
    Cc: Peter Zijlstra (Intel)
    Cc: Randy Dunlap
    Cc: Ravi Bangoria
    Cc: Richard Henderson
    Cc: Rich Felker
    Cc: Russell King
    Cc: Sami Tolvanen
    Cc: Sargun Dhillon
    Cc: Stephen Rothwell
    Cc: Sudeep Holla
    Cc: Sven Schnelle
    Cc: Thiago Jung Bauermann
    Cc: Thomas Bogendoerfer
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Vasily Gorbik
    Cc: Vlastimil Babka
    Cc: Yoshinori Sato
    Cc: Zhou Yanjie
    Link: http://lkml.kernel.org/r/20200616030734.87257-1-nixiaoming@huawei.com
    Signed-off-by: Linus Torvalds

    Xiaoming Ni
     

13 Aug, 2020

3 commits

  • Use the general page fault accounting by passing regs into
    handle_mm_fault(). It naturally solve the issue of multiple page fault
    accounting when page fault retry happened.

    Add the missing PERF_COUNT_SW_PAGE_FAULTS perf events too. Note, the
    other two perf events (PERF_COUNT_SW_PAGE_FAULTS_[MAJ|MIN]) were done in
    handle_mm_fault().

    Signed-off-by: Peter Xu
    Signed-off-by: Andrew Morton
    Cc: Michal Simek
    Link: http://lkml.kernel.org/r/20200707225021.200906-11-peterx@redhat.com
    Signed-off-by: Linus Torvalds

    Peter Xu
     
  • Patch series "mm: Page fault accounting cleanups", v5.

    This is v5 of the pf accounting cleanup series. It originates from Gerald
    Schaefer's report on an issue a week ago regarding to incorrect page fault
    accountings for retried page fault after commit 4064b9827063 ("mm: allow
    VM_FAULT_RETRY for multiple times"):

    https://lore.kernel.org/lkml/20200610174811.44b94525@thinkpad/

    What this series did:

    - Correct page fault accounting: we do accounting for a page fault
    (no matter whether it's from #PF handling, or gup, or anything else)
    only with the one that completed the fault. For example, page fault
    retries should not be counted in page fault counters. Same to the
    perf events.

    - Unify definition of PERF_COUNT_SW_PAGE_FAULTS: currently this perf
    event is used in an adhoc way across different archs.

    Case (1): for many archs it's done at the entry of a page fault
    handler, so that it will also cover e.g. errornous faults.

    Case (2): for some other archs, it is only accounted when the page
    fault is resolved successfully.

    Case (3): there're still quite some archs that have not enabled
    this perf event.

    Since this series will touch merely all the archs, we unify this
    perf event to always follow case (1), which is the one that makes most
    sense. And since we moved the accounting into handle_mm_fault, the
    other two MAJ/MIN perf events are well taken care of naturally.

    - Unify definition of "major faults": the definition of "major
    fault" is slightly changed when used in accounting (not
    VM_FAULT_MAJOR). More information in patch 1.

    - Always account the page fault onto the one that triggered the page
    fault. This does not matter much for #PF handlings, but mostly for
    gup. More information on this in patch 25.

    Patchset layout:

    Patch 1: Introduced the accounting in handle_mm_fault(), not enabled.
    Patch 2-23: Enable the new accounting for arch #PF handlers one by one.
    Patch 24: Enable the new accounting for the rest outliers (gup, iommu, etc.)
    Patch 25: Cleanup GUP task_struct pointer since it's not needed any more

    This patch (of 25):

    This is a preparation patch to move page fault accountings into the
    general code in handle_mm_fault(). This includes both the per task
    flt_maj/flt_min counters, and the major/minor page fault perf events. To
    do this, the pt_regs pointer is passed into handle_mm_fault().

    PERF_COUNT_SW_PAGE_FAULTS should still be kept in per-arch page fault
    handlers.

    So far, all the pt_regs pointer that passed into handle_mm_fault() is
    NULL, which means this patch should have no intented functional change.

    Suggested-by: Linus Torvalds
    Signed-off-by: Peter Xu
    Signed-off-by: Andrew Morton
    Cc: Albert Ou
    Cc: Alexander Gordeev
    Cc: Andy Lutomirski
    Cc: Benjamin Herrenschmidt
    Cc: Borislav Petkov
    Cc: Brian Cain
    Cc: Catalin Marinas
    Cc: Christian Borntraeger
    Cc: Chris Zankel
    Cc: Dave Hansen
    Cc: David S. Miller
    Cc: Geert Uytterhoeven
    Cc: Gerald Schaefer
    Cc: Greentime Hu
    Cc: Guo Ren
    Cc: Heiko Carstens
    Cc: Helge Deller
    Cc: H. Peter Anvin
    Cc: Ingo Molnar
    Cc: Ivan Kokshaysky
    Cc: James E.J. Bottomley
    Cc: John Hubbard
    Cc: Jonas Bonn
    Cc: Ley Foon Tan
    Cc: "Luck, Tony"
    Cc: Matt Turner
    Cc: Max Filippov
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Nick Hu
    Cc: Palmer Dabbelt
    Cc: Paul Mackerras
    Cc: Paul Walmsley
    Cc: Pekka Enberg
    Cc: Peter Zijlstra
    Cc: Richard Henderson
    Cc: Rich Felker
    Cc: Russell King
    Cc: Stafford Horne
    Cc: Stefan Kristiansson
    Cc: Thomas Bogendoerfer
    Cc: Thomas Gleixner
    Cc: Vasily Gorbik
    Cc: Vincent Chen
    Cc: Vineet Gupta
    Cc: Will Deacon
    Cc: Yoshinori Sato
    Link: http://lkml.kernel.org/r/20200707225021.200906-1-peterx@redhat.com
    Link: http://lkml.kernel.org/r/20200707225021.200906-2-peterx@redhat.com
    Signed-off-by: Linus Torvalds

    Peter Xu
     
  • segment_eq is only used to implement uaccess_kernel. Just open code
    uaccess_kernel in the arch uaccess headers and remove one layer of
    indirection.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Acked-by: Linus Torvalds
    Acked-by: Greentime Hu
    Acked-by: Geert Uytterhoeven
    Cc: Nick Hu
    Cc: Vincent Chen
    Cc: Paul Walmsley
    Cc: Palmer Dabbelt
    Link: http://lkml.kernel.org/r/20200710135706.537715-5-hch@lst.de
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     

08 Aug, 2020

3 commits

  • After removal of CONFIG_HAVE_MEMBLOCK_NODE_MAP we have two equivalent
    functions that call memory_present() for each region in memblock.memory:
    sparse_memory_present_with_active_regions() and membocks_present().

    Moreover, all architectures have a call to either of these functions
    preceding the call to sparse_init() and in the most cases they are called
    one after the other.

    Mark the regions from memblock.memory as present during sparce_init() by
    making sparse_init() call memblocks_present(), make memblocks_present()
    and memory_present() functions static and remove redundant
    sparse_memory_present_with_active_regions() function.

    Also remove no longer required HAVE_MEMORY_PRESENT configuration option.

    Signed-off-by: Mike Rapoport
    Signed-off-by: Andrew Morton
    Link: http://lkml.kernel.org/r/20200712083130.22919-1-rppt@kernel.org
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • Most architectures define pgd_free() as a wrapper for free_page().

    Provide a generic version in asm-generic/pgalloc.h and enable its use for
    most architectures.

    Signed-off-by: Mike Rapoport
    Signed-off-by: Andrew Morton
    Reviewed-by: Pekka Enberg
    Acked-by: Geert Uytterhoeven [m68k]
    Cc: Abdul Haleem
    Cc: Andy Lutomirski
    Cc: Arnd Bergmann
    Cc: Christophe Leroy
    Cc: Joerg Roedel
    Cc: Joerg Roedel
    Cc: Max Filippov
    Cc: Peter Zijlstra (Intel)
    Cc: Satheesh Rajendran
    Cc: Stafford Horne
    Cc: Stephen Rothwell
    Cc: Steven Rostedt
    Cc: Matthew Wilcox
    Link: http://lkml.kernel.org/r/20200627143453.31835-7-rppt@kernel.org
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • Patch series "mm: cleanup usage of "

    Most architectures have very similar versions of pXd_alloc_one() and
    pXd_free_one() for intermediate levels of page table. These patches add
    generic versions of these functions in and enable
    use of the generic functions where appropriate.

    In addition, functions declared and defined in headers are
    used mostly by core mm and early mm initialization in arch and there is no
    actual reason to have the included all over the place.
    The first patch in this series removes unneeded includes of

    In the end it didn't work out as neatly as I hoped and moving
    pXd_alloc_track() definitions to would require
    unnecessary changes to arches that have custom page table allocations, so
    I've decided to move lib/ioremap.c to mm/ and make pgalloc-track.h local
    to mm/.

    This patch (of 8):

    In most cases header is required only for allocations of
    page table memory. Most of the .c files that include that header do not
    use symbols declared in and do not require that header.

    As for the other header files that used to include , it is
    possible to move that include into the .c file that actually uses symbols
    from and drop the include from the header file.

    The process was somewhat automated using

    sed -i -E '/[
    Signed-off-by: Andrew Morton
    Reviewed-by: Pekka Enberg
    Acked-by: Geert Uytterhoeven [m68k]
    Cc: Abdul Haleem
    Cc: Andy Lutomirski
    Cc: Arnd Bergmann
    Cc: Christophe Leroy
    Cc: Joerg Roedel
    Cc: Max Filippov
    Cc: Peter Zijlstra
    Cc: Satheesh Rajendran
    Cc: Stafford Horne
    Cc: Stephen Rothwell
    Cc: Steven Rostedt
    Cc: Joerg Roedel
    Cc: Matthew Wilcox
    Link: http://lkml.kernel.org/r/20200627143453.31835-1-rppt@kernel.org
    Link: http://lkml.kernel.org/r/20200627143453.31835-2-rppt@kernel.org
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     

05 Aug, 2020

1 commit

  • Pull close_range() implementation from Christian Brauner:
    "This adds the close_range() syscall. It allows to efficiently close a
    range of file descriptors up to all file descriptors of a calling
    task.

    This is coordinated with the FreeBSD folks which have copied our
    version of this syscall and in the meantime have already merged it in
    April 2019:

    https://reviews.freebsd.org/D21627
    https://svnweb.freebsd.org/base?view=revision&revision=359836

    The syscall originally came up in a discussion around the new mount
    API and making new file descriptor types cloexec by default. During
    this discussion, Al suggested the close_range() syscall.

    First, it helps to close all file descriptors of an exec()ing task.
    This can be done safely via (quoting Al's example from [1] verbatim):

    /* that exec is sensitive */
    unshare(CLONE_FILES);
    /* we don't want anything past stderr here */
    close_range(3, ~0U);
    execve(....);

    The code snippet above is one way of working around the problem that
    file descriptors are not cloexec by default. This is aggravated by the
    fact that we can't just switch them over without massively regressing
    userspace. For a whole class of programs having an in-kernel method of
    closing all file descriptors is very helpful (e.g. demons, service
    managers, programming language standard libraries, container managers
    etc.).

    Second, it allows userspace to avoid implementing closing all file
    descriptors by parsing through /proc//fd/* and calling close() on
    each file descriptor and other hacks. From looking at various
    large(ish) userspace code bases this or similar patterns are very
    common in service managers, container runtimes, and programming
    language runtimes/standard libraries such as Python or Rust.

    In addition, the syscall will also work for tasks that do not have
    procfs mounted and on kernels that do not have procfs support compiled
    in. In such situations the only way to make sure that all file
    descriptors are closed is to call close() on each file descriptor up
    to UINT_MAX or RLIMIT_NOFILE, OPEN_MAX trickery.

    Based on Linus' suggestion close_range() also comes with a new flag
    CLOSE_RANGE_UNSHARE to more elegantly handle file descriptor dropping
    right before exec. This would usually be expressed in the sequence:

    unshare(CLONE_FILES);
    close_range(3, ~0U);

    as pointed out by Linus it might be desirable to have this be a part
    of close_range() itself under a new flag CLOSE_RANGE_UNSHARE which
    gets especially handy when we're closing all file descriptors above a
    certain threshold.

    Test-suite as always included"

    * tag 'close-range-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
    tests: add CLOSE_RANGE_UNSHARE tests
    close_range: add CLOSE_RANGE_UNSHARE
    tests: add close_range() tests
    arch: wire-up close_range()
    open: add close_range()

    Linus Torvalds
     

05 Jul, 2020

3 commits

  • Now that HAVE_COPY_THREAD_TLS has been removed, rename copy_thread_tls()
    back simply copy_thread(). It's a simpler name, and doesn't imply that only
    tls is copied here. This finishes an outstanding chunk of internal process
    creation work since we've added clone3().

    Cc: linux-arch@vger.kernel.org
    Acked-by: Thomas Bogendoerfer A
    Acked-by: Stafford Horne
    Acked-by: Greentime Hu
    Acked-by: Geert Uytterhoeven A
    Reviewed-by: Kees Cook
    Signed-off-by: Christian Brauner

    Christian Brauner
     
  • All architectures support copy_thread_tls() now, so remove the legacy
    copy_thread() function and the HAVE_COPY_THREAD_TLS config option. Everyone
    uses the same process creation calling convention based on
    copy_thread_tls() and struct kernel_clone_args. This will make it easier to
    maintain the core process creation code under kernel/, simplifies the
    callpaths and makes the identical for all architectures.

    Cc: linux-arch@vger.kernel.org
    Acked-by: Thomas Bogendoerfer
    Acked-by: Greentime Hu
    Acked-by: Geert Uytterhoeven
    Reviewed-by: Kees Cook
    Signed-off-by: Christian Brauner

    Christian Brauner
     
  • Use the copy_thread_tls() calling convention which passes tls through a
    register. This is required so we can remove the copy_thread{_tls}() split
    and remove the HAVE_COPY_THREAD_TLS macro.

    Cc: Michal Simek
    Signed-off-by: Christian Brauner

    Christian Brauner
     

17 Jun, 2020

1 commit

  • This wires up the close_range() syscall into all arches at once.

    Suggested-by: Arnd Bergmann
    Signed-off-by: Christian Brauner
    Reviewed-by: Oleg Nesterov
    Acked-by: Arnd Bergmann
    Acked-by: Michael Ellerman (powerpc)
    Cc: Jann Horn
    Cc: David Howells
    Cc: Dmitry V. Levin
    Cc: Linus Torvalds
    Cc: Al Viro
    Cc: Florian Weimer
    Cc: linux-api@vger.kernel.org
    Cc: linux-alpha@vger.kernel.org
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: linux-ia64@vger.kernel.org
    Cc: linux-m68k@lists.linux-m68k.org
    Cc: linux-mips@vger.kernel.org
    Cc: linux-parisc@vger.kernel.org
    Cc: linuxppc-dev@lists.ozlabs.org
    Cc: linux-s390@vger.kernel.org
    Cc: linux-sh@vger.kernel.org
    Cc: sparclinux@vger.kernel.org
    Cc: linux-xtensa@linux-xtensa.org
    Cc: linux-arch@vger.kernel.org
    Cc: x86@kernel.org

    Christian Brauner
     

10 Jun, 2020

7 commits

  • Convert comments that reference mmap_sem to reference mmap_lock instead.

    [akpm@linux-foundation.org: fix up linux-next leftovers]
    [akpm@linux-foundation.org: s/lockaphore/lock/, per Vlastimil]
    [akpm@linux-foundation.org: more linux-next fixups, per Michel]

    Signed-off-by: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Reviewed-by: Vlastimil Babka
    Reviewed-by: Daniel Jordan
    Cc: Davidlohr Bueso
    Cc: David Rientjes
    Cc: Hugh Dickins
    Cc: Jason Gunthorpe
    Cc: Jerome Glisse
    Cc: John Hubbard
    Cc: Laurent Dufour
    Cc: Liam Howlett
    Cc: Matthew Wilcox
    Cc: Peter Zijlstra
    Cc: Ying Han
    Link: http://lkml.kernel.org/r/20200520052908.204642-13-walken@google.com
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • Convert comments that reference old mmap_sem APIs to reference
    corresponding new mmap locking APIs instead.

    Signed-off-by: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Reviewed-by: Vlastimil Babka
    Reviewed-by: Davidlohr Bueso
    Reviewed-by: Daniel Jordan
    Cc: David Rientjes
    Cc: Hugh Dickins
    Cc: Jason Gunthorpe
    Cc: Jerome Glisse
    Cc: John Hubbard
    Cc: Laurent Dufour
    Cc: Liam Howlett
    Cc: Matthew Wilcox
    Cc: Peter Zijlstra
    Cc: Ying Han
    Link: http://lkml.kernel.org/r/20200520052908.204642-12-walken@google.com
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • This change converts the existing mmap_sem rwsem calls to use the new mmap
    locking API instead.

    The change is generated using coccinelle with the following rule:

    // spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir .

    @@
    expression mm;
    @@
    (
    -init_rwsem
    +mmap_init_lock
    |
    -down_write
    +mmap_write_lock
    |
    -down_write_killable
    +mmap_write_lock_killable
    |
    -down_write_trylock
    +mmap_write_trylock
    |
    -up_write
    +mmap_write_unlock
    |
    -downgrade_write
    +mmap_write_downgrade
    |
    -down_read
    +mmap_read_lock
    |
    -down_read_killable
    +mmap_read_lock_killable
    |
    -down_read_trylock
    +mmap_read_trylock
    |
    -up_read
    +mmap_read_unlock
    )
    -(&mm->mmap_sem)
    +(mm)

    Signed-off-by: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Reviewed-by: Daniel Jordan
    Reviewed-by: Laurent Dufour
    Reviewed-by: Vlastimil Babka
    Cc: Davidlohr Bueso
    Cc: David Rientjes
    Cc: Hugh Dickins
    Cc: Jason Gunthorpe
    Cc: Jerome Glisse
    Cc: John Hubbard
    Cc: Liam Howlett
    Cc: Matthew Wilcox
    Cc: Peter Zijlstra
    Cc: Ying Han
    Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.com
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • All architectures define pte_index() as

    (address >> PAGE_SHIFT) & (PTRS_PER_PTE - 1)

    and all architectures define pte_offset_kernel() as an entry in the array
    of PTEs indexed by the pte_index().

    For the most architectures the pte_offset_kernel() implementation relies
    on the availability of pmd_page_vaddr() that converts a PMD entry value to
    the virtual address of the page containing PTEs array.

    Let's move x86 definitions of the PTE accessors to the generic place in
    and then simply drop the respective definitions from the
    other architectures.

    The architectures that didn't provide pmd_page_vaddr() are updated to have
    that defined.

    The generic implementation of pte_offset_kernel() can be overridden by an
    architecture and alpha makes use of this because it has special ordering
    requirements for its version of pte_offset_kernel().

    [rppt@linux.ibm.com: v2]
    Link: http://lkml.kernel.org/r/20200514170327.31389-11-rppt@kernel.org
    [rppt@linux.ibm.com: update]
    Link: http://lkml.kernel.org/r/20200514170327.31389-12-rppt@kernel.org
    [rppt@linux.ibm.com: update]
    Link: http://lkml.kernel.org/r/20200514170327.31389-13-rppt@kernel.org
    [akpm@linux-foundation.org: fix x86 warning]
    [sfr@canb.auug.org.au: fix powerpc build]
    Link: http://lkml.kernel.org/r/20200607153443.GB738695@linux.ibm.com

    Signed-off-by: Mike Rapoport
    Signed-off-by: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Cc: Arnd Bergmann
    Cc: Borislav Petkov
    Cc: Brian Cain
    Cc: Catalin Marinas
    Cc: Chris Zankel
    Cc: "David S. Miller"
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Greg Ungerer
    Cc: Guan Xuetao
    Cc: Guo Ren
    Cc: Heiko Carstens
    Cc: Helge Deller
    Cc: Ingo Molnar
    Cc: Ley Foon Tan
    Cc: Mark Salter
    Cc: Matthew Wilcox
    Cc: Matt Turner
    Cc: Max Filippov
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Nick Hu
    Cc: Paul Walmsley
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Russell King
    Cc: Stafford Horne
    Cc: Thomas Bogendoerfer
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Vincent Chen
    Cc: Vineet Gupta
    Cc: Will Deacon
    Cc: Yoshinori Sato
    Link: http://lkml.kernel.org/r/20200514170327.31389-10-rppt@kernel.org
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • The powerpc 32-bit implementation of pgtable has nice shortcuts for
    accessing kernel PMD and PTE for a given virtual address. Make these
    helpers available for all architectures.

    [rppt@linux.ibm.com: microblaze: fix page table traversal in setup_rt_frame()]
    Link: http://lkml.kernel.org/r/20200518191511.GD1118872@kernel.org
    [akpm@linux-foundation.org: s/pmd_ptr_k/pmd_off_k/ in various powerpc places]

    Signed-off-by: Mike Rapoport
    Signed-off-by: Andrew Morton
    Cc: Arnd Bergmann
    Cc: Borislav Petkov
    Cc: Brian Cain
    Cc: Catalin Marinas
    Cc: Chris Zankel
    Cc: "David S. Miller"
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Greg Ungerer
    Cc: Guan Xuetao
    Cc: Guo Ren
    Cc: Heiko Carstens
    Cc: Helge Deller
    Cc: Ingo Molnar
    Cc: Ley Foon Tan
    Cc: Mark Salter
    Cc: Matthew Wilcox
    Cc: Matt Turner
    Cc: Max Filippov
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Nick Hu
    Cc: Paul Walmsley
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Russell King
    Cc: Stafford Horne
    Cc: Thomas Bogendoerfer
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Vincent Chen
    Cc: Vineet Gupta
    Cc: Will Deacon
    Cc: Yoshinori Sato
    Link: http://lkml.kernel.org/r/20200514170327.31389-9-rppt@kernel.org
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • The replacement of with made the include
    of the latter in the middle of asm includes. Fix this up with the aid of
    the below script and manual adjustments here and there.

    import sys
    import re

    if len(sys.argv) is not 3:
    print "USAGE: %s " % (sys.argv[0])
    sys.exit(1)

    hdr_to_move="#include " % sys.argv[2]
    moved = False
    in_hdrs = False

    with open(sys.argv[1], "r") as f:
    lines = f.readlines()
    for _line in lines:
    line = _line.rstrip('
    ')
    if line == hdr_to_move:
    continue
    if line.startswith("#include
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Greg Ungerer
    Cc: Guan Xuetao
    Cc: Guo Ren
    Cc: Heiko Carstens
    Cc: Helge Deller
    Cc: Ingo Molnar
    Cc: Ley Foon Tan
    Cc: Mark Salter
    Cc: Matthew Wilcox
    Cc: Matt Turner
    Cc: Max Filippov
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Nick Hu
    Cc: Paul Walmsley
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Russell King
    Cc: Stafford Horne
    Cc: Thomas Bogendoerfer
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Vincent Chen
    Cc: Vineet Gupta
    Cc: Will Deacon
    Cc: Yoshinori Sato
    Link: http://lkml.kernel.org/r/20200514170327.31389-4-rppt@kernel.org
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • The include/linux/pgtable.h is going to be the home of generic page table
    manipulation functions.

    Start with moving asm-generic/pgtable.h to include/linux/pgtable.h and
    make the latter include asm/pgtable.h.

    Signed-off-by: Mike Rapoport
    Signed-off-by: Andrew Morton
    Cc: Arnd Bergmann
    Cc: Borislav Petkov
    Cc: Brian Cain
    Cc: Catalin Marinas
    Cc: Chris Zankel
    Cc: "David S. Miller"
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Greg Ungerer
    Cc: Guan Xuetao
    Cc: Guo Ren
    Cc: Heiko Carstens
    Cc: Helge Deller
    Cc: Ingo Molnar
    Cc: Ley Foon Tan
    Cc: Mark Salter
    Cc: Matthew Wilcox
    Cc: Matt Turner
    Cc: Max Filippov
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Nick Hu
    Cc: Paul Walmsley
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Russell King
    Cc: Stafford Horne
    Cc: Thomas Bogendoerfer
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Vincent Chen
    Cc: Vineet Gupta
    Cc: Will Deacon
    Cc: Yoshinori Sato
    Link: http://lkml.kernel.org/r/20200514170327.31389-3-rppt@kernel.org
    Signed-off-by: Linus Torvalds

    Mike Rapoport