15 Jan, 2020

1 commit

  • commit c346b94f8c5d1b7d637522c908209de93305a8eb upstream.

    This is required for clone3 which passes the TLS value through a
    struct rather than a register.

    Signed-off-by: Amanieu d'Antras
    Cc: linux-xtensa@linux-xtensa.org
    Cc: # 5.3.x
    Link: https://lore.kernel.org/r/20200102172413.654385-7-amanieu@gmail.com
    Signed-off-by: Christian Brauner
    Signed-off-by: Greg Kroah-Hartman

    Amanieu d'Antras
     

21 Dec, 2019

3 commits

  • commit c2d9aa3b6e56de56c7f1ed9026ca6ec7cfbeef19 upstream.

    syscall return value is in the register a2, not a0.

    Cc: stable@vger.kernel.org # v5.0+
    Fixes: 9f24f3c1067c ("xtensa: implement tracehook functions and enable HAVE_ARCH_TRACEHOOK")
    Signed-off-by: Max Filippov
    Signed-off-by: Greg Kroah-Hartman

    Max Filippov
     
  • commit 36de10c4788efc6efe6ff9aa10d38cb7eea4c818 upstream.

    Virtual and translated addresses retrieved by the xtensa TLB sanity
    checker must be consistent, i.e. correspond to the same state of the
    checked TLB entry. KASAN shadow memory is mapped dynamically using
    auto-refill TLB entries and thus may change TLB state between the
    virtual and translated address retrieval, resulting in false TLB
    insanity report.
    Move read_xtlb_translation close to read_xtlb_virtual to make sure that
    read values are consistent.

    Cc: stable@vger.kernel.org
    Fixes: a99e07ee5e88 ("xtensa: check TLB sanity on return to userspace")
    Signed-off-by: Max Filippov
    Signed-off-by: Greg Kroah-Hartman

    Max Filippov
     
  • commit e64681b487c897ec871465083bf0874087d47b66 upstream.

    KASAN shadow map doesn't need to be accessible through the linear kernel
    mapping, allocate its pages with MEMBLOCK_ALLOC_ANYWHERE so that high
    memory can be used. This frees up to ~100MB of low memory on xtensa
    configurations with KASAN and high memory.

    Cc: stable@vger.kernel.org # v5.1+
    Fixes: f240ec09bb8a ("memblock: replace memblock_alloc_base(ANYWHERE) with memblock_phys_alloc")
    Reviewed-by: Mike Rapoport
    Signed-off-by: Max Filippov
    Signed-off-by: Greg Kroah-Hartman

    Max Filippov
     

16 Oct, 2019

2 commits

  • change_bit implementation for XCHAL_HAVE_EXCLUSIVE case changes all bits
    except the one required due to copy-paste error from clear_bit.

    Cc: stable@vger.kernel.org # v5.2+
    Fixes: f7c34874f04a ("xtensa: add exclusive atomics support")
    Signed-off-by: Max Filippov

    Max Filippov
     
  • virt device tree incorrectly uses 0xf0000000 on both sides of PCI IO
    ports address space mapping. This results in incorrect port address
    assignment in PCI IO BARs and subsequent crash on attempt to access
    them. Use 0 as base address in PCI IO ports address space.

    Signed-off-by: Max Filippov

    Max Filippov
     

15 Oct, 2019

4 commits

  • Custom outs*/ins* implementations are long gone from the xtensa port,
    remove matching EXPORT_SYMBOLs.
    This fixes the following build warnings issued by modpost since commit
    15bfc2348d54 ("modpost: check for static EXPORT_SYMBOL* functions"):

    WARNING: "insb" [vmlinux] is a static EXPORT_SYMBOL
    WARNING: "insw" [vmlinux] is a static EXPORT_SYMBOL
    WARNING: "insl" [vmlinux] is a static EXPORT_SYMBOL
    WARNING: "outsb" [vmlinux] is a static EXPORT_SYMBOL
    WARNING: "outsw" [vmlinux] is a static EXPORT_SYMBOL
    WARNING: "outsl" [vmlinux] is a static EXPORT_SYMBOL

    Cc: stable@vger.kernel.org
    Fixes: d38efc1f150f ("xtensa: adopt generic io routines")
    Signed-off-by: Max Filippov

    Max Filippov
     
  • __get_user_[no]check uses temporary buffer of type long to store result
    of __get_user_size and do sign extension on it when necessary. This
    doesn't work correctly for 64-bit data. Fix it by moving temporary
    buffer/sign extension logic to __get_user_asm.

    Don't do assignment of __get_user_bad result to (x) as it may not always
    be integer-compatible now and issue warning even when it's going to be
    optimized. Instead do (x) = 0; and call __get_user_bad separately.

    Zero initialize __x in __get_user_asm and use '+' constraint for its
    assembly argument, so that its value is preserved in error cases. This
    may add at most 1 cycle to the fast path, but saves an instruction and
    two padding bytes in the fixup section for each use of this macro and
    works for both misaligned store and store exception.

    Signed-off-by: Max Filippov

    Max Filippov
     
  • Numeric assembly arguments are hard to understand and assembly code that
    uses them is hard to modify. Use named arguments in __check_align_*,
    __get_user_asm and __put_user_asm. Modify macro parameter names so that
    they don't affect argument names. Use '+' constraint for the [err]
    argument instead of having it as both input and output.

    Signed-off-by: Max Filippov

    Max Filippov
     
  • First of all, on short copies __copy_{to,from}_user() return the amount
    of bytes left uncopied, *not* -EFAULT. get_user() and put_user() are
    expected to return -EFAULT on failure.

    Another problem is get_user(v32, (__u64 __user *)p); that should
    fetch 64bit value and the assign it to v32, truncating it in process.
    Current code, OTOH, reads 8 bytes of data and stores them at the
    address of v32, stomping on the 4 bytes that follow v32 itself.

    Signed-off-by: Al Viro
    Signed-off-by: Max Filippov

    Al Viro
     

27 Sep, 2019

1 commit

  • The naming of pgtable_page_{ctor,dtor}() seems to have confused a few
    people, and until recently arm64 used these erroneously/pointlessly for
    other levels of page table.

    To make it incredibly clear that these only apply to the PTE level, and to
    align with the naming of pgtable_pmd_page_{ctor,dtor}(), let's rename them
    to pgtable_pte_page_{ctor,dtor}().

    These changes were generated with the following shell script:

    ----
    git grep -lw 'pgtable_page_.tor' | while read FILE; do
    sed -i '{s/pgtable_page_ctor/pgtable_pte_page_ctor/}' $FILE;
    sed -i '{s/pgtable_page_dtor/pgtable_pte_page_dtor/}' $FILE;
    done
    ----

    ... with the documentation re-flowed to remain under 80 columns, and
    whitespace fixed up in macros to keep backslashes aligned.

    There should be no functional change as a result of this patch.

    Link: http://lkml.kernel.org/r/20190722141133.3116-1-mark.rutland@arm.com
    Signed-off-by: Mark Rutland
    Reviewed-by: Mike Rapoport
    Acked-by: Geert Uytterhoeven [m68k]
    Cc: Anshuman Khandual
    Cc: Matthew Wilcox
    Cc: Michal Hocko
    Cc: Yu Zhao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mark Rutland
     

26 Sep, 2019

2 commits

  • When a process expects no accesses to a certain memory range for a long
    time, it could hint kernel that the pages can be reclaimed instantly but
    data should be preserved for future use. This could reduce workingset
    eviction so it ends up increasing performance.

    This patch introduces the new MADV_PAGEOUT hint to madvise(2) syscall.
    MADV_PAGEOUT can be used by a process to mark a memory range as not
    expected to be used for a long time so that kernel reclaims *any LRU*
    pages instantly. The hint can help kernel in deciding which pages to
    evict proactively.

    A note: It doesn't apply SWAP_CLUSTER_MAX LRU page isolation limit
    intentionally because it's automatically bounded by PMD size. If PMD
    size(e.g., 256) makes some trouble, we could fix it later by limit it to
    SWAP_CLUSTER_MAX[1].

    - man-page material

    MADV_PAGEOUT (since Linux x.x)

    Do not expect access in the near future so pages in the specified
    regions could be reclaimed instantly regardless of memory pressure.
    Thus, access in the range after successful operation could cause
    major page fault but never lose the up-to-date contents unlike
    MADV_DONTNEED. Pages belonging to a shared mapping are only processed
    if a write access is allowed for the calling process.

    MADV_PAGEOUT cannot be applied to locked pages, Huge TLB pages, or
    VM_PFNMAP pages.

    [1] https://lore.kernel.org/lkml/20190710194719.GS29695@dhcp22.suse.cz/

    [minchan@kernel.org: clear PG_active on MADV_PAGEOUT]
    Link: http://lkml.kernel.org/r/20190802200643.GA181880@google.com
    [akpm@linux-foundation.org: resolve conflicts with hmm.git]
    Link: http://lkml.kernel.org/r/20190726023435.214162-5-minchan@kernel.org
    Signed-off-by: Minchan Kim
    Reported-by: kbuild test robot
    Acked-by: Michal Hocko
    Cc: James E.J. Bottomley
    Cc: Richard Henderson
    Cc: Ralf Baechle
    Cc: Chris Zankel
    Cc: Daniel Colascione
    Cc: Dave Hansen
    Cc: Hillf Danton
    Cc: Joel Fernandes (Google)
    Cc: Johannes Weiner
    Cc: Kirill A. Shutemov
    Cc: Oleksandr Natalenko
    Cc: Shakeel Butt
    Cc: Sonny Rao
    Cc: Suren Baghdasaryan
    Cc: Tim Murray
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • Patch series "Introduce MADV_COLD and MADV_PAGEOUT", v7.

    - Background

    The Android terminology used for forking a new process and starting an app
    from scratch is a cold start, while resuming an existing app is a hot
    start. While we continually try to improve the performance of cold
    starts, hot starts will always be significantly less power hungry as well
    as faster so we are trying to make hot start more likely than cold start.

    To increase hot start, Android userspace manages the order that apps
    should be killed in a process called ActivityManagerService.
    ActivityManagerService tracks every Android app or service that the user
    could be interacting with at any time and translates that into a ranked
    list for lmkd(low memory killer daemon). They are likely to be killed by
    lmkd if the system has to reclaim memory. In that sense they are similar
    to entries in any other cache. Those apps are kept alive for
    opportunistic performance improvements but those performance improvements
    will vary based on the memory requirements of individual workloads.

    - Problem

    Naturally, cached apps were dominant consumers of memory on the system.
    However, they were not significant consumers of swap even though they are
    good candidate for swap. Under investigation, swapping out only begins
    once the low zone watermark is hit and kswapd wakes up, but the overall
    allocation rate in the system might trip lmkd thresholds and cause a
    cached process to be killed(we measured performance swapping out vs.
    zapping the memory by killing a process. Unsurprisingly, zapping is 10x
    times faster even though we use zram which is much faster than real
    storage) so kill from lmkd will often satisfy the high zone watermark,
    resulting in very few pages actually being moved to swap.

    - Approach

    The approach we chose was to use a new interface to allow userspace to
    proactively reclaim entire processes by leveraging platform information.
    This allowed us to bypass the inaccuracy of the kernel’s LRUs for pages
    that are known to be cold from userspace and to avoid races with lmkd by
    reclaiming apps as soon as they entered the cached state. Additionally,
    it could provide many chances for platform to use much information to
    optimize memory efficiency.

    To achieve the goal, the patchset introduce two new options for madvise.
    One is MADV_COLD which will deactivate activated pages and the other is
    MADV_PAGEOUT which will reclaim private pages instantly. These new
    options complement MADV_DONTNEED and MADV_FREE by adding non-destructive
    ways to gain some free memory space. MADV_PAGEOUT is similar to
    MADV_DONTNEED in a way that it hints the kernel that memory region is not
    currently needed and should be reclaimed immediately; MADV_COLD is similar
    to MADV_FREE in a way that it hints the kernel that memory region is not
    currently needed and should be reclaimed when memory pressure rises.

    This patch (of 5):

    When a process expects no accesses to a certain memory range, it could
    give a hint to kernel that the pages can be reclaimed when memory pressure
    happens but data should be preserved for future use. This could reduce
    workingset eviction so it ends up increasing performance.

    This patch introduces the new MADV_COLD hint to madvise(2) syscall.
    MADV_COLD can be used by a process to mark a memory range as not expected
    to be used in the near future. The hint can help kernel in deciding which
    pages to evict early during memory pressure.

    It works for every LRU pages like MADV_[DONTNEED|FREE]. IOW, It moves

    active file page -> inactive file LRU
    active anon page -> inacdtive anon LRU

    Unlike MADV_FREE, it doesn't move active anonymous pages to inactive file
    LRU's head because MADV_COLD is a little bit different symantic.
    MADV_FREE means it's okay to discard when the memory pressure because the
    content of the page is *garbage* so freeing such pages is almost zero
    overhead since we don't need to swap out and access afterward causes just
    minor fault. Thus, it would make sense to put those freeable pages in
    inactive file LRU to compete other used-once pages. It makes sense for
    implmentaion point of view, too because it's not swapbacked memory any
    longer until it would be re-dirtied. Even, it could give a bonus to make
    them be reclaimed on swapless system. However, MADV_COLD doesn't mean
    garbage so reclaiming them requires swap-out/in in the end so it's bigger
    cost. Since we have designed VM LRU aging based on cost-model, anonymous
    cold pages would be better to position inactive anon's LRU list, not file
    LRU. Furthermore, it would help to avoid unnecessary scanning if system
    doesn't have a swap device. Let's start simpler way without adding
    complexity at this moment. However, keep in mind, too that it's a caveat
    that workloads with a lot of pages cache are likely to ignore MADV_COLD on
    anonymous memory because we rarely age anonymous LRU lists.

    * man-page material

    MADV_COLD (since Linux x.x)

    Pages in the specified regions will be treated as less-recently-accessed
    compared to pages in the system with similar access frequencies. In
    contrast to MADV_FREE, the contents of the region are preserved regardless
    of subsequent writes to pages.

    MADV_COLD cannot be applied to locked pages, Huge TLB pages, or VM_PFNMAP
    pages.

    [akpm@linux-foundation.org: resolve conflicts with hmm.git]
    Link: http://lkml.kernel.org/r/20190726023435.214162-2-minchan@kernel.org
    Signed-off-by: Minchan Kim
    Reported-by: kbuild test robot
    Acked-by: Michal Hocko
    Acked-by: Johannes Weiner
    Cc: James E.J. Bottomley
    Cc: Richard Henderson
    Cc: Ralf Baechle
    Cc: Chris Zankel
    Cc: Johannes Weiner
    Cc: Daniel Colascione
    Cc: Dave Hansen
    Cc: Hillf Danton
    Cc: Joel Fernandes (Google)
    Cc: Kirill A. Shutemov
    Cc: Oleksandr Natalenko
    Cc: Shakeel Butt
    Cc: Sonny Rao
    Cc: Suren Baghdasaryan
    Cc: Tim Murray
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     

25 Sep, 2019

2 commits

  • Both pgtable_cache_init() and pgd_cache_init() are used to initialize kmem
    cache for page table allocations on several architectures that do not use
    PAGE_SIZE tables for one or more levels of the page table hierarchy.

    Most architectures do not implement these functions and use __weak default
    NOP implementation of pgd_cache_init(). Since there is no such default
    for pgtable_cache_init(), its empty stub is duplicated among most
    architectures.

    Rename the definitions of pgd_cache_init() to pgtable_cache_init() and
    drop empty stubs of pgtable_cache_init().

    Link: http://lkml.kernel.org/r/1566457046-22637-1-git-send-email-rppt@linux.ibm.com
    Signed-off-by: Mike Rapoport
    Acked-by: Will Deacon [arm64]
    Acked-by: Thomas Gleixner [x86]
    Cc: Catalin Marinas
    Cc: Ingo Molnar
    Cc: Borislav Petkov
    Cc: Matthew Wilcox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • Patch series "mm: remove quicklist page table caches".

    A while ago Nicholas proposed to remove quicklist page table caches [1].

    I've rebased his patch on the curren upstream and switched ia64 and sh to
    use generic versions of PTE allocation.

    [1] https://lore.kernel.org/linux-mm/20190711030339.20892-1-npiggin@gmail.com

    This patch (of 3):

    Remove page table allocator "quicklists". These have been around for a
    long time, but have not got much traction in the last decade and are only
    used on ia64 and sh architectures.

    The numbers in the initial commit look interesting but probably don't
    apply anymore. If anybody wants to resurrect this it's in the git
    history, but it's unhelpful to have this code and divergent allocator
    behaviour for minor archs.

    Also it might be better to instead make more general improvements to page
    allocator if this is still so slow.

    Link: http://lkml.kernel.org/r/1565250728-21721-2-git-send-email-rppt@linux.ibm.com
    Signed-off-by: Nicholas Piggin
    Signed-off-by: Mike Rapoport
    Cc: Tony Luck
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nicholas Piggin
     

20 Sep, 2019

1 commit

  • Pull dma-mapping updates from Christoph Hellwig:

    - add dma-mapping and block layer helpers to take care of IOMMU merging
    for mmc plus subsequent fixups (Yoshihiro Shimoda)

    - rework handling of the pgprot bits for remapping (me)

    - take care of the dma direct infrastructure for swiotlb-xen (me)

    - improve the dma noncoherent remapping infrastructure (me)

    - better defaults for ->mmap, ->get_sgtable and ->get_required_mask
    (me)

    - cleanup mmaping of coherent DMA allocations (me)

    - various misc cleanups (Andy Shevchenko, me)

    * tag 'dma-mapping-5.4' of git://git.infradead.org/users/hch/dma-mapping: (41 commits)
    mmc: renesas_sdhi_internal_dmac: Add MMC_CAP2_MERGE_CAPABLE
    mmc: queue: Fix bigger segments usage
    arm64: use asm-generic/dma-mapping.h
    swiotlb-xen: merge xen_unmap_single into xen_swiotlb_unmap_page
    swiotlb-xen: simplify cache maintainance
    swiotlb-xen: use the same foreign page check everywhere
    swiotlb-xen: remove xen_swiotlb_dma_mmap and xen_swiotlb_dma_get_sgtable
    xen: remove the exports for xen_{create,destroy}_contiguous_region
    xen/arm: remove xen_dma_ops
    xen/arm: simplify dma_cache_maint
    xen/arm: use dev_is_dma_coherent
    xen/arm: consolidate page-coherent.h
    xen/arm: use dma-noncoherent.h calls for xen-swiotlb cache maintainance
    arm: remove wrappers for the generic dma remap helpers
    dma-mapping: introduce a dma_common_find_pages helper
    dma-mapping: always use VM_DMA_COHERENT for generic DMA remap
    vmalloc: lift the arm flag for coherent mappings to common code
    dma-mapping: provide a better default ->get_required_mask
    dma-mapping: remove the dma_declare_coherent_memory export
    remoteproc: don't allow modular build
    ...

    Linus Torvalds
     

04 Sep, 2019

2 commits

  • Currently the generic dma remap allocator gets a vm_flags passed by
    the caller that is a little confusing. We just introduced a generic
    vmalloc-level flag to identify the dma coherent allocations, so use
    that everywhere and remove the now pointless argument.

    Signed-off-by: Christoph Hellwig

    Christoph Hellwig
     
  • CONFIG_ARCH_NO_COHERENT_DMA_MMAP is now functionally identical to
    !CONFIG_MMU, so remove the separate symbol. The only difference is that
    arm did not set it for !CONFIG_MMU, but arm uses a separate dma mapping
    implementation including its own mmap method, which is handled by moving
    the CONFIG_MMU check in dma_can_mmap so that is only applies to the
    dma-direct case, just as the other ifdefs for it.

    Signed-off-by: Christoph Hellwig
    Acked-by: Geert Uytterhoeven # m68k

    Christoph Hellwig
     

02 Sep, 2019

3 commits

  • Move PCI configuration space, MMIO and memory to the KIO range to free
    vmalloc area and use static TLB to access them. Move MMIO to the
    beginning of KIO and define PCI_IOBASE as XCHAL_KIO_BYPASS_VADDR to
    match it. Reduce number of supported PCI buses to 0x3f so that ECAM
    window fits into first 64MB of the KIO. Reduce size of the PCI memory
    window to 128MB so that it fits into KIO.

    Signed-off-by: Max Filippov

    Max Filippov
     
  • Provide a Kconfig choice to select whether only the default ABI, only
    call0 ABI or both are supported. The default for XEA2 is windowed, but
    it may change for XEA3. Call0 only runs userspace with PS.WOE disabled.
    Supporting both windowed and call0 ABIs is tricky, as there's no
    indication in the ELF binaries which ABI they use. So it is done by
    probing: each process is started with PS.WOE disabled, but the handler
    of an illegal instruction exception taken with PS.WOE retries faulting
    instruction after enabling PS.WOE. It must happen before any signal is
    delivered to the process, otherwise it may be delivered incorrectly.

    Signed-off-by: Max Filippov

    Max Filippov
     
  • PS_WOE_BIT is mainly used to generate PS.WOE mask in the code. Introduce
    PS_WOE_MASK macro and use it instead.

    Signed-off-by: Max Filippov

    Max Filippov
     

27 Aug, 2019

1 commit

  • The xtensa free_initrd_mem() verifies that initrd is mapped and then
    frees its memory using free_reserved_area().

    The initrd is considered mapped when its memory was successfully reserved
    with mem_reserve().

    Resetting initrd_start to 0 in case of mem_reserve() failure allows to
    switch to generic free_initrd_mem() implementation.

    Signed-off-by: Mike Rapoport
    Message-Id:
    Signed-off-by: Max Filippov

    Mike Rapoport
     

13 Aug, 2019

1 commit

  • ITLB entry modifications must be followed by the isync instruction
    before the new entries are possibly used. cpu_reset lacks one isync
    between ITLB way 6 initialization and jump to the identity mapping.
    Add missing isync to xtensa cpu_reset.

    Cc: stable@vger.kernel.org
    Signed-off-by: Max Filippov

    Max Filippov
     

25 Jul, 2019

1 commit

  • Assembly entry/return abstraction change didn't add asmmacro.h include
    statement to coprocessor.S, resulting in references to undefined macros
    abi_entry and abi_ret on cores that define XTENSA_HAVE_COPROCESSORS.
    Fix that by including asm/asmmacro.h from the coprocessor.S.

    Signed-off-by: Max Filippov

    Max Filippov
     

17 Jul, 2019

4 commits

  • Merge more updates from Andrew Morton:
    "VM:
    - z3fold fixes and enhancements by Henry Burns and Vitaly Wool

    - more accurate reclaimed slab caches calculations by Yafang Shao

    - fix MAP_UNINITIALIZED UAPI symbol to not depend on config, by
    Christoph Hellwig

    - !CONFIG_MMU fixes by Christoph Hellwig

    - new novmcoredd parameter to omit device dumps from vmcore, by
    Kairui Song

    - new test_meminit module for testing heap and pagealloc
    initialization, by Alexander Potapenko

    - ioremap improvements for huge mappings, by Anshuman Khandual

    - generalize kprobe page fault handling, by Anshuman Khandual

    - device-dax hotplug fixes and improvements, by Pavel Tatashin

    - enable synchronous DAX fault on powerpc, by Aneesh Kumar K.V

    - add pte_devmap() support for arm64, by Robin Murphy

    - unify locked_vm accounting with a helper, by Daniel Jordan

    - several misc fixes

    core/lib:
    - new typeof_member() macro including some users, by Alexey Dobriyan

    - make BIT() and GENMASK() available in asm, by Masahiro Yamada

    - changed LIST_POISON2 on x86_64 to 0xdead000000000122 for better
    code generation, by Alexey Dobriyan

    - rbtree code size optimizations, by Michel Lespinasse

    - convert struct pid count to refcount_t, by Joel Fernandes

    get_maintainer.pl:
    - add --no-moderated switch to skip moderated ML's, by Joe Perches

    misc:
    - ptrace PTRACE_GET_SYSCALL_INFO interface

    - coda updates

    - gdb scripts, various"

    [ Using merge message suggestion from Vlastimil Babka, with some editing - Linus ]

    * emailed patches from Andrew Morton : (100 commits)
    fs/select.c: use struct_size() in kmalloc()
    mm: add account_locked_vm utility function
    arm64: mm: implement pte_devmap support
    mm: introduce ARCH_HAS_PTE_DEVMAP
    mm: clean up is_device_*_page() definitions
    mm/mmap: move common defines to mman-common.h
    mm: move MAP_SYNC to asm-generic/mman-common.h
    device-dax: "Hotremove" persistent memory that is used like normal RAM
    mm/hotplug: make remove_memory() interface usable
    device-dax: fix memory and resource leak if hotplug fails
    include/linux/lz4.h: fix spelling and copy-paste errors in documentation
    ipc/mqueue.c: only perform resource calculation if user valid
    include/asm-generic/bug.h: fix "cut here" for WARN_ON for __WARN_TAINT architectures
    scripts/gdb: add helpers to find and list devices
    scripts/gdb: add lx-genpd-summary command
    drivers/pps/pps.c: clear offset flags in PPS_SETPARAMS ioctl
    kernel/pid.c: convert struct pid count to refcount_t
    drivers/rapidio/devices/rio_mport_cdev.c: NUL terminate some strings
    select: shift restore_saved_sigmask_unless() into poll_select_copy_remaining()
    select: change do_poll() to return -ERESTARTNOHAND rather than -EINTR
    ...

    Linus Torvalds
     
  • We can't expose UAPI symbols differently based on CONFIG_ symbols, as
    userspace won't have them available. Instead always define the flag,
    but only respect it based on the config option.

    Link: http://lkml.kernel.org/r/20190703122359.18200-2-hch@lst.de
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Vladimir Murzin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • Pull rst conversion of docs from Mauro Carvalho Chehab:
    "As agreed with Jon, I'm sending this big series directly to you, c/c
    him, as this series required a special care, in order to avoid
    conflicts with other trees"

    * tag 'docs/v5.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (77 commits)
    docs: kbuild: fix build with pdf and fix some minor issues
    docs: block: fix pdf output
    docs: arm: fix a breakage with pdf output
    docs: don't use nested tables
    docs: gpio: add sysfs interface to the admin-guide
    docs: locking: add it to the main index
    docs: add some directories to the main documentation index
    docs: add SPDX tags to new index files
    docs: add a memory-devices subdir to driver-api
    docs: phy: place documentation under driver-api
    docs: serial: move it to the driver-api
    docs: driver-api: add remaining converted dirs to it
    docs: driver-api: add xilinx driver API documentation
    docs: driver-api: add a series of orphaned documents
    docs: admin-guide: add a series of orphaned documents
    docs: cgroup-v1: add it to the admin-guide book
    docs: aoe: add it to the driver-api book
    docs: add some documentation dirs to the driver-api book
    docs: driver-model: move it to the driver-api book
    docs: lp855x-driver.rst: add it to the driver-api book
    ...

    Linus Torvalds
     
  • Pull Xtensa updates from Max Filippov:

    - clean up PCI support code

    - add defconfig and DTS for the 'virt' board

    - abstract 'entry' and 'retw' uses in xtensa assembly in preparation
    for XEA3/NX pipeline support

    - random small cleanups

    * tag 'xtensa-20190715' of git://github.com/jcmvbkbc/linux-xtensa:
    xtensa: virt: add defconfig and DTS
    xtensa: abstract 'entry' and 'retw' in assembly code
    xtensa: One function call less in bootmem_init()
    xtensa: remove arch/xtensa/include/asm/types.h
    xtensa: use generic pcibios_set_master and pcibios_enable_device
    xtensa: drop dead PCI support code
    xtensa/PCI: Remove unused variable

    Linus Torvalds
     

15 Jul, 2019

1 commit

  • Rename the xtensa documentation files to ReST, add an
    index for them and adjust in order to produce a nice html
    output via the Sphinx build system.

    At its new index.rst, let's add a :orphan: while this is not linked to
    the main index.rst file, in order to avoid build warnings.

    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     

13 Jul, 2019

1 commit

  • Pull dma-mapping updates from Christoph Hellwig:

    - move the USB special case that bounced DMA through a device bar into
    the USB code instead of handling it in the common DMA code (Laurentiu
    Tudor and Fredrik Noring)

    - don't dip into the global CMA pool for single page allocations
    (Nicolin Chen)

    - fix a crash when allocating memory for the atomic pool failed during
    boot (Florian Fainelli)

    - move support for MIPS-style uncached segments to the common code and
    use that for MIPS and nios2 (me)

    - make support for DMA_ATTR_NON_CONSISTENT and
    DMA_ATTR_NO_KERNEL_MAPPING generic (me)

    - convert nds32 to the generic remapping allocator (me)

    * tag 'dma-mapping-5.3' of git://git.infradead.org/users/hch/dma-mapping: (29 commits)
    dma-mapping: mark dma_alloc_need_uncached as __always_inline
    MIPS: only select ARCH_HAS_UNCACHED_SEGMENT for non-coherent platforms
    usb: host: Fix excessive alignment restriction for local memory allocations
    lib/genalloc.c: Add algorithm, align and zeroed family of DMA allocators
    nios2: use the generic uncached segment support in dma-direct
    nds32: use the generic remapping allocator for coherent DMA allocations
    arc: use the generic remapping allocator for coherent DMA allocations
    dma-direct: handle DMA_ATTR_NO_KERNEL_MAPPING in common code
    dma-direct: handle DMA_ATTR_NON_CONSISTENT in common code
    dma-mapping: add a dma_alloc_need_uncached helper
    openrisc: remove the partial DMA_ATTR_NON_CONSISTENT support
    arc: remove the partial DMA_ATTR_NON_CONSISTENT support
    arm-nommu: remove the partial DMA_ATTR_NON_CONSISTENT support
    ARM: dma-mapping: allow larger DMA mask than supported
    dma-mapping: truncate dma masks to what dma_addr_t can hold
    iommu/dma: Apply dma_{alloc,free}_contiguous functions
    dma-remap: Avoid de-referencing NULL atomic_pool
    MIPS: use the generic uncached segment support in dma-direct
    dma-direct: provide generic support for uncached kernel segments
    au1100fb: fix DMA API abuse
    ...

    Linus Torvalds
     

12 Jul, 2019

1 commit

  • Pull clone3 system call from Christian Brauner:
    "This adds the clone3 syscall which is an extensible successor to clone
    after we snagged the last flag with CLONE_PIDFD during the 5.2 merge
    window for clone(). It cleanly supports all of the flags from clone()
    and thus all legacy workloads.

    There are few user visible differences between clone3 and clone.
    First, CLONE_DETACHED will cause EINVAL with clone3 so we can reuse
    this flag. Second, the CSIGNAL flag is deprecated and will cause
    EINVAL to be reported. It is superseeded by a dedicated "exit_signal"
    argument in struct clone_args thus freeing up even more flags. And
    third, clone3 gives CLONE_PIDFD a dedicated return argument in struct
    clone_args instead of abusing CLONE_PARENT_SETTID's parent_tidptr
    argument.

    The clone3 uapi is designed to be easy to handle on 32- and 64 bit:

    /* uapi */
    struct clone_args {
    __aligned_u64 flags;
    __aligned_u64 pidfd;
    __aligned_u64 child_tid;
    __aligned_u64 parent_tid;
    __aligned_u64 exit_signal;
    __aligned_u64 stack;
    __aligned_u64 stack_size;
    __aligned_u64 tls;
    };

    and a separate kernel struct is used that uses proper kernel typing:

    /* kernel internal */
    struct kernel_clone_args {
    u64 flags;
    int __user *pidfd;
    int __user *child_tid;
    int __user *parent_tid;
    int exit_signal;
    unsigned long stack;
    unsigned long stack_size;
    unsigned long tls;
    };

    The system call comes with a size argument which enables the kernel to
    detect what version of clone_args userspace is passing in. clone3
    validates that any additional bytes a given kernel does not know about
    are set to zero and that the size never exceeds a page.

    A nice feature is that this patchset allowed us to cleanup and
    simplify various core kernel codepaths in kernel/fork.c by making the
    internal _do_fork() function take struct kernel_clone_args even for
    legacy clone().

    This patch also unblocks the time namespace patchset which wants to
    introduce a new CLONE_TIMENS flag.

    Note, that clone3 has only been wired up for x86{_32,64}, arm{64}, and
    xtensa. These were the architectures that did not require special
    massaging.

    Other architectures treat fork-like system calls individually and
    after some back and forth neither Arnd nor I felt confident that we
    dared to add clone3 unconditionally to all architectures. We agreed to
    leave this up to individual architecture maintainers. This is why
    there's an additional patch that introduces __ARCH_WANT_SYS_CLONE3
    which any architecture can set once it has implemented support for
    clone3. The patch also adds a cond_syscall(clone3) for architectures
    such as nios2 or h8300 that generate their syscall table by simply
    including asm-generic/unistd.h. The hope is to get rid of
    __ARCH_WANT_SYS_CLONE3 and cond_syscall() rather soon"

    * tag 'clone3-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
    arch: handle arches who do not yet define clone3
    arch: wire-up clone3() syscall
    fork: add clone3

    Linus Torvalds
     

11 Jul, 2019

2 commits

  • Pull pidfd updates from Christian Brauner:
    "This adds two main features.

    - First, it adds polling support for pidfds. This allows process
    managers to know when a (non-parent) process dies in a race-free
    way.

    The notification mechanism used follows the same logic that is
    currently used when the parent of a task is notified of a child's
    death. With this patchset it is possible to put pidfds in an
    {e}poll loop and get reliable notifications for process (i.e.
    thread-group) exit.

    - The second feature compliments the first one by making it possible
    to retrieve pollable pidfds for processes that were not created
    using CLONE_PIDFD.

    A lot of processes get created with traditional PID-based calls
    such as fork() or clone() (without CLONE_PIDFD). For these
    processes a caller can currently not create a pollable pidfd. This
    is a problem for Android's low memory killer (LMK) and service
    managers such as systemd.

    Both patchsets are accompanied by selftests.

    It's perhaps worth noting that the work done so far and the work done
    in this branch for pidfd_open() and polling support do already see
    some adoption:

    - Android is in the process of backporting this work to all their LTS
    kernels [1]

    - Service managers make use of pidfd_send_signal but will need to
    wait until we enable waiting on pidfds for full adoption.

    - And projects I maintain make use of both pidfd_send_signal and
    CLONE_PIDFD [2] and will use polling support and pidfd_open() too"

    [1] https://android-review.googlesource.com/q/topic:%22pidfd+polling+support+4.9+backport%22
    https://android-review.googlesource.com/q/topic:%22pidfd+polling+support+4.14+backport%22
    https://android-review.googlesource.com/q/topic:%22pidfd+polling+support+4.19+backport%22

    [2] https://github.com/lxc/lxc/blob/aab6e3eb73c343231cdde775db938994fc6f2803/src/lxc/start.c#L1753

    * tag 'pidfd-updates-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
    tests: add pidfd_open() tests
    arch: wire-up pidfd_open()
    pid: add pidfd_open()
    pidfd: add polling selftests
    pidfd: add polling support

    Linus Torvalds
     
  • Pull m68nommu updates from Greg Ungerer:
    "A series of cleanups for the FLAT format binary loader, binfmt_flat,
    from Christoph.

    The end goal is to support no-MMU on RISC-V, and the last patch
    enables that"

    * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
    riscv: add binfmt_flat support
    binfmt_flat: don't offset the data start
    binfmt_flat: move the MAX_SHARED_LIBS definition to binfmt_flat.c
    binfmt_flat: remove the persistent argument from flat_get_addr_from_rp
    binfmt_flat: provide an asm-generic/flat.h
    binfmt_flat: make support for old format binaries optional
    binfmt_flat: add a ARCH_HAS_BINFMT_FLAT option
    binfmt_flat: add endianess annotations
    binfmt_flat: use fixed size type for the on-disk format
    binfmt_flat: consolidate two version of flat_v2_reloc_t
    binfmt_flat: remove the unused OLD_FLAT_FLAG_RAM definition
    binfmt_flat: remove the uapi header
    binfmt_flat: replace flat_argvp_envp_on_stack with a Kconfig variable
    binfmt_flat: remove flat_old_ram_flag
    binfmt_flat: provide a default version of flat_get_relocate_addr
    binfmt_flat: remove flat_set_persistent
    binfmt_flat: remove flat_reloc_valid

    Linus Torvalds
     

09 Jul, 2019

3 commits

  • …iederm/user-namespace

    Pull force_sig() argument change from Eric Biederman:
    "A source of error over the years has been that force_sig has taken a
    task parameter when it is only safe to use force_sig with the current
    task.

    The force_sig function is built for delivering synchronous signals
    such as SIGSEGV where the userspace application caused a synchronous
    fault (such as a page fault) and the kernel responded with a signal.

    Because the name force_sig does not make this clear, and because the
    force_sig takes a task parameter the function force_sig has been
    abused for sending other kinds of signals over the years. Slowly those
    have been fixed when the oopses have been tracked down.

    This set of changes fixes the remaining abusers of force_sig and
    carefully rips out the task parameter from force_sig and friends
    making this kind of error almost impossible in the future"

    * 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (27 commits)
    signal/x86: Move tsk inside of CONFIG_MEMORY_FAILURE in do_sigbus
    signal: Remove the signal number and task parameters from force_sig_info
    signal: Factor force_sig_info_to_task out of force_sig_info
    signal: Generate the siginfo in force_sig
    signal: Move the computation of force into send_signal and correct it.
    signal: Properly set TRACE_SIGNAL_LOSE_INFO in __send_signal
    signal: Remove the task parameter from force_sig_fault
    signal: Use force_sig_fault_to_task for the two calls that don't deliver to current
    signal: Explicitly call force_sig_fault on current
    signal/unicore32: Remove tsk parameter from __do_user_fault
    signal/arm: Remove tsk parameter from __do_user_fault
    signal/arm: Remove tsk parameter from ptrace_break
    signal/nds32: Remove tsk parameter from send_sigtrap
    signal/riscv: Remove tsk parameter from do_trap
    signal/sh: Remove tsk parameter from force_sig_info_fault
    signal/um: Remove task parameter from send_sigtrap
    signal/x86: Remove task parameter from send_sigtrap
    signal: Remove task parameter from force_sig_mceerr
    signal: Remove task parameter from force_sig
    signal: Remove task parameter from force_sigsegv
    ...

    Linus Torvalds
     
  • Add defconfig and DTS for a virt board. Defconfig enables PCIe host and
    a number of virtio devices. DTS routes legacy PCI IRQs to the first four
    level-triggered external IRQ lines. CPU core with edge-triggered IRQs
    among the first four may need a custom DTS to work correctly.

    Signed-off-by: Max Filippov

    Max Filippov
     
  • Provide abi_entry, abi_entry_default, abi_ret and abi_ret_default macros
    that allocate aligned stack frame in windowed and call0 ABIs.
    Provide XTENSA_SPILL_STACK_RESERVE macro that specifies required stack
    frame size when register spilling is involved.
    Replace all uses of 'entry' and 'retw' with the above macros.
    This makes most of the xtensa assembly code ready for XEA3 and call0 ABI.

    Signed-off-by: Max Filippov

    Max Filippov
     

06 Jul, 2019

1 commit


28 Jun, 2019

2 commits

  • This wires up the pidfd_open() syscall into all arches at once.

    Signed-off-by: Christian Brauner
    Reviewed-by: David Howells
    Reviewed-by: Oleg Nesterov
    Acked-by: Arnd Bergmann
    Cc: "Eric W. Biederman"
    Cc: Kees Cook
    Cc: Joel Fernandes (Google)
    Cc: Thomas Gleixner
    Cc: Jann Horn
    Cc: Andy Lutomirsky
    Cc: Andrew Morton
    Cc: Aleksa Sarai
    Cc: Linus Torvalds
    Cc: Al Viro
    Cc: linux-api@vger.kernel.org
    Cc: linux-alpha@vger.kernel.org
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: linux-ia64@vger.kernel.org
    Cc: linux-m68k@lists.linux-m68k.org
    Cc: linux-mips@vger.kernel.org
    Cc: linux-parisc@vger.kernel.org
    Cc: linuxppc-dev@lists.ozlabs.org
    Cc: linux-s390@vger.kernel.org
    Cc: linux-sh@vger.kernel.org
    Cc: sparclinux@vger.kernel.org
    Cc: linux-xtensa@linux-xtensa.org
    Cc: linux-arch@vger.kernel.org
    Cc: x86@kernel.org

    Christian Brauner
     
  • Xtensa does not define CONFIG_64BIT. The generic definition of
    BITS_PER_LONG in include/asm-generic/bitsperlong.h should work.
    With that definition removed from arch/xtensa/include/asm/types.h
    it does nothing but including arch/xtensa/include/uapi/asm/types.h
    Remove the arch/xtensa/include/asm/types.h header.

    Signed-off-by: Max Filippov

    Max Filippov
     

25 Jun, 2019

1 commit

  • DMA_ATTR_NO_KERNEL_MAPPING is generally implemented by allocating
    normal cacheable pages or CMA memory, and then returning the page
    pointer as the opaque handle. Lift that code from the xtensa and
    generic dma remapping implementations into the generic dma-direct
    code so that we don't even call arch_dma_alloc for these allocations.

    Signed-off-by: Christoph Hellwig

    Christoph Hellwig