15 Aug, 2020

3 commits

  • Merge more updates from Andrew Morton:
    "Subsystems affected by this patch series: mm/hotfixes, lz4, exec,
    mailmap, mm/thp, autofs, sysctl, mm/kmemleak, mm/misc and lib"

    * emailed patches from Andrew Morton : (35 commits)
    virtio: pci: constify ioreadX() iomem argument (as in generic implementation)
    ntb: intel: constify ioreadX() iomem argument (as in generic implementation)
    rtl818x: constify ioreadX() iomem argument (as in generic implementation)
    iomap: constify ioreadX() iomem argument (as in generic implementation)
    sh: use generic strncpy()
    sh: clkfwk: remove r8/r16/r32
    include/asm-generic/vmlinux.lds.h: align ro_after_init
    mm: annotate a data race in page_zonenum()
    mm/swap.c: annotate data races for lru_rotate_pvecs
    mm/rmap: annotate a data race at tlb_flush_batched
    mm/mempool: fix a data race in mempool_free()
    mm/list_lru: fix a data race in list_lru_count_one
    mm/memcontrol: fix a data race in scan count
    mm/page_counter: fix various data races at memsw
    mm/swapfile: fix and annotate various data races
    mm/filemap.c: fix a data race in filemap_fault()
    mm/swap_state: mark various intentional data races
    mm/page_io: mark various intentional data races
    mm/frontswap: mark various intentional data races
    mm/kmemleak: silence KCSAN splats in checksum
    ...

    Linus Torvalds
     
  • swap_info_struct si.highest_bit, si.swap_map[offset] and si.flags could
    be accessed concurrently separately as noticed by KCSAN,

    === si.highest_bit ===

    write to 0xffff8d5abccdc4d4 of 4 bytes by task 5353 on cpu 24:
    swap_range_alloc+0x81/0x130
    swap_range_alloc at mm/swapfile.c:681
    scan_swap_map_slots+0x371/0xb90
    get_swap_pages+0x39d/0x5c0
    get_swap_page+0xf2/0x524
    add_to_swap+0xe4/0x1c0
    shrink_page_list+0x1795/0x2870
    shrink_inactive_list+0x316/0x880
    shrink_lruvec+0x8dc/0x1380
    shrink_node+0x317/0xd80
    do_try_to_free_pages+0x1f7/0xa10
    try_to_free_pages+0x26c/0x5e0
    __alloc_pages_slowpath+0x458/0x1290

    read to 0xffff8d5abccdc4d4 of 4 bytes by task 6672 on cpu 70:
    scan_swap_map_slots+0x4a6/0xb90
    scan_swap_map_slots at mm/swapfile.c:892
    get_swap_pages+0x39d/0x5c0
    get_swap_page+0xf2/0x524
    add_to_swap+0xe4/0x1c0
    shrink_page_list+0x1795/0x2870
    shrink_inactive_list+0x316/0x880
    shrink_lruvec+0x8dc/0x1380
    shrink_node+0x317/0xd80
    do_try_to_free_pages+0x1f7/0xa10
    try_to_free_pages+0x26c/0x5e0
    __alloc_pages_slowpath+0x458/0x1290

    Reported by Kernel Concurrency Sanitizer on:
    CPU: 70 PID: 6672 Comm: oom01 Tainted: G W L 5.5.0-next-20200205+ #3
    Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019

    === si.swap_map[offset] ===

    write to 0xffffbc370c29a64c of 1 bytes by task 6856 on cpu 86:
    __swap_entry_free_locked+0x8c/0x100
    __swap_entry_free_locked at mm/swapfile.c:1209 (discriminator 4)
    __swap_entry_free.constprop.20+0x69/0xb0
    free_swap_and_cache+0x53/0xa0
    unmap_page_range+0x7f8/0x1d70
    unmap_single_vma+0xcd/0x170
    unmap_vmas+0x18b/0x220
    exit_mmap+0xee/0x220
    mmput+0x10e/0x270
    do_exit+0x59b/0xf40
    do_group_exit+0x8b/0x180

    read to 0xffffbc370c29a64c of 1 bytes by task 6855 on cpu 20:
    _swap_info_get+0x81/0xa0
    _swap_info_get at mm/swapfile.c:1140
    free_swap_and_cache+0x40/0xa0
    unmap_page_range+0x7f8/0x1d70
    unmap_single_vma+0xcd/0x170
    unmap_vmas+0x18b/0x220
    exit_mmap+0xee/0x220
    mmput+0x10e/0x270
    do_exit+0x59b/0xf40
    do_group_exit+0x8b/0x180

    === si.flags ===

    write to 0xffff956c8fc6c400 of 8 bytes by task 6087 on cpu 23:
    scan_swap_map_slots+0x6fe/0xb50
    scan_swap_map_slots at mm/swapfile.c:887
    get_swap_pages+0x39d/0x5c0
    get_swap_page+0x377/0x524
    add_to_swap+0xe4/0x1c0
    shrink_page_list+0x1795/0x2870
    shrink_inactive_list+0x316/0x880
    shrink_lruvec+0x8dc/0x1380
    shrink_node+0x317/0xd80
    do_try_to_free_pages+0x1f7/0xa10
    try_to_free_pages+0x26c/0x5e0
    __alloc_pages_slowpath+0x458/0x1290

    read to 0xffff956c8fc6c400 of 8 bytes by task 6207 on cpu 63:
    _swap_info_get+0x41/0xa0
    __swap_info_get at mm/swapfile.c:1114
    put_swap_page+0x84/0x490
    __remove_mapping+0x384/0x5f0
    shrink_page_list+0xff1/0x2870
    shrink_inactive_list+0x316/0x880
    shrink_lruvec+0x8dc/0x1380
    shrink_node+0x317/0xd80
    do_try_to_free_pages+0x1f7/0xa10
    try_to_free_pages+0x26c/0x5e0
    __alloc_pages_slowpath+0x458/0x1290

    The writes are under si->lock but the reads are not. For si.highest_bit
    and si.swap_map[offset], data race could trigger logic bugs, so fix them
    by having WRITE_ONCE() for the writes and READ_ONCE() for the reads
    except those isolated reads where they compare against zero which a data
    race would cause no harm. Thus, annotate them as intentional data races
    using the data_race() macro.

    For si.flags, the readers are only interested in a single bit where a
    data race there would cause no issue there.

    [cai@lca.pw: add a missing annotation for si->flags in memory.c]
    Link: http://lkml.kernel.org/r/1581612647-5958-1-git-send-email-cai@lca.pw

    Signed-off-by: Qian Cai
    Signed-off-by: Andrew Morton
    Cc: Marco Elver
    Cc: Hugh Dickins
    Link: http://lkml.kernel.org/r/1581095163-12198-1-git-send-email-cai@lca.pw
    Signed-off-by: Linus Torvalds

    Qian Cai
     
  • This remoes the code from the COW path to call debug_dma_assert_idle(),
    which was added many years ago.

    Google shows that it hasn't caught anything in the 6+ years we've had it
    apart from a false positive, and Hugh just noticed how it had a very
    unfortunate spinlock serialization in the COW path.

    He fixed that issue the previous commit (a85ffd59bd36: "dma-debug: fix
    debug_dma_assert_idle(), use rcu_read_lock()"), but let's see if anybody
    even notices when we remove this function entirely.

    NOTE! We keep the dma tracking infrastructure that was added by the
    commit that introduced it. Partly to make it easier to resurrect this
    debug code if we ever deside to, and partly because that tracking by pfn
    and offset looks quite reasonable.

    The problem with this debug code was simply that it was expensive and
    didn't seem worth it, not that it was wrong per se.

    Acked-by: Dan Williams
    Acked-by: Christoph Hellwig
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

13 Aug, 2020

6 commits

  • After the cleanup of page fault accounting, gup does not need to pass
    task_struct around any more. Remove that parameter in the whole gup
    stack.

    Signed-off-by: Peter Xu
    Signed-off-by: Andrew Morton
    Reviewed-by: John Hubbard
    Link: http://lkml.kernel.org/r/20200707225021.200906-26-peterx@redhat.com
    Signed-off-by: Linus Torvalds

    Peter Xu
     
  • Here're the last pieces of page fault accounting that were still done
    outside handle_mm_fault() where we still have regs==NULL when calling
    handle_mm_fault():

    arch/powerpc/mm/copro_fault.c: copro_handle_mm_fault
    arch/sparc/mm/fault_32.c: force_user_fault
    arch/um/kernel/trap.c: handle_page_fault
    mm/gup.c: faultin_page
    fixup_user_fault
    mm/hmm.c: hmm_vma_fault
    mm/ksm.c: break_ksm

    Some of them has the issue of duplicated accounting for page fault
    retries. Some of them didn't do the accounting at all.

    This patch cleans all these up by letting handle_mm_fault() to do per-task
    page fault accounting even if regs==NULL (though we'll still skip the perf
    event accountings). With that, we can safely remove all the outliers now.

    There's another functional change in that now we account the page faults
    to the caller of gup, rather than the task_struct that passed into the gup
    code. More information of this can be found at [1].

    After this patch, below things should never be touched again outside
    handle_mm_fault():

    - task_struct.[maj|min]_flt
    - PERF_COUNT_SW_PAGE_FAULTS_[MAJ|MIN]

    [1] https://lore.kernel.org/lkml/CAHk-=wj_V2Tps2QrMn20_W0OJF9xqNh52XSGA42s-ZJ8Y+GyKw@mail.gmail.com/

    Signed-off-by: Peter Xu
    Signed-off-by: Andrew Morton
    Cc: Albert Ou
    Cc: Alexander Gordeev
    Cc: Andy Lutomirski
    Cc: Benjamin Herrenschmidt
    Cc: Borislav Petkov
    Cc: Brian Cain
    Cc: Catalin Marinas
    Cc: Christian Borntraeger
    Cc: Chris Zankel
    Cc: Dave Hansen
    Cc: David S. Miller
    Cc: Geert Uytterhoeven
    Cc: Gerald Schaefer
    Cc: Greentime Hu
    Cc: Guo Ren
    Cc: Heiko Carstens
    Cc: Helge Deller
    Cc: H. Peter Anvin
    Cc: Ingo Molnar
    Cc: Ivan Kokshaysky
    Cc: James E.J. Bottomley
    Cc: John Hubbard
    Cc: Jonas Bonn
    Cc: Ley Foon Tan
    Cc: "Luck, Tony"
    Cc: Matt Turner
    Cc: Max Filippov
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Nick Hu
    Cc: Palmer Dabbelt
    Cc: Paul Mackerras
    Cc: Paul Walmsley
    Cc: Pekka Enberg
    Cc: Peter Zijlstra
    Cc: Richard Henderson
    Cc: Rich Felker
    Cc: Russell King
    Cc: Stafford Horne
    Cc: Stefan Kristiansson
    Cc: Thomas Bogendoerfer
    Cc: Thomas Gleixner
    Cc: Vasily Gorbik
    Cc: Vincent Chen
    Cc: Vineet Gupta
    Cc: Will Deacon
    Cc: Yoshinori Sato
    Link: http://lkml.kernel.org/r/20200707225021.200906-25-peterx@redhat.com
    Signed-off-by: Linus Torvalds

    Peter Xu
     
  • Patch series "mm: Page fault accounting cleanups", v5.

    This is v5 of the pf accounting cleanup series. It originates from Gerald
    Schaefer's report on an issue a week ago regarding to incorrect page fault
    accountings for retried page fault after commit 4064b9827063 ("mm: allow
    VM_FAULT_RETRY for multiple times"):

    https://lore.kernel.org/lkml/20200610174811.44b94525@thinkpad/

    What this series did:

    - Correct page fault accounting: we do accounting for a page fault
    (no matter whether it's from #PF handling, or gup, or anything else)
    only with the one that completed the fault. For example, page fault
    retries should not be counted in page fault counters. Same to the
    perf events.

    - Unify definition of PERF_COUNT_SW_PAGE_FAULTS: currently this perf
    event is used in an adhoc way across different archs.

    Case (1): for many archs it's done at the entry of a page fault
    handler, so that it will also cover e.g. errornous faults.

    Case (2): for some other archs, it is only accounted when the page
    fault is resolved successfully.

    Case (3): there're still quite some archs that have not enabled
    this perf event.

    Since this series will touch merely all the archs, we unify this
    perf event to always follow case (1), which is the one that makes most
    sense. And since we moved the accounting into handle_mm_fault, the
    other two MAJ/MIN perf events are well taken care of naturally.

    - Unify definition of "major faults": the definition of "major
    fault" is slightly changed when used in accounting (not
    VM_FAULT_MAJOR). More information in patch 1.

    - Always account the page fault onto the one that triggered the page
    fault. This does not matter much for #PF handlings, but mostly for
    gup. More information on this in patch 25.

    Patchset layout:

    Patch 1: Introduced the accounting in handle_mm_fault(), not enabled.
    Patch 2-23: Enable the new accounting for arch #PF handlers one by one.
    Patch 24: Enable the new accounting for the rest outliers (gup, iommu, etc.)
    Patch 25: Cleanup GUP task_struct pointer since it's not needed any more

    This patch (of 25):

    This is a preparation patch to move page fault accountings into the
    general code in handle_mm_fault(). This includes both the per task
    flt_maj/flt_min counters, and the major/minor page fault perf events. To
    do this, the pt_regs pointer is passed into handle_mm_fault().

    PERF_COUNT_SW_PAGE_FAULTS should still be kept in per-arch page fault
    handlers.

    So far, all the pt_regs pointer that passed into handle_mm_fault() is
    NULL, which means this patch should have no intented functional change.

    Suggested-by: Linus Torvalds
    Signed-off-by: Peter Xu
    Signed-off-by: Andrew Morton
    Cc: Albert Ou
    Cc: Alexander Gordeev
    Cc: Andy Lutomirski
    Cc: Benjamin Herrenschmidt
    Cc: Borislav Petkov
    Cc: Brian Cain
    Cc: Catalin Marinas
    Cc: Christian Borntraeger
    Cc: Chris Zankel
    Cc: Dave Hansen
    Cc: David S. Miller
    Cc: Geert Uytterhoeven
    Cc: Gerald Schaefer
    Cc: Greentime Hu
    Cc: Guo Ren
    Cc: Heiko Carstens
    Cc: Helge Deller
    Cc: H. Peter Anvin
    Cc: Ingo Molnar
    Cc: Ivan Kokshaysky
    Cc: James E.J. Bottomley
    Cc: John Hubbard
    Cc: Jonas Bonn
    Cc: Ley Foon Tan
    Cc: "Luck, Tony"
    Cc: Matt Turner
    Cc: Max Filippov
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Nick Hu
    Cc: Palmer Dabbelt
    Cc: Paul Mackerras
    Cc: Paul Walmsley
    Cc: Pekka Enberg
    Cc: Peter Zijlstra
    Cc: Richard Henderson
    Cc: Rich Felker
    Cc: Russell King
    Cc: Stafford Horne
    Cc: Stefan Kristiansson
    Cc: Thomas Bogendoerfer
    Cc: Thomas Gleixner
    Cc: Vasily Gorbik
    Cc: Vincent Chen
    Cc: Vineet Gupta
    Cc: Will Deacon
    Cc: Yoshinori Sato
    Link: http://lkml.kernel.org/r/20200707225021.200906-1-peterx@redhat.com
    Link: http://lkml.kernel.org/r/20200707225021.200906-2-peterx@redhat.com
    Signed-off-by: Linus Torvalds

    Peter Xu
     
  • Drop the repeated word "to" in two places.

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Reviewed-by: Andrew Morton
    Reviewed-by: Zi Yan
    Link: http://lkml.kernel.org/r/20200801173822.14973-7-rdunlap@infradead.org
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • This patch implements workingset detection for anonymous LRU. All the
    infrastructure is implemented by the previous patches so this patch just
    activates the workingset detection by installing/retrieving the shadow
    entry and adding refault calculation.

    Signed-off-by: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Acked-by: Johannes Weiner
    Acked-by: Vlastimil Babka
    Cc: Hugh Dickins
    Cc: Matthew Wilcox
    Cc: Mel Gorman
    Cc: Michal Hocko
    Cc: Minchan Kim
    Link: http://lkml.kernel.org/r/1595490560-15117-6-git-send-email-iamjoonsoo.kim@lge.com
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     
  • In current implementation, newly created or swap-in anonymous page is
    started on active list. Growing active list results in rebalancing
    active/inactive list so old pages on active list are demoted to inactive
    list. Hence, the page on active list isn't protected at all.

    Following is an example of this situation.

    Assume that 50 hot pages on active list. Numbers denote the number of
    pages on active/inactive list (active | inactive).

    1. 50 hot pages on active list
    50(h) | 0

    2. workload: 50 newly created (used-once) pages
    50(uo) | 50(h)

    3. workload: another 50 newly created (used-once) pages
    50(uo) | 50(uo), swap-out 50(h)

    This patch tries to fix this issue. Like as file LRU, newly created or
    swap-in anonymous pages will be inserted to the inactive list. They are
    promoted to active list if enough reference happens. This simple
    modification changes the above example as following.

    1. 50 hot pages on active list
    50(h) | 0

    2. workload: 50 newly created (used-once) pages
    50(h) | 50(uo)

    3. workload: another 50 newly created (used-once) pages
    50(h) | 50(uo), swap-out 50(uo)

    As you can see, hot pages on active list would be protected.

    Note that, this implementation has a drawback that the page cannot be
    promoted and will be swapped-out if re-access interval is greater than the
    size of inactive list but less than the size of total(active+inactive).
    To solve this potential issue, following patch will apply workingset
    detection similar to the one that's already applied to file LRU.

    Signed-off-by: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Acked-by: Johannes Weiner
    Acked-by: Vlastimil Babka
    Cc: Hugh Dickins
    Cc: Matthew Wilcox
    Cc: Mel Gorman
    Cc: Michal Hocko
    Cc: Minchan Kim
    Link: http://lkml.kernel.org/r/1595490560-15117-3-git-send-email-iamjoonsoo.kim@lge.com
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     

08 Aug, 2020

2 commits

  • This function implicitly assumes that the addr passed in is page aligned.
    A non page aligned addr could ultimately cause a kernel bug in
    remap_pte_range as the exit condition in the logic loop may never be
    satisfied. This patch documents the need for the requirement, as well as
    explicitly adds a check for it.

    Signed-off-by: Alex Zhang
    Signed-off-by: Andrew Morton
    Reviewed-by: Andrew Morton
    Link: http://lkml.kernel.org/r/20200617233512.177519-1-zhangalex@google.com
    Signed-off-by: Linus Torvalds

    Alex Zhang
     
  • In zap_pte_range(), the check for non_swap_entry() and
    is_device_private_entry() is unnecessary since the latter is sufficient to
    determine if the page is a device private page. Remove the test for
    non_swap_entry() to simplify the code and for clarity.

    Signed-off-by: Ralph Campbell
    Signed-off-by: Andrew Morton
    Reviewed-by: Jason Gunthorpe
    Acked-by: David Hildenbrand
    Link: http://lkml.kernel.org/r/20200615175405.4613-1-rcampbell@nvidia.com
    Signed-off-by: Linus Torvalds

    Ralph Campbell
     

05 Aug, 2020

1 commit

  • Pull uninitialized_var() macro removal from Kees Cook:
    "This is long overdue, and has hidden too many bugs over the years. The
    series has several "by hand" fixes, and then a trivial treewide
    replacement.

    - Clean up non-trivial uses of uninitialized_var()

    - Update documentation and checkpatch for uninitialized_var() removal

    - Treewide removal of uninitialized_var()"

    * tag 'uninit-macro-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
    compiler: Remove uninitialized_var() macro
    treewide: Remove uninitialized_var() usage
    checkpatch: Remove awareness of uninitialized_var() macro
    mm/debug_vm_pgtable: Remove uninitialized_var() usage
    f2fs: Eliminate usage of uninitialized_var() macro
    media: sur40: Remove uninitialized_var() usage
    KVM: PPC: Book3S PR: Remove uninitialized_var() usage
    clk: spear: Remove uninitialized_var() usage
    clk: st: Remove uninitialized_var() usage
    spi: davinci: Remove uninitialized_var() usage
    ide: Remove uninitialized_var() usage
    rtlwifi: rtl8192cu: Remove uninitialized_var() usage
    b43: Remove uninitialized_var() usage
    drbd: Remove uninitialized_var() usage
    x86/mm/numa: Remove uninitialized_var() usage
    docs: deprecated.rst: Add uninitialized_var()

    Linus Torvalds
     

04 Aug, 2020

1 commit

  • Pull arm64 and cross-arch updates from Catalin Marinas:
    "Here's a slightly wider-spread set of updates for 5.9.

    Going outside the usual arch/arm64/ area is the removal of
    read_barrier_depends() series from Will and the MSI/IOMMU ID
    translation series from Lorenzo.

    The notable arm64 updates include ARMv8.4 TLBI range operations and
    translation level hint, time namespace support, and perf.

    Summary:

    - Removal of the tremendously unpopular read_barrier_depends()
    barrier, which is a NOP on all architectures apart from Alpha, in
    favour of allowing architectures to override READ_ONCE() and do
    whatever dance they need to do to ensure address dependencies
    provide LOAD -> LOAD/STORE ordering.

    This work also offers a potential solution if compilers are shown
    to convert LOAD -> LOAD address dependencies into control
    dependencies (e.g. under LTO), as weakly ordered architectures will
    effectively be able to upgrade READ_ONCE() to smp_load_acquire().
    The latter case is not used yet, but will be discussed further at
    LPC.

    - Make the MSI/IOMMU input/output ID translation PCI agnostic,
    augment the MSI/IOMMU ACPI/OF ID mapping APIs to accept an input ID
    bus-specific parameter and apply the resulting changes to the
    device ID space provided by the Freescale FSL bus.

    - arm64 support for TLBI range operations and translation table level
    hints (part of the ARMv8.4 architecture version).

    - Time namespace support for arm64.

    - Export the virtual and physical address sizes in vmcoreinfo for
    makedumpfile and crash utilities.

    - CPU feature handling cleanups and checks for programmer errors
    (overlapping bit-fields).

    - ACPI updates for arm64: disallow AML accesses to EFI code regions
    and kernel memory.

    - perf updates for arm64.

    - Miscellaneous fixes and cleanups, most notably PLT counting
    optimisation for module loading, recordmcount fix to ignore
    relocations other than R_AARCH64_CALL26, CMA areas reserved for
    gigantic pages on 16K and 64K configurations.

    - Trivial typos, duplicate words"

    Link: http://lkml.kernel.org/r/20200710165203.31284-1-will@kernel.org
    Link: http://lkml.kernel.org/r/20200619082013.13661-1-lorenzo.pieralisi@arm.com

    * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (82 commits)
    arm64: use IRQ_STACK_SIZE instead of THREAD_SIZE for irq stack
    arm64/mm: save memory access in check_and_switch_context() fast switch path
    arm64: sigcontext.h: delete duplicated word
    arm64: ptrace.h: delete duplicated word
    arm64: pgtable-hwdef.h: delete duplicated words
    bus: fsl-mc: Add ACPI support for fsl-mc
    bus/fsl-mc: Refactor the MSI domain creation in the DPRC driver
    of/irq: Make of_msi_map_rid() PCI bus agnostic
    of/irq: make of_msi_map_get_device_domain() bus agnostic
    dt-bindings: arm: fsl: Add msi-map device-tree binding for fsl-mc bus
    of/device: Add input id to of_dma_configure()
    of/iommu: Make of_map_rid() PCI agnostic
    ACPI/IORT: Add an input ID to acpi_dma_configure()
    ACPI/IORT: Remove useless PCI bus walk
    ACPI/IORT: Make iort_msi_map_rid() PCI agnostic
    ACPI/IORT: Make iort_get_device_domain IRQ domain agnostic
    ACPI/IORT: Make iort_match_node_callback walk the ACPI namespace for NC
    arm64: enable time namespace support
    arm64/vdso: Restrict splitting VVAR VMA
    arm64/vdso: Handle faults on timens page
    ...

    Linus Torvalds
     

25 Jul, 2020

1 commit

  • clang static analysis reports a garbage return

    In file included from mm/memory.c:84:
    mm/memory.c:1612:2: warning: Undefined or garbage value returned to caller [core.uninitialized.UndefReturn]
    return err;
    ^~~~~~~~~~

    The setting of err depends on a loop executing. So initialize err.

    Signed-off-by: Tom Rix
    Signed-off-by: Andrew Morton
    Link: http://lkml.kernel.org/r/20200703155354.29132-1-trix@redhat.com
    Signed-off-by: Linus Torvalds

    Tom Rix
     

21 Jul, 2020

1 commit


17 Jul, 2020

1 commit

  • Using uninitialized_var() is dangerous as it papers over real bugs[1]
    (or can in the future), and suppresses unrelated compiler warnings
    (e.g. "unused variable"). If the compiler thinks it is uninitialized,
    either simply initialize the variable or make compiler changes.

    In preparation for removing[2] the[3] macro[4], remove all remaining
    needless uses with the following script:

    git grep '\buninitialized_var\b' | cut -d: -f1 | sort -u | \
    xargs perl -pi -e \
    's/\buninitialized_var\(([^\)]+)\)/\1/g;
    s:\s*/\* (GCC be quiet|to make compiler happy) \*/$::g;'

    drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid
    pathological white-space.

    No outstanding warnings were found building allmodconfig with GCC 9.3.0
    for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64,
    alpha, and m68k.

    [1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/
    [2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/
    [3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/
    [4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/

    Reviewed-by: Leon Romanovsky # drivers/infiniband and mlx4/mlx5
    Acked-by: Jason Gunthorpe # IB
    Acked-by: Kalle Valo # wireless drivers
    Reviewed-by: Chao Yu # erofs
    Signed-off-by: Kees Cook

    Kees Cook
     

26 Jun, 2020

3 commits

  • With synchronous IO swap device, swap-in is directly handled in fault
    code. Since IO cost notation isn't added there, with synchronous IO
    swap device, LRU balancing could be wrongly biased. Fix it to count it
    in fault code.

    Link: http://lkml.kernel.org/r/1592288204-27734-4-git-send-email-iamjoonsoo.kim@lge.com
    Fixes: 314b57fb0460001 ("mm: balance LRU lists based on relative thrashing cache sizing")
    Signed-off-by: Joonsoo Kim
    Acked-by: Johannes Weiner
    Cc: Joonsoo Kim
    Cc: Michal Hocko
    Cc: Minchan Kim
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     
  • Calls to pte_offset_map() in vm_insert_pages() are erroneously not
    matched with a call to pte_unmap(). This would cause problems on
    architectures where that is not a no-op.

    This patch does away with the non-traditional locking in the existing
    code, and instead uses pte_offset_map_lock/unlock() as usual,
    incrementing PTE as necessary. The PTE pointer is kept within bounds
    since we clamp it with PTRS_PER_PTE.

    Link: http://lkml.kernel.org/r/20200618220446.20284-1-arjunroy.kdev@gmail.com
    Fixes: 8cd3984d81d5 ("mm/memory.c: add vm_insert_pages()")
    Signed-off-by: Arjun Roy
    Acked-by: David Rientjes
    Cc: Eric Dumazet
    Cc: Hugh Dickins
    Cc: Soheil Hassas Yeganeh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arjun Roy
     
  • do_swap_page() returns error codes from the VM_FAULT* space. try_charge()
    might return -ENOMEM, though, and then do_swap_page() simply returns 0
    which means a success.

    We almost never return ENOMEM for GFP_KERNEL single page charge. Except
    for async OOM handling (oom_disabled v1). So this needs translation to
    VM_FAULT_OOM otherwise the the page fault path will not notify the
    userspace and wait for an action.

    Link: http://lkml.kernel.org/r/20200617090238.GL9499@dhcp22.suse.cz
    Fixes: 4c6355b25e8b ("mm: memcontrol: charge swapin pages on instantiation")
    Signed-off-by: Michal Hocko
    Acked-by: Johannes Weiner
    Cc: Alex Shi
    Cc: Joonsoo Kim
    Cc: Shakeel Butt
    Cc: Hugh Dickins
    Cc: "Kirill A. Shutemov"
    Cc: Roman Gushchin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

10 Jun, 2020

6 commits

  • Convert comments that reference mmap_sem to reference mmap_lock instead.

    [akpm@linux-foundation.org: fix up linux-next leftovers]
    [akpm@linux-foundation.org: s/lockaphore/lock/, per Vlastimil]
    [akpm@linux-foundation.org: more linux-next fixups, per Michel]

    Signed-off-by: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Reviewed-by: Vlastimil Babka
    Reviewed-by: Daniel Jordan
    Cc: Davidlohr Bueso
    Cc: David Rientjes
    Cc: Hugh Dickins
    Cc: Jason Gunthorpe
    Cc: Jerome Glisse
    Cc: John Hubbard
    Cc: Laurent Dufour
    Cc: Liam Howlett
    Cc: Matthew Wilcox
    Cc: Peter Zijlstra
    Cc: Ying Han
    Link: http://lkml.kernel.org/r/20200520052908.204642-13-walken@google.com
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • Convert comments that reference old mmap_sem APIs to reference
    corresponding new mmap locking APIs instead.

    Signed-off-by: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Reviewed-by: Vlastimil Babka
    Reviewed-by: Davidlohr Bueso
    Reviewed-by: Daniel Jordan
    Cc: David Rientjes
    Cc: Hugh Dickins
    Cc: Jason Gunthorpe
    Cc: Jerome Glisse
    Cc: John Hubbard
    Cc: Laurent Dufour
    Cc: Liam Howlett
    Cc: Matthew Wilcox
    Cc: Peter Zijlstra
    Cc: Ying Han
    Link: http://lkml.kernel.org/r/20200520052908.204642-12-walken@google.com
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • Rename the mmap_sem field to mmap_lock. Any new uses of this lock should
    now go through the new mmap locking api. The mmap_lock is still
    implemented as a rwsem, though this could change in the future.

    [akpm@linux-foundation.org: fix it for mm-gup-might_lock_readmmap_sem-in-get_user_pages_fast.patch]

    Signed-off-by: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Reviewed-by: Vlastimil Babka
    Reviewed-by: Davidlohr Bueso
    Reviewed-by: Daniel Jordan
    Cc: David Rientjes
    Cc: Hugh Dickins
    Cc: Jason Gunthorpe
    Cc: Jerome Glisse
    Cc: John Hubbard
    Cc: Laurent Dufour
    Cc: Liam Howlett
    Cc: Matthew Wilcox
    Cc: Peter Zijlstra
    Cc: Ying Han
    Link: http://lkml.kernel.org/r/20200520052908.204642-11-walken@google.com
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • Add new APIs to assert that mmap_sem is held.

    Using this instead of rwsem_is_locked and lockdep_assert_held[_write]
    makes the assertions more tolerant of future changes to the lock type.

    Signed-off-by: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Reviewed-by: Vlastimil Babka
    Reviewed-by: Daniel Jordan
    Cc: Davidlohr Bueso
    Cc: David Rientjes
    Cc: Hugh Dickins
    Cc: Jason Gunthorpe
    Cc: Jerome Glisse
    Cc: John Hubbard
    Cc: Laurent Dufour
    Cc: Liam Howlett
    Cc: Matthew Wilcox
    Cc: Peter Zijlstra
    Cc: Ying Han
    Link: http://lkml.kernel.org/r/20200520052908.204642-10-walken@google.com
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • This change converts the existing mmap_sem rwsem calls to use the new mmap
    locking API instead.

    The change is generated using coccinelle with the following rule:

    // spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir .

    @@
    expression mm;
    @@
    (
    -init_rwsem
    +mmap_init_lock
    |
    -down_write
    +mmap_write_lock
    |
    -down_write_killable
    +mmap_write_lock_killable
    |
    -down_write_trylock
    +mmap_write_trylock
    |
    -up_write
    +mmap_write_unlock
    |
    -downgrade_write
    +mmap_write_downgrade
    |
    -down_read
    +mmap_read_lock
    |
    -down_read_killable
    +mmap_read_lock_killable
    |
    -down_read_trylock
    +mmap_read_trylock
    |
    -up_read
    +mmap_read_unlock
    )
    -(&mm->mmap_sem)
    +(mm)

    Signed-off-by: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Reviewed-by: Daniel Jordan
    Reviewed-by: Laurent Dufour
    Reviewed-by: Vlastimil Babka
    Cc: Davidlohr Bueso
    Cc: David Rientjes
    Cc: Hugh Dickins
    Cc: Jason Gunthorpe
    Cc: Jerome Glisse
    Cc: John Hubbard
    Cc: Liam Howlett
    Cc: Matthew Wilcox
    Cc: Peter Zijlstra
    Cc: Ying Han
    Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.com
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • Patch series "mm: consolidate definitions of page table accessors", v2.

    The low level page table accessors (pXY_index(), pXY_offset()) are
    duplicated across all architectures and sometimes more than once. For
    instance, we have 31 definition of pgd_offset() for 25 supported
    architectures.

    Most of these definitions are actually identical and typically it boils
    down to, e.g.

    static inline unsigned long pmd_index(unsigned long address)
    {
    return (address >> PMD_SHIFT) & (PTRS_PER_PMD - 1);
    }

    static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address)
    {
    return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address);
    }

    These definitions can be shared among 90% of the arches provided
    XYZ_SHIFT, PTRS_PER_XYZ and xyz_page_vaddr() are defined.

    For architectures that really need a custom version there is always
    possibility to override the generic version with the usual ifdefs magic.

    These patches introduce include/linux/pgtable.h that replaces
    include/asm-generic/pgtable.h and add the definitions of the page table
    accessors to the new header.

    This patch (of 12):

    The linux/mm.h header includes to allow inlining of the
    functions involving page table manipulations, e.g. pte_alloc() and
    pmd_alloc(). So, there is no point to explicitly include
    in the files that include .

    The include statements in such cases are remove with a simple loop:

    for f in $(git grep -l "include ") ; do
    sed -i -e '/include / d' $f
    done

    Signed-off-by: Mike Rapoport
    Signed-off-by: Andrew Morton
    Cc: Arnd Bergmann
    Cc: Borislav Petkov
    Cc: Brian Cain
    Cc: Catalin Marinas
    Cc: Chris Zankel
    Cc: "David S. Miller"
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Greg Ungerer
    Cc: Guan Xuetao
    Cc: Guo Ren
    Cc: Heiko Carstens
    Cc: Helge Deller
    Cc: Ingo Molnar
    Cc: Ley Foon Tan
    Cc: Mark Salter
    Cc: Matthew Wilcox
    Cc: Matt Turner
    Cc: Max Filippov
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Mike Rapoport
    Cc: Nick Hu
    Cc: Paul Walmsley
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Russell King
    Cc: Stafford Horne
    Cc: Thomas Bogendoerfer
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Vincent Chen
    Cc: Vineet Gupta
    Cc: Will Deacon
    Cc: Yoshinori Sato
    Link: http://lkml.kernel.org/r/20200514170327.31389-1-rppt@kernel.org
    Link: http://lkml.kernel.org/r/20200514170327.31389-2-rppt@kernel.org
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     

05 Jun, 2020

2 commits

  • There is a comment in typo, fix it.

    Signed-off-by: Ethon Paul
    Signed-off-by: Andrew Morton
    Link: http://lkml.kernel.org/r/20200411004043.14686-1-ethp@qq.com
    Signed-off-by: Linus Torvalds

    Ethon Paul
     
  • There are no architectures that use include/asm-generic/5level-fixup.h
    therefore it can be removed along with __ARCH_HAS_5LEVEL_HACK define and
    the code it surrounds

    Signed-off-by: Mike Rapoport
    Signed-off-by: Andrew Morton
    Cc: Arnd Bergmann
    Cc: Benjamin Herrenschmidt
    Cc: Brian Cain
    Cc: Catalin Marinas
    Cc: Christophe Leroy
    Cc: Fenghua Yu
    Cc: Geert Uytterhoeven
    Cc: Guan Xuetao
    Cc: James Morse
    Cc: Jonas Bonn
    Cc: Julien Thierry
    Cc: Ley Foon Tan
    Cc: Marc Zyngier
    Cc: Michael Ellerman
    Cc: Paul Mackerras
    Cc: Rich Felker
    Cc: Russell King
    Cc: Stafford Horne
    Cc: Stefan Kristiansson
    Cc: Suzuki K Poulose
    Cc: Tony Luck
    Cc: Will Deacon
    Cc: Yoshinori Sato
    Link: http://lkml.kernel.org/r/20200414153455.21744-15-rppt@kernel.org
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     

04 Jun, 2020

8 commits

  • Merge more updates from Andrew Morton:
    "More mm/ work, plenty more to come

    Subsystems affected by this patch series: slub, memcg, gup, kasan,
    pagealloc, hugetlb, vmscan, tools, mempolicy, memblock, hugetlbfs,
    thp, mmap, kconfig"

    * akpm: (131 commits)
    arm64: mm: use ARCH_HAS_DEBUG_WX instead of arch defined
    x86: mm: use ARCH_HAS_DEBUG_WX instead of arch defined
    riscv: support DEBUG_WX
    mm: add DEBUG_WX support
    drivers/base/memory.c: cache memory blocks in xarray to accelerate lookup
    mm/thp: rename pmd_mknotpresent() as pmd_mkinvalid()
    powerpc/mm: drop platform defined pmd_mknotpresent()
    mm: thp: don't need to drain lru cache when splitting and mlocking THP
    hugetlbfs: get unmapped area below TASK_UNMAPPED_BASE for hugetlbfs
    sparc32: register memory occupied by kernel as memblock.memory
    include/linux/memblock.h: fix minor typo and unclear comment
    mm, mempolicy: fix up gup usage in lookup_node
    tools/vm/page_owner_sort.c: filter out unneeded line
    mm: swap: memcg: fix memcg stats for huge pages
    mm: swap: fix vmstats for huge pages
    mm: vmscan: limit the range of LRU type balancing
    mm: vmscan: reclaim writepage is IO cost
    mm: vmscan: determine anon/file pressure balance at the reclaim root
    mm: balance LRU lists based on relative thrashing
    mm: only count actual rotations as LRU reclaim cost
    ...

    Linus Torvalds
     
  • They're the same function, and for the purpose of all callers they are
    equivalent to lru_cache_add().

    [akpm@linux-foundation.org: fix it for local_lock changes]
    Signed-off-by: Johannes Weiner
    Signed-off-by: Andrew Morton
    Reviewed-by: Rik van Riel
    Acked-by: Michal Hocko
    Acked-by: Minchan Kim
    Cc: Joonsoo Kim
    Link: http://lkml.kernel.org/r/20200520232525.798933-5-hannes@cmpxchg.org
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Swapin faults were the last event to charge pages after they had already
    been put on the LRU list. Now that we charge directly on swapin, the
    lrucare portion of the charge code is unused.

    Signed-off-by: Johannes Weiner
    Signed-off-by: Andrew Morton
    Reviewed-by: Joonsoo Kim
    Cc: Alex Shi
    Cc: Hugh Dickins
    Cc: "Kirill A. Shutemov"
    Cc: Michal Hocko
    Cc: Roman Gushchin
    Cc: Balbir Singh
    Cc: Shakeel Butt
    Link: http://lkml.kernel.org/r/20200508183105.225460-19-hannes@cmpxchg.org
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Right now, users that are otherwise memory controlled can easily escape
    their containment and allocate significant amounts of memory that they're
    not being charged for. That's because swap readahead pages are not being
    charged until somebody actually faults them into their page table. This
    can be exploited with MADV_WILLNEED, which triggers arbitrary readahead
    allocations without charging the pages.

    There are additional problems with the delayed charging of swap pages:

    1. To implement refault/workingset detection for anonymous pages, we
    need to have a target LRU available at swapin time, but the LRU is not
    determinable until the page has been charged.

    2. To implement per-cgroup LRU locking, we need page->mem_cgroup to be
    stable when the page is isolated from the LRU; otherwise, the locks
    change under us. But swapcache gets charged after it's already on the
    LRU, and even if we cannot isolate it ourselves (since charging is not
    exactly optional).

    The previous patch ensured we always maintain cgroup ownership records for
    swap pages. This patch moves the swapcache charging point from the fault
    handler to swapin time to fix all of the above problems.

    v2: simplify swapin error checking (Joonsoo)

    [hughd@google.com: fix livelock in __read_swap_cache_async()]
    Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2005212246080.8458@eggly.anvils
    Signed-off-by: Johannes Weiner
    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Reviewed-by: Alex Shi
    Cc: Hugh Dickins
    Cc: Joonsoo Kim
    Cc: "Kirill A. Shutemov"
    Cc: Michal Hocko
    Cc: Roman Gushchin
    Cc: Shakeel Butt
    Cc: Balbir Singh
    Cc: Rafael Aquini
    Cc: Alex Shi
    Link: http://lkml.kernel.org/r/20200508183105.225460-17-hannes@cmpxchg.org
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • With the page->mapping requirement gone from memcg, we can charge anon and
    file-thp pages in one single step, right after they're allocated.

    This removes two out of three API calls - especially the tricky commit
    step that needed to happen at just the right time between when the page is
    "set up" and when it's "published" - somewhat vague and fluid concepts
    that varied by page type. All we need is a freshly allocated page and a
    memcg context to charge.

    v2: prevent double charges on pre-allocated hugepages in khugepaged

    [hannes@cmpxchg.org: Fix crash - *hpage could be ERR_PTR instead of NULL]
    Link: http://lkml.kernel.org/r/20200512215813.GA487759@cmpxchg.org
    Signed-off-by: Johannes Weiner
    Signed-off-by: Andrew Morton
    Reviewed-by: Joonsoo Kim
    Cc: Alex Shi
    Cc: Hugh Dickins
    Cc: "Kirill A. Shutemov"
    Cc: Michal Hocko
    Cc: Roman Gushchin
    Cc: Shakeel Butt
    Cc: Balbir Singh
    Cc: Qian Cai
    Link: http://lkml.kernel.org/r/20200508183105.225460-13-hannes@cmpxchg.org
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Memcg maintains a private MEMCG_RSS counter. This divergence from the
    generic VM accounting means unnecessary code overhead, and creates a
    dependency for memcg that page->mapping is set up at the time of charging,
    so that page types can be told apart.

    Convert the generic accounting sites to mod_lruvec_page_state and friends
    to maintain the per-cgroup vmstat counter of NR_ANON_MAPPED. We use
    lock_page_memcg() to stabilize page->mem_cgroup during rmap changes, the
    same way we do for NR_FILE_MAPPED.

    With the previous patch removing MEMCG_CACHE and the private NR_SHMEM
    counter, this patch finally eliminates the need to have page->mapping set
    up at charge time. However, we need to have page->mem_cgroup set up by
    the time rmap runs and does the accounting, so switch the commit and the
    rmap callbacks around.

    v2: fix temporary accounting bug by switching rmapcommit (Joonsoo)

    Signed-off-by: Johannes Weiner
    Signed-off-by: Andrew Morton
    Cc: Alex Shi
    Cc: Hugh Dickins
    Cc: Joonsoo Kim
    Cc: "Kirill A. Shutemov"
    Cc: Michal Hocko
    Cc: Roman Gushchin
    Cc: Shakeel Butt
    Cc: Balbir Singh
    Link: http://lkml.kernel.org/r/20200508183105.225460-11-hannes@cmpxchg.org
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • The memcg charging API carries a boolean @compound parameter that tells
    whether the page we're dealing with is a hugepage.
    mem_cgroup_commit_charge() has another boolean @lrucare that indicates
    whether the page needs LRU locking or not while charging. The majority of
    callsites know those parameters at compile time, which results in a lot of
    naked "false, false" argument lists. This makes for cryptic code and is a
    breeding ground for subtle mistakes.

    Thankfully, the huge page state can be inferred from the page itself and
    doesn't need to be passed along. This is safe because charging completes
    before the page is published and somebody may split it.

    Simplify the callsites by removing @compound, and let memcg infer the
    state by using hpage_nr_pages() unconditionally. That function does
    PageTransHuge() to identify huge pages, which also helpfully asserts that
    nobody passes in tail pages by accident.

    The following patches will introduce a new charging API, best not to carry
    over unnecessary weight.

    Signed-off-by: Johannes Weiner
    Signed-off-by: Andrew Morton
    Reviewed-by: Alex Shi
    Reviewed-by: Joonsoo Kim
    Reviewed-by: Shakeel Butt
    Cc: Hugh Dickins
    Cc: "Kirill A. Shutemov"
    Cc: Michal Hocko
    Cc: Roman Gushchin
    Cc: Balbir Singh
    Link: http://lkml.kernel.org/r/20200508183105.225460-4-hannes@cmpxchg.org
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Pull MIPS updates from Thomas Bogendoerfer:

    - added support for MIPSr5 and P5600 cores

    - converted Loongson PCI driver into a PCI host driver using the
    generic PCI framework

    - added emulation of CPUCFG command for Loogonson64 cpus

    - removed of LASAT, PMC MSP71xx and NEC MARKEINS/EMMA

    - ioremap cleanup

    - fix for a race between two threads faulting the same page

    - various cleanups and fixes

    * tag 'mips_5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: (143 commits)
    MIPS: ralink: drop ralink_clk_init for mt7621
    MIPS: ralink: bootrom: mark a function as __init to save some memory
    MIPS: Loongson64: Reorder CPUCFG model match arms
    MIPS: Expose Loongson CPUCFG availability via HWCAP
    MIPS: Loongson64: Guard against future cores without CPUCFG
    MIPS: Fix build warning about "PTR_STR" redefinition
    MIPS: Loongson64: Remove not used pci.c
    MIPS: Loongson64: Define PCI_IOBASE
    MIPS: CPU_LOONGSON2EF need software to maintain cache consistency
    MIPS: DTS: Fix build errors used with various configs
    MIPS: Loongson64: select NO_EXCEPT_FILL
    MIPS: Fix IRQ tracing when call handle_fpe() and handle_msa_fpe()
    MIPS: mm: add page valid judgement in function pte_modify
    mm/memory.c: Add memory read privilege on page fault handling
    mm/memory.c: Update local TLB if PTE entry exists
    MIPS: Do not flush tlb page when updating PTE entry
    MIPS: ingenic: Default to a generic board
    MIPS: ingenic: Add support for GCW Zero prototype
    MIPS: ingenic: DTS: Add memory info of GCW Zero
    MIPS: Loongson64: Switch to generic PCI driver
    ...

    Linus Torvalds
     

03 Jun, 2020

1 commit

  • Since commit 25b2995a35b6 ("mm: remove MEMORY_DEVICE_PUBLIC support"),
    the assignment to 'page' for pte_devmap case has been unnecessary.
    Let's remove it.

    [willy@infradead.org: changelog]
    Signed-off-by: chenqiwu
    Signed-off-by: Andrew Morton
    Reviewed-by: Matthew Wilcox
    Acked-by: Michal Hocko
    Link: http://lkml.kernel.org/r/1587349685-31712-1-git-send-email-qiwuchen55@gmail.com
    Signed-off-by: Linus Torvalds

    chenqiwu
     

27 May, 2020

2 commits

  • Here add pte_sw_mkyoung function to make page readable on MIPS
    platform during page fault handling. This patch improves page
    fault latency about 10% on my MIPS machine with lmbench
    lat_pagefault case.

    It is noop function on other arches, there is no negative
    influence on those architectures.

    Signed-off-by: Bibo Mao
    Acked-by: Andrew Morton
    Signed-off-by: Thomas Bogendoerfer

    Bibo Mao
     
  • If two threads concurrently fault at the same page, the thread that
    won the race updates the PTE and its local TLB. For now, the other
    thread gives up, simply does nothing, and continues.

    It could happen that this second thread triggers another fault, whereby
    it only updates its local TLB while handling the fault. Instead of
    triggering another fault, let's directly update the local TLB of the
    second thread. Function update_mmu_tlb is used here to update local
    TLB on the second thread, and it is defined as empty on other arches.

    Signed-off-by: Bibo Mao
    Acked-by: Andrew Morton
    Signed-off-by: Thomas Bogendoerfer

    Bibo Mao
     

11 Apr, 2020

2 commits

  • Add the ability to insert multiple pages at once to a user VM with lower
    PTE spinlock operations.

    The intention of this patch-set is to reduce atomic ops for tcp zerocopy
    receives, which normally hits the same spinlock multiple times
    consecutively.

    [akpm@linux-foundation.org: pte_alloc() no longer takes the `addr' argument]
    [arjunroy@google.com: add missing page_count() check to vm_insert_pages()]
    Link: http://lkml.kernel.org/r/20200214005929.104481-1-arjunroy.kdev@gmail.com
    [arjunroy@google.com: vm_insert_pages() checks if pte_index defined]
    Link: http://lkml.kernel.org/r/20200228054714.204424-2-arjunroy.kdev@gmail.com
    Signed-off-by: Arjun Roy
    Signed-off-by: Eric Dumazet
    Signed-off-by: Soheil Hassas Yeganeh
    Signed-off-by: Andrew Morton
    Cc: David Miller
    Cc: Matthew Wilcox
    Cc: Jason Gunthorpe
    Cc: Stephen Rothwell
    Link: http://lkml.kernel.org/r/20200128025958.43490-2-arjunroy.kdev@gmail.com
    Signed-off-by: Linus Torvalds

    Arjun Roy
     
  • Add helper methods for vm_insert_page()/insert_page() to prepare for
    vm_insert_pages(), which batch-inserts pages to reduce spinlock
    operations when inserting multiple consecutive pages into the user page
    table.

    The intention of this patch-set is to reduce atomic ops for tcp zerocopy
    receives, which normally hits the same spinlock multiple times
    consecutively.

    Signed-off-by: Arjun Roy
    Signed-off-by: Eric Dumazet
    Signed-off-by: Soheil Hassas Yeganeh
    Signed-off-by: Andrew Morton
    Cc: David Miller
    Cc: Matthew Wilcox
    Cc: Jason Gunthorpe
    Cc: Stephen Rothwell
    Link: http://lkml.kernel.org/r/20200128025958.43490-1-arjunroy.kdev@gmail.com
    Signed-off-by: Linus Torvalds

    Arjun Roy