15 Oct, 2020

1 commit

  • The :c:type:`foo` only works properly with structs before
    Sphinx 3.x.

    On Sphinx 3.x, structs should now be declared using the
    .. c:struct, and referenced via :c:struct tag.

    As we now have the automarkup.py macro, that automatically
    convert:
    struct foo

    into cross-references, let's get rid of that, solving
    several warnings when building docs with Sphinx 3.x.

    Reviewed-by: André Almeida # blk-mq.rst
    Reviewed-by: Takashi Iwai # sound
    Reviewed-by: Mike Rapoport
    Reviewed-by: Greg Kroah-Hartman
    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     

20 Sep, 2020

1 commit

  • Patch series "mm: fixes to past from future testing".

    Here's a set of independent fixes against 5.9-rc2: prompted by
    testing Alex Shi's "warning on !memcg" and lru_lock series, but
    I think fit for 5.9 - though maybe only the first for stable.

    This patch (of 5):

    In 5.8 some instances of memcg charging in do_swap_page() and unuse_pte()
    were removed, on the understanding that swap cache is now already charged
    at those points; but a case was missed, when ksm_might_need_to_copy() has
    decided it must allocate a substitute page: such pages were never charged.
    Fix it inside ksm_might_need_to_copy().

    This was discovered by Alex Shi's prospective commit "mm/memcg: warning on
    !memcg after readahead page charged".

    But there is a another surprise: this also fixes some rarer uncharged
    PageAnon cases, when KSM is configured in, but has never been activated.
    ksm_might_need_to_copy()'s anon_vma->root and linear_page_index() check
    sometimes catches a case which would need to have been copied if KSM were
    turned on. Or that's my optimistic interpretation (of my own old code),
    but it leaves some doubt as to whether everything is working as intended
    there - might it hint at rare anon ptes which rmap cannot find? A
    question not easily answered: put in the fix for missed memcg charges.

    Cc; Matthew Wilcox

    Fixes: 4c6355b25e8b ("mm: memcontrol: charge swapin pages on instantiation")
    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Reviewed-by: Shakeel Butt
    Acked-by: Johannes Weiner
    Cc: Alex Shi
    Cc: Michal Hocko
    Cc: Mike Kravetz
    Cc: Qian Cai
    Cc: [5.8]
    Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008301343270.5954@eggly.anvils
    Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008301358020.5954@eggly.anvils
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

05 Sep, 2020

2 commits

  • Merge emailed patches from Peter Xu:
    "This is a small series that I picked up from Linus's suggestion to
    simplify cow handling (and also make it more strict) by checking
    against page refcounts rather than mapcounts.

    This makes uffd-wp work again (verified by running upmapsort)"

    Note: this is horrendously bad timing, and making this kind of
    fundamental vm change after -rc3 is not at all how things should work.
    The saving grace is that it really is a a nice simplification:

    8 files changed, 29 insertions(+), 120 deletions(-)

    The reason for the bad timing is that it turns out that commit
    17839856fd58 ("gup: document and work around 'COW can break either way'
    issue" broke not just UFFD functionality (as Peter noticed), but Mikulas
    Patocka also reports that it caused issues for strace when running in a
    DAX environment with ext4 on a persistent memory setup.

    And we can't just revert that commit without re-introducing the original
    issue that is a potential security hole, so making COW stricter (and in
    the process much simpler) is a step to then undoing the forced COW that
    broke other uses.

    Link: https://lore.kernel.org/lkml/alpine.LRH.2.02.2009031328040.6929@file01.intranet.prod.int.rdu2.redhat.com/

    * emailed patches from Peter Xu :
    mm: Add PGREUSE counter
    mm/gup: Remove enfornced COW mechanism
    mm/ksm: Remove reuse_ksm_page()
    mm: do_wp_page() simplification

    Linus Torvalds
     
  • Remove the function as the last reference has gone away with the do_wp_page()
    changes.

    Signed-off-by: Peter Xu
    Signed-off-by: Linus Torvalds

    Peter Xu
     

24 Aug, 2020

1 commit

  • This reverts commit 5c9fa16e8abd342ce04dc830c1ebb2a03abf6c05.

    Since PROT_SAO can still be useful for certain classes of software,
    reintroduce it. Concerns about guest migration for LPARs using SAO
    will be addressed next.

    Signed-off-by: Shawn Anastasio
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/20200821185558.35561-2-shawn@anastas.io

    Shawn Anastasio
     

13 Aug, 2020

1 commit

  • Patch series "mm: Page fault accounting cleanups", v5.

    This is v5 of the pf accounting cleanup series. It originates from Gerald
    Schaefer's report on an issue a week ago regarding to incorrect page fault
    accountings for retried page fault after commit 4064b9827063 ("mm: allow
    VM_FAULT_RETRY for multiple times"):

    https://lore.kernel.org/lkml/20200610174811.44b94525@thinkpad/

    What this series did:

    - Correct page fault accounting: we do accounting for a page fault
    (no matter whether it's from #PF handling, or gup, or anything else)
    only with the one that completed the fault. For example, page fault
    retries should not be counted in page fault counters. Same to the
    perf events.

    - Unify definition of PERF_COUNT_SW_PAGE_FAULTS: currently this perf
    event is used in an adhoc way across different archs.

    Case (1): for many archs it's done at the entry of a page fault
    handler, so that it will also cover e.g. errornous faults.

    Case (2): for some other archs, it is only accounted when the page
    fault is resolved successfully.

    Case (3): there're still quite some archs that have not enabled
    this perf event.

    Since this series will touch merely all the archs, we unify this
    perf event to always follow case (1), which is the one that makes most
    sense. And since we moved the accounting into handle_mm_fault, the
    other two MAJ/MIN perf events are well taken care of naturally.

    - Unify definition of "major faults": the definition of "major
    fault" is slightly changed when used in accounting (not
    VM_FAULT_MAJOR). More information in patch 1.

    - Always account the page fault onto the one that triggered the page
    fault. This does not matter much for #PF handlings, but mostly for
    gup. More information on this in patch 25.

    Patchset layout:

    Patch 1: Introduced the accounting in handle_mm_fault(), not enabled.
    Patch 2-23: Enable the new accounting for arch #PF handlers one by one.
    Patch 24: Enable the new accounting for the rest outliers (gup, iommu, etc.)
    Patch 25: Cleanup GUP task_struct pointer since it's not needed any more

    This patch (of 25):

    This is a preparation patch to move page fault accountings into the
    general code in handle_mm_fault(). This includes both the per task
    flt_maj/flt_min counters, and the major/minor page fault perf events. To
    do this, the pt_regs pointer is passed into handle_mm_fault().

    PERF_COUNT_SW_PAGE_FAULTS should still be kept in per-arch page fault
    handlers.

    So far, all the pt_regs pointer that passed into handle_mm_fault() is
    NULL, which means this patch should have no intented functional change.

    Suggested-by: Linus Torvalds
    Signed-off-by: Peter Xu
    Signed-off-by: Andrew Morton
    Cc: Albert Ou
    Cc: Alexander Gordeev
    Cc: Andy Lutomirski
    Cc: Benjamin Herrenschmidt
    Cc: Borislav Petkov
    Cc: Brian Cain
    Cc: Catalin Marinas
    Cc: Christian Borntraeger
    Cc: Chris Zankel
    Cc: Dave Hansen
    Cc: David S. Miller
    Cc: Geert Uytterhoeven
    Cc: Gerald Schaefer
    Cc: Greentime Hu
    Cc: Guo Ren
    Cc: Heiko Carstens
    Cc: Helge Deller
    Cc: H. Peter Anvin
    Cc: Ingo Molnar
    Cc: Ivan Kokshaysky
    Cc: James E.J. Bottomley
    Cc: John Hubbard
    Cc: Jonas Bonn
    Cc: Ley Foon Tan
    Cc: "Luck, Tony"
    Cc: Matt Turner
    Cc: Max Filippov
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Nick Hu
    Cc: Palmer Dabbelt
    Cc: Paul Mackerras
    Cc: Paul Walmsley
    Cc: Pekka Enberg
    Cc: Peter Zijlstra
    Cc: Richard Henderson
    Cc: Rich Felker
    Cc: Russell King
    Cc: Stafford Horne
    Cc: Stefan Kristiansson
    Cc: Thomas Bogendoerfer
    Cc: Thomas Gleixner
    Cc: Vasily Gorbik
    Cc: Vincent Chen
    Cc: Vineet Gupta
    Cc: Will Deacon
    Cc: Yoshinori Sato
    Link: http://lkml.kernel.org/r/20200707225021.200906-1-peterx@redhat.com
    Link: http://lkml.kernel.org/r/20200707225021.200906-2-peterx@redhat.com
    Signed-off-by: Linus Torvalds

    Peter Xu
     

08 Aug, 2020

1 commit

  • Pull powerpc updates from Michael Ellerman:

    - Add support for (optionally) using queued spinlocks & rwlocks.

    - Support for a new faster system call ABI using the scv instruction on
    Power9 or later.

    - Drop support for the PROT_SAO mmap/mprotect flag as it will be
    unsupported on Power10 and future processors, leaving us with no way
    to implement the functionality it requests. This risks breaking
    userspace, though we believe it is unused in practice.

    - A bug fix for, and then the removal of, our custom stack expansion
    checking. We now allow stack expansion up to the rlimit, like other
    architectures.

    - Remove the remnants of our (previously disabled) topology update
    code, which tried to react to NUMA layout changes on virtualised
    systems, but was prone to crashes and other problems.

    - Add PMU support for Power10 CPUs.

    - A change to our signal trampoline so that we don't unbalance the link
    stack (branch return predictor) in the signal delivery path.

    - Lots of other cleanups, refactorings, smaller features and so on as
    usual.

    Thanks to: Abhishek Goel, Alastair D'Silva, Alexander A. Klimov, Alexey
    Kardashevskiy, Alistair Popple, Andrew Donnellan, Aneesh Kumar K.V, Anju
    T Sudhakar, Anton Blanchard, Arnd Bergmann, Athira Rajeev, Balamuruhan
    S, Bharata B Rao, Bill Wendling, Bin Meng, Cédric Le Goater, Chris
    Packham, Christophe Leroy, Christoph Hellwig, Daniel Axtens, Dan
    Williams, David Lamparter, Desnes A. Nunes do Rosario, Erhard F., Finn
    Thain, Frederic Barrat, Ganesh Goudar, Gautham R. Shenoy, Geoff Levand,
    Greg Kurz, Gustavo A. R. Silva, Hari Bathini, Harish, Imre Kaloz, Joel
    Stanley, Joe Perches, John Crispin, Jordan Niethe, Kajol Jain, Kamalesh
    Babulal, Kees Cook, Laurent Dufour, Leonardo Bras, Li RongQing, Madhavan
    Srinivasan, Mahesh Salgaonkar, Mark Cave-Ayland, Michal Suchanek, Milton
    Miller, Mimi Zohar, Murilo Opsfelder Araujo, Nathan Chancellor, Nathan
    Lynch, Naveen N. Rao, Nayna Jain, Nicholas Piggin, Oliver O'Halloran,
    Palmer Dabbelt, Pedro Miraglia Franco de Carvalho, Philippe Bergheaud,
    Pingfan Liu, Pratik Rajesh Sampat, Qian Cai, Qinglang Miao, Randy
    Dunlap, Ravi Bangoria, Sachin Sant, Sam Bobroff, Sandipan Das, Santosh
    Sivaraj, Satheesh Rajendran, Shirisha Ganta, Sourabh Jain, Srikar
    Dronamraju, Stan Johnson, Stephen Rothwell, Thadeu Lima de Souza
    Cascardo, Thiago Jung Bauermann, Tom Lane, Vaibhav Jain, Vladis Dronov,
    Wei Yongjun, Wen Xiong, YueHaibing.

    * tag 'powerpc-5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (337 commits)
    selftests/powerpc: Fix pkey syscall redefinitions
    powerpc: Fix circular dependency between percpu.h and mmu.h
    powerpc/powernv/sriov: Fix use of uninitialised variable
    selftests/powerpc: Skip vmx/vsx/tar/etc tests on older CPUs
    powerpc/40x: Fix assembler warning about r0
    powerpc/papr_scm: Add support for fetching nvdimm 'fuel-gauge' metric
    powerpc/papr_scm: Fetch nvdimm performance stats from PHYP
    cpuidle: pseries: Fixup exit latency for CEDE(0)
    cpuidle: pseries: Add function to parse extended CEDE records
    cpuidle: pseries: Set the latency-hint before entering CEDE
    selftests/powerpc: Fix online CPU selection
    powerpc/perf: Consolidate perf_callchain_user_[64|32]()
    powerpc/pseries/hotplug-cpu: Remove double free in error path
    powerpc/pseries/mobility: Add pr_debug() for device tree changes
    powerpc/pseries/mobility: Set pr_fmt()
    powerpc/cacheinfo: Warn if cache object chain becomes unordered
    powerpc/cacheinfo: Improve diagnostics about malformed cache lists
    powerpc/cacheinfo: Use name@unit instead of full DT path in debug messages
    powerpc/cacheinfo: Set pr_fmt()
    powerpc: fix function annotations to avoid section mismatch warnings with gcc-10
    ...

    Linus Torvalds
     

21 Jul, 2020

1 commit

  • ISA v3.1 does not support the SAO storage control attribute required to
    implement PROT_SAO. PROT_SAO was used by specialised system software
    (Lx86) that has been discontinued for about 7 years, and is not thought
    to be used elsewhere, so removal should not cause problems.

    We rather remove it than keep support for older processors, because
    live migrating guest partitions to newer processors may not be possible
    if SAO is in use (or worse allowed with silent races).

    - PROT_SAO stays in the uapi header so code using it would still build.
    - arch_validate_prot() is removed, the generic version rejects PROT_SAO
    so applications would get a failure at mmap() time.

    Signed-off-by: Nicholas Piggin
    [mpe: Drop KVM change for the time being]
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/20200703011958.1166620-3-npiggin@gmail.com

    Nicholas Piggin
     

17 Jul, 2020

1 commit

  • Using uninitialized_var() is dangerous as it papers over real bugs[1]
    (or can in the future), and suppresses unrelated compiler warnings
    (e.g. "unused variable"). If the compiler thinks it is uninitialized,
    either simply initialize the variable or make compiler changes.

    In preparation for removing[2] the[3] macro[4], remove all remaining
    needless uses with the following script:

    git grep '\buninitialized_var\b' | cut -d: -f1 | sort -u | \
    xargs perl -pi -e \
    's/\buninitialized_var\(([^\)]+)\)/\1/g;
    s:\s*/\* (GCC be quiet|to make compiler happy) \*/$::g;'

    drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid
    pathological white-space.

    No outstanding warnings were found building allmodconfig with GCC 9.3.0
    for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64,
    alpha, and m68k.

    [1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/
    [2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/
    [3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/
    [4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/

    Reviewed-by: Leon Romanovsky # drivers/infiniband and mlx4/mlx5
    Acked-by: Jason Gunthorpe # IB
    Acked-by: Kalle Valo # wireless drivers
    Reviewed-by: Chao Yu # erofs
    Signed-off-by: Kees Cook

    Kees Cook
     

10 Jun, 2020

3 commits

  • Convert comments that reference mmap_sem to reference mmap_lock instead.

    [akpm@linux-foundation.org: fix up linux-next leftovers]
    [akpm@linux-foundation.org: s/lockaphore/lock/, per Vlastimil]
    [akpm@linux-foundation.org: more linux-next fixups, per Michel]

    Signed-off-by: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Reviewed-by: Vlastimil Babka
    Reviewed-by: Daniel Jordan
    Cc: Davidlohr Bueso
    Cc: David Rientjes
    Cc: Hugh Dickins
    Cc: Jason Gunthorpe
    Cc: Jerome Glisse
    Cc: John Hubbard
    Cc: Laurent Dufour
    Cc: Liam Howlett
    Cc: Matthew Wilcox
    Cc: Peter Zijlstra
    Cc: Ying Han
    Link: http://lkml.kernel.org/r/20200520052908.204642-13-walken@google.com
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • Convert comments that reference old mmap_sem APIs to reference
    corresponding new mmap locking APIs instead.

    Signed-off-by: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Reviewed-by: Vlastimil Babka
    Reviewed-by: Davidlohr Bueso
    Reviewed-by: Daniel Jordan
    Cc: David Rientjes
    Cc: Hugh Dickins
    Cc: Jason Gunthorpe
    Cc: Jerome Glisse
    Cc: John Hubbard
    Cc: Laurent Dufour
    Cc: Liam Howlett
    Cc: Matthew Wilcox
    Cc: Peter Zijlstra
    Cc: Ying Han
    Link: http://lkml.kernel.org/r/20200520052908.204642-12-walken@google.com
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • This change converts the existing mmap_sem rwsem calls to use the new mmap
    locking API instead.

    The change is generated using coccinelle with the following rule:

    // spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir .

    @@
    expression mm;
    @@
    (
    -init_rwsem
    +mmap_init_lock
    |
    -down_write
    +mmap_write_lock
    |
    -down_write_killable
    +mmap_write_lock_killable
    |
    -down_write_trylock
    +mmap_write_trylock
    |
    -up_write
    +mmap_write_unlock
    |
    -downgrade_write
    +mmap_write_downgrade
    |
    -down_read
    +mmap_read_lock
    |
    -down_read_killable
    +mmap_read_lock_killable
    |
    -down_read_trylock
    +mmap_read_trylock
    |
    -up_read
    +mmap_read_unlock
    )
    -(&mm->mmap_sem)
    +(mm)

    Signed-off-by: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Reviewed-by: Daniel Jordan
    Reviewed-by: Laurent Dufour
    Reviewed-by: Vlastimil Babka
    Cc: Davidlohr Bueso
    Cc: David Rientjes
    Cc: Hugh Dickins
    Cc: Jason Gunthorpe
    Cc: Jerome Glisse
    Cc: John Hubbard
    Cc: Liam Howlett
    Cc: Matthew Wilcox
    Cc: Peter Zijlstra
    Cc: Ying Han
    Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.com
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     

05 Jun, 2020

1 commit


22 Apr, 2020

1 commit

  • find_mergeable_vma() can return NULL. In this case, it leads to a crash
    when we access vm_mm(its offset is 0x40) later in write_protect_page.
    And this case did happen on our server. The following call trace is
    captured in kernel 4.19 with the following patch applied and KSM zero
    page enabled on our server.

    commit e86c59b1b12d ("mm/ksm: improve deduplication of zero pages with colouring")

    So add a vma check to fix it.

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
    Oops: 0000 [#1] SMP NOPTI
    CPU: 9 PID: 510 Comm: ksmd Kdump: loaded Tainted: G OE 4.19.36.bsk.9-amd64 #4.19.36.bsk.9
    RIP: try_to_merge_one_page+0xc7/0x760
    Code: 24 58 65 48 33 34 25 28 00 00 00 89 e8 0f 85 a3 06 00 00 48 83 c4
    60 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 8b 46 08 a8 01 75 b8
    8b 44 24 40 4c 8d 7c 24 20 b9 07 00 00 00 4c 89 e6 4c 89 ff 48
    RSP: 0018:ffffadbdd9fffdb0 EFLAGS: 00010246
    RAX: ffffda83ffd4be08 RBX: ffffda83ffd4be40 RCX: 0000002c6e800000
    RDX: 0000000000000000 RSI: ffffda83ffd4be40 RDI: 0000000000000000
    RBP: ffffa11939f02ec0 R08: 0000000094e1a447 R09: 00000000abe76577
    R10: 0000000000000962 R11: 0000000000004e6a R12: 0000000000000000
    R13: ffffda83b1e06380 R14: ffffa18f31f072c0 R15: ffffda83ffd4be40
    FS: 0000000000000000(0000) GS:ffffa0da43b80000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000040 CR3: 0000002c77c0a003 CR4: 00000000007626e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    PKRU: 55555554
    Call Trace:
    ksm_scan_thread+0x115e/0x1960
    kthread+0xf5/0x130
    ret_from_fork+0x1f/0x30

    [songmuchun@bytedance.com: if the vma is out of date, just exit]
    Link: http://lkml.kernel.org/r/20200416025034.29780-1-songmuchun@bytedance.com
    [akpm@linux-foundation.org: add the conventional braces, replace /** with /*]
    Fixes: e86c59b1b12d ("mm/ksm: improve deduplication of zero pages with colouring")
    Co-developed-by: Xiongchun Duan
    Signed-off-by: Muchun Song
    Signed-off-by: Andrew Morton
    Reviewed-by: David Hildenbrand
    Reviewed-by: Kirill Tkhai
    Cc: Hugh Dickins
    Cc: Yang Shi
    Cc: Claudio Imbrenda
    Cc: Markus Elfring
    Cc:
    Link: http://lkml.kernel.org/r/20200416025034.29780-1-songmuchun@bytedance.com
    Link: http://lkml.kernel.org/r/20200414132905.83819-1-songmuchun@bytedance.com
    Signed-off-by: Linus Torvalds

    Muchun Song
     

08 Apr, 2020

2 commits

  • Convert the various /* fallthrough */ comments to the pseudo-keyword
    fallthrough;

    Done via script:
    https://lore.kernel.org/lkml/b56602fcf79f849e733e7b521bb0e17895d390fa.1582230379.git.joe@perches.com/

    Signed-off-by: Joe Perches
    Signed-off-by: Andrew Morton
    Reviewed-by: Gustavo A. R. Silva
    Link: http://lkml.kernel.org/r/f62fea5d10eb0ccfc05d87c242a620c261219b66.camel@perches.com
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • This updates get_user_pages()'s argument in ksm_test_exit()'s comment

    Signed-off-by: Li Chen
    Signed-off-by: Andrew Morton
    Reviewed-by: Andrew Morton
    Link: http://lkml.kernel.org/r/30ac2417-f1c7-f337-0beb-df561295298c@uniontech.com
    Signed-off-by: Linus Torvalds

    Li Chen
     

05 Dec, 2019

1 commit

  • Pull more KVM updates from Paolo Bonzini:

    - PPC secure guest support

    - small x86 cleanup

    - fix for an x86-specific out-of-bounds write on a ioctl (not guest
    triggerable, data not attacker-controlled)

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
    kvm: vmx: Stop wasting a page for guest_msrs
    KVM: x86: fix out-of-bounds write in KVM_GET_EMULATED_CPUID (CVE-2019-19332)
    Documentation: kvm: Fix mention to number of ioctls classes
    powerpc: Ultravisor: Add PPC_UV config option
    KVM: PPC: Book3S HV: Support reset of secure guest
    KVM: PPC: Book3S HV: Handle memory plug/unplug to secure VM
    KVM: PPC: Book3S HV: Radix changes for secure guest
    KVM: PPC: Book3S HV: Shared pages support for secure guests
    KVM: PPC: Book3S HV: Support for running secure guests
    mm: ksm: Export ksm_madvise()
    KVM x86: Move kvm cpuid support out of svm

    Linus Torvalds
     

28 Nov, 2019

1 commit

  • On PEF-enabled POWER platforms that support running of secure guests,
    secure pages of the guest are represented by device private pages
    in the host. Such pages needn't participate in KSM merging. This is
    achieved by using ksm_madvise() call which need to be exported
    since KVM PPC can be a kernel module.

    Signed-off-by: Bharata B Rao
    Acked-by: Hugh Dickins
    Cc: Andrea Arcangeli
    Signed-off-by: Paul Mackerras

    Bharata B Rao
     

23 Nov, 2019

1 commit

  • It's possible to hit the WARN_ON_ONCE(page_mapped(page)) in
    remove_stable_node() when it races with __mmput() and squeezes in
    between ksm_exit() and exit_mmap().

    WARNING: CPU: 0 PID: 3295 at mm/ksm.c:888 remove_stable_node+0x10c/0x150

    Call Trace:
    remove_all_stable_nodes+0x12b/0x330
    run_store+0x4ef/0x7b0
    kernfs_fop_write+0x200/0x420
    vfs_write+0x154/0x450
    ksys_write+0xf9/0x1d0
    do_syscall_64+0x99/0x510
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    Remove the warning as there is nothing scary going on.

    Link: http://lkml.kernel.org/r/20191119131850.5675-1-aryabinin@virtuozzo.com
    Fixes: cbf86cfe04a6 ("ksm: remove old stable nodes more thoroughly")
    Signed-off-by: Andrey Ryabinin
    Acked-by: Hugh Dickins
    Cc: Andrea Arcangeli
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Ryabinin
     

25 Sep, 2019

1 commit

  • Patch series "THP aware uprobe", v13.

    This patchset makes uprobe aware of THPs.

    Currently, when uprobe is attached to text on THP, the page is split by
    FOLL_SPLIT. As a result, uprobe eliminates the performance benefit of
    THP.

    This set makes uprobe THP-aware. Instead of FOLL_SPLIT, we introduces
    FOLL_SPLIT_PMD, which only split PMD for uprobe.

    After all uprobes within the THP are removed, the PTE-mapped pages are
    regrouped as huge PMD.

    This set (plus a few THP patches) is also available at

    https://github.com/liu-song-6/linux/tree/uprobe-thp

    This patch (of 6):

    Move memcmp_pages() to mm/util.c and pages_identical() to mm.h, so that we
    can use them in other files.

    Link: http://lkml.kernel.org/r/20190815164525.1848545-2-songliubraving@fb.com
    Signed-off-by: Song Liu
    Acked-by: Kirill A. Shutemov
    Reviewed-by: Oleg Nesterov
    Cc: Johannes Weiner
    Cc: Matthew Wilcox
    Cc: William Kucharski
    Cc: Srikar Dronamraju
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Song Liu
     

19 Jun, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this work is licensed under the terms of the gnu gpl version 2

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-only

    has been chosen to replace the boilerplate/reference in 48 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Allison Randal
    Reviewed-by: Enrico Weigelt
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190604081204.624030236@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

15 May, 2019

2 commits

  • This updates each existing invalidation to use the correct mmu notifier
    event that represent what is happening to the CPU page table. See the
    patch which introduced the events to see the rational behind this.

    Link: http://lkml.kernel.org/r/20190326164747.24405-7-jglisse@redhat.com
    Signed-off-by: Jérôme Glisse
    Reviewed-by: Ralph Campbell
    Reviewed-by: Ira Weiny
    Cc: Christian König
    Cc: Joonas Lahtinen
    Cc: Jani Nikula
    Cc: Rodrigo Vivi
    Cc: Jan Kara
    Cc: Andrea Arcangeli
    Cc: Peter Xu
    Cc: Felix Kuehling
    Cc: Jason Gunthorpe
    Cc: Ross Zwisler
    Cc: Dan Williams
    Cc: Paolo Bonzini
    Cc: Radim Krcmar
    Cc: Michal Hocko
    Cc: Christian Koenig
    Cc: John Hubbard
    Cc: Arnd Bergmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jérôme Glisse
     
  • CPU page table update can happens for many reasons, not only as a result
    of a syscall (munmap(), mprotect(), mremap(), madvise(), ...) but also as
    a result of kernel activities (memory compression, reclaim, migration,
    ...).

    Users of mmu notifier API track changes to the CPU page table and take
    specific action for them. While current API only provide range of virtual
    address affected by the change, not why the changes is happening.

    This patchset do the initial mechanical convertion of all the places that
    calls mmu_notifier_range_init to also provide the default MMU_NOTIFY_UNMAP
    event as well as the vma if it is know (most invalidation happens against
    a given vma). Passing down the vma allows the users of mmu notifier to
    inspect the new vma page protection.

    The MMU_NOTIFY_UNMAP is always the safe default as users of mmu notifier
    should assume that every for the range is going away when that event
    happens. A latter patch do convert mm call path to use a more appropriate
    events for each call.

    This is done as 2 patches so that no call site is forgotten especialy
    as it uses this following coccinelle patch:

    %vm_mm, E3, E4)
    ...>

    @@
    expression E1, E2, E3, E4;
    identifier FN, VMA;
    @@
    FN(..., struct vm_area_struct *VMA, ...) {
    }

    @@
    expression E1, E2, E3, E4;
    identifier FN, VMA;
    @@
    FN(...) {
    struct vm_area_struct *VMA;
    }

    @@
    expression E1, E2, E3, E4;
    identifier FN;
    @@
    FN(...) {
    }
    ---------------------------------------------------------------------->%

    Applied with:
    spatch --all-includes --sp-file mmu-notifier.spatch fs/proc/task_mmu.c --in-place
    spatch --sp-file mmu-notifier.spatch --dir kernel/events/ --in-place
    spatch --sp-file mmu-notifier.spatch --dir mm --in-place

    Link: http://lkml.kernel.org/r/20190326164747.24405-6-jglisse@redhat.com
    Signed-off-by: Jérôme Glisse
    Reviewed-by: Ralph Campbell
    Reviewed-by: Ira Weiny
    Cc: Christian König
    Cc: Joonas Lahtinen
    Cc: Jani Nikula
    Cc: Rodrigo Vivi
    Cc: Jan Kara
    Cc: Andrea Arcangeli
    Cc: Peter Xu
    Cc: Felix Kuehling
    Cc: Jason Gunthorpe
    Cc: Ross Zwisler
    Cc: Dan Williams
    Cc: Paolo Bonzini
    Cc: Radim Krcmar
    Cc: Michal Hocko
    Cc: Christian Koenig
    Cc: John Hubbard
    Cc: Arnd Bergmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jérôme Glisse
     

06 Mar, 2019

3 commits

  • ksmd needs to search the stable tree to look for the suitable KSM page,
    but the KSM page might be locked for a while due to i.e. KSM page rmap
    walk. Basically it is not a big deal since commit 2c653d0ee2ae ("ksm:
    introduce ksm_max_page_sharing per page deduplication limit"), since
    max_page_sharing limits the number of shared KSM pages.

    But it still sounds not worth waiting for the lock, the page can be
    skip, then try to merge it in the next scan to avoid potential stall if
    its content is still intact.

    Introduce trylock mode to get_ksm_page() to not block on page lock, like
    what try_to_merge_one_page() does. And, define three possible
    operations (nolock, lock and trylock) as enum type to avoid stacking up
    bools and make the code more readable.

    Return -EBUSY if trylock fails, since NULL means not find suitable KSM
    page, which is a valid case.

    With the default max_page_sharing setting (256), there is almost no
    observed change comparing lock vs trylock.

    However, with ksm02 of LTP, the reduced ksmd full scan time can be
    observed, which has set max_page_sharing to 786432. With lock version,
    ksmd may tak 10s - 11s to run two full scans, with trylock version ksmd
    may take 8s - 11s to run two full scans. And, the number of
    pages_sharing and pages_to_scan keep same. Basically, this change has
    no harm.

    [hughd@google.com: fix BUG_ON()]
    Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1902182122280.6914@eggly.anvils
    Link: http://lkml.kernel.org/r/1548793753-62377-1-git-send-email-yang.shi@linux.alibaba.com
    Signed-off-by: Yang Shi
    Signed-off-by: Hugh Dickins
    Suggested-by: John Hubbard
    Reviewed-by: Kirill Tkhai
    Cc: Hugh Dickins
    Cc: Andrea Arcangeli
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yang Shi
     
  • Add an optimization for KSM pages almost in the same way that we have
    for ordinary anonymous pages. If there is a write fault in a page,
    which is mapped to an only pte, and it is not related to swap cache; the
    page may be reused without copying its content.

    [ Note that we do not consider PageSwapCache() pages at least for now,
    since we don't want to complicate __get_ksm_page(), which has nice
    optimization based on this (for the migration case). Currenly it is
    spinning on PageSwapCache() pages, waiting for when they have
    unfreezed counters (i.e., for the migration finish). But we don't want
    to make it also spinning on swap cache pages, which we try to reuse,
    since there is not a very high probability to reuse them. So, for now
    we do not consider PageSwapCache() pages at all. ]

    So in reuse_ksm_page() we check for 1) PageSwapCache() and 2)
    page_stable_node(), to skip a page, which KSM is currently trying to
    link to stable tree. Then we do page_ref_freeze() to prohibit KSM to
    merge one more page into the page, we are reusing. After that, nobody
    can refer to the reusing page: KSM skips !PageSwapCache() pages with
    zero refcount; and the protection against of all other participants is
    the same as for reused ordinary anon pages pte lock, page lock and
    mmap_sem.

    [akpm@linux-foundation.org: replace BUG_ON()s with WARN_ON()s]
    Link: http://lkml.kernel.org/r/154471491016.31352.1168978849911555609.stgit@localhost.localdomain
    Signed-off-by: Kirill Tkhai
    Reviewed-by: Yang Shi
    Cc: "Kirill A. Shutemov"
    Cc: Hugh Dickins
    Cc: Andrea Arcangeli
    Cc: Christian Koenig
    Cc: Claudio Imbrenda
    Cc: Rik van Riel
    Cc: Huang Ying
    Cc: Minchan Kim
    Cc: Kirill Tkhai
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill Tkhai
     
  • Patch series "Replace all open encodings for NUMA_NO_NODE", v3.

    All these places for replacement were found by running the following
    grep patterns on the entire kernel code. Please let me know if this
    might have missed some instances. This might also have replaced some
    false positives. I will appreciate suggestions, inputs and review.

    1. git grep "nid == -1"
    2. git grep "node == -1"
    3. git grep "nid = -1"
    4. git grep "node = -1"

    This patch (of 2):

    At present there are multiple places where invalid node number is
    encoded as -1. Even though implicitly understood it is always better to
    have macros in there. Replace these open encodings for an invalid node
    number with the global macro NUMA_NO_NODE. This helps remove NUMA
    related assumptions like 'invalid node' from various places redirecting
    them to a common definition.

    Link: http://lkml.kernel.org/r/1545127933-10711-2-git-send-email-anshuman.khandual@arm.com
    Signed-off-by: Anshuman Khandual
    Reviewed-by: David Hildenbrand
    Acked-by: Jeff Kirsher [ixgbe]
    Acked-by: Jens Axboe [mtip32xx]
    Acked-by: Vinod Koul [dmaengine.c]
    Acked-by: Michael Ellerman [powerpc]
    Acked-by: Doug Ledford [drivers/infiniband]
    Cc: Joseph Qi
    Cc: Hans Verkuil
    Cc: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Anshuman Khandual
     

29 Dec, 2018

3 commits

  • ksm thread unconditionally sleeps in ksm_scan_thread() after each
    iteration:

    schedule_timeout_interruptible(
    msecs_to_jiffies(ksm_thread_sleep_millisecs))

    The timeout is configured in /sys/kernel/mm/ksm/sleep_millisecs.

    In case of user writes a big value by a mistake, and the thread enters
    into schedule_timeout_interruptible(), it's not possible to cancel the
    sleep by writing a new smaler value; the thread is just sleeping till
    timeout expires.

    The patch fixes the problem by waking the thread each time after the value
    is updated.

    This also may be useful for debug purposes; and also for userspace
    daemons, which change sleep_millisecs value in dependence of system load.

    Link: http://lkml.kernel.org/r/154454107680.3258.3558002210423531566.stgit@localhost.localdomain
    Signed-off-by: Kirill Tkhai
    Acked-by: Cyrill Gorcunov
    Reviewed-by: Andrew Morton
    Cc: Michal Hocko
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill Tkhai
     
  • To avoid having to change many call sites everytime we want to add a
    parameter use a structure to group all parameters for the mmu_notifier
    invalidate_range_start/end cakks. No functional changes with this patch.

    [akpm@linux-foundation.org: coding style fixes]
    Link: http://lkml.kernel.org/r/20181205053628.3210-3-jglisse@redhat.com
    Signed-off-by: Jérôme Glisse
    Acked-by: Christian König
    Acked-by: Jan Kara
    Cc: Matthew Wilcox
    Cc: Ross Zwisler
    Cc: Dan Williams
    Cc: Paolo Bonzini
    Cc: Radim Krcmar
    Cc: Michal Hocko
    Cc: Felix Kuehling
    Cc: Ralph Campbell
    Cc: John Hubbard
    From: Jérôme Glisse
    Subject: mm/mmu_notifier: use structure for invalidate_range_start/end calls v3

    fix build warning in migrate.c when CONFIG_MMU_NOTIFIER=n

    Link: http://lkml.kernel.org/r/20181213171330.8489-3-jglisse@redhat.com
    Signed-off-by: Jérôme Glisse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jérôme Glisse
     
  • Replace jhash2 with xxhash.

    Perf numbers:
    Intel(R) Xeon(R) CPU E5-2420 v2 @ 2.20GHz
    ksm: crc32c hash() 12081 MB/s
    ksm: xxh64 hash() 8770 MB/s
    ksm: xxh32 hash() 4529 MB/s
    ksm: jhash2 hash() 1569 MB/s

    Sioh Lee did some testing:

    crc32c_intel: 1084.10ns
    crc32c (no hardware acceleration): 7012.51ns
    xxhash32: 2227.75ns
    xxhash64: 1413.16ns
    jhash2: 5128.30ns

    As jhash2 always will be slower (for data size like PAGE_SIZE). Don't use
    it in ksm at all.

    Use only xxhash for now, because for using crc32c, cryptoapi must be
    initialized first - that requires some tricky solution to work well in all
    situations.

    Link: http://lkml.kernel.org/r/20181023182554.23464-3-nefelim4ag@gmail.com
    Signed-off-by: Timofey Titovets
    Signed-off-by: leesioh
    Reviewed-by: Pavel Tatashin
    Reviewed-by: Mike Rapoport
    Reviewed-by: Andrew Morton
    Cc: Andrea Arcangeli
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Timofey Titovets
     

23 Aug, 2018

2 commits

  • Commit cafa0010cd51 ("Raise the minimum required gcc version to 4.6")
    recently exposed a brittle part of the build for supporting non-gcc
    compilers.

    Both Clang and ICC define __GNUC__, __GNUC_MINOR__, and
    __GNUC_PATCHLEVEL__ for quick compatibility with code bases that haven't
    added compiler specific checks for __clang__ or __INTEL_COMPILER.

    This is brittle, as they happened to get compatibility by posing as a
    certain version of GCC. This broke when upgrading the minimal version
    of GCC required to build the kernel, to a version above what ICC and
    Clang claim to be.

    Rather than always including compiler-gcc.h then undefining or
    redefining macros in compiler-intel.h or compiler-clang.h, let's
    separate out the compiler specific macro definitions into mutually
    exclusive headers, do more proper compiler detection, and keep shared
    definitions in compiler_types.h.

    Fixes: cafa0010cd51 ("Raise the minimum required gcc version to 4.6")
    Reported-by: Masahiro Yamada
    Suggested-by: Eli Friedman
    Suggested-by: Joe Perches
    Signed-off-by: Nick Desaulniers
    Signed-off-by: Linus Torvalds

    Nick Desaulniers
     
  • page_freeze_refs/page_unfreeze_refs have already been relplaced by
    page_ref_freeze/page_ref_unfreeze , but they are not modified in the
    comments.

    Link: http://lkml.kernel.org/r/1532590226-106038-1-git-send-email-jiang.biao2@zte.com.cn
    Signed-off-by: Jiang Biao
    Acked-by: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiang Biao
     

18 Aug, 2018

2 commits

  • Use new return type vm_fault_t for fault handler. For now, this is just
    documenting that the function returns a VM_FAULT value rather than an
    errno. Once all instances are converted, vm_fault_t will become a
    distinct type.

    Ref-> commit 1c8f422059ae ("mm: change return type to vm_fault_t")

    In this patch all the caller of handle_mm_fault() are changed to return
    vm_fault_t type.

    Link: http://lkml.kernel.org/r/20180617084810.GA6730@jordon-HP-15-Notebook-PC
    Signed-off-by: Souptick Joarder
    Cc: Matthew Wilcox
    Cc: Richard Henderson
    Cc: Tony Luck
    Cc: Matt Turner
    Cc: Vineet Gupta
    Cc: Russell King
    Cc: Catalin Marinas
    Cc: Will Deacon
    Cc: Richard Kuo
    Cc: Geert Uytterhoeven
    Cc: Michal Simek
    Cc: James Hogan
    Cc: Ley Foon Tan
    Cc: Jonas Bonn
    Cc: James E.J. Bottomley
    Cc: Benjamin Herrenschmidt
    Cc: Palmer Dabbelt
    Cc: Yoshinori Sato
    Cc: David S. Miller
    Cc: Richard Weinberger
    Cc: Guan Xuetao
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Cc: "Levin, Alexander (Sasha Levin)"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Souptick Joarder
     
  • This patch is reworked from an earlier patch that Dan has posted:
    https://patchwork.kernel.org/patch/10131727/

    VM_MIXEDMAP is used by dax to direct mm paths like vm_normal_page() that
    the memory page it is dealing with is not typical memory from the linear
    map. The get_user_pages_fast() path, since it does not resolve the vma,
    is already using {pte,pmd}_devmap() as a stand-in for VM_MIXEDMAP, so we
    use that as a VM_MIXEDMAP replacement in some locations. In the cases
    where there is no pte to consult we fallback to using vma_is_dax() to
    detect the VM_MIXEDMAP special case.

    Now that we have explicit driver pfn_t-flag opt-in/opt-out for
    get_user_pages() support for DAX we can stop setting VM_MIXEDMAP. This
    also means we no longer need to worry about safely manipulating vm_flags
    in a future where we support dynamically changing the dax mode of a
    file.

    DAX should also now be supported with madvise_behavior(), vma_merge(),
    and copy_page_range().

    This patch has been tested against ndctl unit test. It has also been
    tested against xfstests commit: 625515d using fake pmem created by
    memmap and no additional issues have been observed.

    Link: http://lkml.kernel.org/r/152847720311.55924.16999195879201817653.stgit@djiang5-desk3.ch.intel.com
    Signed-off-by: Dave Jiang
    Acked-by: Dan Williams
    Cc: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Jiang
     

15 Jun, 2018

1 commit

  • In our armv8a server(QDF2400), I noticed lots of WARN_ON caused by
    PAGE_SIZE unaligned for rmap_item->address under memory pressure
    tests(start 20 guests and run memhog in the host).

    WARNING: CPU: 4 PID: 4641 at virt/kvm/arm/mmu.c:1826 kvm_age_hva_handler+0xc0/0xc8
    CPU: 4 PID: 4641 Comm: memhog Tainted: G W 4.17.0-rc3+ #8
    Call trace:
    kvm_age_hva_handler+0xc0/0xc8
    handle_hva_to_gpa+0xa8/0xe0
    kvm_age_hva+0x4c/0xe8
    kvm_mmu_notifier_clear_flush_young+0x54/0x98
    __mmu_notifier_clear_flush_young+0x6c/0xa0
    page_referenced_one+0x154/0x1d8
    rmap_walk_ksm+0x12c/0x1d0
    rmap_walk+0x94/0xa0
    page_referenced+0x194/0x1b0
    shrink_page_list+0x674/0xc28
    shrink_inactive_list+0x26c/0x5b8
    shrink_node_memcg+0x35c/0x620
    shrink_node+0x100/0x430
    do_try_to_free_pages+0xe0/0x3a8
    try_to_free_pages+0xe4/0x230
    __alloc_pages_nodemask+0x564/0xdc0
    alloc_pages_vma+0x90/0x228
    do_anonymous_page+0xc8/0x4d0
    __handle_mm_fault+0x4a0/0x508
    handle_mm_fault+0xf8/0x1b0
    do_page_fault+0x218/0x4b8
    do_translation_fault+0x90/0xa0
    do_mem_abort+0x68/0xf0
    el0_da+0x24/0x28

    In rmap_walk_ksm, the rmap_item->address might still have the
    STABLE_FLAG, then the start and end in handle_hva_to_gpa might not be
    PAGE_SIZE aligned. Thus it will cause exceptions in handle_hva_to_gpa
    on arm64.

    This patch fixes it by ignoring (not removing) the low bits of address
    when doing rmap_walk_ksm.

    IMO, it should be backported to stable tree. the storm of WARN_ONs is
    very easy for me to reproduce. More than that, I watched a panic (not
    reproducible) as follows:

    page:ffff7fe003742d80 count:-4871 mapcount:-2126053375 mapping: (null) index:0x0
    flags: 0x1fffc00000000000()
    raw: 1fffc00000000000 0000000000000000 0000000000000000 ffffecf981470000
    raw: dead000000000100 dead000000000200 ffff8017c001c000 0000000000000000
    page dumped because: nonzero _refcount
    CPU: 29 PID: 18323 Comm: qemu-kvm Tainted: G W 4.14.15-5.hxt.aarch64 #1
    Hardware name:
    Call trace:
    dump_backtrace+0x0/0x22c
    show_stack+0x24/0x2c
    dump_stack+0x8c/0xb0
    bad_page+0xf4/0x154
    free_pages_check_bad+0x90/0x9c
    free_pcppages_bulk+0x464/0x518
    free_hot_cold_page+0x22c/0x300
    __put_page+0x54/0x60
    unmap_stage2_range+0x170/0x2b4
    kvm_unmap_hva_handler+0x30/0x40
    handle_hva_to_gpa+0xb0/0xec
    kvm_unmap_hva_range+0x5c/0xd0

    I even injected a fault on purpose in kvm_unmap_hva_range by seting
    size=size-0x200, the call trace is similar as above. So I thought the
    panic is similarly caused by the root cause of WARN_ON.

    Andrea said:

    : It looks a straightforward safe fix, on x86 hva_to_gfn_memslot would
    : zap those bits and hide the misalignment caused by the low metadata
    : bits being erroneously left set in the address, but the arm code
    : notices when that's the last page in the memslot and the hva_end is
    : getting aligned and the size is below one page.
    :
    : I think the problem triggers in the addr += PAGE_SIZE of
    : unmap_stage2_ptes that never matches end because end is aligned but
    : addr is not.
    :
    : } while (pte++, addr += PAGE_SIZE, addr != end);
    :
    : x86 again only works on hva_start/hva_end after converting it to
    : gfn_start/end and that being in pfn units the bits are zapped before
    : they risk to cause trouble.

    Jia He said:

    : I've tested by myself in arm64 server (QDF2400,46 cpus,96G mem) Without
    : this patch, the WARN_ON is very easy for reproducing. After this patch, I
    : have run the same benchmarch for a whole day without any WARN_ONs

    Link: http://lkml.kernel.org/r/1525403506-6750-1-git-send-email-hejianet@gmail.com
    Signed-off-by: Jia He
    Reviewed-by: Andrea Arcangeli
    Tested-by: Jia He
    Cc: Suzuki K Poulose
    Cc: Minchan Kim
    Cc: Claudio Imbrenda
    Cc: Arvind Yadav
    Cc: Mike Rapoport
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jia He
     

08 Jun, 2018

1 commit

  • page_stable_node() and set_page_stable_node() are only used in mm/ksm.c
    and there is no point to keep them in the include/linux/ksm.h

    [akpm@linux-foundation.org: fix SYSFS=n build]
    Link: http://lkml.kernel.org/r/1524552106-7356-3-git-send-email-rppt@linux.vnet.ibm.com
    Signed-off-by: Mike Rapoport
    Reviewed-by: Andrew Morton
    Cc: Andrea Arcangeli
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     

28 Apr, 2018

1 commit


17 Apr, 2018

2 commits

  • Mike Rapoport says:

    These patches convert files in Documentation/vm to ReST format, add an
    initial index and link it to the top level documentation.

    There are no contents changes in the documentation, except few spelling
    fixes. The relatively large diffstat stems from the indentation and
    paragraph wrapping changes.

    I've tried to keep the formatting as consistent as possible, but I could
    miss some places that needed markup and add some markup where it was not
    necessary.

    [jc: significant conflicts in vm/hmm.rst]

    Jonathan Corbet
     
  • Signed-off-by: Mike Rapoport
    Signed-off-by: Jonathan Corbet

    Mike Rapoport
     

12 Apr, 2018

1 commit

  • When using KSM with use_zero_pages, we replace anonymous pages
    containing only zeroes with actual zero pages, which are not anonymous.
    We need to do proper accounting of the mm counters, otherwise we will
    get wrong values in /proc and a BUG message in dmesg when tearing down
    the mm.

    Link: http://lkml.kernel.org/r/1522931274-15552-1-git-send-email-imbrenda@linux.vnet.ibm.com
    Fixes: e86c59b1b1 ("mm/ksm: improve deduplication of zero pages with colouring")
    Signed-off-by: Claudio Imbrenda
    Reviewed-by: Andrew Morton
    Cc: Andrea Arcangeli
    Cc: Minchan Kim
    Cc: Kirill A. Shutemov
    Cc: Hugh Dickins
    Cc: Christian Borntraeger
    Cc: Gerald Schaefer
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Claudio Imbrenda
     

06 Apr, 2018

1 commit

  • This patch fixes a corner case for KSM. When two pages belong or
    belonged to the same transparent hugepage, and they should be merged,
    KSM fails to split the page, and therefore no merging happens.

    This bug can be reproduced by:
    * making sure ksm is running (in case disabling ksmtuned)
    * enabling transparent hugepages
    * allocating a THP-aligned 1-THP-sized buffer
    e.g. on amd64: posix_memalign(&p, 1<<<<
    Co-authored-by: Gerald Schaefer
    Reviewed-by: Andrew Morton
    Cc: Andrea Arcangeli
    Cc: Minchan Kim
    Cc: Kirill A. Shutemov
    Cc: Hugh Dickins
    Cc: Christian Borntraeger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Claudio Imbrenda