25 Sep, 2019

2 commits

  • This patch is (hopefully) the first step to enable THP for non-shmem
    filesystems.

    This patch enables an application to put part of its text sections to THP
    via madvise, for example:

    madvise((void *)0x600000, 0x200000, MADV_HUGEPAGE);

    We tried to reuse the logic for THP on tmpfs.

    Currently, write is not supported for non-shmem THP. khugepaged will only
    process vma with VM_DENYWRITE. sys_mmap() ignores VM_DENYWRITE requests
    (see ksys_mmap_pgoff). The only way to create vma with VM_DENYWRITE is
    execve(). This requirement limits non-shmem THP to text sections.

    The next patch will handle writes, which would only happen when the all
    the vmas with VM_DENYWRITE are unmapped.

    An EXPERIMENTAL config, READ_ONLY_THP_FOR_FS, is added to gate this
    feature.

    [songliubraving@fb.com: fix build without CONFIG_SHMEM]
    Link: http://lkml.kernel.org/r/F53407FB-96CC-42E8-9862-105C92CC2B98@fb.com
    [songliubraving@fb.com: fix double unlock in collapse_file()]
    Link: http://lkml.kernel.org/r/B960CBFA-8EFC-4DA4-ABC5-1977FFF2CA57@fb.com
    Link: http://lkml.kernel.org/r/20190801184244.3169074-7-songliubraving@fb.com
    Signed-off-by: Song Liu
    Acked-by: Rik van Riel
    Acked-by: Kirill A. Shutemov
    Acked-by: Johannes Weiner
    Cc: Stephen Rothwell
    Cc: Dan Carpenter
    Cc: Hillf Danton
    Cc: Hugh Dickins
    Cc: William Kucharski
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Song Liu
     
  • Patch series "mm: remove quicklist page table caches".

    A while ago Nicholas proposed to remove quicklist page table caches [1].

    I've rebased his patch on the curren upstream and switched ia64 and sh to
    use generic versions of PTE allocation.

    [1] https://lore.kernel.org/linux-mm/20190711030339.20892-1-npiggin@gmail.com

    This patch (of 3):

    Remove page table allocator "quicklists". These have been around for a
    long time, but have not got much traction in the last decade and are only
    used on ia64 and sh architectures.

    The numbers in the initial commit look interesting but probably don't
    apply anymore. If anybody wants to resurrect this it's in the git
    history, but it's unhelpful to have this code and divergent allocator
    behaviour for minor archs.

    Also it might be better to instead make more general improvements to page
    allocator if this is still so slow.

    Link: http://lkml.kernel.org/r/1565250728-21721-2-git-send-email-rppt@linux.ibm.com
    Signed-off-by: Nicholas Piggin
    Signed-off-by: Mike Rapoport
    Cc: Tony Luck
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nicholas Piggin
     

20 Aug, 2019

1 commit

  • CONFIG_MIGRATE_VMA_HELPER guards helpers that are required for proper
    devic private memory support. Remove the option and just check for
    CONFIG_DEVICE_PRIVATE instead.

    Link: https://lore.kernel.org/r/20190814075928.23766-11-hch@lst.de
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Jason Gunthorpe
    Tested-by: Ralph Campbell
    Signed-off-by: Jason Gunthorpe

    Christoph Hellwig
     

08 Aug, 2019

2 commits


17 Jul, 2019

1 commit

  • ARCH_HAS_ZONE_DEVICE is somewhat meaningless in itself, and combined
    with the long-out-of-date comment can lead to the impression than an
    architecture may just enable it (since __add_pages() now "comprehends
    device memory" for itself) and expect things to work.

    In practice, however, ZONE_DEVICE users have little chance of
    functioning correctly without __HAVE_ARCH_PTE_DEVMAP, so let's clean
    that up the same way as ARCH_HAS_PTE_SPECIAL and make it the proper
    dependency so the real situation is clearer.

    Link: http://lkml.kernel.org/r/87554aa78478a02a63f2c4cf60a847279ae3eb3b.1558547956.git.robin.murphy@arm.com
    Signed-off-by: Robin Murphy
    Acked-by: Dan Williams
    Reviewed-by: Ira Weiny
    Acked-by: Oliver O'Halloran
    Reviewed-by: Anshuman Khandual
    Cc: Michael Ellerman
    Cc: Catalin Marinas
    Cc: David Hildenbrand
    Cc: Jerome Glisse
    Cc: Michal Hocko
    Cc: Will Deacon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Robin Murphy
     

15 Jul, 2019

1 commit

  • Pull HMM updates from Jason Gunthorpe:
    "Improvements and bug fixes for the hmm interface in the kernel:

    - Improve clarity, locking and APIs related to the 'hmm mirror'
    feature merged last cycle. In linux-next we now see AMDGPU and
    nouveau to be using this API.

    - Remove old or transitional hmm APIs. These are hold overs from the
    past with no users, or APIs that existed only to manage cross tree
    conflicts. There are still a few more of these cleanups that didn't
    make the merge window cut off.

    - Improve some core mm APIs:
    - export alloc_pages_vma() for driver use
    - refactor into devm_request_free_mem_region() to manage
    DEVICE_PRIVATE resource reservations
    - refactor duplicative driver code into the core dev_pagemap
    struct

    - Remove hmm wrappers of improved core mm APIs, instead have drivers
    use the simplified API directly

    - Remove DEVICE_PUBLIC

    - Simplify the kconfig flow for the hmm users and core code"

    * tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (42 commits)
    mm: don't select MIGRATE_VMA_HELPER from HMM_MIRROR
    mm: remove the HMM config option
    mm: sort out the DEVICE_PRIVATE Kconfig mess
    mm: simplify ZONE_DEVICE page private data
    mm: remove hmm_devmem_add
    mm: remove hmm_vma_alloc_locked_page
    nouveau: use devm_memremap_pages directly
    nouveau: use alloc_page_vma directly
    PCI/P2PDMA: use the dev_pagemap internal refcount
    device-dax: use the dev_pagemap internal refcount
    memremap: provide an optional internal refcount in struct dev_pagemap
    memremap: replace the altmap_valid field with a PGMAP_ALTMAP_VALID flag
    memremap: remove the data field in struct dev_pagemap
    memremap: add a migrate_to_ram method to struct dev_pagemap_ops
    memremap: lift the devmap_enable manipulation into devm_memremap_pages
    memremap: pass a struct dev_pagemap to ->kill and ->cleanup
    memremap: move dev_pagemap callbacks into a separate structure
    memremap: validate the pagemap type passed to devm_memremap_pages
    mm: factor out a devm_request_free_mem_region helper
    mm: export alloc_pages_vma
    ...

    Linus Torvalds
     

13 Jul, 2019

4 commits

  • While only powerpc supports the hugepd case, the code is pretty generic
    and I'd like to keep all GUP internals in one place.

    Link: http://lkml.kernel.org/r/20190625143715.1689-15-hch@lst.de
    Signed-off-by: Christoph Hellwig
    Cc: Andrey Konovalov
    Cc: Benjamin Herrenschmidt
    Cc: David Miller
    Cc: James Hogan
    Cc: Jason Gunthorpe
    Cc: Khalid Aziz
    Cc: Michael Ellerman
    Cc: Nicholas Piggin
    Cc: Paul Burton
    Cc: Paul Mackerras
    Cc: Ralf Baechle
    Cc: Rich Felker
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • Always build mm/gup.c so that we don't have to provide separate nommu
    stubs. Also merge the get_user_pages_fast and __get_user_pages_fast stubs
    when HAVE_FAST_GUP into the main implementations, which will never call
    the fast path if HAVE_FAST_GUP is not set.

    This also ensures the new put_user_pages* helpers are available for nommu,
    as those are currently missing, which would create a problem as soon as we
    actually grew users for it.

    Link: http://lkml.kernel.org/r/20190625143715.1689-13-hch@lst.de
    Signed-off-by: Christoph Hellwig
    Cc: Andrey Konovalov
    Cc: Benjamin Herrenschmidt
    Cc: David Miller
    Cc: James Hogan
    Cc: Jason Gunthorpe
    Cc: Khalid Aziz
    Cc: Michael Ellerman
    Cc: Nicholas Piggin
    Cc: Paul Burton
    Cc: Paul Mackerras
    Cc: Ralf Baechle
    Cc: Rich Felker
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • We only support the generic GUP now, so rename the config option to
    be more clear, and always use the mm/Kconfig definition of the
    symbol and select it from the arch Kconfigs.

    Link: http://lkml.kernel.org/r/20190625143715.1689-11-hch@lst.de
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Khalid Aziz
    Reviewed-by: Jason Gunthorpe
    Cc: Andrey Konovalov
    Cc: Benjamin Herrenschmidt
    Cc: David Miller
    Cc: James Hogan
    Cc: Michael Ellerman
    Cc: Nicholas Piggin
    Cc: Paul Burton
    Cc: Paul Mackerras
    Cc: Ralf Baechle
    Cc: Rich Felker
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • The split low/high access is the only non-READ_ONCE version of gup_get_pte
    that did show up in the various arch implemenations. Lift it to common
    code and drop the ifdef based arch override.

    Link: http://lkml.kernel.org/r/20190625143715.1689-4-hch@lst.de
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Jason Gunthorpe
    Cc: Andrey Konovalov
    Cc: Benjamin Herrenschmidt
    Cc: David Miller
    Cc: James Hogan
    Cc: Khalid Aziz
    Cc: Michael Ellerman
    Cc: Nicholas Piggin
    Cc: Paul Burton
    Cc: Paul Mackerras
    Cc: Ralf Baechle
    Cc: Rich Felker
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     

03 Jul, 2019

4 commits

  • The migrate_vma helper is only used by noveau to migrate device private
    pages around. Other HMM_MIRROR users like amdgpu or infiniband don't
    need it.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Jason Gunthorpe
    Reviewed-by: Dan Williams
    Signed-off-by: Jason Gunthorpe

    Christoph Hellwig
     
  • All the mm/hmm.c code is better keyed off HMM_MIRROR. Also let nouveau
    depend on it instead of the mix of a dummy dependency symbol plus the
    actually selected one. Drop various odd dependencies, as the code is
    pretty portable.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Ira Weiny
    Reviewed-by: Jason Gunthorpe
    Reviewed-by: Dan Williams
    Signed-off-by: Jason Gunthorpe

    Christoph Hellwig
     
  • The ZONE_DEVICE support doesn't depend on anything HMM related, just on
    various bits of arch support as indicated by the architecture. Also
    don't select the option from nouveau as it isn't present in many setups,
    and depend on it instead.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Ira Weiny
    Reviewed-by: Dan Williams
    Signed-off-by: Jason Gunthorpe

    Christoph Hellwig
     
  • The code hasn't been used since it was added to the tree, and doesn't
    appear to actually be usable.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Jason Gunthorpe
    Acked-by: Michal Hocko
    Reviewed-by: Dan Williams
    Tested-by: Dan Williams
    Signed-off-by: Jason Gunthorpe

    Christoph Hellwig
     

02 Jul, 2019

1 commit


15 Jun, 2019

1 commit


09 Jun, 2019

1 commit

  • Mostly due to x86 and acpi conversion, several documentation
    links are still pointing to the old file. Fix them.

    Signed-off-by: Mauro Carvalho Chehab
    Reviewed-by: Wolfram Sang
    Reviewed-by: Sven Van Asbroeck
    Reviewed-by: Bhupesh Sharma
    Acked-by: Mark Brown
    Signed-off-by: Jonathan Corbet

    Mauro Carvalho Chehab
     

21 May, 2019

1 commit


15 May, 2019

6 commits

  • The help describing the memory model selection is outdated. It still says
    that SPARSEMEM is experimental and DISCONTIGMEM is a preferred over
    SPARSEMEM.

    Update the help text for the relevant options:
    * add a generic help for the "Memory Model" prompt
    * add description for FLATMEM
    * reduce the description of DISCONTIGMEM and add a deprecation note
    * prefer SPARSEMEM over DISCONTIGMEM

    Link: http://lkml.kernel.org/r/1556188531-20728-1-git-send-email-rppt@linux.ibm.com
    Signed-off-by: Mike Rapoport
    Acked-by: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • Most architectures do not need the memblock memory after the page
    allocator is initialized, but only few enable ARCH_DISCARD_MEMBLOCK in the
    arch Kconfig.

    Replacing ARCH_DISCARD_MEMBLOCK with ARCH_KEEP_MEMBLOCK and inverting the
    logic makes it clear which architectures actually use memblock after
    system initialization and skips the necessity to add ARCH_DISCARD_MEMBLOCK
    to the architectures that are still missing that option.

    Link: http://lkml.kernel.org/r/1556102150-32517-1-git-send-email-rppt@linux.ibm.com
    Signed-off-by: Mike Rapoport
    Acked-by: Michael Ellerman (powerpc)
    Cc: Russell King
    Cc: Catalin Marinas
    Cc: Will Deacon
    Cc: Richard Kuo
    Cc: Tony Luck
    Cc: Fenghua Yu
    Cc: Geert Uytterhoeven
    Cc: Ralf Baechle
    Cc: Paul Burton
    Cc: James Hogan
    Cc: Ley Foon Tan
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: Yoshinori Sato
    Cc: Rich Felker
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Borislav Petkov
    Cc: "H. Peter Anvin"
    Cc: Eric Biederman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • Add 2 new Kconfig variables that are not used by anyone. I check that
    various make ARCH=somearch allmodconfig do work and do not complain. This
    new Kconfig needs to be added first so that device drivers that depend on
    HMM can be updated.

    Once drivers are updated then I can update the HMM Kconfig to depend on
    this new Kconfig in a followup patch.

    This is about solving Kconfig for HMM given that device driver are
    going through their own tree we want to avoid changing them from the mm
    tree. So plan is:

    1 - Kernel release N add the new Kconfig to mm/Kconfig (this patch)
    2 - Kernel release N+1 update driver to depend on new Kconfig ie
    stop using ARCH_HASH_HMM and start using ARCH_HAS_HMM_MIRROR
    and ARCH_HAS_HMM_DEVICE (one or the other or both depending
    on the driver)
    3 - Kernel release N+2 remove ARCH_HASH_HMM and do final Kconfig
    update in mm/Kconfig

    Link: http://lkml.kernel.org/r/20190417211141.17580-1-jglisse@redhat.com
    Signed-off-by: Jérôme Glisse
    Cc: Guenter Roeck
    Cc: Leon Romanovsky
    Cc: Jason Gunthorpe
    Cc: Ralph Campbell
    Cc: John Hubbard
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jérôme Glisse
     
  • 'default n' is the default value for any bool or tristate Kconfig
    setting so there is no need to write it explicitly.

    Also since commit f467c5640c29 ("kconfig: only write '# CONFIG_FOO
    is not set' for visible symbols") the Kconfig behavior is the same
    regardless of 'default n' being present or not:

    ...
    One side effect of (and the main motivation for) this change is making
    the following two definitions behave exactly the same:

    config FOO
    bool

    config FOO
    bool
    default n

    With this change, neither of these will generate a
    '# CONFIG_FOO is not set' line (assuming FOO isn't selected/implied).
    That might make it clearer to people that a bare 'default n' is
    redundant.
    ...

    Link: http://lkml.kernel.org/r/c3385916-e4d4-37d3-b330-e6b7dff83a52@samsung.com
    Signed-off-by: Bartlomiej Zolnierkiewicz
    Reviewed-by: Andrew Morton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bartlomiej Zolnierkiewicz
     
  • To avoid random config build issue, select mmu notifier when HMM is
    selected. In any cases when HMM get selected it will be by users that
    will also wants the mmu notifier.

    Link: http://lkml.kernel.org/r/20190403193318.16478-2-jglisse@redhat.com
    Signed-off-by: Jérôme Glisse
    Acked-by: Balbir Singh
    Cc: Ralph Campbell
    Cc: John Hubbard
    Cc: Dan Williams
    Cc: Arnd Bergmann
    Cc: Dan Carpenter
    Cc: Ira Weiny
    Cc: Matthew Wilcox
    Cc: Souptick Joarder
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jérôme Glisse
     
  • This condition allows to define alloc_contig_range, so simplify it into a
    more accurate naming.

    Link: http://lkml.kernel.org/r/20190327063626.18421-4-alex@ghiti.fr
    Signed-off-by: Alexandre Ghiti
    Suggested-by: Vlastimil Babka
    Acked-by: Vlastimil Babka
    Cc: Andy Lutomirsky
    Cc: Aneesh Kumar K.V
    Cc: Benjamin Herrenschmidt
    Cc: Borislav Petkov
    Cc: Catalin Marinas
    Cc: Dave Hansen
    Cc: David S. Miller
    Cc: Heiko Carstens
    Cc: "H . Peter Anvin"
    Cc: Ingo Molnar
    Cc: Martin Schwidefsky
    Cc: Michael Ellerman
    Cc: Mike Kravetz
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Rich Felker
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexandre Ghiti
     

29 Dec, 2018

1 commit

  • Replace jhash2 with xxhash.

    Perf numbers:
    Intel(R) Xeon(R) CPU E5-2420 v2 @ 2.20GHz
    ksm: crc32c hash() 12081 MB/s
    ksm: xxh64 hash() 8770 MB/s
    ksm: xxh32 hash() 4529 MB/s
    ksm: jhash2 hash() 1569 MB/s

    Sioh Lee did some testing:

    crc32c_intel: 1084.10ns
    crc32c (no hardware acceleration): 7012.51ns
    xxhash32: 2227.75ns
    xxhash64: 1413.16ns
    jhash2: 5128.30ns

    As jhash2 always will be slower (for data size like PAGE_SIZE). Don't use
    it in ksm at all.

    Use only xxhash for now, because for using crc32c, cryptoapi must be
    initialized first - that requires some tricky solution to work well in all
    situations.

    Link: http://lkml.kernel.org/r/20181023182554.23464-3-nefelim4ag@gmail.com
    Signed-off-by: Timofey Titovets
    Signed-off-by: leesioh
    Reviewed-by: Pavel Tatashin
    Reviewed-by: Mike Rapoport
    Reviewed-by: Andrew Morton
    Cc: Andrea Arcangeli
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Timofey Titovets
     

31 Oct, 2018

2 commits

  • All architecures use memblock for early memory management. There is no need
    for the CONFIG_HAVE_MEMBLOCK configuration option.

    [rppt@linux.vnet.ibm.com: of/fdt: fixup #ifdefs]
    Link: http://lkml.kernel.org/r/20180919103457.GA20545@rapoport-lnx
    [rppt@linux.vnet.ibm.com: csky: fixups after bootmem removal]
    Link: http://lkml.kernel.org/r/20180926112744.GC4628@rapoport-lnx
    [rppt@linux.vnet.ibm.com: remove stale #else and the code it protects]
    Link: http://lkml.kernel.org/r/1538067825-24835-1-git-send-email-rppt@linux.vnet.ibm.com
    Link: http://lkml.kernel.org/r/1536927045-23536-4-git-send-email-rppt@linux.vnet.ibm.com
    Signed-off-by: Mike Rapoport
    Acked-by: Michal Hocko
    Tested-by: Jonathan Cameron
    Cc: Catalin Marinas
    Cc: Chris Zankel
    Cc: "David S. Miller"
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Greg Kroah-Hartman
    Cc: Guan Xuetao
    Cc: Ingo Molnar
    Cc: "James E.J. Bottomley"
    Cc: Jonas Bonn
    Cc: Jonathan Corbet
    Cc: Ley Foon Tan
    Cc: Mark Salter
    Cc: Martin Schwidefsky
    Cc: Matt Turner
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Palmer Dabbelt
    Cc: Paul Burton
    Cc: Richard Kuo
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Russell King
    Cc: Serge Semin
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Vineet Gupta
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • All achitectures select NO_BOOTMEM which essentially becomes 'Y' for any
    kernel configuration and therefore it can be removed.

    [alexander.h.duyck@linux.intel.com: remove now defunct NO_BOOTMEM from depends list for deferred init]
    Link: http://lkml.kernel.org/r/20180925201814.3576.15105.stgit@localhost.localdomain
    Link: http://lkml.kernel.org/r/1536927045-23536-3-git-send-email-rppt@linux.vnet.ibm.com
    Signed-off-by: Mike Rapoport
    Signed-off-by: Alexander Duyck
    Acked-by: Michal Hocko
    Cc: Catalin Marinas
    Cc: Chris Zankel
    Cc: "David S. Miller"
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Greg Kroah-Hartman
    Cc: Guan Xuetao
    Cc: Ingo Molnar
    Cc: "James E.J. Bottomley"
    Cc: Jonas Bonn
    Cc: Jonathan Corbet
    Cc: Ley Foon Tan
    Cc: Mark Salter
    Cc: Martin Schwidefsky
    Cc: Matt Turner
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Palmer Dabbelt
    Cc: Paul Burton
    Cc: Richard Kuo
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Russell King
    Cc: Serge Semin
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Vineet Gupta
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     

21 Oct, 2018

1 commit


21 Sep, 2018

1 commit

  • Deferred struct page init is needed only on systems with large amount of
    physical memory to improve boot performance. 32-bit systems do not
    benefit from this feature.

    Jiri reported a problem where deferred struct pages do not work well with
    x86-32:

    [ 0.035162] Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
    [ 0.035725] Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
    [ 0.036269] Initializing CPU#0
    [ 0.036513] Initializing HighMem for node 0 (00036ffe:0007ffe0)
    [ 0.038459] page:f6780000 is uninitialized and poisoned
    [ 0.038460] raw: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
    [ 0.039509] page dumped because: VM_BUG_ON_PAGE(1 && PageCompound(page))
    [ 0.040038] ------------[ cut here ]------------
    [ 0.040399] kernel BUG at include/linux/page-flags.h:293!
    [ 0.040823] invalid opcode: 0000 [#1] SMP PTI
    [ 0.041166] CPU: 0 PID: 0 Comm: swapper Not tainted 4.19.0-rc1_pt_jiri #9
    [ 0.041694] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
    [ 0.042496] EIP: free_highmem_page+0x64/0x80
    [ 0.042839] Code: 13 46 d8 c1 e8 18 5d 83 e0 03 8d 04 c0 c1 e0 06 ff 80 ec 5f 44 d8 c3 8d b4 26 00 00 00 00 ba 08 65 28 d8 89 d8 e8 fc 71 02 00 0b 8d 76 00 8d bc 27 00 00 00 00 ba d0 b1 26 d8 89 d8 e8 e4 71
    [ 0.044338] EAX: 0000003c EBX: f6780000 ECX: 00000000 EDX: d856cbe8
    [ 0.044868] ESI: 0007ffe0 EDI: d838df20 EBP: d838df00 ESP: d838defc
    [ 0.045372] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210086
    [ 0.045913] CR0: 80050033 CR2: 00000000 CR3: 18556000 CR4: 00040690
    [ 0.046413] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
    [ 0.046913] DR6: fffe0ff0 DR7: 00000400
    [ 0.047220] Call Trace:
    [ 0.047419] add_highpages_with_active_regions+0xbd/0x10d
    [ 0.047854] set_highmem_pages_init+0x5b/0x71
    [ 0.048202] mem_init+0x2b/0x1e8
    [ 0.048460] start_kernel+0x1d2/0x425
    [ 0.048757] i386_start_kernel+0x93/0x97
    [ 0.049073] startup_32_smp+0x164/0x168
    [ 0.049379] Modules linked in:
    [ 0.049626] ---[ end trace 337949378db0abbb ]---

    We free highmem pages before their struct pages are initialized:

    mem_init()
    set_highmem_pages_init()
    add_highpages_with_active_regions()
    free_highmem_page()
    .. Access uninitialized struct page here..

    Because there is no reason to have this feature on 32-bit systems, just
    disable it.

    Link: http://lkml.kernel.org/r/20180831150506.31246-1-pavel.tatashin@microsoft.com
    Fixes: 2e3ca40f03bb ("mm: relax deferred struct page requirements")
    Signed-off-by: Pavel Tatashin
    Reported-by: Jiri Slaby
    Acked-by: Michal Hocko
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Greg Kroah-Hartman

    Pasha Tatashin
     

18 Aug, 2018

3 commits

  • CONFIG_THP_SWAP should depend on CONFIG_SWAP, because it's unreasonable
    to optimize swapping for THP (Transparent Huge Page) without basic
    swapping support.

    In original code, when CONFIG_SWAP=n and CONFIG_THP_SWAP=y,
    split_swap_cluster() will not be built because it is in swapfile.c, but
    it will be called in huge_memory.c. This doesn't trigger a build error
    in practice because the call site is enclosed by PageSwapCache(), which
    is defined to be constant 0 when CONFIG_SWAP=n. But this is fragile and
    should be fixed.

    The comments are fixed too to reflect the latest progress.

    Link: http://lkml.kernel.org/r/20180713021228.439-1-ying.huang@intel.com
    Fixes: 38d8b4e6bdc8 ("mm, THP, swap: delay splitting THP during swap out")
    Signed-off-by: "Huang, Ying"
    Reviewed-by: Dan Williams
    Reviewed-by: Naoya Horiguchi
    Cc: Michal Hocko
    Cc: Johannes Weiner
    Cc: Shaohua Li
    Cc: Hugh Dickins
    Cc: Minchan Kim
    Cc: Rik van Riel
    Cc: Dave Hansen
    Cc: Zi Yan
    Cc: Daniel Jordan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Huang Ying
     
  • Rename new_sparse_init() to sparse_init() which enables it. Delete old
    sparse_init() and all the code that became obsolete with.

    [pasha.tatashin@oracle.com: remove unused sparse_mem_maps_populate_node()]
    Link: http://lkml.kernel.org/r/20180716174447.14529-6-pasha.tatashin@oracle.com
    Link: http://lkml.kernel.org/r/20180712203730.8703-6-pasha.tatashin@oracle.com
    Signed-off-by: Pavel Tatashin
    Tested-by: Michael Ellerman [powerpc]
    Tested-by: Oscar Salvador
    Reviewed-by: Oscar Salvador
    Cc: Pasha Tatashin
    Cc: Abdul Haleem
    Cc: Baoquan He
    Cc: Daniel Jordan
    Cc: Dan Williams
    Cc: Dave Hansen
    Cc: David Rientjes
    Cc: Greg Kroah-Hartman
    Cc: Ingo Molnar
    Cc: Jan Kara
    Cc: Jérôme Glisse
    Cc: "Kirill A. Shutemov"
    Cc: Michal Hocko
    Cc: Souptick Joarder
    Cc: Steven Sistare
    Cc: Vlastimil Babka
    Cc: Wei Yang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Tatashin
     
  • The deferred memory initialization relies on section definitions, e.g
    PAGES_PER_SECTION, that are only available when CONFIG_SPARSEMEM=y on
    most architectures.

    Initially DEFERRED_STRUCT_PAGE_INIT depended on explicit
    ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT configuration option, but since
    the commit 2e3ca40f03bb13709df4 ("mm: relax deferred struct page
    requirements") this requirement was relaxed and now it is possible to
    enable DEFERRED_STRUCT_PAGE_INIT on architectures that support
    DISCONTINGMEM and NO_BOOTMEM which causes build failures.

    For instance, setting SMP=y and DEFERRED_STRUCT_PAGE_INIT=y on arc
    causes the following build failure:

    CC mm/page_alloc.o
    mm/page_alloc.c: In function 'update_defer_init':
    mm/page_alloc.c:321:14: error: 'PAGES_PER_SECTION'
    undeclared (first use in this function); did you mean 'USEC_PER_SEC'?
    (pfn & (PAGES_PER_SECTION - 1)) == 0) {
    ^~~~~~~~~~~~~~~~~
    USEC_PER_SEC
    mm/page_alloc.c:321:14: note: each undeclared identifier is reported only once for each function it appears in
    In file included from include/linux/cache.h:5:0,
    from include/linux/printk.h:9,
    from include/linux/kernel.h:14,
    from include/asm-generic/bug.h:18,
    from arch/arc/include/asm/bug.h:32,
    from include/linux/bug.h:5,
    from include/linux/mmdebug.h:5,
    from include/linux/mm.h:9,
    from mm/page_alloc.c:18:
    mm/page_alloc.c: In function 'deferred_grow_zone':
    mm/page_alloc.c:1624:52: error: 'PAGES_PER_SECTION' undeclared (first use in this function); did you mean 'USEC_PER_SEC'?
    unsigned long nr_pages_needed = ALIGN(1 << order, PAGES_PER_SECTION);
    ^
    include/uapi/linux/kernel.h:11:47: note: in definition of macro '__ALIGN_KERNEL_MASK'
    #define __ALIGN_KERNEL_MASK(x, mask) (((x) + (mask)) & ~(mask))
    ^~~~
    include/linux/kernel.h:58:22: note: in expansion of macro '__ALIGN_KERNEL'
    #define ALIGN(x, a) __ALIGN_KERNEL((x), (a))
    ^~~~~~~~~~~~~~
    mm/page_alloc.c:1624:34: note: in expansion of macro 'ALIGN'
    unsigned long nr_pages_needed = ALIGN(1 << order, PAGES_PER_SECTION);
    ^~~~~
    In file included from include/asm-generic/bug.h:18:0,
    from arch/arc/include/asm/bug.h:32,
    from include/linux/bug.h:5,
    from include/linux/mmdebug.h:5,
    from include/linux/mm.h:9,
    from mm/page_alloc.c:18:
    mm/page_alloc.c: In function 'free_area_init_node':
    mm/page_alloc.c:6379:50: error: 'PAGES_PER_SECTION' undeclared (first use in this function); did you mean 'USEC_PER_SEC'?
    pgdat->static_init_pgcnt = min_t(unsigned long, PAGES_PER_SECTION,
    ^
    include/linux/kernel.h:812:22: note: in definition of macro '__typecheck'
    (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1)))
    ^
    include/linux/kernel.h:836:24: note: in expansion of macro '__safe_cmp'
    __builtin_choose_expr(__safe_cmp(x, y), \
    ^~~~~~~~~~
    include/linux/kernel.h:904:27: note: in expansion of macro '__careful_cmp'
    #define min_t(type, x, y) __careful_cmp((type)(x), (type)(y), static_init_pgcnt = min_t(unsigned long, PAGES_PER_SECTION,
    ^~~~~
    include/linux/kernel.h:836:2: error: first argument to '__builtin_choose_expr' not a constant
    __builtin_choose_expr(__safe_cmp(x, y), \
    ^
    include/linux/kernel.h:904:27: note: in expansion of macro '__careful_cmp'
    #define min_t(type, x, y) __careful_cmp((type)(x), (type)(y), static_init_pgcnt = min_t(unsigned long, PAGES_PER_SECTION,
    ^~~~~
    scripts/Makefile.build:317: recipe for target 'mm/page_alloc.o' failed

    Let's make the DEFERRED_STRUCT_PAGE_INIT explicitly depend on SPARSEMEM
    as the systems that support DISCONTIGMEM do not seem to have that huge
    amounts of memory that would make DEFERRED_STRUCT_PAGE_INIT relevant.

    Link: http://lkml.kernel.org/r/1530279308-24988-1-git-send-email-rppt@linux.vnet.ibm.com
    Signed-off-by: Mike Rapoport
    Acked-by: Michal Hocko
    Reviewed-by: Pavel Tatashin
    Tested-by: Randy Dunlap
    Cc: Pasha Tatashin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     

02 Aug, 2018

1 commit


09 Jun, 2018

2 commits

  • Pull libnvdimm updates from Dan Williams:
    "This adds a user for the new 'bytes-remaining' updates to
    memcpy_mcsafe() that you already received through Ingo via the
    x86-dax- for-linus pull.

    Not included here, but still targeting this cycle, is support for
    handling memory media errors (poison) consumed via userspace dax
    mappings.

    Summary:

    - DAX broke a fundamental assumption of truncate of file mapped
    pages. The truncate path assumed that it is safe to disconnect a
    pinned page from a file and let the filesystem reclaim the physical
    block. With DAX the page is equivalent to the filesystem block.
    Introduce dax_layout_busy_page() to enable filesystems to wait for
    pinned DAX pages to be released. Without this wait a filesystem
    could allocate blocks under active device-DMA to a new file.

    - DAX arranges for the block layer to be bypassed and uses
    dax_direct_access() + copy_to_iter() to satisfy read(2) calls.
    However, the memcpy_mcsafe() facility is available through the pmem
    block driver. In order to safely handle media errors, via the DAX
    block-layer bypass, introduce copy_to_iter_mcsafe().

    - Fix cache management policy relative to the ACPI NFIT Platform
    Capabilities Structure to properly elide cache flushes when they
    are not necessary. The table indicates whether CPU caches are
    power-fail protected. Clarify that a deep flush is always performed
    on REQ_{FUA,PREFLUSH} requests"

    * tag 'libnvdimm-for-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (21 commits)
    dax: Use dax_write_cache* helpers
    libnvdimm, pmem: Do not flush power-fail protected CPU caches
    libnvdimm, pmem: Unconditionally deep flush on *sync
    libnvdimm, pmem: Complete REQ_FLUSH => REQ_PREFLUSH
    acpi, nfit: Remove ecc_unit_size
    dax: dax_insert_mapping_entry always succeeds
    libnvdimm, e820: Register all pmem resources
    libnvdimm: Debug probe times
    linvdimm, pmem: Preserve read-only setting for pmem devices
    x86, nfit_test: Add unit test for memcpy_mcsafe()
    pmem: Switch to copy_to_iter_mcsafe()
    dax: Report bytes remaining in dax_iomap_actor()
    dax: Introduce a ->copy_to_iter dax operation
    uio, lib: Fix CONFIG_ARCH_HAS_UACCESS_MCSAFE compilation
    xfs, dax: introduce xfs_break_dax_layouts()
    xfs: prepare xfs_break_layouts() for another layout type
    xfs: prepare xfs_break_layouts() to be called with XFS_MMAPLOCK_EXCL
    mm, fs, dax: handle layout changes to pinned dax mappings
    mm: fix __gup_device_huge vs unmap
    mm: introduce MEMORY_DEVICE_FS_DAX and CONFIG_DEV_PAGEMAP_OPS
    ...

    Linus Torvalds
     
  • Dan Williams
     

08 Jun, 2018

1 commit

  • Currently the PTE special supports is turned on in per architecture
    header files. Most of the time, it is defined in
    arch/*/include/asm/pgtable.h depending or not on some other per
    architecture static definition.

    This patch introduce a new configuration variable to manage this
    directly in the Kconfig files. It would later replace
    __HAVE_ARCH_PTE_SPECIAL.

    Here notes for some architecture where the definition of
    __HAVE_ARCH_PTE_SPECIAL is not obvious:

    arm
    __HAVE_ARCH_PTE_SPECIAL which is currently defined in
    arch/arm/include/asm/pgtable-3level.h which is included by
    arch/arm/include/asm/pgtable.h when CONFIG_ARM_LPAE is set.
    So select ARCH_HAS_PTE_SPECIAL if ARM_LPAE.

    powerpc
    __HAVE_ARCH_PTE_SPECIAL is defined in 2 files:
    - arch/powerpc/include/asm/book3s/64/pgtable.h
    - arch/powerpc/include/asm/pte-common.h
    The first one is included if (PPC_BOOK3S & PPC64) while the second is
    included in all the other cases.
    So select ARCH_HAS_PTE_SPECIAL all the time.

    sparc:
    __HAVE_ARCH_PTE_SPECIAL is defined if defined(__sparc__) &&
    defined(__arch64__) which are defined through the compiler in
    sparc/Makefile if !SPARC32 which I assume to be if SPARC64.
    So select ARCH_HAS_PTE_SPECIAL if SPARC64

    There is no functional change introduced by this patch.

    Link: http://lkml.kernel.org/r/1523433816-14460-2-git-send-email-ldufour@linux.vnet.ibm.com
    Signed-off-by: Laurent Dufour
    Suggested-by: Jerome Glisse
    Reviewed-by: Jerome Glisse
    Acked-by: David Rientjes
    Cc: Michal Hocko
    Cc: "Aneesh Kumar K . V"
    Cc: Michael Ellerman
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Jonathan Corbet
    Cc: Catalin Marinas
    Cc: Will Deacon
    Cc: Yoshinori Sato
    Cc: Rich Felker
    Cc: David S. Miller
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Vineet Gupta
    Cc: Palmer Dabbelt
    Cc: Albert Ou
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: David Rientjes
    Cc: Robin Murphy
    Cc: Christophe LEROY
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Laurent Dufour
     

05 Jun, 2018

2 commits

  • Pull documentation updates from Jonathan Corbet:
    "There's been a fair amount of work in the docs tree this time around,
    including:

    - Extensive RST conversions and organizational work in the
    memory-management docs thanks to Mike Rapoport.

    - An update of Documentation/features from Andrea Parri and a script
    to keep it updated.

    - Various LICENSES updates from Thomas, along with a script to check
    SPDX tags.

    - Work to fix dangling references to documentation files; this
    involved a fair number of one-liner comment changes outside of
    Documentation/

    ... and the usual list of documentation improvements, typo fixes, etc"

    * tag 'docs-4.18' of git://git.lwn.net/linux: (103 commits)
    Documentation: document hung_task_panic kernel parameter
    docs/admin-guide/mm: add high level concepts overview
    docs/vm: move ksm and transhuge from "user" to "internals" section.
    docs: Use the kerneldoc comments for memalloc_no*()
    doc: document scope NOFS, NOIO APIs
    docs: update kernel versions and dates in tables
    docs/vm: transhuge: split userspace bits to admin-guide/mm/transhuge
    docs/vm: transhuge: minor updates
    docs/vm: transhuge: change sections order
    Documentation: arm: clean up Marvell Berlin family info
    Documentation: gpio: driver: Fix a typo and some odd grammar
    docs: ranoops.rst: fix location of ramoops.txt
    scripts/documentation-file-ref-check: rewrite it in perl with auto-fix mode
    docs: uio-howto.rst: use a code block to solve a warning
    mm, THP, doc: Add document for thp_swpout/thp_swpout_fallback
    w1: w1_io.c: fix a kernel-doc warning
    Documentation/process/posting: wrap text at 80 cols
    docs: admin-guide: add cgroup-v2 documentation
    Revert "Documentation/features/vm: Remove arch support status file for 'pte_special'"
    Documentation: refcount-vs-atomic: Update reference to LKMM doc.
    ...

    Linus Torvalds
     
  • Pull dma-mapping updates from Christoph Hellwig:

    - replace the force_dma flag with a dma_configure bus method. (Nipun
    Gupta, although one patch is іncorrectly attributed to me due to a
    git rebase bug)

    - use GFP_DMA32 more agressively in dma-direct. (Takashi Iwai)

    - remove PCI_DMA_BUS_IS_PHYS and rely on the dma-mapping API to do the
    right thing for bounce buffering.

    - move dma-debug initialization to common code, and apply a few
    cleanups to the dma-debug code.

    - cleanup the Kconfig mess around swiotlb selection

    - swiotlb comment fixup (Yisheng Xie)

    - a trivial swiotlb fix. (Dan Carpenter)

    - support swiotlb on RISC-V. (based on a patch from Palmer Dabbelt)

    - add a new generic dma-noncoherent dma_map_ops implementation and use
    it for arc, c6x and nds32.

    - improve scatterlist validity checking in dma-debug. (Robin Murphy)

    - add a struct device quirk to limit the dma-mask to 32-bit due to
    bridge/system issues, and switch x86 to use it instead of a local
    hack for VIA bridges.

    - handle devices without a dma_mask more gracefully in the dma-direct
    code.

    * tag 'dma-mapping-4.18' of git://git.infradead.org/users/hch/dma-mapping: (48 commits)
    dma-direct: don't crash on device without dma_mask
    nds32: use generic dma_noncoherent_ops
    nds32: implement the unmap_sg DMA operation
    nds32: consolidate DMA cache maintainance routines
    x86/pci-dma: switch the VIA 32-bit DMA quirk to use the struct device flag
    x86/pci-dma: remove the explicit nodac and allowdac option
    x86/pci-dma: remove the experimental forcesac boot option
    Documentation/x86: remove a stray reference to pci-nommu.c
    core, dma-direct: add a flag 32-bit dma limits
    dma-mapping: remove unused gfp_t parameter to arch_dma_alloc_attrs
    dma-debug: check scatterlist segments
    c6x: use generic dma_noncoherent_ops
    arc: use generic dma_noncoherent_ops
    arc: fix arc_dma_{map,unmap}_page
    arc: fix arc_dma_sync_sg_for_{cpu,device}
    arc: simplify arc_dma_sync_single_for_{cpu,device}
    dma-mapping: provide a generic dma-noncoherent implementation
    dma-mapping: simplify Kconfig dependencies
    riscv: add swiotlb support
    riscv: only enable ZONE_DMA32 for 64-bit
    ...

    Linus Torvalds
     

22 May, 2018

1 commit

  • In preparation for fixing dax-dma-vs-unmap issues, filesystems need to
    be able to rely on the fact that they will get wakeups on dev_pagemap
    page-idle events. Introduce MEMORY_DEVICE_FS_DAX and
    generic_dax_page_free() as common indicator / infrastructure for dax
    filesytems to require. With this change there are no users of the
    MEMORY_DEVICE_HOST designation, so remove it.

    The HMM sub-system extended dev_pagemap to arrange a callback when a
    dev_pagemap managed page is freed. Since a dev_pagemap page is free /
    idle when its reference count is 1 it requires an additional branch to
    check the page-type at put_page() time. Given put_page() is a hot-path
    we do not want to incur that check if HMM is not in use, so a static
    branch is used to avoid that overhead when not necessary.

    Now, the FS_DAX implementation wants to reuse this mechanism for
    receiving dev_pagemap ->page_free() callbacks. Rework the HMM-specific
    static-key into a generic mechanism that either HMM or FS_DAX code paths
    can enable.

    For ARCH=um builds, and any other arch that lacks ZONE_DEVICE support,
    care must be taken to compile out the DEV_PAGEMAP_OPS infrastructure.
    However, we still need to support FS_DAX in the FS_DAX_LIMITED case
    implemented by the s390/dcssblk driver.

    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: Michal Hocko
    Reported-by: kbuild test robot
    Reported-by: Thomas Meyer
    Reported-by: Dave Jiang
    Cc: "Jérôme Glisse"
    Reviewed-by: Jan Kara
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Dan Williams

    Dan Williams