02 Dec, 2019

2 commits

  • End a Kconfig help text sentence with a period (aka full stop).

    Link: http://lkml.kernel.org/r/c17f2c75-dc2a-42a4-2229-bb6b489addf2@infradead.org
    Signed-off-by: Randy Dunlap
    Reviewed-by: Andrew Morton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • Adjust indentation from spaces to tab (+optional two spaces) as in
    coding style with command like:

    $ sed -e 's/^ / /' -i */Kconfig

    Link: http://lkml.kernel.org/r/1574306437-28837-1-git-send-email-krzk@kernel.org
    Signed-off-by: Krzysztof Kozlowski
    Reviewed-by: David Hildenbrand
    Cc: Greg Kroah-Hartman
    Cc: Jiri Kosina
    Cc: Masahiro Yamada
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Krzysztof Kozlowski
     

01 Dec, 2019

1 commit

  • Pull hmm updates from Jason Gunthorpe:
    "This is another round of bug fixing and cleanup. This time the focus
    is on the driver pattern to use mmu notifiers to monitor a VA range.
    This code is lifted out of many drivers and hmm_mirror directly into
    the mmu_notifier core and written using the best ideas from all the
    driver implementations.

    This removes many bugs from the drivers and has a very pleasing
    diffstat. More drivers can still be converted, but that is for another
    cycle.

    - A shared branch with RDMA reworking the RDMA ODP implementation

    - New mmu_interval_notifier API. This is focused on the use case of
    monitoring a VA and simplifies the process for drivers

    - A common seq-count locking scheme built into the
    mmu_interval_notifier API usable by drivers that call
    get_user_pages() or hmm_range_fault() with the VA range

    - Conversion of mlx5 ODP, hfi1, radeon, nouveau, AMD GPU, and Xen
    GntDev drivers to the new API. This deletes a lot of wonky driver
    code.

    - Two improvements for hmm_range_fault(), from testing done by Ralph"

    * tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
    mm/hmm: remove hmm_range_dma_map and hmm_range_dma_unmap
    mm/hmm: make full use of walk_page_range()
    xen/gntdev: use mmu_interval_notifier_insert
    mm/hmm: remove hmm_mirror and related
    drm/amdgpu: Use mmu_interval_notifier instead of hmm_mirror
    drm/amdgpu: Use mmu_interval_insert instead of hmm_mirror
    drm/amdgpu: Call find_vma under mmap_sem
    nouveau: use mmu_interval_notifier instead of hmm_mirror
    nouveau: use mmu_notifier directly for invalidate_range_start
    drm/radeon: use mmu_interval_notifier_insert
    RDMA/hfi1: Use mmu_interval_notifier_insert for user_exp_rcv
    RDMA/odp: Use mmu_interval_notifier_insert()
    mm/hmm: define the pre-processor related parts of hmm.h even if disabled
    mm/hmm: allow hmm_range to be used with a mmu_interval_notifier or hmm_mirror
    mm/mmu_notifier: add an interval tree notifier
    mm/mmu_notifier: define the header pre-processor parts even if disabled
    mm/hmm: allow snapshot of the special zero page

    Linus Torvalds
     

24 Nov, 2019

2 commits

  • The only two users of this are now converted to use mmu_interval_notifier,
    delete all the code and update hmm.rst.

    Link: https://lore.kernel.org/r/20191112202231.3856-14-jgg@ziepe.ca
    Reviewed-by: Jérôme Glisse
    Tested-by: Ralph Campbell
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jason Gunthorpe

    Jason Gunthorpe
     
  • Of the 13 users of mmu_notifiers, 8 of them use only
    invalidate_range_start/end() and immediately intersect the
    mmu_notifier_range with some kind of internal list of VAs. 4 use an
    interval tree (i915_gem, radeon_mn, umem_odp, hfi1). 4 use a linked list
    of some kind (scif_dma, vhost, gntdev, hmm)

    And the remaining 5 either don't use invalidate_range_start() or do some
    special thing with it.

    It turns out that building a correct scheme with an interval tree is
    pretty complicated, particularly if the use case is synchronizing against
    another thread doing get_user_pages(). Many of these implementations have
    various subtle and difficult to fix races.

    This approach puts the interval tree as common code at the top of the mmu
    notifier call tree and implements a shareable locking scheme.

    It includes:
    - An interval tree tracking VA ranges, with per-range callbacks
    - A read/write locking scheme for the interval tree that avoids
    sleeping in the notifier path (for OOM killer)
    - A sequence counter based collision-retry locking scheme to tell
    device page fault that a VA range is being concurrently invalidated.

    This is based on various ideas:
    - hmm accumulates invalidated VA ranges and releases them when all
    invalidates are done, via active_invalidate_ranges count.
    This approach avoids having to intersect the interval tree twice (as
    umem_odp does) at the potential cost of a longer device page fault.

    - kvm/umem_odp use a sequence counter to drive the collision retry,
    via invalidate_seq

    - a deferred work todo list on unlock scheme like RTNL, via deferred_list.
    This makes adding/removing interval tree members more deterministic

    - seqlock, except this version makes the seqlock idea multi-holder on the
    write side by protecting it with active_invalidate_ranges and a spinlock

    To minimize MM overhead when only the interval tree is being used, the
    entire SRCU and hlist overheads are dropped using some simple
    branches. Similarly the interval tree overhead is dropped when in hlist
    mode.

    The overhead from the mandatory spinlock is broadly the same as most of
    existing users which already had a lock (or two) of some sort on the
    invalidation path.

    Link: https://lore.kernel.org/r/20191112202231.3856-3-jgg@ziepe.ca
    Acked-by: Christian König
    Tested-by: Philip Yang
    Tested-by: Ralph Campbell
    Reviewed-by: John Hubbard
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jason Gunthorpe

    Jason Gunthorpe
     

06 Nov, 2019

1 commit

  • Add two utilities to 1) write-protect and 2) clean all ptes pointing into
    a range of an address space.
    The utilities are intended to aid in tracking dirty pages (either
    driver-allocated system memory or pci device memory).
    The write-protect utility should be used in conjunction with
    page_mkwrite() and pfn_mkwrite() to trigger write page-faults on page
    accesses. Typically one would want to use this on sparse accesses into
    large memory regions. The clean utility should be used to utilize
    hardware dirtying functionality and avoid the overhead of page-faults,
    typically on large accesses into small memory regions.

    Cc: Andrew Morton
    Cc: Matthew Wilcox
    Cc: Will Deacon
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Minchan Kim
    Cc: Michal Hocko
    Cc: Huang Ying
    Cc: Jérôme Glisse
    Cc: Kirill A. Shutemov
    Signed-off-by: Thomas Hellstrom
    Acked-by: Andrew Morton

    Thomas Hellstrom
     

25 Sep, 2019

2 commits

  • This patch is (hopefully) the first step to enable THP for non-shmem
    filesystems.

    This patch enables an application to put part of its text sections to THP
    via madvise, for example:

    madvise((void *)0x600000, 0x200000, MADV_HUGEPAGE);

    We tried to reuse the logic for THP on tmpfs.

    Currently, write is not supported for non-shmem THP. khugepaged will only
    process vma with VM_DENYWRITE. sys_mmap() ignores VM_DENYWRITE requests
    (see ksys_mmap_pgoff). The only way to create vma with VM_DENYWRITE is
    execve(). This requirement limits non-shmem THP to text sections.

    The next patch will handle writes, which would only happen when the all
    the vmas with VM_DENYWRITE are unmapped.

    An EXPERIMENTAL config, READ_ONLY_THP_FOR_FS, is added to gate this
    feature.

    [songliubraving@fb.com: fix build without CONFIG_SHMEM]
    Link: http://lkml.kernel.org/r/F53407FB-96CC-42E8-9862-105C92CC2B98@fb.com
    [songliubraving@fb.com: fix double unlock in collapse_file()]
    Link: http://lkml.kernel.org/r/B960CBFA-8EFC-4DA4-ABC5-1977FFF2CA57@fb.com
    Link: http://lkml.kernel.org/r/20190801184244.3169074-7-songliubraving@fb.com
    Signed-off-by: Song Liu
    Acked-by: Rik van Riel
    Acked-by: Kirill A. Shutemov
    Acked-by: Johannes Weiner
    Cc: Stephen Rothwell
    Cc: Dan Carpenter
    Cc: Hillf Danton
    Cc: Hugh Dickins
    Cc: William Kucharski
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Song Liu
     
  • Patch series "mm: remove quicklist page table caches".

    A while ago Nicholas proposed to remove quicklist page table caches [1].

    I've rebased his patch on the curren upstream and switched ia64 and sh to
    use generic versions of PTE allocation.

    [1] https://lore.kernel.org/linux-mm/20190711030339.20892-1-npiggin@gmail.com

    This patch (of 3):

    Remove page table allocator "quicklists". These have been around for a
    long time, but have not got much traction in the last decade and are only
    used on ia64 and sh architectures.

    The numbers in the initial commit look interesting but probably don't
    apply anymore. If anybody wants to resurrect this it's in the git
    history, but it's unhelpful to have this code and divergent allocator
    behaviour for minor archs.

    Also it might be better to instead make more general improvements to page
    allocator if this is still so slow.

    Link: http://lkml.kernel.org/r/1565250728-21721-2-git-send-email-rppt@linux.ibm.com
    Signed-off-by: Nicholas Piggin
    Signed-off-by: Mike Rapoport
    Cc: Tony Luck
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nicholas Piggin
     

20 Aug, 2019

1 commit

  • CONFIG_MIGRATE_VMA_HELPER guards helpers that are required for proper
    devic private memory support. Remove the option and just check for
    CONFIG_DEVICE_PRIVATE instead.

    Link: https://lore.kernel.org/r/20190814075928.23766-11-hch@lst.de
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Jason Gunthorpe
    Tested-by: Ralph Campbell
    Signed-off-by: Jason Gunthorpe

    Christoph Hellwig
     

08 Aug, 2019

2 commits


17 Jul, 2019

1 commit

  • ARCH_HAS_ZONE_DEVICE is somewhat meaningless in itself, and combined
    with the long-out-of-date comment can lead to the impression than an
    architecture may just enable it (since __add_pages() now "comprehends
    device memory" for itself) and expect things to work.

    In practice, however, ZONE_DEVICE users have little chance of
    functioning correctly without __HAVE_ARCH_PTE_DEVMAP, so let's clean
    that up the same way as ARCH_HAS_PTE_SPECIAL and make it the proper
    dependency so the real situation is clearer.

    Link: http://lkml.kernel.org/r/87554aa78478a02a63f2c4cf60a847279ae3eb3b.1558547956.git.robin.murphy@arm.com
    Signed-off-by: Robin Murphy
    Acked-by: Dan Williams
    Reviewed-by: Ira Weiny
    Acked-by: Oliver O'Halloran
    Reviewed-by: Anshuman Khandual
    Cc: Michael Ellerman
    Cc: Catalin Marinas
    Cc: David Hildenbrand
    Cc: Jerome Glisse
    Cc: Michal Hocko
    Cc: Will Deacon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Robin Murphy
     

15 Jul, 2019

1 commit

  • Pull HMM updates from Jason Gunthorpe:
    "Improvements and bug fixes for the hmm interface in the kernel:

    - Improve clarity, locking and APIs related to the 'hmm mirror'
    feature merged last cycle. In linux-next we now see AMDGPU and
    nouveau to be using this API.

    - Remove old or transitional hmm APIs. These are hold overs from the
    past with no users, or APIs that existed only to manage cross tree
    conflicts. There are still a few more of these cleanups that didn't
    make the merge window cut off.

    - Improve some core mm APIs:
    - export alloc_pages_vma() for driver use
    - refactor into devm_request_free_mem_region() to manage
    DEVICE_PRIVATE resource reservations
    - refactor duplicative driver code into the core dev_pagemap
    struct

    - Remove hmm wrappers of improved core mm APIs, instead have drivers
    use the simplified API directly

    - Remove DEVICE_PUBLIC

    - Simplify the kconfig flow for the hmm users and core code"

    * tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (42 commits)
    mm: don't select MIGRATE_VMA_HELPER from HMM_MIRROR
    mm: remove the HMM config option
    mm: sort out the DEVICE_PRIVATE Kconfig mess
    mm: simplify ZONE_DEVICE page private data
    mm: remove hmm_devmem_add
    mm: remove hmm_vma_alloc_locked_page
    nouveau: use devm_memremap_pages directly
    nouveau: use alloc_page_vma directly
    PCI/P2PDMA: use the dev_pagemap internal refcount
    device-dax: use the dev_pagemap internal refcount
    memremap: provide an optional internal refcount in struct dev_pagemap
    memremap: replace the altmap_valid field with a PGMAP_ALTMAP_VALID flag
    memremap: remove the data field in struct dev_pagemap
    memremap: add a migrate_to_ram method to struct dev_pagemap_ops
    memremap: lift the devmap_enable manipulation into devm_memremap_pages
    memremap: pass a struct dev_pagemap to ->kill and ->cleanup
    memremap: move dev_pagemap callbacks into a separate structure
    memremap: validate the pagemap type passed to devm_memremap_pages
    mm: factor out a devm_request_free_mem_region helper
    mm: export alloc_pages_vma
    ...

    Linus Torvalds
     

13 Jul, 2019

4 commits

  • While only powerpc supports the hugepd case, the code is pretty generic
    and I'd like to keep all GUP internals in one place.

    Link: http://lkml.kernel.org/r/20190625143715.1689-15-hch@lst.de
    Signed-off-by: Christoph Hellwig
    Cc: Andrey Konovalov
    Cc: Benjamin Herrenschmidt
    Cc: David Miller
    Cc: James Hogan
    Cc: Jason Gunthorpe
    Cc: Khalid Aziz
    Cc: Michael Ellerman
    Cc: Nicholas Piggin
    Cc: Paul Burton
    Cc: Paul Mackerras
    Cc: Ralf Baechle
    Cc: Rich Felker
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • Always build mm/gup.c so that we don't have to provide separate nommu
    stubs. Also merge the get_user_pages_fast and __get_user_pages_fast stubs
    when HAVE_FAST_GUP into the main implementations, which will never call
    the fast path if HAVE_FAST_GUP is not set.

    This also ensures the new put_user_pages* helpers are available for nommu,
    as those are currently missing, which would create a problem as soon as we
    actually grew users for it.

    Link: http://lkml.kernel.org/r/20190625143715.1689-13-hch@lst.de
    Signed-off-by: Christoph Hellwig
    Cc: Andrey Konovalov
    Cc: Benjamin Herrenschmidt
    Cc: David Miller
    Cc: James Hogan
    Cc: Jason Gunthorpe
    Cc: Khalid Aziz
    Cc: Michael Ellerman
    Cc: Nicholas Piggin
    Cc: Paul Burton
    Cc: Paul Mackerras
    Cc: Ralf Baechle
    Cc: Rich Felker
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • We only support the generic GUP now, so rename the config option to
    be more clear, and always use the mm/Kconfig definition of the
    symbol and select it from the arch Kconfigs.

    Link: http://lkml.kernel.org/r/20190625143715.1689-11-hch@lst.de
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Khalid Aziz
    Reviewed-by: Jason Gunthorpe
    Cc: Andrey Konovalov
    Cc: Benjamin Herrenschmidt
    Cc: David Miller
    Cc: James Hogan
    Cc: Michael Ellerman
    Cc: Nicholas Piggin
    Cc: Paul Burton
    Cc: Paul Mackerras
    Cc: Ralf Baechle
    Cc: Rich Felker
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • The split low/high access is the only non-READ_ONCE version of gup_get_pte
    that did show up in the various arch implemenations. Lift it to common
    code and drop the ifdef based arch override.

    Link: http://lkml.kernel.org/r/20190625143715.1689-4-hch@lst.de
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Jason Gunthorpe
    Cc: Andrey Konovalov
    Cc: Benjamin Herrenschmidt
    Cc: David Miller
    Cc: James Hogan
    Cc: Khalid Aziz
    Cc: Michael Ellerman
    Cc: Nicholas Piggin
    Cc: Paul Burton
    Cc: Paul Mackerras
    Cc: Ralf Baechle
    Cc: Rich Felker
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     

03 Jul, 2019

4 commits

  • The migrate_vma helper is only used by noveau to migrate device private
    pages around. Other HMM_MIRROR users like amdgpu or infiniband don't
    need it.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Jason Gunthorpe
    Reviewed-by: Dan Williams
    Signed-off-by: Jason Gunthorpe

    Christoph Hellwig
     
  • All the mm/hmm.c code is better keyed off HMM_MIRROR. Also let nouveau
    depend on it instead of the mix of a dummy dependency symbol plus the
    actually selected one. Drop various odd dependencies, as the code is
    pretty portable.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Ira Weiny
    Reviewed-by: Jason Gunthorpe
    Reviewed-by: Dan Williams
    Signed-off-by: Jason Gunthorpe

    Christoph Hellwig
     
  • The ZONE_DEVICE support doesn't depend on anything HMM related, just on
    various bits of arch support as indicated by the architecture. Also
    don't select the option from nouveau as it isn't present in many setups,
    and depend on it instead.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Ira Weiny
    Reviewed-by: Dan Williams
    Signed-off-by: Jason Gunthorpe

    Christoph Hellwig
     
  • The code hasn't been used since it was added to the tree, and doesn't
    appear to actually be usable.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Jason Gunthorpe
    Acked-by: Michal Hocko
    Reviewed-by: Dan Williams
    Tested-by: Dan Williams
    Signed-off-by: Jason Gunthorpe

    Christoph Hellwig
     

02 Jul, 2019

1 commit


15 Jun, 2019

1 commit


09 Jun, 2019

1 commit

  • Mostly due to x86 and acpi conversion, several documentation
    links are still pointing to the old file. Fix them.

    Signed-off-by: Mauro Carvalho Chehab
    Reviewed-by: Wolfram Sang
    Reviewed-by: Sven Van Asbroeck
    Reviewed-by: Bhupesh Sharma
    Acked-by: Mark Brown
    Signed-off-by: Jonathan Corbet

    Mauro Carvalho Chehab
     

21 May, 2019

1 commit


15 May, 2019

6 commits

  • The help describing the memory model selection is outdated. It still says
    that SPARSEMEM is experimental and DISCONTIGMEM is a preferred over
    SPARSEMEM.

    Update the help text for the relevant options:
    * add a generic help for the "Memory Model" prompt
    * add description for FLATMEM
    * reduce the description of DISCONTIGMEM and add a deprecation note
    * prefer SPARSEMEM over DISCONTIGMEM

    Link: http://lkml.kernel.org/r/1556188531-20728-1-git-send-email-rppt@linux.ibm.com
    Signed-off-by: Mike Rapoport
    Acked-by: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • Most architectures do not need the memblock memory after the page
    allocator is initialized, but only few enable ARCH_DISCARD_MEMBLOCK in the
    arch Kconfig.

    Replacing ARCH_DISCARD_MEMBLOCK with ARCH_KEEP_MEMBLOCK and inverting the
    logic makes it clear which architectures actually use memblock after
    system initialization and skips the necessity to add ARCH_DISCARD_MEMBLOCK
    to the architectures that are still missing that option.

    Link: http://lkml.kernel.org/r/1556102150-32517-1-git-send-email-rppt@linux.ibm.com
    Signed-off-by: Mike Rapoport
    Acked-by: Michael Ellerman (powerpc)
    Cc: Russell King
    Cc: Catalin Marinas
    Cc: Will Deacon
    Cc: Richard Kuo
    Cc: Tony Luck
    Cc: Fenghua Yu
    Cc: Geert Uytterhoeven
    Cc: Ralf Baechle
    Cc: Paul Burton
    Cc: James Hogan
    Cc: Ley Foon Tan
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: Yoshinori Sato
    Cc: Rich Felker
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Borislav Petkov
    Cc: "H. Peter Anvin"
    Cc: Eric Biederman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • Add 2 new Kconfig variables that are not used by anyone. I check that
    various make ARCH=somearch allmodconfig do work and do not complain. This
    new Kconfig needs to be added first so that device drivers that depend on
    HMM can be updated.

    Once drivers are updated then I can update the HMM Kconfig to depend on
    this new Kconfig in a followup patch.

    This is about solving Kconfig for HMM given that device driver are
    going through their own tree we want to avoid changing them from the mm
    tree. So plan is:

    1 - Kernel release N add the new Kconfig to mm/Kconfig (this patch)
    2 - Kernel release N+1 update driver to depend on new Kconfig ie
    stop using ARCH_HASH_HMM and start using ARCH_HAS_HMM_MIRROR
    and ARCH_HAS_HMM_DEVICE (one or the other or both depending
    on the driver)
    3 - Kernel release N+2 remove ARCH_HASH_HMM and do final Kconfig
    update in mm/Kconfig

    Link: http://lkml.kernel.org/r/20190417211141.17580-1-jglisse@redhat.com
    Signed-off-by: Jérôme Glisse
    Cc: Guenter Roeck
    Cc: Leon Romanovsky
    Cc: Jason Gunthorpe
    Cc: Ralph Campbell
    Cc: John Hubbard
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jérôme Glisse
     
  • 'default n' is the default value for any bool or tristate Kconfig
    setting so there is no need to write it explicitly.

    Also since commit f467c5640c29 ("kconfig: only write '# CONFIG_FOO
    is not set' for visible symbols") the Kconfig behavior is the same
    regardless of 'default n' being present or not:

    ...
    One side effect of (and the main motivation for) this change is making
    the following two definitions behave exactly the same:

    config FOO
    bool

    config FOO
    bool
    default n

    With this change, neither of these will generate a
    '# CONFIG_FOO is not set' line (assuming FOO isn't selected/implied).
    That might make it clearer to people that a bare 'default n' is
    redundant.
    ...

    Link: http://lkml.kernel.org/r/c3385916-e4d4-37d3-b330-e6b7dff83a52@samsung.com
    Signed-off-by: Bartlomiej Zolnierkiewicz
    Reviewed-by: Andrew Morton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bartlomiej Zolnierkiewicz
     
  • To avoid random config build issue, select mmu notifier when HMM is
    selected. In any cases when HMM get selected it will be by users that
    will also wants the mmu notifier.

    Link: http://lkml.kernel.org/r/20190403193318.16478-2-jglisse@redhat.com
    Signed-off-by: Jérôme Glisse
    Acked-by: Balbir Singh
    Cc: Ralph Campbell
    Cc: John Hubbard
    Cc: Dan Williams
    Cc: Arnd Bergmann
    Cc: Dan Carpenter
    Cc: Ira Weiny
    Cc: Matthew Wilcox
    Cc: Souptick Joarder
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jérôme Glisse
     
  • This condition allows to define alloc_contig_range, so simplify it into a
    more accurate naming.

    Link: http://lkml.kernel.org/r/20190327063626.18421-4-alex@ghiti.fr
    Signed-off-by: Alexandre Ghiti
    Suggested-by: Vlastimil Babka
    Acked-by: Vlastimil Babka
    Cc: Andy Lutomirsky
    Cc: Aneesh Kumar K.V
    Cc: Benjamin Herrenschmidt
    Cc: Borislav Petkov
    Cc: Catalin Marinas
    Cc: Dave Hansen
    Cc: David S. Miller
    Cc: Heiko Carstens
    Cc: "H . Peter Anvin"
    Cc: Ingo Molnar
    Cc: Martin Schwidefsky
    Cc: Michael Ellerman
    Cc: Mike Kravetz
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Rich Felker
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexandre Ghiti
     

29 Dec, 2018

1 commit

  • Replace jhash2 with xxhash.

    Perf numbers:
    Intel(R) Xeon(R) CPU E5-2420 v2 @ 2.20GHz
    ksm: crc32c hash() 12081 MB/s
    ksm: xxh64 hash() 8770 MB/s
    ksm: xxh32 hash() 4529 MB/s
    ksm: jhash2 hash() 1569 MB/s

    Sioh Lee did some testing:

    crc32c_intel: 1084.10ns
    crc32c (no hardware acceleration): 7012.51ns
    xxhash32: 2227.75ns
    xxhash64: 1413.16ns
    jhash2: 5128.30ns

    As jhash2 always will be slower (for data size like PAGE_SIZE). Don't use
    it in ksm at all.

    Use only xxhash for now, because for using crc32c, cryptoapi must be
    initialized first - that requires some tricky solution to work well in all
    situations.

    Link: http://lkml.kernel.org/r/20181023182554.23464-3-nefelim4ag@gmail.com
    Signed-off-by: Timofey Titovets
    Signed-off-by: leesioh
    Reviewed-by: Pavel Tatashin
    Reviewed-by: Mike Rapoport
    Reviewed-by: Andrew Morton
    Cc: Andrea Arcangeli
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Timofey Titovets
     

31 Oct, 2018

2 commits

  • All architecures use memblock for early memory management. There is no need
    for the CONFIG_HAVE_MEMBLOCK configuration option.

    [rppt@linux.vnet.ibm.com: of/fdt: fixup #ifdefs]
    Link: http://lkml.kernel.org/r/20180919103457.GA20545@rapoport-lnx
    [rppt@linux.vnet.ibm.com: csky: fixups after bootmem removal]
    Link: http://lkml.kernel.org/r/20180926112744.GC4628@rapoport-lnx
    [rppt@linux.vnet.ibm.com: remove stale #else and the code it protects]
    Link: http://lkml.kernel.org/r/1538067825-24835-1-git-send-email-rppt@linux.vnet.ibm.com
    Link: http://lkml.kernel.org/r/1536927045-23536-4-git-send-email-rppt@linux.vnet.ibm.com
    Signed-off-by: Mike Rapoport
    Acked-by: Michal Hocko
    Tested-by: Jonathan Cameron
    Cc: Catalin Marinas
    Cc: Chris Zankel
    Cc: "David S. Miller"
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Greg Kroah-Hartman
    Cc: Guan Xuetao
    Cc: Ingo Molnar
    Cc: "James E.J. Bottomley"
    Cc: Jonas Bonn
    Cc: Jonathan Corbet
    Cc: Ley Foon Tan
    Cc: Mark Salter
    Cc: Martin Schwidefsky
    Cc: Matt Turner
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Palmer Dabbelt
    Cc: Paul Burton
    Cc: Richard Kuo
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Russell King
    Cc: Serge Semin
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Vineet Gupta
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • All achitectures select NO_BOOTMEM which essentially becomes 'Y' for any
    kernel configuration and therefore it can be removed.

    [alexander.h.duyck@linux.intel.com: remove now defunct NO_BOOTMEM from depends list for deferred init]
    Link: http://lkml.kernel.org/r/20180925201814.3576.15105.stgit@localhost.localdomain
    Link: http://lkml.kernel.org/r/1536927045-23536-3-git-send-email-rppt@linux.vnet.ibm.com
    Signed-off-by: Mike Rapoport
    Signed-off-by: Alexander Duyck
    Acked-by: Michal Hocko
    Cc: Catalin Marinas
    Cc: Chris Zankel
    Cc: "David S. Miller"
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Greg Kroah-Hartman
    Cc: Guan Xuetao
    Cc: Ingo Molnar
    Cc: "James E.J. Bottomley"
    Cc: Jonas Bonn
    Cc: Jonathan Corbet
    Cc: Ley Foon Tan
    Cc: Mark Salter
    Cc: Martin Schwidefsky
    Cc: Matt Turner
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Palmer Dabbelt
    Cc: Paul Burton
    Cc: Richard Kuo
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Russell King
    Cc: Serge Semin
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Vineet Gupta
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     

21 Oct, 2018

1 commit


21 Sep, 2018

1 commit

  • Deferred struct page init is needed only on systems with large amount of
    physical memory to improve boot performance. 32-bit systems do not
    benefit from this feature.

    Jiri reported a problem where deferred struct pages do not work well with
    x86-32:

    [ 0.035162] Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
    [ 0.035725] Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
    [ 0.036269] Initializing CPU#0
    [ 0.036513] Initializing HighMem for node 0 (00036ffe:0007ffe0)
    [ 0.038459] page:f6780000 is uninitialized and poisoned
    [ 0.038460] raw: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
    [ 0.039509] page dumped because: VM_BUG_ON_PAGE(1 && PageCompound(page))
    [ 0.040038] ------------[ cut here ]------------
    [ 0.040399] kernel BUG at include/linux/page-flags.h:293!
    [ 0.040823] invalid opcode: 0000 [#1] SMP PTI
    [ 0.041166] CPU: 0 PID: 0 Comm: swapper Not tainted 4.19.0-rc1_pt_jiri #9
    [ 0.041694] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
    [ 0.042496] EIP: free_highmem_page+0x64/0x80
    [ 0.042839] Code: 13 46 d8 c1 e8 18 5d 83 e0 03 8d 04 c0 c1 e0 06 ff 80 ec 5f 44 d8 c3 8d b4 26 00 00 00 00 ba 08 65 28 d8 89 d8 e8 fc 71 02 00 0b 8d 76 00 8d bc 27 00 00 00 00 ba d0 b1 26 d8 89 d8 e8 e4 71
    [ 0.044338] EAX: 0000003c EBX: f6780000 ECX: 00000000 EDX: d856cbe8
    [ 0.044868] ESI: 0007ffe0 EDI: d838df20 EBP: d838df00 ESP: d838defc
    [ 0.045372] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210086
    [ 0.045913] CR0: 80050033 CR2: 00000000 CR3: 18556000 CR4: 00040690
    [ 0.046413] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
    [ 0.046913] DR6: fffe0ff0 DR7: 00000400
    [ 0.047220] Call Trace:
    [ 0.047419] add_highpages_with_active_regions+0xbd/0x10d
    [ 0.047854] set_highmem_pages_init+0x5b/0x71
    [ 0.048202] mem_init+0x2b/0x1e8
    [ 0.048460] start_kernel+0x1d2/0x425
    [ 0.048757] i386_start_kernel+0x93/0x97
    [ 0.049073] startup_32_smp+0x164/0x168
    [ 0.049379] Modules linked in:
    [ 0.049626] ---[ end trace 337949378db0abbb ]---

    We free highmem pages before their struct pages are initialized:

    mem_init()
    set_highmem_pages_init()
    add_highpages_with_active_regions()
    free_highmem_page()
    .. Access uninitialized struct page here..

    Because there is no reason to have this feature on 32-bit systems, just
    disable it.

    Link: http://lkml.kernel.org/r/20180831150506.31246-1-pavel.tatashin@microsoft.com
    Fixes: 2e3ca40f03bb ("mm: relax deferred struct page requirements")
    Signed-off-by: Pavel Tatashin
    Reported-by: Jiri Slaby
    Acked-by: Michal Hocko
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Greg Kroah-Hartman

    Pasha Tatashin
     

18 Aug, 2018

3 commits

  • CONFIG_THP_SWAP should depend on CONFIG_SWAP, because it's unreasonable
    to optimize swapping for THP (Transparent Huge Page) without basic
    swapping support.

    In original code, when CONFIG_SWAP=n and CONFIG_THP_SWAP=y,
    split_swap_cluster() will not be built because it is in swapfile.c, but
    it will be called in huge_memory.c. This doesn't trigger a build error
    in practice because the call site is enclosed by PageSwapCache(), which
    is defined to be constant 0 when CONFIG_SWAP=n. But this is fragile and
    should be fixed.

    The comments are fixed too to reflect the latest progress.

    Link: http://lkml.kernel.org/r/20180713021228.439-1-ying.huang@intel.com
    Fixes: 38d8b4e6bdc8 ("mm, THP, swap: delay splitting THP during swap out")
    Signed-off-by: "Huang, Ying"
    Reviewed-by: Dan Williams
    Reviewed-by: Naoya Horiguchi
    Cc: Michal Hocko
    Cc: Johannes Weiner
    Cc: Shaohua Li
    Cc: Hugh Dickins
    Cc: Minchan Kim
    Cc: Rik van Riel
    Cc: Dave Hansen
    Cc: Zi Yan
    Cc: Daniel Jordan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Huang Ying
     
  • Rename new_sparse_init() to sparse_init() which enables it. Delete old
    sparse_init() and all the code that became obsolete with.

    [pasha.tatashin@oracle.com: remove unused sparse_mem_maps_populate_node()]
    Link: http://lkml.kernel.org/r/20180716174447.14529-6-pasha.tatashin@oracle.com
    Link: http://lkml.kernel.org/r/20180712203730.8703-6-pasha.tatashin@oracle.com
    Signed-off-by: Pavel Tatashin
    Tested-by: Michael Ellerman [powerpc]
    Tested-by: Oscar Salvador
    Reviewed-by: Oscar Salvador
    Cc: Pasha Tatashin
    Cc: Abdul Haleem
    Cc: Baoquan He
    Cc: Daniel Jordan
    Cc: Dan Williams
    Cc: Dave Hansen
    Cc: David Rientjes
    Cc: Greg Kroah-Hartman
    Cc: Ingo Molnar
    Cc: Jan Kara
    Cc: Jérôme Glisse
    Cc: "Kirill A. Shutemov"
    Cc: Michal Hocko
    Cc: Souptick Joarder
    Cc: Steven Sistare
    Cc: Vlastimil Babka
    Cc: Wei Yang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Tatashin
     
  • The deferred memory initialization relies on section definitions, e.g
    PAGES_PER_SECTION, that are only available when CONFIG_SPARSEMEM=y on
    most architectures.

    Initially DEFERRED_STRUCT_PAGE_INIT depended on explicit
    ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT configuration option, but since
    the commit 2e3ca40f03bb13709df4 ("mm: relax deferred struct page
    requirements") this requirement was relaxed and now it is possible to
    enable DEFERRED_STRUCT_PAGE_INIT on architectures that support
    DISCONTINGMEM and NO_BOOTMEM which causes build failures.

    For instance, setting SMP=y and DEFERRED_STRUCT_PAGE_INIT=y on arc
    causes the following build failure:

    CC mm/page_alloc.o
    mm/page_alloc.c: In function 'update_defer_init':
    mm/page_alloc.c:321:14: error: 'PAGES_PER_SECTION'
    undeclared (first use in this function); did you mean 'USEC_PER_SEC'?
    (pfn & (PAGES_PER_SECTION - 1)) == 0) {
    ^~~~~~~~~~~~~~~~~
    USEC_PER_SEC
    mm/page_alloc.c:321:14: note: each undeclared identifier is reported only once for each function it appears in
    In file included from include/linux/cache.h:5:0,
    from include/linux/printk.h:9,
    from include/linux/kernel.h:14,
    from include/asm-generic/bug.h:18,
    from arch/arc/include/asm/bug.h:32,
    from include/linux/bug.h:5,
    from include/linux/mmdebug.h:5,
    from include/linux/mm.h:9,
    from mm/page_alloc.c:18:
    mm/page_alloc.c: In function 'deferred_grow_zone':
    mm/page_alloc.c:1624:52: error: 'PAGES_PER_SECTION' undeclared (first use in this function); did you mean 'USEC_PER_SEC'?
    unsigned long nr_pages_needed = ALIGN(1 << order, PAGES_PER_SECTION);
    ^
    include/uapi/linux/kernel.h:11:47: note: in definition of macro '__ALIGN_KERNEL_MASK'
    #define __ALIGN_KERNEL_MASK(x, mask) (((x) + (mask)) & ~(mask))
    ^~~~
    include/linux/kernel.h:58:22: note: in expansion of macro '__ALIGN_KERNEL'
    #define ALIGN(x, a) __ALIGN_KERNEL((x), (a))
    ^~~~~~~~~~~~~~
    mm/page_alloc.c:1624:34: note: in expansion of macro 'ALIGN'
    unsigned long nr_pages_needed = ALIGN(1 << order, PAGES_PER_SECTION);
    ^~~~~
    In file included from include/asm-generic/bug.h:18:0,
    from arch/arc/include/asm/bug.h:32,
    from include/linux/bug.h:5,
    from include/linux/mmdebug.h:5,
    from include/linux/mm.h:9,
    from mm/page_alloc.c:18:
    mm/page_alloc.c: In function 'free_area_init_node':
    mm/page_alloc.c:6379:50: error: 'PAGES_PER_SECTION' undeclared (first use in this function); did you mean 'USEC_PER_SEC'?
    pgdat->static_init_pgcnt = min_t(unsigned long, PAGES_PER_SECTION,
    ^
    include/linux/kernel.h:812:22: note: in definition of macro '__typecheck'
    (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1)))
    ^
    include/linux/kernel.h:836:24: note: in expansion of macro '__safe_cmp'
    __builtin_choose_expr(__safe_cmp(x, y), \
    ^~~~~~~~~~
    include/linux/kernel.h:904:27: note: in expansion of macro '__careful_cmp'
    #define min_t(type, x, y) __careful_cmp((type)(x), (type)(y), static_init_pgcnt = min_t(unsigned long, PAGES_PER_SECTION,
    ^~~~~
    include/linux/kernel.h:836:2: error: first argument to '__builtin_choose_expr' not a constant
    __builtin_choose_expr(__safe_cmp(x, y), \
    ^
    include/linux/kernel.h:904:27: note: in expansion of macro '__careful_cmp'
    #define min_t(type, x, y) __careful_cmp((type)(x), (type)(y), static_init_pgcnt = min_t(unsigned long, PAGES_PER_SECTION,
    ^~~~~
    scripts/Makefile.build:317: recipe for target 'mm/page_alloc.o' failed

    Let's make the DEFERRED_STRUCT_PAGE_INIT explicitly depend on SPARSEMEM
    as the systems that support DISCONTIGMEM do not seem to have that huge
    amounts of memory that would make DEFERRED_STRUCT_PAGE_INIT relevant.

    Link: http://lkml.kernel.org/r/1530279308-24988-1-git-send-email-rppt@linux.vnet.ibm.com
    Signed-off-by: Mike Rapoport
    Acked-by: Michal Hocko
    Reviewed-by: Pavel Tatashin
    Tested-by: Randy Dunlap
    Cc: Pasha Tatashin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     

02 Aug, 2018

1 commit