13 Jan, 2019

2 commits

  • commit 02917e9f8676207a4c577d4d94eae12bf348e9d7 upstream.

    At Maintainer Summit, Greg brought up a topic I proposed around
    EXPORT_SYMBOL_GPL usage. The motivation was considerations for when
    EXPORT_SYMBOL_GPL is warranted and the criteria for taking the exceptional
    step of reclassifying an existing export. Specifically, I wanted to make
    the case that although the line is fuzzy and hard to specify in abstract
    terms, it is nonetheless clear that devm_memremap_pages() and HMM
    (Heterogeneous Memory Management) have crossed it. The
    devm_memremap_pages() facility should have been EXPORT_SYMBOL_GPL from the
    beginning, and HMM as a derivative of that functionality should have
    naturally picked up that designation as well.

    Contrary to typical rules, the HMM infrastructure was merged upstream with
    zero in-tree consumers. There was a promise at the time that those users
    would be merged "soon", but it has been over a year with no drivers
    arriving. While the Nouveau driver is about to belatedly make good on
    that promise it is clear that HMM was targeted first and foremost at an
    out-of-tree consumer.

    HMM is derived from devm_memremap_pages(), a facility Christoph and I
    spearheaded to support persistent memory. It combines a device lifetime
    model with a dynamically created 'struct page' / memmap array for any
    physical address range. It enables coordination and control of the many
    code paths in the kernel built to interact with memory via 'struct page'
    objects. With HMM the integration goes even deeper by allowing device
    drivers to hook and manipulate page fault and page free events.

    One interpretation of when EXPORT_SYMBOL is suitable is when it is
    exporting stable and generic leaf functionality. The
    devm_memremap_pages() facility continues to see expanding use cases,
    peer-to-peer DMA being the most recent, with no clear end date when it
    will stop attracting reworks and semantic changes. It is not suitable to
    export devm_memremap_pages() as a stable 3rd party driver API due to the
    fact that it is still changing and manipulates core behavior. Moreover,
    it is not in the best interest of the long term development of the core
    memory management subsystem to permit any external driver to effectively
    define its own system-wide memory management policies with no
    encouragement to engage with upstream.

    I am also concerned that HMM was designed in a way to minimize further
    engagement with the core-MM. That, with these hooks in place,
    device-drivers are free to implement their own policies without much
    consideration for whether and how the core-MM could grow to meet that
    need. Going forward not only should HMM be EXPORT_SYMBOL_GPL, but the
    core-MM should be allowed the opportunity and stimulus to change and
    address these new use cases as first class functionality.

    Original changelog:

    hmm_devmem_add(), and hmm_devmem_add_resource() duplicated
    devm_memremap_pages() and are now simple now wrappers around the core
    facility to inject a dev_pagemap instance into the global pgmap_radix and
    hook page-idle events. The devm_memremap_pages() interface is base
    infrastructure for HMM. HMM has more and deeper ties into the kernel
    memory management implementation than base ZONE_DEVICE which is itself a
    EXPORT_SYMBOL_GPL facility.

    Originally, the HMM page structure creation routines copied the
    devm_memremap_pages() code and reused ZONE_DEVICE. A cleanup to unify the
    implementations was discussed during the initial review:
    http://lkml.iu.edu/hypermail/linux/kernel/1701.2/00812.html Recent work to
    extend devm_memremap_pages() for the peer-to-peer-DMA facility enabled
    this cleanup to move forward.

    In addition to the integration with devm_memremap_pages() HMM depends on
    other GPL-only symbols:

    mmu_notifier_unregister_no_release
    percpu_ref
    region_intersects
    __class_create

    It goes further to consume / indirectly expose functionality that is not
    exported to any other driver:

    alloc_pages_vma
    walk_page_range

    HMM is derived from devm_memremap_pages(), and extends deep core-kernel
    fundamentals. Similar to devm_memremap_pages(), mark its entry points
    EXPORT_SYMBOL_GPL().

    [logang@deltatee.com: PCI/P2PDMA: match interface changes to devm_memremap_pages()]
    Link: http://lkml.kernel.org/r/20181130225911.2900-1-logang@deltatee.com
    Link: http://lkml.kernel.org/r/154275560565.76910.15919297436557795278.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Dan Williams
    Signed-off-by: Logan Gunthorpe
    Reviewed-by: Christoph Hellwig
    Cc: Logan Gunthorpe
    Cc: "Jérôme Glisse"
    Cc: Balbir Singh ,
    Cc: Michal Hocko
    Cc: Benjamin Herrenschmidt
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Dan Williams
     
  • commit 58ef15b765af0d2cbe6799ec564f1dc485010ab8 upstream.

    devm semantics arrange for resources to be torn down when
    device-driver-probe fails or when device-driver-release completes.
    Similar to devm_memremap_pages() there is no need to support an explicit
    remove operation when the users properly adhere to devm semantics.

    Note that devm_kzalloc() automatically handles allocating node-local
    memory.

    Link: http://lkml.kernel.org/r/154275559545.76910.9186690723515469051.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Dan Williams
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Jérôme Glisse
    Cc: "Jérôme Glisse"
    Cc: Logan Gunthorpe
    Cc: Balbir Singh
    Cc: Michal Hocko
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Dan Williams
     

24 Apr, 2018

1 commit

  • commit c719547f032d4610c7a20900baacae26d0b1ff3e upstream.

    The private field of mm_walk struct point to an hmm_vma_walk struct and
    not to the hmm_range struct desired. Fix to get proper struct pointer.

    Link: http://lkml.kernel.org/r/20180323005527.758-6-jglisse@redhat.com
    Signed-off-by: Jérôme Glisse
    Cc: Evgeny Baskakov
    Cc: Ralph Campbell
    Cc: Mark Hairgrove
    Cc: John Hubbard
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Jérôme Glisse
     

09 Sep, 2017

10 commits

  • This moves all new code including new page migration helper behind kernel
    Kconfig option so that there is no codee bloat for arch or user that do
    not want to use HMM or any of its associated features.

    arm allyesconfig (without all the patchset, then with and this patch):
    text data bss dec hex filename
    83721896 46511131 27582964 157815991 96814b7 ../without/vmlinux
    83722364 46511131 27582964 157816459 968168b vmlinux

    [jglisse@redhat.com: struct hmm is only use by HMM mirror functionality]
    Link: http://lkml.kernel.org/r/20170825213133.27286-1-jglisse@redhat.com
    [sfr@canb.auug.org.au: fix build (arm multi_v7_defconfig)]
    Link: http://lkml.kernel.org/r/20170828181849.323ab81b@canb.auug.org.au
    Link: http://lkml.kernel.org/r/20170818032858.7447-1-jglisse@redhat.com
    Signed-off-by: Jérôme Glisse
    Signed-off-by: Stephen Rothwell
    Cc: Dan Williams
    Cc: Arnd Bergmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jérôme Glisse
     
  • Unlike unaddressable memory, coherent device memory has a real resource
    associated with it on the system (as CPU can address it). Add a new
    helper to hotplug such memory within the HMM framework.

    Link: http://lkml.kernel.org/r/20170817000548.32038-20-jglisse@redhat.com
    Signed-off-by: Jérôme Glisse
    Reviewed-by: Balbir Singh
    Cc: Aneesh Kumar
    Cc: Benjamin Herrenschmidt
    Cc: Dan Williams
    Cc: David Nellans
    Cc: Evgeny Baskakov
    Cc: Johannes Weiner
    Cc: John Hubbard
    Cc: Kirill A. Shutemov
    Cc: Mark Hairgrove
    Cc: Michal Hocko
    Cc: Paul E. McKenney
    Cc: Ross Zwisler
    Cc: Sherry Cheung
    Cc: Subhash Gutti
    Cc: Vladimir Davydov
    Cc: Bob Liu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jérôme Glisse
     
  • Platform with advance system bus (like CAPI or CCIX) allow device memory
    to be accessible from CPU in a cache coherent fashion. Add a new type of
    ZONE_DEVICE to represent such memory. The use case are the same as for
    the un-addressable device memory but without all the corners cases.

    Link: http://lkml.kernel.org/r/20170817000548.32038-19-jglisse@redhat.com
    Signed-off-by: Jérôme Glisse
    Cc: Aneesh Kumar
    Cc: Paul E. McKenney
    Cc: Benjamin Herrenschmidt
    Cc: Dan Williams
    Cc: Ross Zwisler
    Cc: Balbir Singh
    Cc: David Nellans
    Cc: Evgeny Baskakov
    Cc: Johannes Weiner
    Cc: John Hubbard
    Cc: Kirill A. Shutemov
    Cc: Mark Hairgrove
    Cc: Michal Hocko
    Cc: Sherry Cheung
    Cc: Subhash Gutti
    Cc: Vladimir Davydov
    Cc: Bob Liu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jérôme Glisse
     
  • This introduce a dummy HMM device class so device driver can use it to
    create hmm_device for the sole purpose of registering device memory. It
    is useful to device driver that want to manage multiple physical device
    memory under same struct device umbrella.

    Link: http://lkml.kernel.org/r/20170817000548.32038-13-jglisse@redhat.com
    Signed-off-by: Jérôme Glisse
    Signed-off-by: Evgeny Baskakov
    Signed-off-by: John Hubbard
    Signed-off-by: Mark Hairgrove
    Signed-off-by: Sherry Cheung
    Signed-off-by: Subhash Gutti
    Cc: Aneesh Kumar
    Cc: Balbir Singh
    Cc: Benjamin Herrenschmidt
    Cc: Dan Williams
    Cc: David Nellans
    Cc: Johannes Weiner
    Cc: Kirill A. Shutemov
    Cc: Michal Hocko
    Cc: Paul E. McKenney
    Cc: Ross Zwisler
    Cc: Vladimir Davydov
    Cc: Bob Liu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jérôme Glisse
     
  • This introduce a simple struct and associated helpers for device driver to
    use when hotpluging un-addressable device memory as ZONE_DEVICE. It will
    find a unuse physical address range and trigger memory hotplug for it
    which allocates and initialize struct page for the device memory.

    Device driver should use this helper during device initialization to
    hotplug the device memory. It should only need to remove the memory once
    the device is going offline (shutdown or hotremove). There should not be
    any userspace API to hotplug memory expect maybe for host device driver to
    allow to add more memory to a guest device driver.

    Device's memory is manage by the device driver and HMM only provides
    helpers to that effect.

    Link: http://lkml.kernel.org/r/20170817000548.32038-12-jglisse@redhat.com
    Signed-off-by: Jérôme Glisse
    Signed-off-by: Evgeny Baskakov
    Signed-off-by: John Hubbard
    Signed-off-by: Mark Hairgrove
    Signed-off-by: Sherry Cheung
    Signed-off-by: Subhash Gutti
    Signed-off-by: Balbir Singh
    Cc: Aneesh Kumar
    Cc: Benjamin Herrenschmidt
    Cc: Dan Williams
    Cc: David Nellans
    Cc: Johannes Weiner
    Cc: Kirill A. Shutemov
    Cc: Michal Hocko
    Cc: Paul E. McKenney
    Cc: Ross Zwisler
    Cc: Vladimir Davydov
    Cc: Bob Liu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jérôme Glisse
     
  • A ZONE_DEVICE page that reach a refcount of 1 is free ie no longer have
    any user. For device private pages this is important to catch and thus we
    need to special case put_page() for this.

    Link: http://lkml.kernel.org/r/20170817000548.32038-9-jglisse@redhat.com
    Signed-off-by: Jérôme Glisse
    Cc: Kirill A. Shutemov
    Cc: Dan Williams
    Cc: Ross Zwisler
    Cc: Aneesh Kumar
    Cc: Balbir Singh
    Cc: Benjamin Herrenschmidt
    Cc: David Nellans
    Cc: Evgeny Baskakov
    Cc: Johannes Weiner
    Cc: John Hubbard
    Cc: Mark Hairgrove
    Cc: Michal Hocko
    Cc: Paul E. McKenney
    Cc: Sherry Cheung
    Cc: Subhash Gutti
    Cc: Vladimir Davydov
    Cc: Bob Liu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jérôme Glisse
     
  • This handles page fault on behalf of device driver, unlike
    handle_mm_fault() it does not trigger migration back to system memory for
    device memory.

    Link: http://lkml.kernel.org/r/20170817000548.32038-6-jglisse@redhat.com
    Signed-off-by: Jérôme Glisse
    Signed-off-by: Evgeny Baskakov
    Signed-off-by: John Hubbard
    Signed-off-by: Mark Hairgrove
    Signed-off-by: Sherry Cheung
    Signed-off-by: Subhash Gutti
    Cc: Aneesh Kumar
    Cc: Balbir Singh
    Cc: Benjamin Herrenschmidt
    Cc: Dan Williams
    Cc: David Nellans
    Cc: Johannes Weiner
    Cc: Kirill A. Shutemov
    Cc: Michal Hocko
    Cc: Paul E. McKenney
    Cc: Ross Zwisler
    Cc: Vladimir Davydov
    Cc: Bob Liu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jérôme Glisse
     
  • This does not use existing page table walker because we want to share
    same code for our page fault handler.

    Link: http://lkml.kernel.org/r/20170817000548.32038-5-jglisse@redhat.com
    Signed-off-by: Jérôme Glisse
    Signed-off-by: Evgeny Baskakov
    Signed-off-by: John Hubbard
    Signed-off-by: Mark Hairgrove
    Signed-off-by: Sherry Cheung
    Signed-off-by: Subhash Gutti
    Cc: Aneesh Kumar
    Cc: Balbir Singh
    Cc: Benjamin Herrenschmidt
    Cc: Dan Williams
    Cc: David Nellans
    Cc: Johannes Weiner
    Cc: Kirill A. Shutemov
    Cc: Michal Hocko
    Cc: Paul E. McKenney
    Cc: Ross Zwisler
    Cc: Vladimir Davydov
    Cc: Bob Liu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jérôme Glisse
     
  • This is a heterogeneous memory management (HMM) process address space
    mirroring. In a nutshell this provide an API to mirror process address
    space on a device. This boils down to keeping CPU and device page table
    synchronize (we assume that both device and CPU are cache coherent like
    PCIe device can be).

    This patch provide a simple API for device driver to achieve address space
    mirroring thus avoiding each device driver to grow its own CPU page table
    walker and its own CPU page table synchronization mechanism.

    This is useful for NVidia GPU >= Pascal, Mellanox IB >= mlx5 and more
    hardware in the future.

    [jglisse@redhat.com: fix hmm for "mmu_notifier kill invalidate_page callback"]
    Link: http://lkml.kernel.org/r/20170830231955.GD9445@redhat.com
    Link: http://lkml.kernel.org/r/20170817000548.32038-4-jglisse@redhat.com
    Signed-off-by: Jérôme Glisse
    Signed-off-by: Evgeny Baskakov
    Signed-off-by: John Hubbard
    Signed-off-by: Mark Hairgrove
    Signed-off-by: Sherry Cheung
    Signed-off-by: Subhash Gutti
    Cc: Aneesh Kumar
    Cc: Balbir Singh
    Cc: Benjamin Herrenschmidt
    Cc: Dan Williams
    Cc: David Nellans
    Cc: Johannes Weiner
    Cc: Kirill A. Shutemov
    Cc: Michal Hocko
    Cc: Paul E. McKenney
    Cc: Ross Zwisler
    Cc: Vladimir Davydov
    Cc: Bob Liu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jérôme Glisse
     
  • HMM provides 3 separate types of functionality:
    - Mirroring: synchronize CPU page table and device page table
    - Device memory: allocating struct page for device memory
    - Migration: migrating regular memory to device memory

    This patch introduces some common helpers and definitions to all of
    those 3 functionality.

    Link: http://lkml.kernel.org/r/20170817000548.32038-3-jglisse@redhat.com
    Signed-off-by: Jérôme Glisse
    Signed-off-by: Evgeny Baskakov
    Signed-off-by: John Hubbard
    Signed-off-by: Mark Hairgrove
    Signed-off-by: Sherry Cheung
    Signed-off-by: Subhash Gutti
    Cc: Aneesh Kumar
    Cc: Balbir Singh
    Cc: Benjamin Herrenschmidt
    Cc: Dan Williams
    Cc: David Nellans
    Cc: Johannes Weiner
    Cc: Kirill A. Shutemov
    Cc: Michal Hocko
    Cc: Paul E. McKenney
    Cc: Ross Zwisler
    Cc: Vladimir Davydov
    Cc: Bob Liu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jérôme Glisse