05 Oct, 2020

2 commits

  • In 2019, we introduced pin_user_pages*() and now we are converting
    get_user_pages*() to the new API as appropriate. [1] & [2] could
    be referred for more information. This is case 5 as per document [1].

    [1] Documentation/core-api/pin_user_pages.rst

    [2] "Explicit pinning of user-space pages":
    https://lwn.net/Articles/807108/

    Signed-off-by: Souptick Joarder
    Cc: John Hubbard
    Cc: Boris Ostrovsky
    Cc: Juergen Gross
    Cc: David Vrabel
    Link: https://lore.kernel.org/r/1599375114-32360-2-git-send-email-jrdr.linux@gmail.com
    Reviewed-by: Boris Ostrovsky
    Signed-off-by: Boris Ostrovsky

    Souptick Joarder
     
  • There seems to be a bug in the original code when gntdev_get_page()
    is called with writeable=true then the page needs to be marked dirty
    before being put.

    To address this, a bool writeable is added in gnt_dev_copy_batch, set
    it in gntdev_grant_copy_seg() (and drop `writeable` argument to
    gntdev_get_page()) and then, based on batch->writeable, use
    set_page_dirty_lock().

    Fixes: a4cdb556cae0 (xen/gntdev: add ioctl for grant copy)
    Suggested-by: Boris Ostrovsky
    Signed-off-by: Souptick Joarder
    Cc: John Hubbard
    Cc: Boris Ostrovsky
    Cc: Juergen Gross
    Cc: David Vrabel
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/1599375114-32360-1-git-send-email-jrdr.linux@gmail.com
    Reviewed-by: Boris Ostrovsky
    Signed-off-by: Boris Ostrovsky

    Souptick Joarder
     

10 Jun, 2020

2 commits

  • Convert comments that reference mmap_sem to reference mmap_lock instead.

    [akpm@linux-foundation.org: fix up linux-next leftovers]
    [akpm@linux-foundation.org: s/lockaphore/lock/, per Vlastimil]
    [akpm@linux-foundation.org: more linux-next fixups, per Michel]

    Signed-off-by: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Reviewed-by: Vlastimil Babka
    Reviewed-by: Daniel Jordan
    Cc: Davidlohr Bueso
    Cc: David Rientjes
    Cc: Hugh Dickins
    Cc: Jason Gunthorpe
    Cc: Jerome Glisse
    Cc: John Hubbard
    Cc: Laurent Dufour
    Cc: Liam Howlett
    Cc: Matthew Wilcox
    Cc: Peter Zijlstra
    Cc: Ying Han
    Link: http://lkml.kernel.org/r/20200520052908.204642-13-walken@google.com
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • This change converts the existing mmap_sem rwsem calls to use the new mmap
    locking API instead.

    The change is generated using coccinelle with the following rule:

    // spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir .

    @@
    expression mm;
    @@
    (
    -init_rwsem
    +mmap_init_lock
    |
    -down_write
    +mmap_write_lock
    |
    -down_write_killable
    +mmap_write_lock_killable
    |
    -down_write_trylock
    +mmap_write_trylock
    |
    -up_write
    +mmap_write_unlock
    |
    -downgrade_write
    +mmap_write_downgrade
    |
    -down_read
    +mmap_read_lock
    |
    -down_read_killable
    +mmap_read_lock_killable
    |
    -down_read_trylock
    +mmap_read_trylock
    |
    -up_read
    +mmap_read_unlock
    )
    -(&mm->mmap_sem)
    +(mm)

    Signed-off-by: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Reviewed-by: Daniel Jordan
    Reviewed-by: Laurent Dufour
    Reviewed-by: Vlastimil Babka
    Cc: Davidlohr Bueso
    Cc: David Rientjes
    Cc: Hugh Dickins
    Cc: Jason Gunthorpe
    Cc: Jerome Glisse
    Cc: John Hubbard
    Cc: Liam Howlett
    Cc: Matthew Wilcox
    Cc: Peter Zijlstra
    Cc: Ying Han
    Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.com
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     

07 Apr, 2020

1 commit


29 Jan, 2020

1 commit

  • Commit d3eeb1d77c5d ("xen/gntdev: use mmu_interval_notifier_insert")
    missed a test for use_ptemod when calling mmu_interval_read_begin(). Fix
    that.

    Fixes: d3eeb1d77c5d ("xen/gntdev: use mmu_interval_notifier_insert")
    CC: stable@vger.kernel.org # 5.5
    Reported-by: Ilpo Järvinen
    Tested-by: Ilpo Järvinen
    Signed-off-by: Boris Ostrovsky
    Reviewed-by: Jason Gunthorpe
    Acked-by: Juergen Gross

    Boris Ostrovsky
     

02 Dec, 2019

3 commits

  • With sufficient many pages to map gntdev can reach order 9 allocation
    sizes. As there is no need to have physically contiguous buffers switch
    to kvcalloc() in order to avoid failing allocations.

    Signed-off-by: Juergen Gross
    Reviewed-by: Oleksandr Andrushchenko
    Reviewed-by: Boris Ostrovsky
    Signed-off-by: Juergen Gross

    Juergen Gross
     
  • Today there is a global limit of pages mapped via /dev/xen/gntdev set
    to 1 million pages per default. There is no reason why that limit is
    existing, as total number of grant mappings is limited by the
    hypervisor anyway and preferring kernel mappings over userspace ones
    doesn't make sense. It should be noted that the gntdev device is
    usable by root only.

    Additionally checking of that limit is fragile, as the number of pages
    to map via one call is specified in a 32-bit unsigned variable which
    isn't tested to stay within reasonable limits (the only test is the
    value to be
    Reviewed-by: Oleksandr Andrushchenko
    Reviewed-by: Boris Ostrovsky
    Signed-off-by: Juergen Gross

    Juergen Gross
     
  • The non-zero check on ret is always going to be false because
    ret was initialized as zero and the only place it is set to
    non-zero contains a return path before the non-zero check. Hence
    the check is redundant and can be removed.

    [ jgross@suse.com: limit scope of ret ]

    Addresses-Coverity: ("Logically dead code")
    Signed-off-by: Colin Ian King
    Reviewed-by: Juergen Gross
    Signed-off-by: Juergen Gross

    Colin Ian King
     

24 Nov, 2019

1 commit

  • gntdev simply wants to monitor a specific VMA for any notifier events,
    this can be done straightforwardly using mmu_interval_notifier_insert()
    over the VMA's VA range.

    The notifier should be attached until the original VMA is destroyed.

    It is unclear if any of this is even sane, but at least a lot of duplicate
    code is removed.

    Link: https://lore.kernel.org/r/20191112202231.3856-15-jgg@ziepe.ca
    Reviewed-by: Boris Ostrovsky
    Signed-off-by: Jason Gunthorpe

    Jason Gunthorpe
     

10 Oct, 2019

1 commit

  • As the removed comments say, these aren't DT based devices.
    of_dma_configure() is going to stop allowing a NULL DT node and calling
    it will no longer work.

    The comment is also now out of date as of commit 9ab91e7c5c51 ("arm64:
    default to the direct mapping in get_arch_dma_ops"). Direct mapping
    is now the default rather than dma_dummy_ops.

    According to Stefano and Oleksandr, the only other part needed is
    setting the DMA masks and there's no reason to restrict the masks to
    32-bits. So set the masks to 64 bits.

    Cc: Robin Murphy
    Cc: Julien Grall
    Cc: Nicolas Saenz Julienne
    Cc: Oleksandr Andrushchenko
    Cc: Boris Ostrovsky
    Cc: Juergen Gross
    Cc: Stefano Stabellini
    Cc: Christoph Hellwig
    Cc: xen-devel@lists.xenproject.org
    Signed-off-by: Rob Herring
    Acked-by: Oleksandr Andrushchenko
    Reviewed-by: Boris Ostrovsky
    Signed-off-by: Boris Ostrovsky

    Rob Herring
     

31 Jul, 2019

1 commit

  • 'commit df9bde015a72 ("xen/gntdev.c: convert to use vm_map_pages()")'
    breaks gntdev driver. If vma->vm_pgoff > 0, vm_map_pages()
    will:
    - use map->pages starting at vma->vm_pgoff instead of 0
    - verify map->count against vma_pages()+vma->vm_pgoff instead of just
    vma_pages().

    In practice, this breaks using a single gntdev FD for mapping multiple
    grants.

    relevant strace output:
    [pid 857] ioctl(7, IOCTL_GNTDEV_MAP_GRANT_REF, 0x7ffd3407b6d0) = 0
    [pid 857] mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, 7, 0) =
    0x777f1211b000
    [pid 857] ioctl(7, IOCTL_GNTDEV_SET_UNMAP_NOTIFY, 0x7ffd3407b710) = 0
    [pid 857] ioctl(7, IOCTL_GNTDEV_MAP_GRANT_REF, 0x7ffd3407b6d0) = 0
    [pid 857] mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, 7,
    0x1000) = -1 ENXIO (No such device or address)

    details here:
    https://github.com/QubesOS/qubes-issues/issues/5199

    The reason is -> ( copying Marek's word from discussion)

    vma->vm_pgoff is used as index passed to gntdev_find_map_index. It's
    basically using this parameter for "which grant reference to map".
    map struct returned by gntdev_find_map_index() describes just the pages
    to be mapped. Specifically map->pages[0] should be mapped at
    vma->vm_start, not vma->vm_start+vma->vm_pgoff*PAGE_SIZE.

    When trying to map grant with index (aka vma->vm_pgoff) > 1,
    __vm_map_pages() will refuse to map it because it will expect map->count
    to be at least vma_pages(vma)+vma->vm_pgoff, while it is exactly
    vma_pages(vma).

    Converting vm_map_pages() to use vm_map_pages_zero() will fix the
    problem.

    Marek has tested and confirmed the same.

    Cc: stable@vger.kernel.org # v5.2+
    Fixes: df9bde015a72 ("xen/gntdev.c: convert to use vm_map_pages()")

    Reported-by: Marek Marczykowski-Górecki
    Signed-off-by: Souptick Joarder
    Tested-by: Marek Marczykowski-Górecki
    Reviewed-by: Boris Ostrovsky
    Signed-off-by: Juergen Gross

    Souptick Joarder
     

13 Jul, 2019

1 commit

  • Drop the pgtable_t variable from all implementation for pte_fn_t as none
    of them use it. apply_to_pte_range() should stop computing it as well.
    Should help us save some cycles.

    Link: http://lkml.kernel.org/r/1556803126-26596-1-git-send-email-anshuman.khandual@arm.com
    Signed-off-by: Anshuman Khandual
    Acked-by: Matthew Wilcox
    Cc: Ard Biesheuvel
    Cc: Russell King
    Cc: Catalin Marinas
    Cc: Will Deacon
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Michal Hocko
    Cc: Logan Gunthorpe
    Cc: "Kirill A. Shutemov"
    Cc: Dan Williams
    Cc:
    Cc: Mike Rapoport
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Anshuman Khandual
     

15 May, 2019

3 commits

  • Convert to use vm_map_pages() to map range of kernel memory to user vma.

    map->count is passed to vm_map_pages() and internal API verify map->count
    against count ( count = vma_pages(vma)) for page array boundary overrun
    condition.

    Link: http://lkml.kernel.org/r/88e56e82d2db98705c2d842e9c9806c00b366d67.1552921225.git.jrdr.linux@gmail.com
    Signed-off-by: Souptick Joarder
    Reviewed-by: Boris Ostrovsky
    Cc: David Airlie
    Cc: Heiko Stuebner
    Cc: Joerg Roedel
    Cc: Joonsoo Kim
    Cc: Juergen Gross
    Cc: Kees Cook
    Cc: "Kirill A. Shutemov"
    Cc: Kyungmin Park
    Cc: Marek Szyprowski
    Cc: Matthew Wilcox
    Cc: Mauro Carvalho Chehab
    Cc: Michal Hocko
    Cc: Mike Rapoport
    Cc: Oleksandr Andrushchenko
    Cc: Pawel Osciak
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Robin Murphy
    Cc: Russell King
    Cc: Sandy Huang
    Cc: Stefan Richter
    Cc: Stephen Rothwell
    Cc: Thierry Reding
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Souptick Joarder
     
  • Use the mmu_notifier_range_blockable() helper function instead of directly
    dereferencing the range->blockable field. This is done to make it easier
    to change the mmu_notifier range field.

    This patch is the outcome of the following coccinelle patch:

    %blockable
    +mmu_notifier_range_blockable(I1)
    ...>
    }
    ------------------------------------------------------------------->%

    spatch --in-place --sp-file blockable.spatch --dir .

    Link: http://lkml.kernel.org/r/20190326164747.24405-3-jglisse@redhat.com
    Signed-off-by: Jérôme Glisse
    Reviewed-by: Ralph Campbell
    Reviewed-by: Ira Weiny
    Cc: Christian König
    Cc: Joonas Lahtinen
    Cc: Jani Nikula
    Cc: Rodrigo Vivi
    Cc: Jan Kara
    Cc: Andrea Arcangeli
    Cc: Peter Xu
    Cc: Felix Kuehling
    Cc: Jason Gunthorpe
    Cc: Ross Zwisler
    Cc: Dan Williams
    Cc: Paolo Bonzini
    Cc: Radim Krcmar
    Cc: Michal Hocko
    Cc: Christian Koenig
    Cc: John Hubbard
    Cc: Arnd Bergmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jérôme Glisse
     
  • To facilitate additional options to get_user_pages_fast() change the
    singular write parameter to be gup_flags.

    This patch does not change any functionality. New functionality will
    follow in subsequent patches.

    Some of the get_user_pages_fast() call sites were unchanged because they
    already passed FOLL_WRITE or 0 for the write parameter.

    NOTE: It was suggested to change the ordering of the get_user_pages_fast()
    arguments to ensure that callers were converted. This breaks the current
    GUP call site convention of having the returned pages be the final
    parameter. So the suggestion was rejected.

    Link: http://lkml.kernel.org/r/20190328084422.29911-4-ira.weiny@intel.com
    Link: http://lkml.kernel.org/r/20190317183438.2057-4-ira.weiny@intel.com
    Signed-off-by: Ira Weiny
    Reviewed-by: Mike Marshall
    Cc: Aneesh Kumar K.V
    Cc: Benjamin Herrenschmidt
    Cc: Borislav Petkov
    Cc: Dan Williams
    Cc: "David S. Miller"
    Cc: Heiko Carstens
    Cc: Ingo Molnar
    Cc: James Hogan
    Cc: Jason Gunthorpe
    Cc: John Hubbard
    Cc: "Kirill A. Shutemov"
    Cc: Martin Schwidefsky
    Cc: Michal Hocko
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Ralf Baechle
    Cc: Rich Felker
    Cc: Thomas Gleixner
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ira Weiny
     

18 Feb, 2019

1 commit

  • If there are exported DMA buffers which are still in use and
    grant device is closed by either normal user-space close or by
    a signal this leads to the grant device context to be destroyed,
    thus making it not possible to correctly destroy those exported
    buffers when they are returned back to gntdev and makes the module
    crash:

    [ 339.617540] [] dmabuf_exp_ops_release+0x40/0xa8
    [ 339.617560] [] dma_buf_release+0x60/0x190
    [ 339.617577] [] __fput+0x88/0x1d0
    [ 339.617589] [] ____fput+0xc/0x18
    [ 339.617607] [] task_work_run+0x9c/0xc0
    [ 339.617622] [] do_notify_resume+0xfc/0x108

    Fix this by referencing gntdev on each DMA buffer export and
    unreferencing on buffer release.

    Signed-off-by: Oleksandr Andrushchenko
    Reviewed-by: Boris Ostrovsky@oracle.com>
    Signed-off-by: Juergen Gross

    Oleksandr Andrushchenko
     

29 Dec, 2018

1 commit

  • Patch series "mmu notifier contextual informations", v2.

    This patchset adds contextual information, why an invalidation is
    happening, to mmu notifier callback. This is necessary for user of mmu
    notifier that wish to maintains their own data structure without having to
    add new fields to struct vm_area_struct (vma).

    For instance device can have they own page table that mirror the process
    address space. When a vma is unmap (munmap() syscall) the device driver
    can free the device page table for the range.

    Today we do not have any information on why a mmu notifier call back is
    happening and thus device driver have to assume that it is always an
    munmap(). This is inefficient at it means that it needs to re-allocate
    device page table on next page fault and rebuild the whole device driver
    data structure for the range.

    Other use case beside munmap() also exist, for instance it is pointless
    for device driver to invalidate the device page table when the
    invalidation is for the soft dirtyness tracking. Or device driver can
    optimize away mprotect() that change the page table permission access for
    the range.

    This patchset enables all this optimizations for device drivers. I do not
    include any of those in this series but another patchset I am posting will
    leverage this.

    The patchset is pretty simple from a code point of view. The first two
    patches consolidate all mmu notifier arguments into a struct so that it is
    easier to add/change arguments. The last patch adds the contextual
    information (munmap, protection, soft dirty, clear, ...).

    This patch (of 3):

    To avoid having to change many callback definition everytime we want to
    add a parameter use a structure to group all parameters for the
    mmu_notifier invalidate_range_start/end callback. No functional changes
    with this patch.

    [akpm@linux-foundation.org: fix drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c kerneldoc]
    Link: http://lkml.kernel.org/r/20181205053628.3210-2-jglisse@redhat.com
    Signed-off-by: Jérôme Glisse
    Acked-by: Jan Kara
    Acked-by: Jason Gunthorpe [infiniband]
    Cc: Matthew Wilcox
    Cc: Ross Zwisler
    Cc: Dan Williams
    Cc: Paolo Bonzini
    Cc: Radim Krcmar
    Cc: Michal Hocko
    Cc: Christian Koenig
    Cc: Felix Kuehling
    Cc: Ralph Campbell
    Cc: John Hubbard
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jérôme Glisse
     

14 Sep, 2018

1 commit

  • Patch series "mmu_notifiers follow ups".

    Tetsuo has noticed some fallouts from 93065ac753e4 ("mm, oom: distinguish
    blockable mode for mmu notifiers"). One of them has been fixed and picked
    up by AMD/DRM maintainer [1]. XEN issue is fixed by patch 1. I have also
    clarified expectations about blockable semantic of invalidate_range_end.
    Finally the last patch removes MMU_INVALIDATE_DOES_NOT_BLOCK which is no
    longer used nor needed.

    [1] http://lkml.kernel.org/r/20180824135257.GU29735@dhcp22.suse.cz

    This patch (of 3):

    93065ac753e4 ("mm, oom: distinguish blockable mode for mmu notifiers") has
    introduced blockable parameter to all mmu_notifiers and the notifier has
    to back off when called in !blockable case and it could block down the
    road.

    The above commit implemented that for mn_invl_range_start but both
    in_range checks are done unconditionally regardless of the blockable mode
    and as such they would fail all the time for regular calls. Fix this by
    checking blockable parameter as well.

    Once we are there we can remove the stale TODO. The lock has to be
    sleepable because we wait for completion down in gnttab_unmap_refs_sync.

    Link: http://lkml.kernel.org/r/20180827112623.8992-2-mhocko@kernel.org
    Fixes: 93065ac753e4 ("mm, oom: distinguish blockable mode for mmu notifiers")
    Signed-off-by: Michal Hocko
    Cc: Boris Ostrovsky
    Cc: Juergen Gross
    Cc: David Rientjes
    Cc: Jerome Glisse
    Cc: Tetsuo Handa
    Reviewed-by: Juergen Gross
    Signed-off-by: Boris Ostrovsky

    Michal Hocko
     

23 Aug, 2018

1 commit

  • There are several blockable mmu notifiers which might sleep in
    mmu_notifier_invalidate_range_start and that is a problem for the
    oom_reaper because it needs to guarantee a forward progress so it cannot
    depend on any sleepable locks.

    Currently we simply back off and mark an oom victim with blockable mmu
    notifiers as done after a short sleep. That can result in selecting a new
    oom victim prematurely because the previous one still hasn't torn its
    memory down yet.

    We can do much better though. Even if mmu notifiers use sleepable locks
    there is no reason to automatically assume those locks are held. Moreover
    majority of notifiers only care about a portion of the address space and
    there is absolutely zero reason to fail when we are unmapping an unrelated
    range. Many notifiers do really block and wait for HW which is harder to
    handle and we have to bail out though.

    This patch handles the low hanging fruit.
    __mmu_notifier_invalidate_range_start gets a blockable flag and callbacks
    are not allowed to sleep if the flag is set to false. This is achieved by
    using trylock instead of the sleepable lock for most callbacks and
    continue as long as we do not block down the call chain.

    I think we can improve that even further because there is a common pattern
    to do a range lookup first and then do something about that. The first
    part can be done without a sleeping lock in most cases AFAICS.

    The oom_reaper end then simply retries if there is at least one notifier
    which couldn't make any progress in !blockable mode. A retry loop is
    already implemented to wait for the mmap_sem and this is basically the
    same thing.

    The simplest way for driver developers to test this code path is to wrap
    userspace code which uses these notifiers into a memcg and set the hard
    limit to hit the oom. This can be done e.g. after the test faults in all
    the mmu notifier managed memory and set the hard limit to something really
    small. Then we are looking for a proper process tear down.

    [akpm@linux-foundation.org: coding style fixes]
    [akpm@linux-foundation.org: minor code simplification]
    Link: http://lkml.kernel.org/r/20180716115058.5559-1-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Acked-by: Christian König # AMD notifiers
    Acked-by: Leon Romanovsky # mlx and umem_odp
    Reported-by: David Rientjes
    Cc: "David (ChunMing) Zhou"
    Cc: Paolo Bonzini
    Cc: Alex Deucher
    Cc: David Airlie
    Cc: Jani Nikula
    Cc: Joonas Lahtinen
    Cc: Rodrigo Vivi
    Cc: Doug Ledford
    Cc: Jason Gunthorpe
    Cc: Mike Marciniszyn
    Cc: Dennis Dalessandro
    Cc: Sudeep Dutt
    Cc: Ashutosh Dixit
    Cc: Dimitri Sivanich
    Cc: Boris Ostrovsky
    Cc: Juergen Gross
    Cc: "Jérôme Glisse"
    Cc: Andrea Arcangeli
    Cc: Felix Kuehling
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

27 Jul, 2018

3 commits

  • Add UAPI and IOCTLs for dma-buf grant device driver extension:
    the extension allows userspace processes and kernel modules to
    use Xen backed dma-buf implementation. With this extension grant
    references to the pages of an imported dma-buf can be exported
    for other domain use and grant references coming from a foreign
    domain can be converted into a local dma-buf for local export.
    Implement basic initialization and stubs for Xen DMA buffers'
    support.

    Signed-off-by: Oleksandr Andrushchenko
    Reviewed-by: Boris Ostrovsky
    Signed-off-by: Boris Ostrovsky

    Oleksandr Andrushchenko
     
  • This is in preparation for adding support of DMA buffer
    functionality: make map/unmap related code and structures, used
    privately by gntdev, ready for dma-buf extension, which will re-use
    these. Rename corresponding structures as those become non-private
    to gntdev now.

    Signed-off-by: Oleksandr Andrushchenko
    Reviewed-by: Boris Ostrovsky
    Signed-off-by: Boris Ostrovsky

    Oleksandr Andrushchenko
     
  • Allow mappings for DMA backed buffers if grant table module
    supports such: this extends grant device to not only map buffers
    made of balloon pages, but also from buffers allocated with
    dma_alloc_xxx.

    Signed-off-by: Oleksandr Andrushchenko
    Reviewed-by: Boris Ostrovsky
    Signed-off-by: Boris Ostrovsky

    Oleksandr Andrushchenko
     

10 Jan, 2018

2 commits

  • When cleaning up after a partially successful gntdev_mmap(), unmap the
    successfully mapped grant pages otherwise Xen will kill the domain if
    in debug mode (Attempt to implicitly unmap a granted PTE) or Linux will
    kill the process and emit "BUG: Bad page map in process" if Xen is in
    release mode.

    This is only needed when use_ptemod is true because gntdev_put_map()
    will unmap grant pages itself when use_ptemod is false.

    Signed-off-by: Ross Lagerwall
    Reviewed-by: Boris Ostrovsky
    Signed-off-by: Boris Ostrovsky

    Ross Lagerwall
     
  • If the requested range has a hole, the calculation of the number of
    pages to unmap is off by one. Fix it.

    Signed-off-by: Ross Lagerwall
    Reviewed-by: Boris Ostrovsky
    Signed-off-by: Boris Ostrovsky

    Ross Lagerwall
     

26 Oct, 2017

1 commit

  • In case gntdev_mmap() succeeds only partially in mapping grant pages
    it will leave some vital information uninitialized needed later for
    cleanup. This will lead to an out of bounds array access when unmapping
    the already mapped pages.

    So just initialize the data needed for unmapping the pages a little bit
    earlier.

    Cc:
    Reported-by: Arthur Borsboom
    Signed-off-by: Juergen Gross
    Reviewed-by: Boris Ostrovsky
    Signed-off-by: Boris Ostrovsky

    Juergen Gross
     

01 Sep, 2017

1 commit

  • Calls to mmu_notifier_invalidate_page() were replaced by calls to
    mmu_notifier_invalidate_range() and are now bracketed by calls to
    mmu_notifier_invalidate_range_start()/end()

    Remove now useless invalidate_page callback.

    Signed-off-by: Jérôme Glisse
    Reviewed-by: Boris Ostrovsky
    Cc: Konrad Rzeszutek Wilk
    Cc: Roger Pau Monné
    Cc: xen-devel@lists.xenproject.org (moderated for non-subscribers)
    Cc: Kirill A. Shutemov
    Cc: Andrew Morton
    Cc: Andrea Arcangeli
    Signed-off-by: Linus Torvalds

    Jérôme Glisse
     

14 Mar, 2017

1 commit

  • refcount_t type and corresponding API should be
    used instead of atomic_t when the variable is used as
    a reference counter. This allows to avoid accidental
    refcounter overflows that might lead to use-after-free
    situations.

    Signed-off-by: Elena Reshetova
    Signed-off-by: Hans Liljestrand
    Signed-off-by: Kees Cook
    Signed-off-by: David Windsor
    Signed-off-by: Boris Ostrovsky

    Elena Reshetova
     

02 Mar, 2017

1 commit

  • We are going to split out of , which
    will have to be picked up from other headers and a couple of .c files.

    Create a trivial placeholder file that just
    maps to to make this patch obviously correct and
    bisectable.

    The APIs that are going to be moved first are:

    mm_alloc()
    __mmdrop()
    mmdrop()
    mmdrop_async_fn()
    mmdrop_async()
    mmget_not_zero()
    mmput()
    mmput_async()
    get_task_mm()
    mm_access()
    mm_release()

    Include the new header in the files that are going to need it.

    Acked-by: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

28 Nov, 2016

1 commit

  • Commit 9c17d96500f7 ("xen/gntdev: Grant maps should not be subject to
    NUMA balancing") set VM_IO flag to prevent grant maps from being
    subjected to NUMA balancing.

    It was discovered recently that this flag causes get_user_pages() to
    always fail with -EFAULT.

    check_vma_flags
    __get_user_pages
    __get_user_pages_locked
    __get_user_pages_unlocked
    get_user_pages_fast
    iov_iter_get_pages
    dio_refill_pages
    do_direct_IO
    do_blockdev_direct_IO
    do_blockdev_direct_IO
    ext4_direct_IO_read
    generic_file_read_iter
    aio_run_iocb

    (which can happen if guest's vdisk has direct-io-safe option).

    To avoid this let's use VM_MIXEDMAP flag instead --- it prevents
    NUMA balancing just as VM_IO does and has no effect on
    check_vma_flags().

    Cc: stable@vger.kernel.org

    Reported-by: Olaf Hering
    Suggested-by: Hugh Dickins
    Signed-off-by: Boris Ostrovsky
    Acked-by: Hugh Dickins
    Tested-by: Olaf Hering
    Signed-off-by: Juergen Gross

    Boris Ostrovsky
     

06 Jul, 2016

1 commit


24 May, 2016

1 commit

  • IOCTL_GNTDEV_GRANT_COPY batches copy operations to reduce the number
    of hypercalls. The stack is used to avoid a memory allocation in a
    hot path. However, a batch size of 24 requires more than 1024 bytes of
    stack which in some configurations causes a compiler warning.

    xen/gntdev.c: In function ‘gntdev_ioctl_grant_copy’:
    xen/gntdev.c:949:1: warning: the frame size of 1248 bytes is
    larger than 1024 bytes [-Wframe-larger-than=]

    This is a harmless warning as there is still plenty of stack spare,
    but people keep trying to "fix" it. Reduce the batch size to 16 to
    reduce stack usage to less than 1024 bytes. This should have minimal
    impact on performance.

    Signed-off-by: David Vrabel

    David Vrabel
     

07 Jan, 2016

1 commit

  • Add IOCTL_GNTDEV_GRANT_COPY to allow applications to copy between user
    space buffers and grant references.

    This interface is similar to the GNTTABOP_copy hypercall ABI except
    the local buffers are provided using a virtual address (instead of a
    GFN and offset). To avoid userspace from having to page align its
    buffers the driver will use two or more ops if required.

    If the ioctl returns 0, the application must check the status of each
    segment with the segments status field. If the ioctl returns a -ve
    error code (EINVAL or EFAULT), the status of individual ops is
    undefined.

    Signed-off-by: David Vrabel
    Reviewed-by: Boris Ostrovsky

    David Vrabel
     

21 Dec, 2015

1 commit


27 Nov, 2015

1 commit

  • Doing so will cause the grant to be unmapped and then, during
    fault handling, the fault to be mistakenly treated as NUMA hint
    fault.

    In addition, even if those maps could partcipate in NUMA
    balancing, it wouldn't provide any benefit since we are unable
    to determine physical page's node (even if/when VNUMA is
    implemented).

    Marking grant maps' VMAs as VM_IO will exclude them from being
    part of NUMA balancing.

    Signed-off-by: Boris Ostrovsky
    Cc: stable@vger.kernel.org
    Signed-off-by: David Vrabel

    Boris Ostrovsky
     

11 Sep, 2015

1 commit

  • With two exceptions (drm/qxl and drm/radeon) all vm_operations_struct
    structs should be constant.

    Signed-off-by: Kirill A. Shutemov
    Reviewed-by: Oleg Nesterov
    Cc: "H. Peter Anvin"
    Cc: Andy Lutomirski
    Cc: Dave Hansen
    Cc: Ingo Molnar
    Cc: Minchan Kim
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

30 Jun, 2015

1 commit

  • While gntdev_release() is called the MMU notifier is still registered
    and can traverse priv->maps list even if no pages are mapped (which is
    the case -- gntdev_release() is called after all). But
    gntdev_release() will clear that list, so make sure that only one of
    those things happens at the same time.

    Signed-off-by: Marek Marczykowski-Górecki
    Cc:
    Signed-off-by: David Vrabel

    Marek Marczykowski-Górecki
     

17 Jun, 2015

1 commit

  • Using xen/page.h will be necessary later for using common xen page
    helpers.

    As xen/page.h already include asm/xen/page.h, always use the later.

    Signed-off-by: Julien Grall
    Reviewed-by: David Vrabel
    Cc: Stefano Stabellini
    Cc: Ian Campbell
    Cc: Wei Liu
    Cc: Konrad Rzeszutek Wilk
    Cc: Boris Ostrovsky
    Cc: netdev@vger.kernel.org
    Signed-off-by: David Vrabel

    Julien Grall
     

27 Apr, 2015

1 commit


28 Jan, 2015

1 commit