11 Sep, 2020

2 commits

  • For the calls linked to mlx4_ib_umem_calc_optimal_mtt_size() use
    ib_umem_num_dma_blocks() inside the function, it is just some weird static
    default.

    All other places are just using it with PAGE_SIZE, switch to
    ib_umem_num_dma_blocks().

    As this is the last call site, remove ib_umem_num_count().

    Link: https://lore.kernel.org/r/15-v2-270386b7e60b+28f4-umem_1_jgg@nvidia.com
    Signed-off-by: Jason Gunthorpe

    Jason Gunthorpe
     
  • ib_umem_num_pages() should only be used by things working with the SGL in
    CPU pages directly.

    Drivers building DMA lists should use the new ib_num_dma_blocks() which
    returns the number of blocks rdma_umem_for_each_block() will return.

    To make this general for DMA drivers requires a different implementation.
    Computing DMA block count based on umem->address only works if the
    requested page size is < PAGE_SIZE and/or the IOVA == umem->address.

    Instead the number of DMA pages should be computed in the IOVA address
    space, not umem->address. Thus the IOVA has to be stored inside the umem
    so it can be used for these calculations.

    For now set it to umem->address by default and fix it up if
    ib_umem_find_best_pgsz() was called. This allows drivers to be converted
    to ib_umem_num_dma_blocks() safely.

    Link: https://lore.kernel.org/r/6-v2-270386b7e60b+28f4-umem_1_jgg@nvidia.com
    Signed-off-by: Jason Gunthorpe

    Jason Gunthorpe
     

10 Sep, 2020

1 commit


31 Aug, 2020

1 commit

  • The original function returns unsigned long and 0 on failure.

    Fixes: 4a35339958f1 ("RDMA/umem: Add API to find best driver supported page size in an MR")
    Link: https://lore.kernel.org/r/0-v1-982a13cc5c6d+501ae-fix_best_pgsz_stub_jgg@nvidia.com
    Reviewed-by: Gal Pressman
    Acked-by: Shiraz Saleem
    Signed-off-by: Jason Gunthorpe

    Jason Gunthorpe
     

30 Jul, 2020

1 commit

  • The header files in RDMA subsystem are dual licensed and can be
    described by simple SPDX tag, so replace all of them at once
    together with making them use the same coding style for header
    guard defines.

    Link: https://lore.kernel.org/r/20200719072521.135260-1-leon@kernel.org
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe

    Leon Romanovsky
     

16 Jan, 2020

1 commit

  • So far the assumption was that ib_umem_get() and ib_umem_odp_get()
    are called from flows that start in UVERBS and therefore has a user
    context. This assumption restricts flows that are initiated by ULPs
    and need the service that ib_umem_get() provides.

    This patch changes ib_umem_get() and ib_umem_odp_get() to get IB device
    directly by relying on the fact that both UVERBS and ULPs sets that
    field correctly.

    Reviewed-by: Guy Levi
    Signed-off-by: Moni Shoua
    Signed-off-by: Leon Romanovsky

    Moni Shoua
     

17 Nov, 2019

1 commit


22 Aug, 2019

1 commit

  • At this point the ucontext is only being stored to access the ib_device,
    so just store the ib_device directly instead. This is more natural and
    logical as the umem has nothing to do with the ucontext.

    Link: https://lore.kernel.org/r/20190806231548.25242-8-jgg@ziepe.ca
    Signed-off-by: Jason Gunthorpe

    Jason Gunthorpe
     

22 May, 2019

1 commit

  • This value has always been set to PAGE_SHIFT in the core code, the only
    thing that does differently was the ODP path. Move the value into the ODP
    struct and still use it for ODP, but change all the non-ODP things to just
    use PAGE_SHIFT/PAGE_SIZE/PAGE_MASK directly.

    Reviewed-by: Shiraz Saleem
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Leon Romanovsky

    Jason Gunthorpe
     

07 May, 2019

2 commits


09 Apr, 2019

1 commit

  • Combine contiguous regions of PAGE_SIZE pages into single scatter list
    entry while building the scatter table for a umem. This minimizes the
    number of the entries in the scatter list and reduces the DMA mapping
    overhead, particularly with the IOMMU.

    Set default max_seg_size in core for IB devices to 2G and do not combine
    if we exceed this limit.

    Also, purge npages in struct ib_umem as we now DMA map the umem SGL with
    sg_nents and npage computation is not needed. Drivers should now be using
    ib_umem_num_pages(), so fix the last stragglers.

    Move npages tracking to ib_umem_odp as ODP drivers still need it.

    Suggested-by: Jason Gunthorpe
    Reviewed-by: Michael J. Ruhl
    Reviewed-by: Ira Weiny
    Acked-by: Adit Ranadive
    Signed-off-by: Shiraz Saleem
    Tested-by: Gal Pressman
    Tested-by: Selvin Xavier
    Signed-off-by: Jason Gunthorpe

    Shiraz Saleem
     

11 Jan, 2019

1 commit


21 Sep, 2018

2 commits

  • This no longer has any use, we can use container_of to get to the
    umem_odp, and a simple flag to indicate if this is an odp MR. Remove the
    few remaining references to it.

    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Doug Ledford

    Jason Gunthorpe
     
  • This is just wrong, the process that calls into the reg_mr is the process
    associated with the umem, and that does not have to be the same process
    that created the context.

    When this code was first written mmgrab() didn't exist, however these days
    we can just directly hold the mm_struct pointer in the umem and have no
    ambiguity when it comes to releasing the umem as to which mm it was
    associated with.

    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Doug Ledford

    Jason Gunthorpe
     

16 May, 2018

1 commit

  • User-space may invoke ibv_reg_mr and ibv_dereg_mr in different threads.

    If ibv_dereg_mr is called after the thread which invoked ibv_reg_mr has
    exited, get_pid_task will return NULL and ib_umem_release will not
    decrease mm->pinned_vm.

    Instead of using threads to locate the mm, use the overall tgid from the
    ib_ucontext struct instead. This matches the behavior of ODP and
    disassociate in handling the mm of the process that called ibv_reg_mr.

    Cc:
    Fixes: 87773dd56d54 ("IB: ib_umem_release() should decrement mm->pinned_vm from ib_umem_get")
    Signed-off-by: Lidong Chen
    Signed-off-by: Jason Gunthorpe

    Lidong Chen
     

26 Apr, 2017

2 commits

  • Currenlty ODP supports only regular MMU pages.
    Add ODP support for regions consisting of physically contiguous chunks
    of arbitrary order (huge pages for instance) to improve performance.

    Signed-off-by: Artemy Kovalyov
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Doug Ledford

    Artemy Kovalyov
     
  • Size of pages are held by struct ib_umem in page_size field.

    It is better to store it as an exponent, because page size by nature
    is always power-of-two and used as a factor, divisor or ilog2's argument.

    The conversion of page_size to be page_shift allows to have portable
    code and avoid following error while compiling on ARM:

    ERROR: "__aeabi_uldivmod" [drivers/infiniband/core/ib_core.ko] undefined!

    CC: Selvin Xavier
    CC: Steve Wise
    CC: Lijun Ou
    CC: Shiraz Saleem
    CC: Adit Ranadive
    CC: Dennis Dalessandro
    CC: Ram Amrani
    Signed-off-by: Artemy Kovalyov
    Signed-off-by: Leon Romanovsky
    Acked-by: Ram Amrani
    Acked-by: Shiraz Saleem
    Acked-by: Selvin Xavier
    Acked-by: Selvin Xavier
    Acked-by: Adit Ranadive
    Signed-off-by: Doug Ledford

    Artemy Kovalyov
     

16 Dec, 2014

4 commits

  • * Extend the umem struct to keep the ODP related data.
    * Allocate and initialize the ODP related information in the umem
    (page_list, dma_list) and freeing as needed in the end of the run.
    * Store a reference to the process PID struct in the ucontext. Used to
    safely obtain the task_struct and the mm during fault handling,
    without preventing the task destruction if needed.
    * Add 2 helper functions: ib_umem_odp_map_dma_pages and
    ib_umem_odp_unmap_dma_pages. These functions get the DMA addresses
    of specific pages of the umem (and, currently, pin them).
    * Support for page faults only - IB core will keep the reference on
    the pages used and call put_page when freeing an ODP umem
    area. Invalidations support will be added in a later patch.

    Signed-off-by: Sagi Grimberg
    Signed-off-by: Shachar Raindel
    Signed-off-by: Haggai Eran
    Signed-off-by: Majd Dibbiny
    Signed-off-by: Roland Dreier

    Shachar Raindel
     
  • Add a helper function mlx5_ib_read_user_wqe to read information from
    user-space owned work queues. The function will be used in a later
    patch by the page-fault handling code in mlx5_ib.

    Signed-off-by: Haggai Eran

    [ Add stub for ib_umem_copy_from() for CONFIG_INFINIBAND_USER_MEM=n
    - Roland ]

    Signed-off-by: Roland Dreier

    Haggai Eran
     
  • In some drivers there's a need to read data from a user space area
    that was pinned using ib_umem when running from a different process
    context.

    The ib_umem_copy_from function allows reading data from the physical
    pages pinned in the ib_umem struct.

    Signed-off-by: Haggai Eran
    Signed-off-by: Roland Dreier

    Haggai Eran
     
  • In order to allow umems that do not pin memory, we need the umem to
    keep track of its region's address.

    This makes the offset field redundant, and so this patch removes it.

    Signed-off-by: Haggai Eran
    Signed-off-by: Roland Dreier

    Haggai Eran
     

20 Sep, 2014

1 commit

  • In debugging an application that receives -ENOMEM from ib_reg_mr(), I
    found that ib_umem_get() can fail because the pinned_vm count has
    wrapped causing it to always be larger than the lock limit even with
    RLIMIT_MEMLOCK set to RLIM_INFINITY.

    The wrapping of pinned_vm occurs because the process that calls
    ib_reg_mr() will have its mm->pinned_vm count incremented. Later a
    different process with a different mm_struct than the one that
    allocated the ib_umem struct ends up releasing it which results in
    decrementing the new processes mm->pinned_vm count past zero and
    wrapping.

    I'm not entirely sure what circumstances cause a different process to
    release the ib_umem than the one that allocated it but the kernel
    stack trace of the freeing process from my situation looks like the
    following:

    Call Trace:
    [] dump_stack+0x19/0x1b
    [] ib_umem_release+0x1f5/0x200 [ib_core]
    [] mlx4_ib_destroy_qp+0x241/0x440 [mlx4_ib]
    [] ib_destroy_qp+0x12c/0x170 [ib_core]
    [] ib_uverbs_close+0x259/0x4e0 [ib_uverbs]
    [] __fput+0xba/0x240
    [] ____fput+0xe/0x10
    [] task_work_run+0xc4/0xe0
    [] do_notify_resume+0x95/0xa0
    [] int_signal+0x12/0x17

    The following patch fixes the issue by storing the pid struct of the
    process that calls ib_umem_get() so that ib_umem_release and/or
    ib_umem_account() can properly decrement the pinned_vm count of the
    correct mm_struct.

    Signed-off-by: Shawn Bohrer
    Reviewed-by: Shachar Raindel
    Signed-off-by: Roland Dreier

    Shawn Bohrer
     

05 Mar, 2014

1 commit

  • This patch refactors the IB core umem code and vendor drivers to use a
    linear (chained) SG table instead of chunk list. With this change the
    relevant code becomes clearer—no need for nested loops to build and
    use umem.

    Signed-off-by: Shachar Raindel
    Signed-off-by: Yishai Hadas
    Signed-off-by: Roland Dreier

    Yishai Hadas
     

29 Apr, 2008

1 commit

  • Add a new parameter, dmasync, to the ib_umem_get() prototype. Use dmasync = 1
    when mapping user-allocated CQs with ib_umem_get().

    Signed-off-by: Arthur Kepner
    Cc: Tony Luck
    Cc: Jesse Barnes
    Cc: Jes Sorensen
    Cc: Randy Dunlap
    Cc: Roland Dreier
    Cc: James Bottomley
    Cc: David Miller
    Cc: Benjamin Herrenschmidt
    Cc: Grant Grundler
    Cc: Michael Ellerman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arthur Kepner
     

10 Oct, 2007

1 commit


22 May, 2007

1 commit

  • First thing mm.h does is including sched.h solely for can_do_mlock() inline
    function which has "current" dereference inside. By dealing with can_do_mlock()
    mm.h can be detached from sched.h which is good. See below, why.

    This patch
    a) removes unconditional inclusion of sched.h from mm.h
    b) makes can_do_mlock() normal function in mm/mlock.c
    c) exports can_do_mlock() to not break compilation
    d) adds sched.h inclusions back to files that were getting it indirectly.
    e) adds less bloated headers to some files (asm/signal.h, jiffies.h) that were
    getting them indirectly

    Net result is:
    a) mm.h users would get less code to open, read, preprocess, parse, ... if
    they don't need sched.h
    b) sched.h stops being dependency for significant number of files:
    on x86_64 allmodconfig touching sched.h results in recompile of 4083 files,
    after patch it's only 3744 (-8.3%).

    Cross-compile tested on

    all arm defconfigs, all mips defconfigs, all powerpc defconfigs,
    alpha alpha-up
    arm
    i386 i386-up i386-defconfig i386-allnoconfig
    ia64 ia64-up
    m68k
    mips
    parisc parisc-up
    powerpc powerpc-up
    s390 s390-up
    sparc sparc-up
    sparc64 sparc64-up
    um-x86_64
    x86_64 x86_64-up x86_64-defconfig x86_64-allnoconfig

    as well as my two usual configs.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

09 May, 2007

2 commits

  • When memory pinned with ib_umem_get() is released, ib_umem_release()
    needs to subtract the amount of memory being unpinned from
    mm->locked_vm. However, ib_umem_release() may be called with
    mm->mmap_sem already held for writing if the memory is being released
    as part of an munmap() call, so it is sometimes necessary to defer
    this accounting into a workqueue.

    However, the work struct used to defer this accounting is dynamically
    allocated before it is queued, so there is the possibility of failing
    that allocation. If the allocation fails, then ib_umem_release has no
    choice except to bail out and leave the process with a permanently
    elevated locked_vm.

    Fix this by allocating the structure to defer accounting as part of
    the original struct ib_umem, so there's no possibility of failing a
    later allocation if creating the struct ib_umem and pinning memory
    succeeds.

    Signed-off-by: Roland Dreier

    Roland Dreier
     
  • Export ib_umem_get()/ib_umem_release() and put low-level drivers in
    control of when to call ib_umem_get() to pin and DMA map userspace,
    rather than always calling it in ib_uverbs_reg_mr() before calling the
    low-level driver's reg_user_mr method.

    Also move these functions to be in the ib_core module instead of
    ib_uverbs, so that driver modules using them do not depend on
    ib_uverbs.

    This has a number of advantages:
    - It is better design from the standpoint of making generic code a
    library that can be used or overridden by device-specific code as
    the details of specific devices dictate.
    - Drivers that do not need to pin userspace memory regions do not
    need to take the performance hit of calling ib_mem_get(). For
    example, although I have not tried to implement it in this patch,
    the ipath driver should be able to avoid pinning memory and just
    use copy_{to,from}_user() to access userspace memory regions.
    - Buffers that need special mapping treatment can be identified by
    the low-level driver. For example, it may be possible to solve
    some Altix-specific memory ordering issues with mthca CQs in
    userspace by mapping CQ buffers with extra flags.
    - Drivers that need to pin and DMA map userspace memory for things
    other than memory regions can use ib_umem_get() directly, instead
    of hacks using extra parameters to their reg_phys_mr method. For
    example, the mlx4 driver that is pending being merged needs to pin
    and DMA map QP and CQ buffers, but it does not need to create a
    memory key for these buffers. So the cleanest solution is for mlx4
    to call ib_umem_get() in the create_qp and create_cq methods.

    Signed-off-by: Roland Dreier

    Roland Dreier