16 Dec, 2014

4 commits

  • * Extend the umem struct to keep the ODP related data.
    * Allocate and initialize the ODP related information in the umem
    (page_list, dma_list) and freeing as needed in the end of the run.
    * Store a reference to the process PID struct in the ucontext. Used to
    safely obtain the task_struct and the mm during fault handling,
    without preventing the task destruction if needed.
    * Add 2 helper functions: ib_umem_odp_map_dma_pages and
    ib_umem_odp_unmap_dma_pages. These functions get the DMA addresses
    of specific pages of the umem (and, currently, pin them).
    * Support for page faults only - IB core will keep the reference on
    the pages used and call put_page when freeing an ODP umem
    area. Invalidations support will be added in a later patch.

    Signed-off-by: Sagi Grimberg
    Signed-off-by: Shachar Raindel
    Signed-off-by: Haggai Eran
    Signed-off-by: Majd Dibbiny
    Signed-off-by: Roland Dreier

    Shachar Raindel
     
  • Add a helper function mlx5_ib_read_user_wqe to read information from
    user-space owned work queues. The function will be used in a later
    patch by the page-fault handling code in mlx5_ib.

    Signed-off-by: Haggai Eran

    [ Add stub for ib_umem_copy_from() for CONFIG_INFINIBAND_USER_MEM=n
    - Roland ]

    Signed-off-by: Roland Dreier

    Haggai Eran
     
  • In some drivers there's a need to read data from a user space area
    that was pinned using ib_umem when running from a different process
    context.

    The ib_umem_copy_from function allows reading data from the physical
    pages pinned in the ib_umem struct.

    Signed-off-by: Haggai Eran
    Signed-off-by: Roland Dreier

    Haggai Eran
     
  • In order to allow umems that do not pin memory, we need the umem to
    keep track of its region's address.

    This makes the offset field redundant, and so this patch removes it.

    Signed-off-by: Haggai Eran
    Signed-off-by: Roland Dreier

    Haggai Eran
     

20 Sep, 2014

1 commit

  • In debugging an application that receives -ENOMEM from ib_reg_mr(), I
    found that ib_umem_get() can fail because the pinned_vm count has
    wrapped causing it to always be larger than the lock limit even with
    RLIMIT_MEMLOCK set to RLIM_INFINITY.

    The wrapping of pinned_vm occurs because the process that calls
    ib_reg_mr() will have its mm->pinned_vm count incremented. Later a
    different process with a different mm_struct than the one that
    allocated the ib_umem struct ends up releasing it which results in
    decrementing the new processes mm->pinned_vm count past zero and
    wrapping.

    I'm not entirely sure what circumstances cause a different process to
    release the ib_umem than the one that allocated it but the kernel
    stack trace of the freeing process from my situation looks like the
    following:

    Call Trace:
    [] dump_stack+0x19/0x1b
    [] ib_umem_release+0x1f5/0x200 [ib_core]
    [] mlx4_ib_destroy_qp+0x241/0x440 [mlx4_ib]
    [] ib_destroy_qp+0x12c/0x170 [ib_core]
    [] ib_uverbs_close+0x259/0x4e0 [ib_uverbs]
    [] __fput+0xba/0x240
    [] ____fput+0xe/0x10
    [] task_work_run+0xc4/0xe0
    [] do_notify_resume+0x95/0xa0
    [] int_signal+0x12/0x17

    The following patch fixes the issue by storing the pid struct of the
    process that calls ib_umem_get() so that ib_umem_release and/or
    ib_umem_account() can properly decrement the pinned_vm count of the
    correct mm_struct.

    Signed-off-by: Shawn Bohrer
    Reviewed-by: Shachar Raindel
    Signed-off-by: Roland Dreier

    Shawn Bohrer
     

05 Mar, 2014

1 commit

  • This patch refactors the IB core umem code and vendor drivers to use a
    linear (chained) SG table instead of chunk list. With this change the
    relevant code becomes clearer—no need for nested loops to build and
    use umem.

    Signed-off-by: Shachar Raindel
    Signed-off-by: Yishai Hadas
    Signed-off-by: Roland Dreier

    Yishai Hadas
     

29 Apr, 2008

1 commit

  • Add a new parameter, dmasync, to the ib_umem_get() prototype. Use dmasync = 1
    when mapping user-allocated CQs with ib_umem_get().

    Signed-off-by: Arthur Kepner
    Cc: Tony Luck
    Cc: Jesse Barnes
    Cc: Jes Sorensen
    Cc: Randy Dunlap
    Cc: Roland Dreier
    Cc: James Bottomley
    Cc: David Miller
    Cc: Benjamin Herrenschmidt
    Cc: Grant Grundler
    Cc: Michael Ellerman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arthur Kepner
     

10 Oct, 2007

1 commit


22 May, 2007

1 commit

  • First thing mm.h does is including sched.h solely for can_do_mlock() inline
    function which has "current" dereference inside. By dealing with can_do_mlock()
    mm.h can be detached from sched.h which is good. See below, why.

    This patch
    a) removes unconditional inclusion of sched.h from mm.h
    b) makes can_do_mlock() normal function in mm/mlock.c
    c) exports can_do_mlock() to not break compilation
    d) adds sched.h inclusions back to files that were getting it indirectly.
    e) adds less bloated headers to some files (asm/signal.h, jiffies.h) that were
    getting them indirectly

    Net result is:
    a) mm.h users would get less code to open, read, preprocess, parse, ... if
    they don't need sched.h
    b) sched.h stops being dependency for significant number of files:
    on x86_64 allmodconfig touching sched.h results in recompile of 4083 files,
    after patch it's only 3744 (-8.3%).

    Cross-compile tested on

    all arm defconfigs, all mips defconfigs, all powerpc defconfigs,
    alpha alpha-up
    arm
    i386 i386-up i386-defconfig i386-allnoconfig
    ia64 ia64-up
    m68k
    mips
    parisc parisc-up
    powerpc powerpc-up
    s390 s390-up
    sparc sparc-up
    sparc64 sparc64-up
    um-x86_64
    x86_64 x86_64-up x86_64-defconfig x86_64-allnoconfig

    as well as my two usual configs.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

09 May, 2007

2 commits

  • When memory pinned with ib_umem_get() is released, ib_umem_release()
    needs to subtract the amount of memory being unpinned from
    mm->locked_vm. However, ib_umem_release() may be called with
    mm->mmap_sem already held for writing if the memory is being released
    as part of an munmap() call, so it is sometimes necessary to defer
    this accounting into a workqueue.

    However, the work struct used to defer this accounting is dynamically
    allocated before it is queued, so there is the possibility of failing
    that allocation. If the allocation fails, then ib_umem_release has no
    choice except to bail out and leave the process with a permanently
    elevated locked_vm.

    Fix this by allocating the structure to defer accounting as part of
    the original struct ib_umem, so there's no possibility of failing a
    later allocation if creating the struct ib_umem and pinning memory
    succeeds.

    Signed-off-by: Roland Dreier

    Roland Dreier
     
  • Export ib_umem_get()/ib_umem_release() and put low-level drivers in
    control of when to call ib_umem_get() to pin and DMA map userspace,
    rather than always calling it in ib_uverbs_reg_mr() before calling the
    low-level driver's reg_user_mr method.

    Also move these functions to be in the ib_core module instead of
    ib_uverbs, so that driver modules using them do not depend on
    ib_uverbs.

    This has a number of advantages:
    - It is better design from the standpoint of making generic code a
    library that can be used or overridden by device-specific code as
    the details of specific devices dictate.
    - Drivers that do not need to pin userspace memory regions do not
    need to take the performance hit of calling ib_mem_get(). For
    example, although I have not tried to implement it in this patch,
    the ipath driver should be able to avoid pinning memory and just
    use copy_{to,from}_user() to access userspace memory regions.
    - Buffers that need special mapping treatment can be identified by
    the low-level driver. For example, it may be possible to solve
    some Altix-specific memory ordering issues with mthca CQs in
    userspace by mapping CQ buffers with extra flags.
    - Drivers that need to pin and DMA map userspace memory for things
    other than memory regions can use ib_umem_get() directly, instead
    of hacks using extra parameters to their reg_phys_mr method. For
    example, the mlx4 driver that is pending being merged needs to pin
    and DMA map QP and CQ buffers, but it does not need to create a
    memory key for these buffers. So the cleanest solution is for mlx4
    to call ib_umem_get() in the create_qp and create_cq methods.

    Signed-off-by: Roland Dreier

    Roland Dreier