13 Oct, 2018

1 commit

  • commit 5fe23f262e0548ca7f19fb79f89059a60d087d22 upstream.

    There is a race condition between ucma_close() and ucma_resolve_ip():

    CPU0 CPU1
    ucma_resolve_ip(): ucma_close():

    ctx = ucma_get_ctx(file, cmd.id);

    list_for_each_entry_safe(ctx, tmp, &file->ctx_list, list) {
    mutex_lock(&mut);
    idr_remove(&ctx_idr, ctx->id);
    mutex_unlock(&mut);
    ...
    mutex_lock(&mut);
    if (!ctx->closing) {
    mutex_unlock(&mut);
    rdma_destroy_id(ctx->cm_id);
    ...
    ucma_free_ctx(ctx);

    ret = rdma_resolve_addr();
    ucma_put_ctx(ctx);

    Before idr_remove(), ucma_get_ctx() could still find the ctx
    and after rdma_destroy_id(), rdma_resolve_addr() may still
    access id_priv pointer. Also, ucma_put_ctx() may use ctx after
    ucma_free_ctx() too.

    ucma_close() should call ucma_put_ctx() too which tests the
    refcnt and waits for the last one releasing it. The similar
    pattern is already used by ucma_destroy_id().

    Reported-and-tested-by: syzbot+da2591e115d57a9cbb8b@syzkaller.appspotmail.com
    Reported-by: syzbot+cfe3c1e8ef634ba8964b@syzkaller.appspotmail.com
    Cc: Jason Gunthorpe
    Cc: Doug Ledford
    Cc: Leon Romanovsky
    Signed-off-by: Cong Wang
    Reviewed-by: Leon Romanovsky
    Signed-off-by: Doug Ledford
    Signed-off-by: Greg Kroah-Hartman

    Cong Wang
     

10 Oct, 2018

1 commit

  • [ Upstream commit 0d23ba6034b9cf48b8918404367506da3e4b3ee5 ]

    The current code grabs the private_data of whatever file descriptor
    userspace has supplied and implicitly casts it to a `struct ucma_file *`,
    potentially causing a type confusion.

    This is probably fine in practice because the pointer is only used for
    comparisons, it is never actually dereferenced; and even in the
    comparisons, it is unlikely that a file from another filesystem would have
    a ->private_data pointer that happens to also be valid in this context.
    But ->private_data is not always guaranteed to be a valid pointer to an
    object owned by the file's filesystem; for example, some filesystems just
    cram numbers in there.

    Check the type of the supplied file descriptor to be safe, analogous to how
    other places in the kernel do it.

    Fixes: 88314e4dda1e ("RDMA/cma: add support for rdma_migrate_id()")
    Signed-off-by: Jann Horn
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Jann Horn
     

04 Oct, 2018

2 commits

  • commit 67e3816842fe6414d629c7515b955952ec40c7d7 upstream.

    Currently a uverbs completion event queue is flushed of events in
    ib_uverbs_comp_event_close() with the queue spinlock held and then
    released. Yet setting ev_queue->is_closed is not set until later in
    uverbs_hot_unplug_completion_event_file().

    In between the time ib_uverbs_comp_event_close() releases the lock and
    uverbs_hot_unplug_completion_event_file() acquires the lock, a completion
    event can arrive and be inserted into the event queue by
    ib_uverbs_comp_handler().

    This can cause a "double add" list_add warning or crash depending on the
    kernel configuration, or a memory leak because the event is never dequeued
    since the queue is already closed down.

    So add setting ev_queue->is_closed = 1 to ib_uverbs_comp_event_close().

    Cc: stable@vger.kernel.org
    Fixes: 1e7710f3f656 ("IB/core: Change completion channel to use the reworked objects schema")
    Signed-off-by: Steve Wise
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Steve Wise
     
  • [ Upstream commit c2d7c8ff89b22ddefb1ac2986c0d48444a667689 ]

    "nents" is an unsigned int, so if ib_map_mr_sg() returns a negative
    error code then it's type promoted to a high unsigned int which is
    treated as success.

    Fixes: a060b5629ab0 ("IB/core: generic RDMA READ/WRITE API")
    Signed-off-by: Dan Carpenter
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Dan Carpenter
     

26 Sep, 2018

1 commit

  • commit 954a8e3aea87e896e320cf648c1a5bbe47de443e upstream.

    When AF_IB addresses are used during rdma_resolve_addr() a lock is not
    held. A cma device can get removed while list traversal is in progress
    which may lead to crash. ie

    CPU0 CPU1
    ==== ====
    rdma_resolve_addr()
    cma_resolve_ib_dev()
    list_for_each() cma_remove_one()
    cur_dev->device mutex_lock(&lock)
    list_del();
    mutex_unlock(&lock);
    cma_process_remove();

    Therefore, hold a lock while traversing the list which avoids such
    situation.

    Cc: # 3.10
    Fixes: f17df3b0dede ("RDMA/cma: Add support for AF_IB to rdma_resolve_addr()")
    Signed-off-by: Parav Pandit
    Reviewed-by: Daniel Jurgens
    Signed-off-by: Leon Romanovsky
    Reviewed-by: Dennis Dalessandro
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Parav Pandit
     

20 Sep, 2018

1 commit

  • [ Upstream commit 643d213a9a034fa04f5575a40dfc8548e33ce04f ]

    Currently if the cm_id is not bound to any netdevice, than for such cm_id,
    net namespace is ignored; which is incorrect.

    Regardless of cm_id bound to a netdevice or not, net namespace must
    match. When a cm_id is bound to a netdevice, in such case net namespace
    and netdevice both must match.

    Fixes: 4c21b5bcef73 ("IB/cma: Add net_dev and private data checks to RDMA CM")
    Signed-off-by: Parav Pandit
    Reviewed-by: Daniel Jurgens
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Parav Pandit
     

06 Aug, 2018

1 commit

  • commit addb8a6559f0f8b5a37582b7ca698358445a55bf upstream.

    The commit cited below checked that the port numbers provided in the
    primary and alt AVs are legal.

    That is sufficient to prevent a kernel panic. However, it is not
    sufficient for correct operation.

    In Linux, AVs (both primary and alt) must be completely self-described.
    We do not accept an AV from userspace without an embedded port number.
    (This has been the case since kernel 3.14 commit dbf727de7440
    ("IB/core: Use GID table in AH creation and dmac resolution")).

    For the primary AV, this embedded port number must match the port number
    specified with IB_QP_PORT.

    We also expect the port number embedded in the alt AV to match the
    alt_port_num value passed by the userspace driver in the modify_qp command
    base structure.

    Add these checks to modify_qp.

    Cc: # 4.16
    Fixes: 5d4c05c3ee36 ("RDMA/uverbs: Sanitize user entered port numbers prior to access it")
    Signed-off-by: Jack Morgenstein
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Jack Morgenstein
     

03 Aug, 2018

5 commits

  • commit 940efcc8889f0d15567eb07fc9fd69b06e366aa5 upstream.

    Flows can be created on UD and RAW_PACKET QP types. Attempts to provide
    other QP types as an input causes to various unpredictable failures.

    The reason is that in order to support all various types (e.g. XRC), we
    are supposed to use real_qp handle and not qp handle and expect to
    driver/FW to fail such (XRC) flows. The simpler and safer variant is to
    ban all QP types except UD and RAW_PACKET, instead of relying on
    driver/FW.

    Cc: # 3.11
    Fixes: 436f2ad05a0b ("IB/core: Export ib_create/destroy_flow through uverbs")
    Cc: syzkaller
    Reported-by: Noa Osherovich
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sudip Mukherjee
    Signed-off-by: Greg Kroah-Hartman

    Leon Romanovsky
     
  • [ Upstream commit 2468b82d69e3a53d024f28d79ba0fdb8bf43dfbf ]

    Let's perform checks in-place instead of BUG_ONs.

    Signed-off-by: Leon Romanovsky
    Signed-off-by: Doug Ledford
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Leon Romanovsky
     
  • [ Upstream commit cb2595c1393b4a5211534e6f0a0fbad369e21ad8 ]

    ucma_process_join() will free the new allocated "mc" struct,
    if there is any error after that, especially the copy_to_user().

    But in parallel, ucma_leave_multicast() could find this "mc"
    through idr_find() before ucma_process_join() frees it, since it
    is already published.

    So "mc" could be used in ucma_leave_multicast() after it is been
    allocated and freed in ucma_process_join(), since we don't refcnt
    it.

    Fix this by separating "publish" from ID allocation, so that we
    can get an ID first and publish it later after copy_to_user().

    Fixes: c8f6a362bf3e ("RDMA/cma: Add multicast communication support")
    Reported-by: Noam Rathaus
    Signed-off-by: Cong Wang
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Cong Wang
     
  • commit 6ee687735e745eafae9e6b93d1ea70bc52e7ad07 upstream.

    gcc-4.4.4 has issues with initialization of anonymous unions.

    drivers/infiniband/core/verbs.c: In function '__ib_drain_sq':
    drivers/infiniband/core/verbs.c:2204: error: unknown field 'wr_cqe' specified in initializer
    drivers/infiniband/core/verbs.c:2204: warning: initialization makes integer from pointer without a cast

    Work around this.

    Fixes: a1ae7d0345edd5 ("RDMA/core: Avoid that ib_drain_qp() triggers an out-of-bounds stack access")
    Cc: Bart Van Assche
    Cc: Steve Wise
    Cc: Sagi Grimberg
    Cc: Jason Gunthorpe
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Doug Ledford
    Signed-off-by: Sudip Mukherjee
    Signed-off-by: Greg Kroah-Hartman

    Andrew Morton
     
  • commit a1ae7d0345edd593d6725d3218434d903a0af95d upstream.

    This patch fixes the following KASAN complaint:

    ==================================================================
    BUG: KASAN: stack-out-of-bounds in rxe_post_send+0x77d/0x9b0 [rdma_rxe]
    Read of size 8 at addr ffff880061aef860 by task 01/1080

    CPU: 2 PID: 1080 Comm: 01 Not tainted 4.16.0-rc3-dbg+ #2
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
    Call Trace:
    dump_stack+0x85/0xc7
    print_address_description+0x65/0x270
    kasan_report+0x231/0x350
    rxe_post_send+0x77d/0x9b0 [rdma_rxe]
    __ib_drain_sq+0x1ad/0x250 [ib_core]
    ib_drain_qp+0x9/0x30 [ib_core]
    srp_destroy_qp+0x51/0x70 [ib_srp]
    srp_free_ch_ib+0xfc/0x380 [ib_srp]
    srp_create_target+0x1071/0x19e0 [ib_srp]
    kernfs_fop_write+0x180/0x210
    __vfs_write+0xb1/0x2e0
    vfs_write+0xf6/0x250
    SyS_write+0x99/0x110
    do_syscall_64+0xee/0x2b0
    entry_SYSCALL_64_after_hwframe+0x42/0xb7

    The buggy address belongs to the page:
    page:ffffea000186bbc0 count:0 mapcount:0 mapping:0000000000000000 index:0x0
    flags: 0x4000000000000000()
    raw: 4000000000000000 0000000000000000 0000000000000000 00000000ffffffff
    raw: 0000000000000000 ffffea000186bbe0 0000000000000000 0000000000000000
    page dumped because: kasan: bad access detected

    Memory state around the buggy address:
    ffff880061aef700: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    ffff880061aef780: 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00
    >ffff880061aef800: f2 f2 f2 f2 f2 f2 f2 00 00 00 00 00 f2 f2 f2 f2
    ^
    ffff880061aef880: f2 f2 f2 00 00 00 00 00 00 00 00 00 00 00 f2 f2
    ffff880061aef900: f2 f2 f2 00 00 00 00 00 00 00 00 00 00 00 00 00
    ==================================================================

    Fixes: 765d67748bcf ("IB: new common API for draining queues")
    Signed-off-by: Bart Van Assche
    Cc: Steve Wise
    Cc: Sagi Grimberg
    Cc: stable@vger.kernel.org
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sudip Mukherjee
    Signed-off-by: Greg Kroah-Hartman

    Bart Van Assche
     

17 Jul, 2018

1 commit

  • commit 7a8690ed6f5346f6738971892205e91d39b6b901 upstream.

    In commit 357d23c811a7 ("Remove the obsolete libibcm library")
    in rdma-core [1], we removed obsolete library which used the
    /dev/infiniband/ucmX interface.

    Following multiple syzkaller reports about non-sanitized
    user input in the UCMA module, the short audit reveals the same
    issues in UCM module too.

    It is better to disable this interface in the kernel,
    before syzkaller team invests time and energy to harden
    this unused interface.

    [1] https://github.com/linux-rdma/rdma-core/pull/279

    Signed-off-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Leon Romanovsky
     

03 Jul, 2018

1 commit


21 Jun, 2018

5 commits

  • [ Upstream commit 9aa169213d1166d30ae357a44abbeae93459339d ]

    When commit [1] was added, SGID was queried to derive the SMAC address.
    Then, later on during a refactor [2], SMAC was no longer needed. However,
    the now useless GID query remained. Then during additional code changes
    later on, the GID query was being done in such a way that it caused iWARP
    queries to start breaking. Remove the useless GID query and resolve the
    iWARP breakage at the same time.

    This is discussed in [3].

    [1] commit dd5f03beb4f7 ("IB/core: Ethernet L2 attributes in verbs/cm structures")
    [2] commit 5c266b2304fb ("IB/cm: Remove the usage of smac and vid of qp_attr and cm_av")
    [3] https://www.spinics.net/lists/linux-rdma/msg63951.html

    Suggested-by: Shiraz Saleem
    Signed-off-by: Parav Pandit
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Doug Ledford
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Parav Pandit
     
  • [ Upstream commit db82476f37413eaeff5f836a9d8b022d6544accf ]

    Currently, the kernel protects access to the agent ID allocator on a per
    port basis using a spinlock, so it is impossible for two apps/threads on
    the same port to get the same TID, but it is entirely possible for two
    threads on different ports to end up with the same TID.

    As this can be confusing (regardless of it being legal according to the
    IB Spec 1.3, C13-18.1.1, in section 13.4.6.4 - TransactionID usage),
    and as the rdma-core user space API for /dev/umad devices implies unique
    TIDs even across ports, make the TID an atomic type so that no two
    allocations, regardless of port number, will be the same.

    Signed-off-by: Håkon Bugge
    Reviewed-by: Jack Morgenstein
    Reviewed-by: Ira Weiny
    Reviewed-by: Zhu Yanjun
    Signed-off-by: Doug Ledford
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Håkon Bugge
     
  • [ Upstream commit f96416cea7bce9afe619c15e87fced70f93f9098 ]

    In the cases where iwpm_hash_bucket is NULL and where function
    get_mapinfo_hash_bucket returns NULL then the map_info is never added
    to hash_bucket_head and hence there is a leak of map_info. Fix this
    by nullifying hash_bucket_head and if that is null we know that
    that map_info was not added to hash_bucket_head and hence map_info
    should be free'd.

    Detected by CoverityScan, CID#1222481 ("Resource Leak")

    Fixes: 30dc5e63d6a5 ("RDMA/core: Add support for iWARP Port Mapper user space service")
    Signed-off-by: Colin Ian King
    Signed-off-by: Doug Ledford
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Colin Ian King
     
  • [ Upstream commit 2918c1a900252b4a0c730715ec205437c7daf79d ]

    There are few issues with validation of netdevice and listen id lookup
    for IB (IPoIB) while processing incoming CM request as below.

    1. While performing lookup of bind_list in cma_ps_find(), net namespace
    of the netdevice can get deleted in cma_exit_net(), resulting in use
    after free access of idr and/or net namespace structures.
    This lookup occurs from the workqueue context (and not userspace
    context where net namespace is always valid).

    CPU0 CPU1
    ==== ====

    bind_list = cma_ps_find();
    move netdevice to new namespace
    delete net namespace
    cma_exit_net()
    idr_destroy(idr);

    [..]
    cma_find_listener(bind_list, ..);

    2. While netdevice is validated for IP address in given net namespace,
    netdevice's net namespace and/or ifindex can change in
    cma_get_net_dev() and cma_match_net_dev().

    Above issues are overcome by using rcu lock along with netdevice
    UP/DOWN state as described below.
    When a net namespace is getting deleted, netdevice is closed and
    shutdown before moving it back to init_net namespace.
    change_net_namespace() synchronizes with any existing use of netdevice
    before changing the netdev properties such as net or ifindex.
    Once netdevice IFF_UP flags is cleared, such fields are not guaranteed
    to be valid.
    Therefore, rcu lock along with netdevice state check ensures that,
    while route lookup and cm_id lookup is in progress, netdevice of
    interest won't migrate to any other net namespace.
    This ensures that associated net namespace of netdevice won't get
    deleted while rcu lock is held for netdevice which is in IFF_UP state.

    Fixes: fa20105e09e9 ("IB/cma: Add support for network namespaces")
    Fixes: 4be74b42a6d0 ("IB/cma: Separate port allocation to network namespaces")
    Fixes: f887f2ac87c2 ("IB/cma: Validate routing of incoming requests")
    Signed-off-by: Parav Pandit
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Doug Ledford
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Parav Pandit
     
  • [ Upstream commit f604db645a66b7ba4f21c426fe73253928dada41 ]

    Previously, if a method contained mandatory attributes in a namespace
    that wasn't given by the user, these attributes weren't validated.
    Fixing this by iterating over all specification namespaces.

    Fixes: fac9658cabb9 ("IB/core: Add new ioctl interface")
    Signed-off-by: Matan Barak
    Signed-off-by: Doug Ledford
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Matan Barak
     

05 Jun, 2018

1 commit

  • commit a840c93ca7582bb6c88df2345a33f979b7a67874 upstream.

    When a GID entry is invalid EAGAIN is returned. This is an incorrect error
    code, there is nothing that will make this GID entry valid again in
    bounded time.

    Some user space tools fail incorrectly if EAGAIN is returned here, and
    this represents a small ABI change from earlier kernels.

    The first patch in the Fixes list makes entries that were valid before
    to become invalid, allowing this code to trigger, while the second patch
    in the Fixes list introduced the wrong EAGAIN.

    Therefore revert the return result to EINVAL which matches the historical
    expectations of the ibv_query_gid_type() API of the libibverbs user space
    library.

    Cc:
    Fixes: 598ff6bae689 ("IB/core: Refactor GID modify code for RoCE")
    Fixes: 03db3a2d81e6 ("IB/core: Add RoCE GID table management")
    Reviewed-by: Daniel Jurgens
    Signed-off-by: Parav Pandit
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Parav Pandit
     

30 May, 2018

7 commits

  • [ Upstream commit 563c4ba3bd2b8b0b21c65669ec2226b1cfa1138b ]

    ah_attr contains the port number to which cm_id is bound. However, while
    searching for GID table for matching GID entry, the port number is
    ignored.

    This could cause the wrong GID to be used when the ah_attr is converted to
    an AH.

    Reviewed-by: Daniel Jurgens
    Signed-off-by: Parav Pandit
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Parav Pandit
     
  • [ Upstream commit 5f3e3b85cc0a5eae1c46d72e47d3de7bf208d9e2 ]

    The option size check is using optval instead of optlen
    causing the set option call to fail. Use the correct
    field, optlen, for size check.

    Fixes: 6a21dfc0d0db ("RDMA/ucma: Limit possible option size")
    Signed-off-by: Chien Tin Tung
    Signed-off-by: Shiraz Saleem
    Reviewed-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Chien Tin Tung
     
  • [ Upstream commit bb7f8f199c354c4cf155b1d6d55f86eaaed7fa5a ]

    resolved_dev returned might be NULL as ifindex is transient number.
    Ignoring NULL check of resolved_dev might crash the kernel.
    Therefore perform NULL check before accessing resolved_dev.

    Additionally rdma_resolve_ip_route() invokes addr_resolve() which
    performs check and address translation for loopback ifindex.
    Therefore, checking it again in rdma_resolve_ip_route() is not helpful.
    Therefore, the code is simplified to avoid IFF_LOOPBACK check.

    Fixes: 200298326b27 ("IB/core: Validate route when we init ah")
    Reviewed-by: Daniel Jurgens
    Signed-off-by: Parav Pandit
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Doug Ledford
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Parav Pandit
     
  • [ Upstream commit ec6f8401c48a86809237e86878a6fac6b281118f ]

    If remove_commit fails then the lock is left locked while the uobj still
    exists. Eventually the kernel will deadlock.

    lockdep detects this and says:

    test/4221 is leaving the kernel with locks still held!
    1 lock held by test/4221:
    #0: (&ucontext->cleanup_rwsem){.+.+}, at: [] rdma_explicit_destroy+0x37/0x120 [ib_uverbs]

    Fixes: 4da70da23e9b ("IB/core: Explicitly destroy an object while keeping uobject")
    Signed-off-by: Leon Romanovsky
    Reviewed-by: Dennis Dalessandro
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Jason Gunthorpe
     
  • [ Upstream commit 4d39a959bc1f3d164b5a54147fdeb19f84b1ed58 ]

    If the same attribute is listed twice by the user in the ioctl attribute
    list then error unwind can cause the kernel to deref garbage.

    This happens when an object with WRITE access is sent twice. The second
    parse properly fails but corrupts the state required for the error unwind
    it triggers.

    Fixing this by making duplicates in the attribute list invalid. This is
    not something we need to support.

    The ioctl interface is currently recommended to be disabled in kConfig.

    Signed-off-by: Matan Barak
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Matan Barak
     
  • [ Upstream commit 3d89459e2ef92cc0e5a50dde868780ccda9786c1 ]

    Fix a bug in uverbs_ioctl_merge that looked at the object's iterator
    number instead of the method's iterator number when merging methods.

    While we're at it, make the uverbs_ioctl_merge code a bit more clear
    and faster.

    Fixes: 118620d3686b ('IB/core: Add uverbs merge trees functionality')
    Signed-off-by: Matan Barak
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Matan Barak
     
  • commit 8e907ed4882714fd13cfe670681fc6cb5284c780 upstream.

    User-space may invoke ibv_reg_mr and ibv_dereg_mr in different threads.

    If ibv_dereg_mr is called after the thread which invoked ibv_reg_mr has
    exited, get_pid_task will return NULL and ib_umem_release will not
    decrease mm->pinned_vm.

    Instead of using threads to locate the mm, use the overall tgid from the
    ib_ucontext struct instead. This matches the behavior of ODP and
    disassociate in handling the mm of the process that called ibv_reg_mr.

    Cc:
    Fixes: 87773dd56d54 ("IB: ib_umem_release() should decrement mm->pinned_vm from ib_umem_get")
    Signed-off-by: Lidong Chen
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Lidong Chen
     

09 May, 2018

1 commit

  • commit 09abfe7b5b2f442a85f4c4d59ecf582ad76088d7 upstream.

    The RDMA CM will select a source device and address by consulting
    the routing table if no source address is passed into
    rdma_resolve_address(). Userspace will ask for this by passing an
    all-zero source address in the RESOLVE_IP command. Unfortunately
    the new check for non-zero address size rejects this with EINVAL,
    which breaks valid userspace applications.

    Fix this by explicitly allowing a zero address family for the source.

    Fixes: 2975d5de6428 ("RDMA/ucma: Check AF family prior resolving address")
    Cc:
    Signed-off-by: Roland Dreier
    Signed-off-by: Doug Ledford
    Signed-off-by: Greg Kroah-Hartman

    Roland Dreier
     

26 Apr, 2018

4 commits

  • [ Upstream commit d3b9e8ad425cfd5b9116732e057f1b48e4d3bcb8 ]

    Fix warning limit for kernel stack consumption:

    drivers/infiniband/core/cq.c: In function 'ib_process_cq_direct':
    drivers/infiniband/core/cq.c:78:1: error: the frame size of 1032 bytes
    is larger than 1024 bytes [-Werror=frame-larger-than=]

    Using smaller ib_wc array on the stack brings us comfortably below that
    limit again.

    Fixes: 246d8b184c10 ("IB/cq: Don't force IB_POLL_DIRECT poll context for ib_process_cq_direct")
    Reported-by: Arnd Bergmann
    Reviewed-by: Sergey Gorenko
    Signed-off-by: Max Gurtovoy
    Signed-off-by: Leon Romanovsky
    Reviewed-by: Bart Van Assche
    Acked-by: Arnd Bergmann
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Max Gurtovoy
     
  • [ Upstream commit 3624a8f02568f08aef299d3b117f2226f621177d ]

    Returning EOPNOTSUPP is problematic because it can also be
    returned by the method function, and we use it in quite a few
    places in drivers these days.

    Instead, dedicate EPROTONOSUPPORT to indicate that the ioctl framework
    is enabled but the requested object and method are not supported by
    the kernel. No other case will return this code, and it lets userspace
    know to fall back to write().

    grep says we do not use it today in drivers/infiniband subsystem.

    Signed-off-by: Jason Gunthorpe
    Reviewed-by: Matan Barak
    Signed-off-by: Doug Ledford
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Jason Gunthorpe
     
  • [ Upstream commit 00db63c128dd3daf38f481371976c24d32678142 ]

    If valid netdevice is not found for RoCE, GID table should not be
    searched with NULL netdevice.

    Doing so causes the search routines to ignore the netdev argument and may
    match the wrong GID table entry if the netdev is deleted.

    Fixes: abae1b71dd37 ("IB/cma: cma_validate_port should verify the port and netdevice")
    Signed-off-by: Parav Pandit
    Reviewed-by: Mark Bloch
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Parav Pandit
     
  • [ Upstream commit 246d8b184c100e8eb6b4e8c88f232c2ed2a4e672 ]

    polling the completion queue directly does not interfere
    with the existing polling logic, hence drop the requirement.
    Be aware that running ib_process_cq_direct with non IB_POLL_DIRECT
    CQ may trigger concurrent CQ processing.

    This can be used for polling mode ULPs.

    Cc: Bart Van Assche
    Reported-by: Steve Wise
    Signed-off-by: Sagi Grimberg
    [maxg: added wcs array argument to __ib_process_cq]
    Signed-off-by: Max Gurtovoy
    Signed-off-by: Doug Ledford
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Sagi Grimberg
     

24 Apr, 2018

1 commit


12 Apr, 2018

2 commits

  • [ Upstream commit 89838118a515847d3e5c904d2e022779a7173bec ]

    The 'if' logic in ucma_query_path was broken with OPA was introduced
    and started to treat RoCE paths as as OPA paths. Invert the logic
    of the 'if' so only OPA paths are treated as OPA paths.

    Otherwise the path records returned to rdma_cma users are mangled
    when in RoCE mode.

    Fixes: 57520751445b ("IB/SA: Add OPA path record type")
    Signed-off-by: Parav Pandit
    Reviewed-by: Mark Bloch
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Parav Pandit
     
  • [ Upstream commit e48e5e198fb6ec77c91047a694022f0fefa45292 ]

    The commit 1a1c116f3dcf ("RDMA/netlink: Simplify the put_msg and put_attr")
    removes nlmsg_len calculation in ibnl_put_attr causing netlink messages and
    caused to miss source and destination addresses.

    Fixes: 1a1c116f3dcf ("RDMA/netlink: Simplify the put_msg and put_attr")
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Leon Romanovsky
     

08 Apr, 2018

5 commits

  • commit 84652aefb347297aa08e91e283adf7b18f77c2d5 upstream.

    There are several places in the ucma ABI where userspace can pass in a
    sockaddr but set the address family to AF_IB. When that happens,
    rdma_addr_size() will return a size bigger than sizeof struct sockaddr_in6,
    and the ucma kernel code might end up copying past the end of a buffer
    not sized for a struct sockaddr_ib.

    Fix this by introducing new variants

    int rdma_addr_size_in6(struct sockaddr_in6 *addr);
    int rdma_addr_size_kss(struct __kernel_sockaddr_storage *addr);

    that are type-safe for the types used in the ucma ABI and return 0 if the
    size computed is bigger than the size of the type passed in. We can use
    these new variants to check what size userspace has passed in before
    copying any addresses.

    Reported-by:
    Signed-off-by: Roland Dreier
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Roland Dreier
     
  • commit c8d3bcbfc5eab3f01cf373d039af725f3b488813 upstream.

    Ensure that device exists prior to accessing its properties.

    Reported-by:
    Fixes: 75216638572f ("RDMA/cma: Export rdma cm interface to userspace")
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Leon Romanovsky
     
  • commit 4b658d1bbc16605330694bb3ef2570c465ef383d upstream.

    Add missing check that device is connected prior to access it.

    [ 55.358652] BUG: KASAN: null-ptr-deref in rdma_init_qp_attr+0x4a/0x2c0
    [ 55.359389] Read of size 8 at addr 00000000000000b0 by task qp/618
    [ 55.360255]
    [ 55.360432] CPU: 1 PID: 618 Comm: qp Not tainted 4.16.0-rc1-00071-gcaf61b1b8b88 #91
    [ 55.361693] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014
    [ 55.363264] Call Trace:
    [ 55.363833] dump_stack+0x5c/0x77
    [ 55.364215] kasan_report+0x163/0x380
    [ 55.364610] ? rdma_init_qp_attr+0x4a/0x2c0
    [ 55.365238] rdma_init_qp_attr+0x4a/0x2c0
    [ 55.366410] ucma_init_qp_attr+0x111/0x200
    [ 55.366846] ? ucma_notify+0xf0/0xf0
    [ 55.367405] ? _get_random_bytes+0xea/0x1b0
    [ 55.367846] ? urandom_read+0x2f0/0x2f0
    [ 55.368436] ? kmem_cache_alloc_trace+0xd2/0x1e0
    [ 55.369104] ? refcount_inc_not_zero+0x9/0x60
    [ 55.369583] ? refcount_inc+0x5/0x30
    [ 55.370155] ? rdma_create_id+0x215/0x240
    [ 55.370937] ? _copy_to_user+0x4f/0x60
    [ 55.371620] ? mem_cgroup_commit_charge+0x1f5/0x290
    [ 55.372127] ? _copy_from_user+0x5e/0x90
    [ 55.372720] ucma_write+0x174/0x1f0
    [ 55.373090] ? ucma_close_id+0x40/0x40
    [ 55.373805] ? __lru_cache_add+0xa8/0xd0
    [ 55.374403] __vfs_write+0xc4/0x350
    [ 55.374774] ? kernel_read+0xa0/0xa0
    [ 55.375173] ? fsnotify+0x899/0x8f0
    [ 55.375544] ? fsnotify_unmount_inodes+0x170/0x170
    [ 55.376689] ? __fsnotify_update_child_dentry_flags+0x30/0x30
    [ 55.377522] ? handle_mm_fault+0x174/0x320
    [ 55.378169] vfs_write+0xf7/0x280
    [ 55.378864] SyS_write+0xa1/0x120
    [ 55.379270] ? SyS_read+0x120/0x120
    [ 55.379643] ? mm_fault_error+0x180/0x180
    [ 55.380071] ? task_work_run+0x7d/0xd0
    [ 55.380910] ? __task_pid_nr_ns+0x120/0x140
    [ 55.381366] ? SyS_read+0x120/0x120
    [ 55.381739] do_syscall_64+0xeb/0x250
    [ 55.382143] entry_SYSCALL_64_after_hwframe+0x21/0x86
    [ 55.382841] RIP: 0033:0x7fc2ef803e99
    [ 55.383227] RSP: 002b:00007fffcc5f3be8 EFLAGS: 00000217 ORIG_RAX: 0000000000000001
    [ 55.384173] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc2ef803e99
    [ 55.386145] RDX: 0000000000000057 RSI: 0000000020000080 RDI: 0000000000000003
    [ 55.388418] RBP: 00007fffcc5f3c00 R08: 0000000000000000 R09: 0000000000000000
    [ 55.390542] R10: 0000000000000000 R11: 0000000000000217 R12: 0000000000400480
    [ 55.392916] R13: 00007fffcc5f3cf0 R14: 0000000000000000 R15: 0000000000000000
    [ 55.521088] Code: e5 4d 1e ff 48 89 df 44 0f b6 b3 b8 01 00 00 e8 65 50 1e ff 4c 8b 2b 49
    8d bd b0 00 00 00 e8 56 50 1e ff 41 0f b6 c6 48 c1 e0 04 03 85 b0 00 00 00 48 8d 78 08
    48 89 04 24 e8 3a 4f 1e ff 48
    [ 55.525980] RIP: rdma_init_qp_attr+0x52/0x2c0 RSP: ffff8801e2c2f9d8
    [ 55.532648] CR2: 00000000000000b0
    [ 55.534396] ---[ end trace 70cee64090251c0b ]---

    Fixes: 75216638572f ("RDMA/cma: Export rdma cm interface to userspace")
    Fixes: d541e45500bd ("IB/core: Convert ah_attr from OPA to IB when copying to user")
    Reported-by:
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Leon Romanovsky
     
  • commit 9137108cc3d64ade13e753108ec611a0daed16a0 upstream.

    process_one_req() can race with rdma_addr_cancel():

    CPU0 CPU1
    ==== ====
    process_one_work()
    debug_work_deactivate(work);
    process_one_req()
    rdma_addr_cancel()
    mutex_lock(&lock);
    set_timeout(&req->work,..);
    __queue_work()
    debug_work_activate(work);
    mutex_unlock(&lock);

    mutex_lock(&lock);
    [..]
    list_del(&req->list);
    mutex_unlock(&lock);
    [..]

    // ODEBUG explodes since the work is still queued.
    kfree(req);

    Causing ODEBUG to detect the use after free:

    ODEBUG: free active (active state 0) object type: work_struct hint: process_one_req+0x0/0x6c0 include/net/dst.h:165
    WARNING: CPU: 0 PID: 79 at lib/debugobjects.c:291 debug_print_object+0x166/0x220 lib/debugobjects.c:288
    kvm: emulating exchange as write
    Kernel panic - not syncing: panic_on_warn set ...

    CPU: 0 PID: 79 Comm: kworker/u4:3 Not tainted 4.16.0-rc6+ #361
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Workqueue: ib_addr process_one_req
    Call Trace:
    __dump_stack lib/dump_stack.c:17 [inline]
    dump_stack+0x194/0x24d lib/dump_stack.c:53
    panic+0x1e4/0x41c kernel/panic.c:183
    __warn+0x1dc/0x200 kernel/panic.c:547
    report_bug+0x1f4/0x2b0 lib/bug.c:186
    fixup_bug.part.11+0x37/0x80 arch/x86/kernel/traps.c:178
    fixup_bug arch/x86/kernel/traps.c:247 [inline]
    do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296
    do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
    invalid_op+0x1b/0x40 arch/x86/entry/entry_64.S:986
    RIP: 0010:debug_print_object+0x166/0x220 lib/debugobjects.c:288
    RSP: 0000:ffff8801d966f210 EFLAGS: 00010086
    RAX: dffffc0000000008 RBX: 0000000000000003 RCX: ffffffff815acd6e
    RDX: 0000000000000000 RSI: 1ffff1003b2cddf2 RDI: 0000000000000000
    RBP: ffff8801d966f250 R08: 0000000000000000 R09: 1ffff1003b2cddc8
    R10: ffffed003b2cde71 R11: ffffffff86f39a98 R12: 0000000000000001
    R13: ffffffff86f15540 R14: ffffffff86408700 R15: ffffffff8147c0a0
    __debug_check_no_obj_freed lib/debugobjects.c:745 [inline]
    debug_check_no_obj_freed+0x662/0xf1f lib/debugobjects.c:774
    kfree+0xc7/0x260 mm/slab.c:3799
    process_one_req+0x2e7/0x6c0 drivers/infiniband/core/addr.c:592
    process_one_work+0xc47/0x1bb0 kernel/workqueue.c:2113
    worker_thread+0x223/0x1990 kernel/workqueue.c:2247
    kthread+0x33c/0x400 kernel/kthread.c:238
    ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:406

    Fixes: 5fff41e1f89d ("IB/core: Fix race condition in resolving IP to MAC")
    Reported-by:
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Jason Gunthorpe
     
  • commit e8980d67d6017c8eee8f9c35f782c4bd68e004c9 upstream.

    Prior to access UCMA commands, the context should be initialized
    and connected to CM_ID with ucma_create_id(). In case user skips
    this step, he can provide non-valid ctx without CM_ID and cause
    to multiple NULL dereferences.

    Also there are situations where the create_id can be raced with
    other user access, ensure that the context is only shared to
    other threads once it is fully initialized to avoid the races.

    [ 109.088108] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
    [ 109.090315] IP: ucma_connect+0x138/0x1d0
    [ 109.092595] PGD 80000001dc02d067 P4D 80000001dc02d067 PUD 1da9ef067 PMD 0
    [ 109.095384] Oops: 0000 [#1] SMP KASAN PTI
    [ 109.097834] CPU: 0 PID: 663 Comm: uclose Tainted: G B 4.16.0-rc1-00062-g2975d5de6428 #45
    [ 109.100816] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014
    [ 109.105943] RIP: 0010:ucma_connect+0x138/0x1d0
    [ 109.108850] RSP: 0018:ffff8801c8567a80 EFLAGS: 00010246
    [ 109.111484] RAX: 0000000000000000 RBX: 1ffff100390acf50 RCX: ffffffff9d7812e2
    [ 109.114496] RDX: 1ffffffff3f507a5 RSI: 0000000000000297 RDI: 0000000000000297
    [ 109.117490] RBP: ffff8801daa15600 R08: 0000000000000000 R09: ffffed00390aceeb
    [ 109.120429] R10: 0000000000000001 R11: ffffed00390aceea R12: 0000000000000000
    [ 109.123318] R13: 0000000000000120 R14: ffff8801de6459c0 R15: 0000000000000118
    [ 109.126221] FS: 00007fabb68d6700(0000) GS:ffff8801e5c00000(0000) knlGS:0000000000000000
    [ 109.129468] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 109.132523] CR2: 0000000000000020 CR3: 00000001d45d8003 CR4: 00000000003606b0
    [ 109.135573] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [ 109.138716] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    [ 109.142057] Call Trace:
    [ 109.144160] ? ucma_listen+0x110/0x110
    [ 109.146386] ? wake_up_q+0x59/0x90
    [ 109.148853] ? futex_wake+0x10b/0x2a0
    [ 109.151297] ? save_stack+0x89/0xb0
    [ 109.153489] ? _copy_from_user+0x5e/0x90
    [ 109.155500] ucma_write+0x174/0x1f0
    [ 109.157933] ? ucma_resolve_route+0xf0/0xf0
    [ 109.160389] ? __mod_node_page_state+0x1d/0x80
    [ 109.162706] __vfs_write+0xc4/0x350
    [ 109.164911] ? kernel_read+0xa0/0xa0
    [ 109.167121] ? path_openat+0x1b10/0x1b10
    [ 109.169355] ? fsnotify+0x899/0x8f0
    [ 109.171567] ? fsnotify_unmount_inodes+0x170/0x170
    [ 109.174145] ? __fget+0xa8/0xf0
    [ 109.177110] vfs_write+0xf7/0x280
    [ 109.179532] SyS_write+0xa1/0x120
    [ 109.181885] ? SyS_read+0x120/0x120
    [ 109.184482] ? compat_start_thread+0x60/0x60
    [ 109.187124] ? SyS_read+0x120/0x120
    [ 109.189548] do_syscall_64+0xeb/0x250
    [ 109.192178] entry_SYSCALL_64_after_hwframe+0x21/0x86
    [ 109.194725] RIP: 0033:0x7fabb61ebe99
    [ 109.197040] RSP: 002b:00007fabb68d5e98 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
    [ 109.200294] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fabb61ebe99
    [ 109.203399] RDX: 0000000000000120 RSI: 00000000200001c0 RDI: 0000000000000004
    [ 109.206548] RBP: 00007fabb68d5ec0 R08: 0000000000000000 R09: 0000000000000000
    [ 109.209902] R10: 0000000000000000 R11: 0000000000000202 R12: 00007fabb68d5fc0
    [ 109.213327] R13: 0000000000000000 R14: 00007fff40ab2430 R15: 00007fabb68d69c0
    [ 109.216613] Code: 88 44 24 2c 0f b6 84 24 6e 01 00 00 88 44 24 2d 0f
    b6 84 24 69 01 00 00 88 44 24 2e 8b 44 24 60 89 44 24 30 e8 da f6 06 ff
    31 c0 41 83 7c 24 20 1b 75 04 8b 44 24 64 48 8d 74 24 20 4c 89 e7
    [ 109.223602] RIP: ucma_connect+0x138/0x1d0 RSP: ffff8801c8567a80
    [ 109.226256] CR2: 0000000000000020

    Fixes: 75216638572f ("RDMA/cma: Export rdma cm interface to userspace")
    Reported-by:
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Leon Romanovsky