07 Feb, 2019

1 commit

  • commit 7709b0dc265f28695487712c45f02bbd1f98415d upstream.

    Applications that use the stack for execution purposes cause userspace PSM
    jobs to fail during mmap().

    Both Fortran (non-standard format parsing) and C (callback functions
    located in the stack) applications can be written such that stack
    execution is required. The linker notes this via the gnu_stack ELF flag.

    This causes READ_IMPLIES_EXEC to be set which forces all PROT_READ mmaps
    to have PROT_EXEC for the process.

    Checking for VM_EXEC bit and failing the request with EPERM is overly
    conservative and will break any PSM application using executable stacks.

    Cc: #v4.14+
    Fixes: 12220267645c ("IB/hfi: Protect against writable mmap")
    Reviewed-by: Mike Marciniszyn
    Reviewed-by: Dennis Dalessandro
    Reviewed-by: Ira Weiny
    Signed-off-by: Michael J. Ruhl
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Michael J. Ruhl
     

26 Jan, 2019

2 commits

  • [ Upstream commit 8036e90f92aae2784b855a0007ae2d8154d28b3c ]

    Acquiring the rtnl lock while holding usdev_lock could result in a
    deadlock.

    For example:

    usnic_ib_query_port()
    | mutex_lock(&us_ibdev->usdev_lock)
    | ib_get_eth_speed()
    | rtnl_lock()

    rtnl_lock()
    | usnic_ib_netdevice_event()
    | mutex_lock(&us_ibdev->usdev_lock)

    This commit moves the usdev_lock acquisition after the rtnl lock has been
    released.

    This is safe to do because usdev_lock is not protecting anything being
    accessed in ib_get_eth_speed(). Hence, the correct order of holding locks
    (rtnl -> usdev_lock) is not violated.

    Signed-off-by: Parvi Kaustubhi
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin

    Parvi Kaustubhi
     
  • [ Upstream commit b024dd0eba6e6d568f69d63c5e3153aba94c23e3 ]

    FRWR memory registration is done with a series of calls and WRs.
    1. ULP invokes ib_dma_map_sg()
    2. ULP invokes ib_map_mr_sg()
    3. ULP posts an IB_WR_REG_MR on the Send queue

    Step 2 generates an iova. It is permissible for ULPs to change this
    iova (with certain restrictions) between steps 2 and 3.

    rxe_map_mr_sg captures the MR's iova but later when rxe processes the
    REG_MR WR, it ignores the MR's iova field. If a ULP alters the MR's iova
    after step 2 but before step 3, rxe never captures that change.

    When the remote sends an RDMA Read targeting that MR, rxe looks up the
    R_key, but the altered iova does not match the iova stored in the MR,
    causing the RDMA Read request to fail.

    Reported-by: Anna Schumaker
    Signed-off-by: Chuck Lever
    Reviewed-by: Sagi Grimberg
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin

    Chuck Lever
     

13 Jan, 2019

1 commit

  • commit e48d8ed9c6193502d849b35767fd18e20bbd7ba2 upstream.

    Error completions must still contain a valid wr_id and
    qp_num such that the consumer can rely on. Correctly
    fill these fields in receive error completions.

    Reported-by: Walker Benjamin
    Cc: stable@vger.kernel.org
    Signed-off-by: Sagi Grimberg
    Reviewed-by: Zhu Yanjun
    Tested-by: Zhu Yanjun
    Signed-off-by: Doug Ledford
    Signed-off-by: Greg Kroah-Hartman

    Sagi Grimberg
     

10 Jan, 2019

1 commit

  • commit dbc2970caef74e8ff41923d302aa6fb5a4812d0e upstream.

    An incorrect sge sizing in the HFI PIO path will cause an OOPs similar to
    this:

    BUG: unable to handle kernel NULL pointer dereference at (null)
    IP: [] hfi1_verbs_send_pio+0x3d8/0x530 [hfi1]
    PGD 0
    Oops: 0000 1 SMP
    Call Trace:
    ? hfi1_verbs_send_dma+0xad0/0xad0 [hfi1]
    hfi1_verbs_send+0xdf/0x250 [hfi1]
    ? make_rc_ack+0xa80/0xa80 [hfi1]
    hfi1_do_send+0x192/0x430 [hfi1]
    hfi1_do_send_from_rvt+0x10/0x20 [hfi1]
    rvt_post_send+0x369/0x820 [rdmavt]
    ib_uverbs_post_send+0x317/0x570 [ib_uverbs]
    ib_uverbs_write+0x26f/0x420 [ib_uverbs]
    ? security_file_permission+0x21/0xa0
    vfs_write+0xbd/0x1e0
    ? mntput+0x24/0x40
    SyS_write+0x7f/0xe0
    system_call_fastpath+0x16/0x1b

    Fix by adding the missing sizing check to correctly determine the sge
    length.

    Fixes: 7724105686e7 ("IB/hfi1: add driver files")
    Reviewed-by: Mike Marciniszyn
    Signed-off-by: Michael J. Ruhl
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Michael J. Ruhl
     

29 Dec, 2018

1 commit

  • commit 14d15c2b278011056482eb015dff89f9cbf2b841 upstream

    BUG: KASAN: use-after-free in srpt_set_enabled+0x1a9/0x1e0 [ib_srpt]
    Read of size 4 at addr ffff8801269d23f8 by task check/29726

    CPU: 4 PID: 29726 Comm: check Not tainted 4.18.0-rc2-dbg+ #4
    Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
    Call Trace:
    dump_stack+0xa4/0xf5
    print_address_description+0x6f/0x270
    kasan_report+0x241/0x360
    __asan_load4+0x78/0x80
    srpt_set_enabled+0x1a9/0x1e0 [ib_srpt]
    srpt_tpg_enable_store+0xb8/0x120 [ib_srpt]
    configfs_write_file+0x14e/0x1d0 [configfs]
    __vfs_write+0xd2/0x3b0
    vfs_write+0x101/0x270
    ksys_write+0xab/0x120
    __x64_sys_write+0x43/0x50
    do_syscall_64+0x77/0x230
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x7f235cfe6154

    Fixes: aaf45bd83eba ("IB/srpt: Detect session shutdown reliably")
    Signed-off-by: Bart Van Assche
    Cc:
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sudip Mukherjee
    Signed-off-by: Sasha Levin

    Bart Van Assche
     

21 Dec, 2018

1 commit

  • commit 28a9a9e83ceae2cee25b9af9ad20d53aaa9ab951 upstream

    Packet queue state is over used to determine SDMA descriptor
    availablitity and packet queue request state.

    cpu 0 ret = user_sdma_send_pkts(req, pcount);
    cpu 0 if (atomic_read(&pq->n_reqs))
    cpu 1 IRQ user_sdma_txreq_cb calls pq_update() (state to _INACTIVE)
    cpu 0 xchg(&pq->state, SDMA_PKT_Q_ACTIVE);

    At this point pq->n_reqs == 0 and pq->state is incorrectly
    SDMA_PKT_Q_ACTIVE. The close path will hang waiting for the state
    to return to _INACTIVE.

    This can also change the state from _DEFERRED to _ACTIVE. However,
    this is a mostly benign race.

    Remove the racy code path.

    Use n_reqs to determine if a packet queue is active or not.

    Cc: # 4.14.0>
    Reviewed-by: Mitko Haralanov
    Reviewed-by: Mike Marciniszyn
    Signed-off-by: Michael J. Ruhl
    Signed-off-by: Sasha Levin

    Michael J. Ruhl
     

17 Dec, 2018

4 commits

  • commit 36d842194a57f1b21fbc6a6875f2fa2f9a7f8679 upstream.

    When running with KASAN, the following trace is produced:

    [ 62.535888]

    ==================================================================
    [ 62.544930] BUG: KASAN: slab-out-of-bounds in
    gut_hw_stats+0x122/0x230 [hfi1]
    [ 62.553856] Write of size 8 at addr ffff88080e8d6330 by task
    kworker/0:1/14

    [ 62.565333] CPU: 0 PID: 14 Comm: kworker/0:1 Not tainted
    4.19.0-test-build-kasan+ #8
    [ 62.575087] Hardware name: Intel Corporation S2600KPR/S2600KPR, BIOS
    SE5C610.86B.01.01.0019.101220160604 10/12/2016
    [ 62.587951] Workqueue: events work_for_cpu_fn
    [ 62.594050] Call Trace:
    [ 62.598023] dump_stack+0xc6/0x14c
    [ 62.603089] ? dump_stack_print_info.cold.1+0x2f/0x2f
    [ 62.610041] ? kmsg_dump_rewind_nolock+0x59/0x59
    [ 62.616615] ? get_hw_stats+0x122/0x230 [hfi1]
    [ 62.622985] print_address_description+0x6c/0x23c
    [ 62.629744] ? get_hw_stats+0x122/0x230 [hfi1]
    [ 62.636108] kasan_report.cold.6+0x241/0x308
    [ 62.642365] get_hw_stats+0x122/0x230 [hfi1]
    [ 62.648703] ? hfi1_alloc_rn+0x40/0x40 [hfi1]
    [ 62.655088] ? __kmalloc+0x110/0x240
    [ 62.660695] ? hfi1_alloc_rn+0x40/0x40 [hfi1]
    [ 62.667142] setup_hw_stats+0xd8/0x430 [ib_core]
    [ 62.673972] ? show_hfi+0x50/0x50 [hfi1]
    [ 62.680026] ib_device_register_sysfs+0x165/0x180 [ib_core]
    [ 62.687995] ib_register_device+0x5a2/0xa10 [ib_core]
    [ 62.695340] ? show_hfi+0x50/0x50 [hfi1]
    [ 62.701421] ? ib_unregister_device+0x2e0/0x2e0 [ib_core]
    [ 62.709222] ? __vmalloc_node_range+0x2d0/0x380
    [ 62.716131] ? rvt_driver_mr_init+0x11f/0x2d0 [rdmavt]
    [ 62.723735] ? vmalloc_node+0x5c/0x70
    [ 62.729697] ? rvt_driver_mr_init+0x11f/0x2d0 [rdmavt]
    [ 62.737347] ? rvt_driver_mr_init+0x1f5/0x2d0 [rdmavt]
    [ 62.744998] ? __rvt_alloc_mr+0x110/0x110 [rdmavt]
    [ 62.752315] ? rvt_rc_error+0x140/0x140 [rdmavt]
    [ 62.759434] ? rvt_vma_open+0x30/0x30 [rdmavt]
    [ 62.766364] ? mutex_unlock+0x1d/0x40
    [ 62.772445] ? kmem_cache_create_usercopy+0x15d/0x230
    [ 62.780115] rvt_register_device+0x1f6/0x360 [rdmavt]
    [ 62.787823] ? rvt_get_port_immutable+0x180/0x180 [rdmavt]
    [ 62.796058] ? __get_txreq+0x400/0x400 [hfi1]
    [ 62.802969] ? memcpy+0x34/0x50
    [ 62.808611] hfi1_register_ib_device+0xde6/0xeb0 [hfi1]
    [ 62.816601] ? hfi1_get_npkeys+0x10/0x10 [hfi1]
    [ 62.823760] ? hfi1_init+0x89f/0x9a0 [hfi1]
    [ 62.830469] ? hfi1_setup_eagerbufs+0xad0/0xad0 [hfi1]
    [ 62.838204] ? pcie_capability_clear_and_set_word+0xcd/0xe0
    [ 62.846429] ? pcie_capability_read_word+0xd0/0xd0
    [ 62.853791] ? hfi1_pcie_init+0x187/0x4b0 [hfi1]
    [ 62.860958] init_one+0x67f/0xae0 [hfi1]
    [ 62.867301] ? hfi1_init+0x9a0/0x9a0 [hfi1]
    [ 62.873876] ? wait_woken+0x130/0x130
    [ 62.879860] ? read_word_at_a_time+0xe/0x20
    [ 62.886329] ? strscpy+0x14b/0x280
    [ 62.891998] ? hfi1_init+0x9a0/0x9a0 [hfi1]
    [ 62.898405] local_pci_probe+0x70/0xd0
    [ 62.904295] ? pci_device_shutdown+0x90/0x90
    [ 62.910833] work_for_cpu_fn+0x29/0x40
    [ 62.916750] process_one_work+0x584/0x960
    [ 62.922974] ? rcu_work_rcufn+0x40/0x40
    [ 62.928991] ? __schedule+0x396/0xdc0
    [ 62.934806] ? __sched_text_start+0x8/0x8
    [ 62.941020] ? pick_next_task_fair+0x68b/0xc60
    [ 62.947674] ? run_rebalance_domains+0x260/0x260
    [ 62.954471] ? __list_add_valid+0x29/0xa0
    [ 62.960607] ? move_linked_works+0x1c7/0x230
    [ 62.967077] ?
    trace_event_raw_event_workqueue_execute_start+0x140/0x140
    [ 62.976248] ? mutex_lock+0xa6/0x100
    [ 62.982029] ? __mutex_lock_slowpath+0x10/0x10
    [ 62.988795] ? __switch_to+0x37a/0x710
    [ 62.994731] worker_thread+0x62e/0x9d0
    [ 63.000602] ? max_active_store+0xf0/0xf0
    [ 63.006828] ? __switch_to_asm+0x40/0x70
    [ 63.012932] ? __switch_to_asm+0x34/0x70
    [ 63.019013] ? __switch_to_asm+0x40/0x70
    [ 63.025042] ? __switch_to_asm+0x34/0x70
    [ 63.031030] ? __switch_to_asm+0x40/0x70
    [ 63.037006] ? __schedule+0x396/0xdc0
    [ 63.042660] ? kmem_cache_alloc_trace+0xf3/0x1f0
    [ 63.049323] ? kthread+0x59/0x1d0
    [ 63.054594] ? ret_from_fork+0x35/0x40
    [ 63.060257] ? __sched_text_start+0x8/0x8
    [ 63.066212] ? schedule+0xcf/0x250
    [ 63.071529] ? __wake_up_common+0x110/0x350
    [ 63.077794] ? __schedule+0xdc0/0xdc0
    [ 63.083348] ? wait_woken+0x130/0x130
    [ 63.088963] ? finish_task_switch+0x1f1/0x520
    [ 63.095258] ? kasan_unpoison_shadow+0x30/0x40
    [ 63.101792] ? __init_waitqueue_head+0xa0/0xd0
    [ 63.108183] ? replenish_dl_entity.cold.60+0x18/0x18
    [ 63.115151] ? _raw_spin_lock_irqsave+0x25/0x50
    [ 63.121754] ? max_active_store+0xf0/0xf0
    [ 63.127753] kthread+0x1ae/0x1d0
    [ 63.132894] ? kthread_bind+0x30/0x30
    [ 63.138422] ret_from_fork+0x35/0x40

    [ 63.146973] Allocated by task 14:
    [ 63.152077] kasan_kmalloc+0xbf/0xe0
    [ 63.157471] __kmalloc+0x110/0x240
    [ 63.162804] init_cntrs+0x34d/0xdf0 [hfi1]
    [ 63.168883] hfi1_init_dd+0x29a3/0x2f90 [hfi1]
    [ 63.175244] init_one+0x551/0xae0 [hfi1]
    [ 63.181065] local_pci_probe+0x70/0xd0
    [ 63.186759] work_for_cpu_fn+0x29/0x40
    [ 63.192310] process_one_work+0x584/0x960
    [ 63.198163] worker_thread+0x62e/0x9d0
    [ 63.203843] kthread+0x1ae/0x1d0
    [ 63.208874] ret_from_fork+0x35/0x40

    [ 63.217203] Freed by task 1:
    [ 63.221844] __kasan_slab_free+0x12e/0x180
    [ 63.227844] kfree+0x92/0x1a0
    [ 63.232570] single_release+0x3a/0x60
    [ 63.238024] __fput+0x1d9/0x480
    [ 63.242911] task_work_run+0x139/0x190
    [ 63.248440] exit_to_usermode_loop+0x191/0x1a0
    [ 63.254814] do_syscall_64+0x301/0x330
    [ 63.260283] entry_SYSCALL_64_after_hwframe+0x44/0xa9

    [ 63.270199] The buggy address belongs to the object at
    ffff88080e8d5500
    which belongs to the cache kmalloc-4096 of size 4096
    [ 63.287247] The buggy address is located 3632 bytes inside of
    4096-byte region [ffff88080e8d5500, ffff88080e8d6500)
    [ 63.303564] The buggy address belongs to the page:
    [ 63.310447] page:ffffea00203a3400 count:1 mapcount:0
    mapping:ffff88081380e840 index:0x0 compound_mapcount: 0
    [ 63.323102] flags: 0x2fffff80008100(slab|head)
    [ 63.329775] raw: 002fffff80008100 0000000000000000 0000000100000001
    ffff88081380e840
    [ 63.340175] raw: 0000000000000000 0000000000070007 00000001ffffffff
    0000000000000000
    [ 63.350564] page dumped because: kasan: bad access detected

    [ 63.361974] Memory state around the buggy address:
    [ 63.369137] ffff88080e8d6200: 00 00 00 00 00 00 00 00 00 00 00 00 00
    00 00 00
    [ 63.379082] ffff88080e8d6280: 00 00 00 00 00 00 00 00 00 00 00 00 00
    00 00 00
    [ 63.389032] >ffff88080e8d6300: 00 00 00 00 00 00 fc fc fc fc fc fc fc
    fc fc fc
    [ 63.398944] ^
    [ 63.406141] ffff88080e8d6380: fc fc fc fc fc fc fc fc fc fc fc fc fc
    fc fc fc
    [ 63.416109] ffff88080e8d6400: fc fc fc fc fc fc fc fc fc fc fc fc fc
    fc fc fc
    [ 63.426099]
    ==================================================================

    The trace happens because get_hw_stats() assumes there is room in the
    memory allocated in init_cntrs() to accommodate the driver counters.
    Unfortunately, that routine only allocated space for the device
    counters.

    Fix by insuring the allocation has room for the additional driver
    counters.

    Cc: # v4.14+
    Fixes: b7481944b06e9 ("IB/hfi1: Show statistics counters under IB stats interface")
    Reviewed-by: Mike Marciniczyn
    Reviewed-by: Mike Ruhl
    Signed-off-by: Piotr Stankiewicz
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford
    Signed-off-by: Greg Kroah-Hartman

    Piotr Stankiewicz
     
  • [ Upstream commit 75b7b86bdb0df37e08e44b6c1f99010967f81944 ]

    Memory windows are implemented with an indirect MKey, when a page fault
    event comes for a MW Mkey we need to find the MR at the end of the list of
    the indirect MKeys by iterating on all items from the first to the last.

    The offset calculated during this process has to be zeroed after the first
    iteration or the next iteration will start from a wrong address, resulting
    incorrect ODP faulting behavior.

    Fixes: db570d7deafb ("IB/mlx5: Add ODP support to MW")
    Signed-off-by: Artemy Kovalyov
    Signed-off-by: Moni Shoua
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin

    Artemy Kovalyov
     
  • [ Upstream commit 4f32fb921b153ae9ea280e02a3e91509fffc03d3 ]

    rdmavt uses a crazy system that looses the type checking when assinging
    functions to struct ib_device function pointers. Because of this the
    signature to this function was not changed when the below commit revised
    things.

    Fix the signature so we are not calling a function pointer with a
    mismatched signature.

    Fixes: 477864c8fcd9 ("IB/core: Let create_ah return extended response to user")
    Signed-off-by: Kamal Heib
    Reviewed-by: Dennis Dalessandro
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin

    Kamal Heib
     
  • [ Upstream commit 074fca3a18e7e1e0d4d7dcc9d7badc43b90232f4 ]

    Currently, for IB_WR_LOCAL_INV WR, when the next fence is None, the
    current fence will be SMALL instead of Normal Fence.

    Without this patch krping doesn't work on CX-5 devices and throws
    following error:

    The error messages are from CX5 driver are: (from server side)
    [ 710.434014] mlx5_0:dump_cqe:278:(pid 2712): dump error cqe
    [ 710.434016] 00000000 00000000 00000000 00000000
    [ 710.434016] 00000000 00000000 00000000 00000000
    [ 710.434017] 00000000 00000000 00000000 00000000
    [ 710.434018] 00000000 93003204 100000b8 000524d2
    [ 710.434019] krping: cq completion failed with wr_id 0 status 4 opcode 128 vender_err 32

    Fixed the logic to set the correct fence type.

    Fixes: 6e8484c5cf07 ("RDMA/mlx5: set UMR wqe fence according to HCA cap")
    Signed-off-by: Majd Dibbiny
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin

    Majd Dibbiny
     

08 Dec, 2018

2 commits

  • commit db7a691a1551a748cb92d9c89c6b190ea87e28d5 upstream.

    If the firmware reports a connection width that is not 1x, 4x, 8x or 12x
    it causes the driver to fail during initialization.

    To prevent this failure every time a new width is introduced to the RDMA
    stack, we will set a default 4x width for these widths which ar unknown to
    the driver.

    This is needed to allow to run old kernels with new firmware.

    Cc: # 4.1
    Fixes: 1b5daf11b015 ("IB/mlx5: Avoid using the MAD_IFC command under ISSI > 0 mode")
    Signed-off-by: Michael Guralnik
    Reviewed-by: Majd Dibbiny
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Michael Guralnik
     
  • commit 24c3456c8d5ee6fc1933ca40f7b4406130682668 upstream.

    If for some reason we failed to query the mr status, we need to make sure
    to provide sufficient information for an ambiguous error (guard error on
    sector 0).

    Fixes: 0a7a08ad6f5f ("IB/iser: Implement check_protection")
    Cc:
    Reported-by: Dan Carpenter
    Signed-off-by: Sagi Grimberg
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Sagi Grimberg
     

01 Dec, 2018

3 commits

  • commit 5a7189d529cd146cd5838af97b32fcac4122b471 upstream.

    If i40iw_allocate_dma_mem fails when creating a QP, the
    memory allocated for the QP structure using kzalloc is not
    freed because iwqp->allocated_buffer is used to free the
    memory and it is not setup until later. Fix this by setting
    iwqp->allocated_buffer before allocating the dma memory.

    Fixes: d37498417947 ("i40iw: add files for iwarp interface")
    Signed-off-by: Mustafa Ismail
    Signed-off-by: Shiraz Saleem
    Signed-off-by: Doug Ledford
    Signed-off-by: Sudip Mukherjee
    Signed-off-by: Greg Kroah-Hartman

    Mustafa Ismail
     
  • commit a0e0cb82804a6a21d9067022c2dfdf80d11da429 upstream.

    pq_update() can only be called in two places: from the completion
    function when the complete (npkts) sequence of packets has been
    submitted and processed, or from setup function if a subset of the
    packets were submitted (i.e. the error path).

    Currently both paths can call pq_update() if an error occurrs. This
    race will cause the n_req value to go negative, hanging file_close(),
    or cause a crash by freeing the txlist more than once.

    Several variables are used to determine SDMA send state. Most of
    these are unnecessary, and have code inspectible races between the
    setup function and the completion function, in both the send path and
    the error path.

    The request 'status' value can be set by the setup or by the
    completion function. This is code inspectibly racy. Since the status
    is not needed in the completion code or by the caller it has been
    removed.

    The request 'done' value races between usage by the setup and the
    completion function. The completion function does not need this.
    When the number of processed packets matches npkts, it is done.

    The 'has_error' value races between usage of the setup and the
    completion function. This can cause incorrect error handling and leave
    the n_req in an incorrect value (i.e. negative).

    Simplify the code by removing all of the unneeded state checks and
    variables.

    Clean up iovs node when it is freed.

    Eliminate race conditions in the error path:

    If all packets are submitted, the completion handler will set the
    completion status correctly (ok or aborted).

    If all packets are not submitted, the caller must wait until the
    submitted packets have completed, and then set the completion status.

    These two change eliminate the race condition in the error path.

    Reviewed-by: Mitko Haralanov
    Reviewed-by: Mike Marciniszyn
    Signed-off-by: Michael J. Ruhl
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Michael J. Ruhl
     
  • commit b2bedfb39541a7e14798d066b6f8685d84c8fcf5 upstream.

    Currently qp->port stores the port number whenever IB_QP_PORT
    QP attribute mask is set (during QP state transition to INIT state).
    This port number should be stored for the real QP when XRC target QP
    is used.

    Follow the ib_modify_qp() implementation and hide the access to ->real_qp.

    Fixes: a512c2fbef9c ("IB/core: Introduce modify QP operation with udata")
    Signed-off-by: Parav Pandit
    Reviewed-by: Daniel Jurgens
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Parav Pandit
     

14 Nov, 2018

5 commits

  • commit 013c2403bf32e48119aeb13126929f81352cc7ac upstream.

    Schedule MR cache work only after bucket was initialized.

    Cc: # 4.10
    Fixes: 49780d42dfc9 ("IB/mlx5: Expose MR cache for mlx5_ib")
    Signed-off-by: Artemy Kovalyov
    Reviewed-by: Majd Dibbiny
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Artemy Kovalyov
     
  • [ Upstream commit b97db58557f4aa6d9903f8e1deea6b3d1ed0ba43 ]

    Don't reset the resp opcode for a replayed read response.
    The resp opcode could be in the middle of a write or send
    sequence, when the duplicate read request was received.
    An example sequence is as follows:
    - Receive read request for 12KB PSN 20. Transmit read response
    first, middle and last with PSNs 20,21,22.
    - Receive write first PSN 23.
    At this point the resp psn is 24 and resp opcode is write first.
    - The sender notices that PSN 20 is dropped and retransmits.
    Receive read request for 12KB PSN 20. Transmit read response
    first, middle and last with PSNs 20,21,22. The resp opcode is
    set to -1, the resp psn remains 24.
    - Receive write first PSN 23. This is processed by duplicate_request().
    The resp opcode remains -1 and resp psn remains 24.
    - Receive write middle PSN 24. check_op_seq() reports a missing
    first error since the resp opcode is -1.

    When sending an ack for a duplicate send or write request,
    use the psn of the previous ack sent. Do not use the psn
    of a read response for the ack.
    An example sequence is as follows:
    - Receive write PSN 30. Transmit ACK for PSN 30.
    - Receive read request 4KB PSN 31. Transmit read response with
    PSN 31. The resp psn is now 32.
    - The sender notices that PSN 30 is dropped and retransmits.
    Receive write PSN 30. duplicate_request() sends an ACK with
    PSN 31. That is incorrect since PSN 31 was a read request.

    Signed-off-by: Vijay Immanuel
    Signed-off-by: Doug Ledford
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Vijay Immanuel
     
  • [ Upstream commit d455f29f6d76a5f94881ca1289aaa1e90617ff5d ]

    Fix possible recursive lock warning. Its a false warning as the locks are
    part of two differnt HW Queue data structure - cmdq and creq. Debug kernel
    is throwing the following warning and stack trace.

    [ 783.914967] ============================================
    [ 783.914970] WARNING: possible recursive locking detected
    [ 783.914973] 4.19.0-rc2+ #33 Not tainted
    [ 783.914976] --------------------------------------------
    [ 783.914979] swapper/2/0 is trying to acquire lock:
    [ 783.914982] 000000002aa3949d (&(&hwq->lock)->rlock){..-.}, at: bnxt_qplib_service_creq+0x232/0x350 [bnxt_re]
    [ 783.914999]
    but task is already holding lock:
    [ 783.915002] 00000000be73920d (&(&hwq->lock)->rlock){..-.}, at: bnxt_qplib_service_creq+0x2a/0x350 [bnxt_re]
    [ 783.915013]
    other info that might help us debug this:
    [ 783.915016] Possible unsafe locking scenario:

    [ 783.915019] CPU0
    [ 783.915021] ----
    [ 783.915034] lock(&(&hwq->lock)->rlock);
    [ 783.915035] lock(&(&hwq->lock)->rlock);
    [ 783.915037]
    *** DEADLOCK ***

    [ 783.915038] May be due to missing lock nesting notation

    [ 783.915039] 1 lock held by swapper/2/0:
    [ 783.915040] #0: 00000000be73920d (&(&hwq->lock)->rlock){..-.}, at: bnxt_qplib_service_creq+0x2a/0x350 [bnxt_re]
    [ 783.915044]
    stack backtrace:
    [ 783.915046] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.19.0-rc2+ #33
    [ 783.915047] Hardware name: Dell Inc. PowerEdge R730/0599V5, BIOS 1.0.4 08/28/2014
    [ 783.915048] Call Trace:
    [ 783.915049]
    [ 783.915054] dump_stack+0x90/0xe3
    [ 783.915058] __lock_acquire+0x106c/0x1080
    [ 783.915061] ? sched_clock+0x5/0x10
    [ 783.915063] lock_acquire+0xbd/0x1a0
    [ 783.915065] ? bnxt_qplib_service_creq+0x232/0x350 [bnxt_re]
    [ 783.915069] _raw_spin_lock_irqsave+0x4a/0x90
    [ 783.915071] ? bnxt_qplib_service_creq+0x232/0x350 [bnxt_re]
    [ 783.915073] bnxt_qplib_service_creq+0x232/0x350 [bnxt_re]
    [ 783.915078] tasklet_action_common.isra.17+0x197/0x1b0
    [ 783.915081] __do_softirq+0xcb/0x3a6
    [ 783.915084] irq_exit+0xe9/0x100
    [ 783.915085] do_IRQ+0x6a/0x120
    [ 783.915087] common_interrupt+0xf/0xf
    [ 783.915088]

    Use nested notation for the spin_lock to avoid this warning.

    Signed-off-by: Selvin Xavier
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Selvin Xavier
     
  • [ Upstream commit 4d6e4d12da2c308f8f976d3955c45ee62539ac98 ]

    IPCB should be cleared before icmp_send, since it may contain data from
    previous layers and the data could be misinterpreted as ip header options,
    which later caused the ihl to be set to an invalid value and resulted in
    the following stack corruption:

    [ 1083.031512] ib0: packet len 57824 (> 2048) too long to send, dropping
    [ 1083.031843] ib0: packet len 37904 (> 2048) too long to send, dropping
    [ 1083.032004] ib0: packet len 4040 (> 2048) too long to send, dropping
    [ 1083.032253] ib0: packet len 63800 (> 2048) too long to send, dropping
    [ 1083.032481] ib0: packet len 23960 (> 2048) too long to send, dropping
    [ 1083.033149] ib0: packet len 63800 (> 2048) too long to send, dropping
    [ 1083.033439] ib0: packet len 63800 (> 2048) too long to send, dropping
    [ 1083.033700] ib0: packet len 63800 (> 2048) too long to send, dropping
    [ 1083.034124] ib0: packet len 63800 (> 2048) too long to send, dropping
    [ 1083.034387] ==================================================================
    [ 1083.034602] BUG: KASAN: stack-out-of-bounds in __ip_options_echo+0xf08/0x1310
    [ 1083.034798] Write of size 4 at addr ffff880353457c5f by task kworker/u16:0/7
    [ 1083.034990]
    [ 1083.035104] CPU: 7 PID: 7 Comm: kworker/u16:0 Tainted: G O 4.19.0-rc5+ #1
    [ 1083.035316] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu2 04/01/2014
    [ 1083.035573] Workqueue: ipoib_wq ipoib_cm_skb_reap [ib_ipoib]
    [ 1083.035750] Call Trace:
    [ 1083.035888] dump_stack+0x9a/0xeb
    [ 1083.036031] print_address_description+0xe3/0x2e0
    [ 1083.036213] kasan_report+0x18a/0x2e0
    [ 1083.036356] ? __ip_options_echo+0xf08/0x1310
    [ 1083.036522] __ip_options_echo+0xf08/0x1310
    [ 1083.036688] icmp_send+0x7b9/0x1cd0
    [ 1083.036843] ? icmp_route_lookup.constprop.9+0x1070/0x1070
    [ 1083.037018] ? netif_schedule_queue+0x5/0x200
    [ 1083.037180] ? debug_show_all_locks+0x310/0x310
    [ 1083.037341] ? rcu_dynticks_curr_cpu_in_eqs+0x85/0x120
    [ 1083.037519] ? debug_locks_off+0x11/0x80
    [ 1083.037673] ? debug_check_no_obj_freed+0x207/0x4c6
    [ 1083.037841] ? check_flags.part.27+0x450/0x450
    [ 1083.037995] ? debug_check_no_obj_freed+0xc3/0x4c6
    [ 1083.038169] ? debug_locks_off+0x11/0x80
    [ 1083.038318] ? skb_dequeue+0x10e/0x1a0
    [ 1083.038476] ? ipoib_cm_skb_reap+0x2b5/0x650 [ib_ipoib]
    [ 1083.038642] ? netif_schedule_queue+0xa8/0x200
    [ 1083.038820] ? ipoib_cm_skb_reap+0x544/0x650 [ib_ipoib]
    [ 1083.038996] ipoib_cm_skb_reap+0x544/0x650 [ib_ipoib]
    [ 1083.039174] process_one_work+0x912/0x1830
    [ 1083.039336] ? wq_pool_ids_show+0x310/0x310
    [ 1083.039491] ? lock_acquire+0x145/0x3a0
    [ 1083.042312] worker_thread+0x87/0xbb0
    [ 1083.045099] ? process_one_work+0x1830/0x1830
    [ 1083.047865] kthread+0x322/0x3e0
    [ 1083.050624] ? kthread_create_worker_on_cpu+0xc0/0xc0
    [ 1083.053354] ret_from_fork+0x3a/0x50

    For instance __ip_options_echo is failing to proceed with invalid srr and
    optlen passed from another layer via IPCB

    [ 762.139568] IPv4: __ip_options_echo rr=0 ts=0 srr=43 cipso=0
    [ 762.139720] IPv4: ip_options_build: IPCB 00000000f3cd969e opt 000000002ccb3533
    [ 762.139838] IPv4: __ip_options_echo in srr: optlen 197 soffset 84
    [ 762.139852] IPv4: ip_options_build srr=0 is_frag=0 rr_needaddr=0 ts_needaddr=0 ts_needtime=0 rr=0 ts=0
    [ 762.140269] ==================================================================
    [ 762.140713] IPv4: __ip_options_echo rr=0 ts=0 srr=0 cipso=0
    [ 762.141078] BUG: KASAN: stack-out-of-bounds in __ip_options_echo+0x12ec/0x1680
    [ 762.141087] Write of size 4 at addr ffff880353457c7f by task kworker/u16:0/7

    Signed-off-by: Denis Drozdov
    Reviewed-by: Erez Shitrit
    Reviewed-by: Feras Daoud
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Denis Drozdov
     
  • [ Upstream commit 0f6ef65d1c6ec8deb5d0f11f86631ec4cfe8f22e ]

    If the provider driver (such as rdma_rxe) doesn't support pma counters,
    avoid exposing its directory similar to optional hw_counters directory.
    If core fails to read the PMA counter, return an error so that user can
    retry later if needed.

    Fixes: 35c4cbb17811 ("IB/core: Create get_perf_mad function in sysfs.c")
    Reported-by: Holger Hoffstätte
    Tested-by: Holger Hoffstätte
    Signed-off-by: Parav Pandit
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Doug Ledford
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Parav Pandit
     

10 Nov, 2018

2 commits

  • commit 0295e39595e1146522f2722715dba7f7fba42217 upstream.

    hdr.cmd can be indirectly controlled by user-space, hence leading to
    a potential exploitation of the Spectre variant 1 vulnerability.

    This issue was detected with the help of Smatch:

    drivers/infiniband/core/ucm.c:1127 ib_ucm_write() warn: potential
    spectre issue 'ucm_cmd_table' [r] (local cap)

    Fix this by sanitizing hdr.cmd before using it to index
    ucm_cmd_table.

    Notice that given that speculation windows are large, the policy is
    to kill the speculation on the first load and not worry if it can be
    completed with a dependent load/store [1].

    [1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2

    Cc: stable@vger.kernel.org
    Signed-off-by: Gustavo A. R. Silva
    Signed-off-by: Doug Ledford
    Signed-off-by: Greg Kroah-Hartman

    Gustavo A. R. Silva
     
  • commit a3671a4f973ee9d9621d60166cc3b037c397d604 upstream.

    hdr.cmd can be indirectly controlled by user-space, hence leading to
    a potential exploitation of the Spectre variant 1 vulnerability.

    This issue was detected with the help of Smatch:

    drivers/infiniband/core/ucma.c:1686 ucma_write() warn: potential
    spectre issue 'ucma_cmd_table' [r] (local cap)

    Fix this by sanitizing hdr.cmd before using it to index
    ucm_cmd_table.

    Notice that given that speculation windows are large, the policy is
    to kill the speculation on the first load and not worry if it can be
    completed with a dependent load/store [1].

    [1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2

    Cc: stable@vger.kernel.org
    Signed-off-by: Gustavo A. R. Silva
    Signed-off-by: Doug Ledford
    Signed-off-by: Greg Kroah-Hartman

    Gustavo A. R. Silva
     

04 Nov, 2018

5 commits

  • [ Upstream commit 43cbd64b1fdc1da89abdad88a022d9e87a98e9c6 ]

    usnic has a modified version of the core codes' ib_umem_get() and
    related, and the copy misses many of the bug fixes done over the years:

    Commit bc3e53f682d9 ("mm: distinguish between mlocked and pinned pages")
    Commit 87773dd56d54 ("IB: ib_umem_release() should decrement mm->pinned_vm
    from ib_umem_get")
    Commit 8494057ab5e4 ("IB/uverbs: Prevent integer overflow in ib_umem_get
    address arithmetic")
    Commit 8abaae62f3fd ("IB/core: disallow registering 0-sized memory region")
    Commit 66578b0b2f69 ("IB/core: don't disallow registering region starting
    at 0x0")
    Commit 53376fedb9da ("RDMA/core: not to set page dirty bit if it's already
    set.")
    Commit 8e907ed48827 ("IB/umem: Use the correct mm during ib_umem_release")

    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin

    Jason Gunthorpe
     
  • [ Upstream commit e7b169f34403becd3c9fd3b6e46614ab788f2187 ]

    During QP creation, the mlx5 driver translates the QP type to an
    internal value which is passed on to FW. There was no check to make
    sure that the translated value is valid, and -EINVAL was coerced into
    the mailbox command.

    Current firmware refuses this as an invalid QP type, but future/past
    firmware may do something else.

    Fixes: 09a7d9eca1a6c ('{net,IB}/mlx5: QP/XRCD commands via mlx5 ifc')
    Reviewed-by: Ilya Lesokhin
    Signed-off-by: Noa Osherovich
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin

    Noa Osherovich
     
  • [ Upstream commit 6082d9c9c94a408d7409b5f2e4e42ac9e8b16d0d ]

    Adding the vector offset when calling to mlx5_vector2eqn() is wrong.
    This is because mlx5_vector2eqn() checks if EQ index is equal to vector number
    and the fact that the internal completion vectors that mlx5 allocates
    don't get an EQ index.

    The second problem here is that using effective_affinity_mask gives the same
    CPU for different vectors.
    This leads to unmapped queues when calling it from blk_mq_rdma_map_queues().
    This doesn't happen when using affinity_hint mask.

    Fixes: 2572cf57d75a ("mlx5: fix mlx5_get_vector_affinity to start from completion vector 0")
    Fixes: 05e0cc84e00c ("net/mlx5: Fix get vector affinity helper function")
    Signed-off-by: Israel Rukshin
    Reviewed-by: Max Gurtovoy
    Reviewed-by: Sagi Grimberg
    Signed-off-by: Sasha Levin

    Israel Rukshin
     
  • [ Upstream commit 6b9f8970cd30929cb6b372fa44fa66da9e59c650 ]

    If the allocation of elem fails, it is not sufficient to simply check
    for NULL and return. We need to also put our reference on the pool or
    else we will leave the pool with a permanent ref count and we will never
    be able to free it.

    Fixes: 4831ca9e4a8e ("IB/rxe: check for allocation failure on elem")
    Suggested-by: Leon Romanovsky
    Signed-off-by: Doug Ledford
    Signed-off-by: Sasha Levin

    Doug Ledford
     
  • [ Upstream commit 1f80bd6a6cc8358b81194e1f5fc16449947396ec ]

    The locking order of vlan_rwsem (LOCK A) and then rtnl (LOCK B),
    contradicts other flows such as ipoib_open possibly causing a deadlock.
    To prevent this deadlock heavy flush is called with RTNL locked and
    only then tries to acquire vlan_rwsem.
    This deadlock is possible only when there are child interfaces.

    [ 140.941758] ======================================================
    [ 140.946276] WARNING: possible circular locking dependency detected
    [ 140.950950] 4.15.0-rc1+ #9 Tainted: G O
    [ 140.954797] ------------------------------------------------------
    [ 140.959424] kworker/u32:1/146 is trying to acquire lock:
    [ 140.963450] (rtnl_mutex){+.+.}, at: [] __ipoib_ib_dev_flush+0x2da/0x4e0 [ib_ipoib]
    [ 140.970006]
    but task is already holding lock:
    [ 140.975141] (&priv->vlan_rwsem){++++}, at: [] __ipoib_ib_dev_flush+0x51/0x4e0 [ib_ipoib]
    [ 140.982105]
    which lock already depends on the new lock.
    [ 140.990023]
    the existing dependency chain (in reverse order) is:
    [ 140.998650]
    -> #1 (&priv->vlan_rwsem){++++}:
    [ 141.005276] down_read+0x4d/0xb0
    [ 141.009560] ipoib_open+0xad/0x120 [ib_ipoib]
    [ 141.014400] __dev_open+0xcb/0x140
    [ 141.017919] __dev_change_flags+0x1a4/0x1e0
    [ 141.022133] dev_change_flags+0x23/0x60
    [ 141.025695] devinet_ioctl+0x704/0x7d0
    [ 141.029156] sock_do_ioctl+0x20/0x50
    [ 141.032526] sock_ioctl+0x221/0x300
    [ 141.036079] do_vfs_ioctl+0xa6/0x6d0
    [ 141.039656] SyS_ioctl+0x74/0x80
    [ 141.042811] entry_SYSCALL_64_fastpath+0x1f/0x96
    [ 141.046891]
    -> #0 (rtnl_mutex){+.+.}:
    [ 141.051701] lock_acquire+0xd4/0x220
    [ 141.055212] __mutex_lock+0x88/0x970
    [ 141.058631] __ipoib_ib_dev_flush+0x2da/0x4e0 [ib_ipoib]
    [ 141.063160] __ipoib_ib_dev_flush+0x71/0x4e0 [ib_ipoib]
    [ 141.067648] process_one_work+0x1f5/0x610
    [ 141.071429] worker_thread+0x4a/0x3f0
    [ 141.074890] kthread+0x141/0x180
    [ 141.078085] ret_from_fork+0x24/0x30
    [ 141.081559]

    other info that might help us debug this:
    [ 141.088967] Possible unsafe locking scenario:
    [ 141.094280] CPU0 CPU1
    [ 141.097953] ---- ----
    [ 141.101640] lock(&priv->vlan_rwsem);
    [ 141.104771] lock(rtnl_mutex);
    [ 141.109207] lock(&priv->vlan_rwsem);
    [ 141.114032] lock(rtnl_mutex);
    [ 141.116800]
    *** DEADLOCK ***

    Fixes: b4b678b06f6e ("IB/ipoib: Grab rtnl lock on heavy flush when calling ndo_open/stop")
    Signed-off-by: Alex Vesker
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin

    Alex Vesker
     

20 Oct, 2018

1 commit

  • commit b4a4957d3d1c328b733fce783b7264996f866ad2 upstream.

    rvt_destroy_qp() cannot complete until all in process packets have
    been released from the underlying hardware. If a link down event
    occurs, an application can hang with a kernel stack similar to:

    cat /proc//stack
    quiesce_qp+0x178/0x250 [hfi1]
    rvt_reset_qp+0x23d/0x400 [rdmavt]
    rvt_destroy_qp+0x69/0x210 [rdmavt]
    ib_destroy_qp+0xba/0x1c0 [ib_core]
    nvme_rdma_destroy_queue_ib+0x46/0x80 [nvme_rdma]
    nvme_rdma_free_queue+0x3c/0xd0 [nvme_rdma]
    nvme_rdma_destroy_io_queues+0x88/0xd0 [nvme_rdma]
    nvme_rdma_error_recovery_work+0x52/0xf0 [nvme_rdma]
    process_one_work+0x17a/0x440
    worker_thread+0x126/0x3c0
    kthread+0xcf/0xe0
    ret_from_fork+0x58/0x90
    0xffffffffffffffff

    quiesce_qp() waits until all outstanding packets have been freed.
    This wait should be momentary. During a link down event, the cleanup
    handling does not ensure that all packets caught by the link down are
    flushed properly.

    This is caused by the fact that the freeze path and the link down
    event is handled the same. This is not correct. The freeze path
    waits until the HFI is unfrozen and then restarts PIO. A link down
    is not a freeze event. The link down path cannot restart the PIO
    until link is restored. If the PIO path is restarted before the link
    comes up, the application (QP) using the PIO path will hang (until
    link is restored).

    Fix by separating the linkdown path from the freeze path and use the
    link down path for link down events.

    Close a race condition sc_disable() by acquiring both the progress
    and release locks.

    Close a race condition in sc_stop() by moving the setting of the flag
    bits under the alloc lock.

    Cc: # 4.9.x+
    Fixes: 7724105686e7 ("IB/hfi1: add driver files")
    Reviewed-by: Mike Marciniszyn
    Signed-off-by: Michael J. Ruhl
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Michael J. Ruhl
     

13 Oct, 2018

1 commit

  • commit 5fe23f262e0548ca7f19fb79f89059a60d087d22 upstream.

    There is a race condition between ucma_close() and ucma_resolve_ip():

    CPU0 CPU1
    ucma_resolve_ip(): ucma_close():

    ctx = ucma_get_ctx(file, cmd.id);

    list_for_each_entry_safe(ctx, tmp, &file->ctx_list, list) {
    mutex_lock(&mut);
    idr_remove(&ctx_idr, ctx->id);
    mutex_unlock(&mut);
    ...
    mutex_lock(&mut);
    if (!ctx->closing) {
    mutex_unlock(&mut);
    rdma_destroy_id(ctx->cm_id);
    ...
    ucma_free_ctx(ctx);

    ret = rdma_resolve_addr();
    ucma_put_ctx(ctx);

    Before idr_remove(), ucma_get_ctx() could still find the ctx
    and after rdma_destroy_id(), rdma_resolve_addr() may still
    access id_priv pointer. Also, ucma_put_ctx() may use ctx after
    ucma_free_ctx() too.

    ucma_close() should call ucma_put_ctx() too which tests the
    refcnt and waits for the last one releasing it. The similar
    pattern is already used by ucma_destroy_id().

    Reported-and-tested-by: syzbot+da2591e115d57a9cbb8b@syzkaller.appspotmail.com
    Reported-by: syzbot+cfe3c1e8ef634ba8964b@syzkaller.appspotmail.com
    Cc: Jason Gunthorpe
    Cc: Doug Ledford
    Cc: Leon Romanovsky
    Signed-off-by: Cong Wang
    Reviewed-by: Leon Romanovsky
    Signed-off-by: Doug Ledford
    Signed-off-by: Greg Kroah-Hartman

    Cong Wang
     

10 Oct, 2018

1 commit

  • [ Upstream commit 0d23ba6034b9cf48b8918404367506da3e4b3ee5 ]

    The current code grabs the private_data of whatever file descriptor
    userspace has supplied and implicitly casts it to a `struct ucma_file *`,
    potentially causing a type confusion.

    This is probably fine in practice because the pointer is only used for
    comparisons, it is never actually dereferenced; and even in the
    comparisons, it is unlikely that a file from another filesystem would have
    a ->private_data pointer that happens to also be valid in this context.
    But ->private_data is not always guaranteed to be a valid pointer to an
    object owned by the file's filesystem; for example, some filesystems just
    cram numbers in there.

    Check the type of the supplied file descriptor to be safe, analogous to how
    other places in the kernel do it.

    Fixes: 88314e4dda1e ("RDMA/cma: add support for rdma_migrate_id()")
    Signed-off-by: Jann Horn
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Jann Horn
     

04 Oct, 2018

9 commits

  • commit 67e3816842fe6414d629c7515b955952ec40c7d7 upstream.

    Currently a uverbs completion event queue is flushed of events in
    ib_uverbs_comp_event_close() with the queue spinlock held and then
    released. Yet setting ev_queue->is_closed is not set until later in
    uverbs_hot_unplug_completion_event_file().

    In between the time ib_uverbs_comp_event_close() releases the lock and
    uverbs_hot_unplug_completion_event_file() acquires the lock, a completion
    event can arrive and be inserted into the event queue by
    ib_uverbs_comp_handler().

    This can cause a "double add" list_add warning or crash depending on the
    kernel configuration, or a memory leak because the event is never dequeued
    since the queue is already closed down.

    So add setting ev_queue->is_closed = 1 to ib_uverbs_comp_event_close().

    Cc: stable@vger.kernel.org
    Fixes: 1e7710f3f656 ("IB/core: Change completion channel to use the reworked objects schema")
    Signed-off-by: Steve Wise
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Steve Wise
     
  • commit d623500b3c4efd8d4e945ac9003c6b87b469a9ab upstream.

    If a packet stream uses an UnsupportedVL (virtual lane), the send
    engine will not send the packet, and it will not indicate that an
    error has occurred. This will cause the packet stream to block.

    HFI has 8 virtual lanes available for packet streams. Each lane can
    be enabled or disabled using the UnsupportedVL mask. If a lane is
    disabled, adding a packet to the send context must be disallowed.

    The current mask for determining unsupported VLs defaults to 0 (allow
    all). This is incorrect. Only the VLs that are defined should be
    allowed.

    Determine which VLs are disabled (mtu == 0), and set the appropriate
    unsupported bit in the mask. The correct mask will allow the send
    engine to error on the invalid VL, and error recovery will work
    correctly.

    Cc: # 4.9.x+
    Fixes: 7724105686e7 ("IB/hfi1: add driver files")
    Reviewed-by: Mike Marciniszyn
    Reviewed-by: Lukasz Odzioba
    Signed-off-by: Michael J. Ruhl
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Michael J. Ruhl
     
  • commit 94694d18cf27a6faad91487a38ce516c2b16e7d9 upstream.

    If the number of packets in a user sdma request does not match
    the actual iovectors being sent, sdma_cleanup can be called on
    an uninitialized request structure, resulting in a crash similar
    to this:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
    IP: [] __sdma_txclean+0x57/0x1e0 [hfi1]
    PGD 8000001044f61067 PUD 1052706067 PMD 0
    Oops: 0000 [#1] SMP
    CPU: 30 PID: 69912 Comm: upsm Kdump: loaded Tainted: G OE
    ------------ 3.10.0-862.el7.x86_64 #1
    Hardware name: Intel Corporation S2600KPR/S2600KPR, BIOS
    SE5C610.86B.01.01.0019.101220160604 10/12/2016
    task: ffff8b331c890000 ti: ffff8b2ed1f98000 task.ti: ffff8b2ed1f98000
    RIP: 0010:[] [] __sdma_txclean+0x57/0x1e0
    [hfi1]
    RSP: 0018:ffff8b2ed1f9bab0 EFLAGS: 00010286
    RAX: 0000000000008b2b RBX: ffff8b2adf6e0000 RCX: 0000000000000000
    RDX: 00000000000000a0 RSI: ffff8b2e9eedc540 RDI: ffff8b2adf6e0000
    RBP: ffff8b2ed1f9bad8 R08: 0000000000000000 R09: ffffffffc0b04a06
    R10: ffff8b331c890190 R11: ffffe6ed00bf1840 R12: ffff8b3315480000
    R13: ffff8b33154800f0 R14: 00000000fffffff2 R15: ffff8b2e9eedc540
    FS: 00007f035ac47740(0000) GS:ffff8b331e100000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000008 CR3: 0000000c03fe6000 CR4: 00000000001607e0
    Call Trace:
    [] user_sdma_send_pkts+0xdcd/0x1990 [hfi1]
    [] ? gup_pud_range+0x140/0x290
    [] ? hfi1_mmu_rb_insert+0x155/0x1b0 [hfi1]
    [] hfi1_user_sdma_process_request+0xc5b/0x11b0 [hfi1]
    [] hfi1_aio_write+0xba/0x110 [hfi1]
    [] do_sync_readv_writev+0x7b/0xd0
    [] do_readv_writev+0xce/0x260
    [] ? tty_ldisc_deref+0x19/0x20
    [] ? n_tty_ioctl+0xe0/0xe0
    [] vfs_writev+0x35/0x60
    [] SyS_writev+0x7f/0x110
    [] system_call_fastpath+0x1c/0x21
    Code: 06 49 c7 47 18 00 00 00 00 0f 87 89 01 00 00 5b 41 5c 41 5d 41 5e 41 5f
    5d c3 66 2e 0f 1f 84 00 00 00 00 00 48 8b 4e 10 48 89 fb 8b 51 08 49 89 d4
    83 e2 0c 41 81 e4 00 e0 00 00 48 c1 ea 02
    RIP [] __sdma_txclean+0x57/0x1e0 [hfi1]
    RSP
    CR2: 0000000000000008

    There are two exit points from user_sdma_send_pkts(). One (free_tx)
    merely frees the slab entry and one (free_txreq) cleans the sdma_txreq
    prior to freeing the slab entry. The free_txreq variation can only be
    called after one of the sdma_init*() variations has been called.

    In the panic case, the slab entry had been allocated but not inited.

    Fix the issue by exiting through free_tx thus avoiding sdma_clean().

    Cc: # 4.9.x+
    Fixes: 7724105686e7 ("IB/hfi1: add driver files")
    Reviewed-by: Mike Marciniszyn
    Reviewed-by: Lukasz Odzioba
    Signed-off-by: Michael J. Ruhl
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Greg Kroah-Hartman

    Signed-off-by: Jason Gunthorpe

    Michael J. Ruhl
     
  • commit 0dbfaa9f2813787679e296eb5476e40938ab48c8 upstream.

    The SL specified by a user needs to be a valid SL.

    Add a range check to the user specified SL value which protects from
    running off the end of the SL to SC table.

    CC: stable@vger.kernel.org
    Fixes: 7724105686e7 ("IB/hfi1: add driver files")
    Signed-off-by: Ira Weiny
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Ira Weiny
     
  • commit ee92efe41cf358f4b99e73509f2bfd4733609f26 upstream.

    Use different loop variables for the inner and outer loop. This avoids
    that an infinite loop occurs if there are more RDMA channels than
    target->req_ring_size.

    Fixes: d92c0da71a35 ("IB/srp: Add multichannel support")
    Cc:
    Signed-off-by: Bart Van Assche
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Bart Van Assche
     
  • [ Upstream commit f1228867adaf8890826f2b59e4caddb1c5cc2df7 ]

    rdma_ah_find_type() can reach into ib_device->port_immutable with a
    potentially out-of-bounds port number, so check that the port number is
    valid first.

    Fixes: 44c58487d51a ("IB/core: Define 'ib' and 'roce' rdma_ah_attr types")
    Signed-off-by: Tarick Bedeir
    Reviewed-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Tarick Bedeir
     
  • [ Upstream commit c2d7c8ff89b22ddefb1ac2986c0d48444a667689 ]

    "nents" is an unsigned int, so if ib_map_mr_sg() returns a negative
    error code then it's type promoted to a high unsigned int which is
    treated as success.

    Fixes: a060b5629ab0 ("IB/core: generic RDMA READ/WRITE API")
    Signed-off-by: Dan Carpenter
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Dan Carpenter
     
  • [ Upstream commit 5d9a2b0e28759e319a623da33940dbb3ce952b7d ]

    VMA lookup is supposed to be performed while mmap_sem is held.

    Fixes: f26c7c83395b ("i40iw: Add 2MB page support")
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Leon Romanovsky
     
  • [ Upstream commit 474e5a86067e5f12c97d1db8b170c7f45b53097a ]

    The sgid_tbl->tbl[] array is allocated in bnxt_qplib_alloc_sgid_tbl().
    It has sgid_tbl->max elements. So the > should be >= to prevent
    accessing one element beyond the end of the array.

    Fixes: 1ac5a4047975 ("RDMA/bnxt_re: Add bnxt_re RoCE driver")
    Signed-off-by: Dan Carpenter
    Acked-by: Selvin Xavier
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Dan Carpenter