18 Jan, 2020

1 commit


04 Oct, 2019

1 commit

  • Currently, RDS calls ib_dma_alloc_coherent() to allocate a large piece
    of contiguous DMA coherent memory to store struct rds_header for
    sending/receiving packets. The memory allocated is then partitioned
    into struct rds_header. This is not necessary and can be costly at
    times when memory is fragmented. Instead, RDS should use the DMA
    memory pool interface to handle this. The DMA addresses of the pre-
    allocated headers are stored in an array. At send/receive ring
    initialization and refill time, this arrary is de-referenced to get
    the DMA addresses. This array is not accessed at send/receive packet
    processing.

    Suggested-by: Håkon Bugge
    Signed-off-by: Ka-Cheong Poon
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Ka-Cheong Poon
     

03 Oct, 2019

1 commit


10 Jul, 2019

1 commit

  • RDS composite message(rdma + control) user notification needs to be
    triggered once the full message is delivered and such a fix was
    added as part of commit 941f8d55f6d61 ("RDS: RDMA: Fix the composite
    message user notification"). But rds_send_remove_from_sock is missing
    data part notify check and hence at times the user don't get
    notification which isn't desirable.

    One way is to fix the rds_send_remove_from_sock to check of that case
    but considering the ordering complexity with completion handler and
    rdma + control messages are always dispatched back to back in same send
    context, just delaying the signaled completion on rmda work request also
    gets the desired behaviour. i.e Notifying application only after
    RDMA + control message send completes. So patch updates the earlier
    fix with this approach. The delay signaling completions of rdma op
    till the control message send completes fix was done by Venkat
    Venkatsubra in downstream kernel.

    Reviewed-and-tested-by: Zhu Yanjun
    Reviewed-by: Gerd Rausch
    Signed-off-by: Santosh Shilimkar

    Santosh Shilimkar
     

10 Mar, 2019

1 commit

  • Pull rdma updates from Jason Gunthorpe:
    "This has been a slightly more active cycle than normal with ongoing
    core changes and quite a lot of collected driver updates.

    - Various driver fixes for bnxt_re, cxgb4, hns, mlx5, pvrdma, rxe

    - A new data transfer mode for HFI1 giving higher performance

    - Significant functional and bug fix update to the mlx5
    On-Demand-Paging MR feature

    - A chip hang reset recovery system for hns

    - Change mm->pinned_vm to an atomic64

    - Update bnxt_re to support a new 57500 chip

    - A sane netlink 'rdma link add' method for creating rxe devices and
    fixing the various unregistration race conditions in rxe's
    unregister flow

    - Allow lookup up objects by an ID over netlink

    - Various reworking of the core to driver interface:
    - drivers should not assume umem SGLs are in PAGE_SIZE chunks
    - ucontext is accessed via udata not other means
    - start to make the core code responsible for object memory
    allocation
    - drivers should convert struct device to struct ib_device via a
    helper
    - drivers have more tools to avoid use after unregister problems"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (280 commits)
    net/mlx5: ODP support for XRC transport is not enabled by default in FW
    IB/hfi1: Close race condition on user context disable and close
    RDMA/umem: Revert broken 'off by one' fix
    RDMA/umem: minor bug fix in error handling path
    RDMA/hns: Use GFP_ATOMIC in hns_roce_v2_modify_qp
    cxgb4: kfree mhp after the debug print
    IB/rdmavt: Fix concurrency panics in QP post_send and modify to error
    IB/rdmavt: Fix loopback send with invalidate ordering
    IB/iser: Fix dma_nents type definition
    IB/mlx5: Set correct write permissions for implicit ODP MR
    bnxt_re: Clean cq for kernel consumers only
    RDMA/uverbs: Don't do double free of allocated PD
    RDMA: Handle ucontext allocations by IB/core
    RDMA/core: Fix a WARN() message
    bnxt_re: fix the regression due to changes in alloc_pbl
    IB/mlx4: Increase the timeout for CM cache
    IB/core: Abort page fault handler silently during owning process exit
    IB/mlx5: Validate correct PD before prefetch MR
    IB/mlx5: Protect against prefetch of invalid MR
    RDMA/uverbs: Store PR pointer before it is overwritten
    ...

    Linus Torvalds
     

05 Feb, 2019

3 commits

  • For RDMA transports, RDS TOS is an extension of IB QoS(Annex A13)
    to provide clients the ability to segregate traffic flows for
    different type of data. RDMA CM abstract it for ULPs using
    rdma_set_service_type(). Internally, each traffic flow is
    represented by a connection with all of its independent resources
    like that of a normal connection, and is differentiated by
    service type. In other words, there can be multiple qp connections
    between an IP pair and each supports a unique service type.

    The feature has been added from RDSv4.1 onwards and supports
    rolling upgrades. RDMA connection metadata also carries the tos
    information to set up SL on end to end context. The original
    code was developed by Bang Nguyen in downstream kernel back in
    2.6.32 kernel days and it has evolved over period of time.

    Reviewed-by: Sowmini Varadhan
    Signed-off-by: Santosh Shilimkar
    [yanjun.zhu@oracle.com: Adapted original patch with ipv6 changes]
    Signed-off-by: Zhu Yanjun

    Santosh Shilimkar
     
  • Linux 5.0-rc5

    Needed to merge the include/uapi changes so we have an up to date
    single-tree for these files. Patches already posted are also expected to
    need this for dependencies.

    Jason Gunthorpe
     
  • Keeping single line wrapper functions is not useful. Hence remove the
    ib_sg_dma_address() and ib_sg_dma_len() functions. This patch does not
    change any functionality.

    Signed-off-by: Bart Van Assche
    Signed-off-by: Jason Gunthorpe

    Bart Van Assche
     

07 Jan, 2019

1 commit


17 Aug, 2018

1 commit

  • rdma.git merge resolution for the 4.19 merge window

    Conflicts:
    drivers/infiniband/core/rdma_core.c
    - Use the rdma code and revise with the new spelling for
    atomic_fetch_add_unless
    drivers/nvme/host/rdma.c
    - Replace max_sge with max_send_sge in new blk code
    drivers/nvme/target/rdma.c
    - Use the blk code and revise to use NULL for ib_post_recv when
    appropriate
    - Replace max_sge with max_recv_sge in new blk code
    net/rds/ib_send.c
    - Use the net code and revise to use NULL for ib_post_recv when
    appropriate

    Signed-off-by: Jason Gunthorpe

    Jason Gunthorpe
     

02 Aug, 2018

1 commit

  • Variable 'rds_ibdev' is being assigned but never used,
    so can be removed.

    fix this clang warning:
    net/rds/ib_send.c:762:24: warning: variable ‘rds_ibdev’ set but not used [-Wunused-but-set-variable]

    Signed-off-by: YueHaibing
    Signed-off-by: David S. Miller

    YueHaibing
     

31 Jul, 2018

1 commit

  • Since neither ib_post_send() nor ib_post_recv() modify the data structure
    their second argument points at, declare that argument const. This change
    makes it necessary to declare the 'bad_wr' argument const too and also to
    modify all ULPs that call ib_post_send(), ib_post_recv() or
    ib_post_srq_recv(). This patch does not change any functionality but makes
    it possible for the compiler to verify whether the
    ib_post_(send|recv|srq_recv) really do not modify the posted work request.

    To make this possible, only one cast had to be introduce that casts away
    constness, namely in rpcrdma_post_recvs(). The only way I can think of to
    avoid that cast is to introduce an additional loop in that function or to
    change the data type of bad_wr from struct ib_recv_wr ** into int
    (an index that refers to an element in the work request list). However,
    both approaches would require even more extensive changes than this
    patch.

    Signed-off-by: Bart Van Assche
    Reviewed-by: Chuck Lever
    Signed-off-by: Jason Gunthorpe

    Bart Van Assche
     

24 Jul, 2018

1 commit

  • This patch changes the internal representation of an IP address to use
    struct in6_addr. IPv4 address is stored as an IPv4 mapped address.
    All the functions which take an IP address as argument are also
    changed to use struct in6_addr. But RDS socket layer is not modified
    such that it still does not accept IPv6 address from an application.
    And RDS layer does not accept nor initiate IPv6 connections.

    v2: Fixed sparse warnings.

    Signed-off-by: Ka-Cheong Poon
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Ka-Cheong Poon
     

26 Oct, 2017

2 commits

  • The number of unsignaled work-requests posted to the IB send queue is
    tracked by a counter in the rds_ib_connection struct. When it reaches
    zero, or the caller explicitly asks for it, the send-signaled bit is
    set in send_flags and the counter is reset. This is performed by the
    rds_ib_set_wr_signal_state() function.

    However, this function is not always used which yields inaccurate
    accounting. This commit fixes this, re-factors a code bloat related to
    the matter, and makes the actual parameter type to the function
    consistent.

    Signed-off-by: Håkon Bugge
    Signed-off-by: David S. Miller

    Håkon Bugge
     
  • send_flags needs to be initialized before calling
    rds_ib_set_wr_signal_state().

    Signed-off-by: Håkon Bugge
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Håkon Bugge
     

18 Feb, 2017

1 commit

  • In the function rds_ib_xmit_atomic, ib_ring is not allocated
    successfully. As such, it is not necessary to unalloc it.

    Cc: Joe Jin
    Cc: Junxiao Bi
    Signed-off-by: Zhu Yanjun
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Zhu Yanjun
     

03 Jan, 2017

2 commits

  • When application sends an RDS RDMA composite message consist of
    RDMA transfer to be followed up by non RDMA payload, it expect to
    be notified *only* when the full message gets delivered. RDS RDMA
    notification doesn't behave this way though.

    Thanks to Venkat for debug and root casuing the issue
    where only first part of the message(RDMA) was
    successfully delivered but remainder payload delivery failed.
    In that case, application should not be notified with
    a false positive of message delivery success.

    Fix this case by making sure the user gets notified only after
    the full message delivery.

    Reviewed-by: Venkat Venkatsubra
    Signed-off-by: Santosh Shilimkar

    Santosh Shilimkar
     
  • Also use pr_* for it.

    Signed-off-by: Santosh Shilimkar

    Santosh Shilimkar
     

02 Jul, 2016

1 commit

  • Refactor code to avoid separate indirections for single-path
    and multipath transports. All transports (both single and mp-capable)
    will get a pointer to the rds_conn_path, and can trivially derive
    the rds_connection from the ->cp_conn.

    Acked-by: Santosh Shilimkar
    Signed-off-by: Sowmini Varadhan
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     

15 Jun, 2016

1 commit

  • In preparation for multipath RDS, split the rds_connection
    structure into a base structure, and a per-path struct rds_conn_path.
    The base structure tracks information and locks common to all
    paths. The workqs for send/recv/shutdown etc are tracked per
    rds_conn_path. Thus the workq callbacks now work with rds_conn_path.

    This commit allows for one rds_conn_path per rds_connection, and will
    be extended into multiple conn_paths in subsequent commits.

    Signed-off-by: Sowmini Varadhan
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     

03 Mar, 2016

1 commit


08 Nov, 2015

1 commit

  • Pull rdma updates from Doug Ledford:
    "This is my initial round of 4.4 merge window patches. There are a few
    other things I wish to get in for 4.4 that aren't in this pull, as
    this represents what has gone through merge/build/run testing and not
    what is the last few items for which testing is not yet complete.

    - "Checksum offload support in user space" enablement
    - Misc cxgb4 fixes, add T6 support
    - Misc usnic fixes
    - 32 bit build warning fixes
    - Misc ocrdma fixes
    - Multicast loopback prevention extension
    - Extend the GID cache to store and return attributes of GIDs
    - Misc iSER updates
    - iSER clustering update
    - Network NameSpace support for rdma CM
    - Work Request cleanup series
    - New Memory Registration API"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (76 commits)
    IB/core, cma: Make __attribute_const__ declarations sparse-friendly
    IB/core: Remove old fast registration API
    IB/ipath: Remove fast registration from the code
    IB/hfi1: Remove fast registration from the code
    RDMA/nes: Remove old FRWR API
    IB/qib: Remove old FRWR API
    iw_cxgb4: Remove old FRWR API
    RDMA/cxgb3: Remove old FRWR API
    RDMA/ocrdma: Remove old FRWR API
    IB/mlx4: Remove old FRWR API support
    IB/mlx5: Remove old FRWR API support
    IB/srp: Dont allocate a page vector when using fast_reg
    IB/srp: Remove srp_finish_mapping
    IB/srp: Convert to new registration API
    IB/srp: Split srp_map_sg
    RDS/IW: Convert to new memory registration API
    svcrdma: Port to new memory registration API
    xprtrdma: Port to new memory registration API
    iser-target: Port to new memory registration API
    IB/iser: Port to new fast registration API
    ...

    Linus Torvalds
     

08 Oct, 2015

1 commit

  • This patch split up struct ib_send_wr so that all non-trivial verbs
    use their own structure which embedds struct ib_send_wr. This dramaticly
    shrinks the size of a WR for most common operations:

    sizeof(struct ib_send_wr) (old): 96

    sizeof(struct ib_send_wr): 48
    sizeof(struct ib_rdma_wr): 64
    sizeof(struct ib_atomic_wr): 96
    sizeof(struct ib_ud_wr): 88
    sizeof(struct ib_fast_reg_wr): 88
    sizeof(struct ib_bind_mw_wr): 96
    sizeof(struct ib_sig_handover_wr): 80

    And with Sagi's pending MR rework the fast registration WR will also be
    down to a reasonable size:

    sizeof(struct ib_fastreg_wr): 64

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Bart Van Assche [srp, srpt]
    Reviewed-by: Chuck Lever [sunrpc]
    Tested-by: Haggai Eran
    Tested-by: Sagi Grimberg
    Tested-by: Steve Wise

    Christoph Hellwig
     

06 Oct, 2015

1 commit

  • Similar to what we did with receive CQ completion handling, we split
    the transmit completion handler so that it lets us implement batched
    work completion handling.

    We re-use the cq_poll routine and makes use of RDS_IB_SEND_OP to
    identify the send vs receive completion event handler invocation.

    Signed-off-by: Santosh Shilimkar
    Signed-off-by: Santosh Shilimkar

    Santosh Shilimkar
     

09 Sep, 2015

1 commit

  • Pull inifiniband/rdma updates from Doug Ledford:
    "This is a fairly sizeable set of changes. I've put them through a
    decent amount of testing prior to sending the pull request due to
    that.

    There are still a few fixups that I know are coming, but I wanted to
    go ahead and get the big, sizable chunk into your hands sooner rather
    than waiting for those last few fixups.

    Of note is the fact that this creates what is intended to be a
    temporary area in the drivers/staging tree specifically for some
    cleanups and additions that are coming for the RDMA stack. We
    deprecated two drivers (ipath and amso1100) and are waiting to hear
    back if we can deprecate another one (ehca). We also put Intel's new
    hfi1 driver into this area because it needs to be refactored and a
    transfer library created out of the factored out code, and then it and
    the qib driver and the soft-roce driver should all be modified to use
    that library.

    I expect drivers/staging/rdma to be around for three or four kernel
    releases and then to go away as all of the work is completed and final
    deletions of deprecated drivers are done.

    Summary of changes for 4.3:

    - Create drivers/staging/rdma
    - Move amso1100 driver to staging/rdma and schedule for deletion
    - Move ipath driver to staging/rdma and schedule for deletion
    - Add hfi1 driver to staging/rdma and set TODO for move to regular
    tree
    - Initial support for namespaces to be used on RDMA devices
    - Add RoCE GID table handling to the RDMA core caching code
    - Infrastructure to support handling of devices with differing read
    and write scatter gather capabilities
    - Various iSER updates
    - Kill off unsafe usage of global mr registrations
    - Update SRP driver
    - Misc mlx4 driver updates
    - Support for the mr_alloc verb
    - Support for a netlink interface between kernel and user space cache
    daemon to speed path record queries and route resolution
    - Ininitial support for safe hot removal of verbs devices"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (136 commits)
    IB/ipoib: Suppress warning for send only join failures
    IB/ipoib: Clean up send-only multicast joins
    IB/srp: Fix possible protection fault
    IB/core: Move SM class defines from ib_mad.h to ib_smi.h
    IB/core: Remove unnecessary defines from ib_mad.h
    IB/hfi1: Add PSM2 user space header to header_install
    IB/hfi1: Add CSRs for CONFIG_SDMA_VERBOSITY
    mlx5: Fix incorrect wc pkey_index assignment for GSI messages
    IB/mlx5: avoid destroying a NULL mr in reg_user_mr error flow
    IB/uverbs: reject invalid or unknown opcodes
    IB/cxgb4: Fix if statement in pick_local_ip6adddrs
    IB/sa: Fix rdma netlink message flags
    IB/ucma: HW Device hot-removal support
    IB/mlx4_ib: Disassociate support
    IB/uverbs: Enable device removal when there are active user space applications
    IB/uverbs: Explicitly pass ib_dev to uverbs commands
    IB/uverbs: Fix race between ib_uverbs_open and remove_one
    IB/uverbs: Fix reference counting usage of event files
    IB/core: Make ib_dealloc_pd return void
    IB/srp: Create an insecure all physical rkey only if needed
    ...

    Linus Torvalds
     

31 Aug, 2015

1 commit


26 Aug, 2015

1 commit

  • WR(Work Requests )always generate a WC(Work Completion) with
    signaled send. Default RDS ib code is setup for un-signaled
    completion. Since RDS connction is persistent, we can end up
    sending the data even after large-send when the remote end is
    not active(for any reason).

    By doing a signaled send at least once per large-send,
    we can at least detect the problem in work completion
    handler there by avoiding sending more data to
    inactive remote.

    Reviewed-by: Ajaykumar Hotchandani
    Signed-off-by: Santosh Shilimkar
    Signed-off-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    santosh.shilimkar@oracle.com
     

02 Jun, 2015

2 commits

  • Doug Ledford
     
  • The BUG_ON at line 452/453 is triggered in function rds_send_xmit.

    441 while (ret) {
    442 tmp = min_t(int, ret, sg->length -
    443 conn->c_xmit_data_off);
    444 conn->c_xmit_data_off += tmp;
    445 ret -= tmp;
    446 if (conn->c_xmit_data_off == sg->length) {
    447 conn->c_xmit_data_off = 0;
    448 sg++;
    449 conn->c_xmit_sg++;
    450 if (ret != 0 && conn->c_xmit_sg == rm->data.op_nents)
    451 printk(KERN_ERR "conn %p rm %p sg %p ret %d\n", conn, rm, sg, ret);
    452 BUG_ON(ret != 0 &&
    453 conn->c_xmit_sg == rm->data.op_nents);
    454 }
    455 }

    it is complaining the total sent length is bigger that we want to send.

    rds_ib_xmit() is wrong for the second entry for the same rds_message returning
    wrong value.

    the sg and off passed by rds_send_xmit to rds_ib_xmit is based on
    scatterlist.offset/length, but the rds_ib_xmit action is based on
    scatterlist.dma_address/dma_length. in case dma_length is larger than length
    there is problem. for the 2nd and later entries of rds_ib_xmit for same
    rds_message, at least one of the following two is wrong:

    1) the scatterlist to start with, the choosen one can far beyond the correct
    one.
    2) the offset to start with within the scatterlist.

    fix:
    add op_dmasg and op_dmaoff to rm_data_op structure indicating the scatterlist
    and offset within the it to start with for rds_ib_xmit respectively. op_dmasg
    and op_dmaoff are initialized to zero when doing dma mapping for the first see
    of the message and are changed when filling send slots.

    the same applies to rds_iw_xmit too.

    Signed-off-by: Wengang Wang
    Signed-off-by: Doug Ledford

    Wengang Wang
     

19 May, 2015

1 commit


08 Feb, 2015

1 commit


19 May, 2014

1 commit


04 Dec, 2013

1 commit

  • After congestion update on a local connection, when rds_ib_xmit returns
    less bytes than that are there in the message, rds_send_xmit calls
    back rds_ib_xmit with an offset that causes BUG_ON(off & RDS_FRAG_SIZE)
    to trigger.

    For a 4Kb PAGE_SIZE rds_ib_xmit returns min(8240,4096)=4096 when actually
    the message contains 8240 bytes. rds_send_xmit thinks there is more to send
    and calls rds_ib_xmit again with a data offset "off" of 4096-48(rds header)
    =4048 bytes thus hitting the BUG_ON(off & RDS_FRAG_SIZE) [RDS_FRAG_SIZE=4k].

    The commit 6094628bfd94323fc1cea05ec2c6affd98c18f7f
    "rds: prevent BUG_ON triggering on congestion map updates" introduced
    this regression. That change was addressing the triggering of a different
    BUG_ON in rds_send_xmit() on PowerPC architecture with 64Kbytes PAGE_SIZE:
    BUG_ON(ret != 0 &&
    conn->c_xmit_sg == rm->data.op_nents);
    This was the sequence it was going through:
    (rds_ib_xmit)
    /* Do not send cong updates to IB loopback */
    if (conn->c_loopback
    && rm->m_inc.i_hdr.h_flags & RDS_FLAG_CONG_BITMAP) {
    rds_cong_map_updated(conn->c_fcong, ~(u64) 0);
    return sizeof(struct rds_header) + RDS_CONG_MAP_BYTES;
    }
    rds_ib_xmit returns 8240
    rds_send_xmit:
    c_xmit_data_off = 0 + 8240 - 48 (rds header accounted only the first time)
    = 8192
    c_xmit_data_off < 65536 (sg->length), so calls rds_ib_xmit again
    rds_ib_xmit returns 8240
    rds_send_xmit:
    c_xmit_data_off = 8192 + 8240 = 16432, calls rds_ib_xmit again
    and so on (c_xmit_data_off 24672,32912,41152,49392,57632)
    rds_ib_xmit returns 8240
    On this iteration this sequence causes the BUG_ON in rds_send_xmit:
    while (ret) {
    tmp = min_t(int, ret, sg->length - conn->c_xmit_data_off);
    [tmp = 65536 - 57632 = 7904]
    conn->c_xmit_data_off += tmp;
    [c_xmit_data_off = 57632 + 7904 = 65536]
    ret -= tmp;
    [ret = 8240 - 7904 = 336]
    if (conn->c_xmit_data_off == sg->length) {
    conn->c_xmit_data_off = 0;
    sg++;
    conn->c_xmit_sg++;
    BUG_ON(ret != 0 &&
    conn->c_xmit_sg == rm->data.op_nents);
    [c_xmit_sg = 1, rm->data.op_nents = 1]

    What the current fix does:
    Since the congestion update over loopback is not actually transmitted
    as a message, all that rds_ib_xmit needs to do is let the caller think
    the full message has been transmitted and not return partial bytes.
    It will return 8240 (RDS_CONG_MAP_BYTES+48) when PAGE_SIZE is 4Kb.
    And 64Kb+48 when page size is 64Kb.

    Reported-by: Josh Hunt
    Tested-by: Honggang Li
    Acked-by: Bang Nguyen
    Signed-off-by: Venkat Venkatsubra
    Signed-off-by: David S. Miller

    Venkat Venkatsubra
     

17 Jun, 2011

1 commit


31 Mar, 2011

1 commit


09 Mar, 2011

1 commit

  • Recently had this bug halt reported to me:

    kernel BUG at net/rds/send.c:329!
    Oops: Exception in kernel mode, sig: 5 [#1]
    SMP NR_CPUS=1024 NUMA pSeries
    Modules linked in: rds sunrpc ipv6 dm_mirror dm_region_hash dm_log ibmveth sg
    ext4 jbd2 mbcache sd_mod crc_t10dif ibmvscsic scsi_transport_srp scsi_tgt
    dm_mod [last unloaded: scsi_wait_scan]
    NIP: d000000003ca68f4 LR: d000000003ca67fc CTR: d000000003ca8770
    REGS: c000000175cab980 TRAP: 0700 Not tainted (2.6.32-118.el6.ppc64)
    MSR: 8000000000029032 CR: 44000022 XER: 00000000
    TASK = c00000017586ec90[1896] 'krdsd' THREAD: c000000175ca8000 CPU: 0
    GPR00: 0000000000000150 c000000175cabc00 d000000003cb7340 0000000000002030
    GPR04: ffffffffffffffff 0000000000000030 0000000000000000 0000000000000030
    GPR08: 0000000000000001 0000000000000001 c0000001756b1e30 0000000000010000
    GPR12: d000000003caac90 c000000000fa2500 c0000001742b2858 c0000001742b2a00
    GPR16: c0000001742b2a08 c0000001742b2820 0000000000000001 0000000000000001
    GPR20: 0000000000000040 c0000001742b2814 c000000175cabc70 0800000000000000
    GPR24: 0000000000000004 0200000000000000 0000000000000000 c0000001742b2860
    GPR28: 0000000000000000 c0000001756b1c80 d000000003cb68e8 c0000001742b27b8
    NIP [d000000003ca68f4] .rds_send_xmit+0x4c4/0x8a0 [rds]
    LR [d000000003ca67fc] .rds_send_xmit+0x3cc/0x8a0 [rds]
    Call Trace:
    [c000000175cabc00] [d000000003ca67fc] .rds_send_xmit+0x3cc/0x8a0 [rds]
    (unreliable)
    [c000000175cabd30] [d000000003ca7e64] .rds_send_worker+0x54/0x100 [rds]
    [c000000175cabdb0] [c0000000000b475c] .worker_thread+0x1dc/0x3c0
    [c000000175cabed0] [c0000000000baa9c] .kthread+0xbc/0xd0
    [c000000175cabf90] [c000000000032114] .kernel_thread+0x54/0x70
    Instruction dump:
    4bfffd50 60000000 60000000 39080001 935f004c f91f0040 41820024 813d017c
    7d094a78 7d290074 7929d182 394a0020 40e2ff68 4bffffa4 39200000
    Kernel panic - not syncing: Fatal exception
    Call Trace:
    [c000000175cab560] [c000000000012e04] .show_stack+0x74/0x1c0 (unreliable)
    [c000000175cab610] [c0000000005a365c] .panic+0x80/0x1b4
    [c000000175cab6a0] [c00000000002fbcc] .die+0x21c/0x2a0
    [c000000175cab750] [c000000000030000] ._exception+0x110/0x220
    [c000000175cab910] [c000000000004b9c] program_check_common+0x11c/0x180

    Signed-off-by: David S. Miller

    Neil Horman
     

09 Sep, 2010

4 commits

  • Add two CMSGs for masked versions of cswp and fadd. args
    struct modified to use a union for different atomic op type's
    arguments. Change IB to do masked atomic ops. Atomic op type
    in rds_message similarly unionized.

    Signed-off-by: Andy Grover

    Andy Grover
     
  • This prints the constant identifier for work completion status and rdma
    cm event types, like we already do for IB event types.

    A core string array helper is added that each string type uses.

    Signed-off-by: Zach Brown

    Zach Brown
     
  • We're seeing bugs today where IB connection shutdown clears the send
    ring while the tasklet is processing completed sends. Implementation
    details cause this to dereference a null pointer. Shutdown needs to
    wait for send completion to stop before tearing down the connection. We
    can't simply wait for the ring to empty because it may contain
    unsignaled sends that will never be processed.

    This patch tracks the number of signaled sends that we've posted and
    waits for them to complete. It also makes sure that the tasklet has
    finished executing.

    Signed-off-by: Zach Brown

    Zach Brown
     
  • rds_send_xmit() was changed to hold an interrupt masking spinlock instead of a
    mutex so that it could be called from the IB receive tasklet path. This broke
    the TCP transport because its xmit method can block and masks and unmasks
    interrupts.

    This patch serializes callers to rds_send_xmit() with a simple bit instead of
    the current spinlock or previous mutex. This enables rds_send_xmit() to be
    called from any context and to call functions which block. Getting rid of the
    c_send_lock exposes the bare c_lock acquisitions which are changed to block
    interrupts.

    A waitqueue is added so that rds_conn_shutdown() can wait for callers to leave
    rds_send_xmit() before tearing down partial send state. This lets us get rid
    of c_senders.

    rds_send_xmit() is changed to check the conn state after acquiring the
    RDS_IN_XMIT bit to resolve races with the shutdown path. Previously both
    worked with the conn state and then the lock in the same order, allowing them
    to race and execute the paths concurrently.

    rds_send_reset() isn't racing with rds_send_xmit() now that rds_conn_shutdown()
    properly ensures that rds_send_xmit() can't start once the conn state has been
    changed. We can remove its previous use of the spinlock.

    Finally, c_send_generation is redundant. Callers can race to test the c_flags
    bit by simply retrying instead of racing to test the c_send_generation atomic.

    Signed-off-by: Zach Brown

    Zach Brown