17 Jun, 2011

1 commit


31 Mar, 2011

1 commit


09 Mar, 2011

1 commit

  • Recently had this bug halt reported to me:

    kernel BUG at net/rds/send.c:329!
    Oops: Exception in kernel mode, sig: 5 [#1]
    SMP NR_CPUS=1024 NUMA pSeries
    Modules linked in: rds sunrpc ipv6 dm_mirror dm_region_hash dm_log ibmveth sg
    ext4 jbd2 mbcache sd_mod crc_t10dif ibmvscsic scsi_transport_srp scsi_tgt
    dm_mod [last unloaded: scsi_wait_scan]
    NIP: d000000003ca68f4 LR: d000000003ca67fc CTR: d000000003ca8770
    REGS: c000000175cab980 TRAP: 0700 Not tainted (2.6.32-118.el6.ppc64)
    MSR: 8000000000029032 CR: 44000022 XER: 00000000
    TASK = c00000017586ec90[1896] 'krdsd' THREAD: c000000175ca8000 CPU: 0
    GPR00: 0000000000000150 c000000175cabc00 d000000003cb7340 0000000000002030
    GPR04: ffffffffffffffff 0000000000000030 0000000000000000 0000000000000030
    GPR08: 0000000000000001 0000000000000001 c0000001756b1e30 0000000000010000
    GPR12: d000000003caac90 c000000000fa2500 c0000001742b2858 c0000001742b2a00
    GPR16: c0000001742b2a08 c0000001742b2820 0000000000000001 0000000000000001
    GPR20: 0000000000000040 c0000001742b2814 c000000175cabc70 0800000000000000
    GPR24: 0000000000000004 0200000000000000 0000000000000000 c0000001742b2860
    GPR28: 0000000000000000 c0000001756b1c80 d000000003cb68e8 c0000001742b27b8
    NIP [d000000003ca68f4] .rds_send_xmit+0x4c4/0x8a0 [rds]
    LR [d000000003ca67fc] .rds_send_xmit+0x3cc/0x8a0 [rds]
    Call Trace:
    [c000000175cabc00] [d000000003ca67fc] .rds_send_xmit+0x3cc/0x8a0 [rds]
    (unreliable)
    [c000000175cabd30] [d000000003ca7e64] .rds_send_worker+0x54/0x100 [rds]
    [c000000175cabdb0] [c0000000000b475c] .worker_thread+0x1dc/0x3c0
    [c000000175cabed0] [c0000000000baa9c] .kthread+0xbc/0xd0
    [c000000175cabf90] [c000000000032114] .kernel_thread+0x54/0x70
    Instruction dump:
    4bfffd50 60000000 60000000 39080001 935f004c f91f0040 41820024 813d017c
    7d094a78 7d290074 7929d182 394a0020 40e2ff68 4bffffa4 39200000
    Kernel panic - not syncing: Fatal exception
    Call Trace:
    [c000000175cab560] [c000000000012e04] .show_stack+0x74/0x1c0 (unreliable)
    [c000000175cab610] [c0000000005a365c] .panic+0x80/0x1b4
    [c000000175cab6a0] [c00000000002fbcc] .die+0x21c/0x2a0
    [c000000175cab750] [c000000000030000] ._exception+0x110/0x220
    [c000000175cab910] [c000000000004b9c] program_check_common+0x11c/0x180

    Signed-off-by: David S. Miller

    Neil Horman
     

09 Sep, 2010

23 commits

  • Add two CMSGs for masked versions of cswp and fadd. args
    struct modified to use a union for different atomic op type's
    arguments. Change IB to do masked atomic ops. Atomic op type
    in rds_message similarly unionized.

    Signed-off-by: Andy Grover

    Andy Grover
     
  • This prints the constant identifier for work completion status and rdma
    cm event types, like we already do for IB event types.

    A core string array helper is added that each string type uses.

    Signed-off-by: Zach Brown

    Zach Brown
     
  • We're seeing bugs today where IB connection shutdown clears the send
    ring while the tasklet is processing completed sends. Implementation
    details cause this to dereference a null pointer. Shutdown needs to
    wait for send completion to stop before tearing down the connection. We
    can't simply wait for the ring to empty because it may contain
    unsignaled sends that will never be processed.

    This patch tracks the number of signaled sends that we've posted and
    waits for them to complete. It also makes sure that the tasklet has
    finished executing.

    Signed-off-by: Zach Brown

    Zach Brown
     
  • rds_send_xmit() was changed to hold an interrupt masking spinlock instead of a
    mutex so that it could be called from the IB receive tasklet path. This broke
    the TCP transport because its xmit method can block and masks and unmasks
    interrupts.

    This patch serializes callers to rds_send_xmit() with a simple bit instead of
    the current spinlock or previous mutex. This enables rds_send_xmit() to be
    called from any context and to call functions which block. Getting rid of the
    c_send_lock exposes the bare c_lock acquisitions which are changed to block
    interrupts.

    A waitqueue is added so that rds_conn_shutdown() can wait for callers to leave
    rds_send_xmit() before tearing down partial send state. This lets us get rid
    of c_senders.

    rds_send_xmit() is changed to check the conn state after acquiring the
    RDS_IN_XMIT bit to resolve races with the shutdown path. Previously both
    worked with the conn state and then the lock in the same order, allowing them
    to race and execute the paths concurrently.

    rds_send_reset() isn't racing with rds_send_xmit() now that rds_conn_shutdown()
    properly ensures that rds_send_xmit() can't start once the conn state has been
    changed. We can remove its previous use of the spinlock.

    Finally, c_send_generation is redundant. Callers can race to test the c_flags
    bit by simply retrying instead of racing to test the c_send_generation atomic.

    Signed-off-by: Zach Brown

    Zach Brown
     
  • rds_ib_xmit_rdma() was calling ib_get_client_data() to get at the rds_ibdevice
    just to get the max_sge for the transmit. This patch instead has it get it
    directly off the rds_ibdev which is stored on the connection.

    The current code won't free the rds_ibdev until all the IB connections that use
    it are freed. So it's safe to reference the rds_ibdev this way. In the future
    it also makes it easier to support proper reference counting of the rds_ibdev
    struct.

    As an additional bonus, this gets rid of the performance hit of calling in to
    the IB stack to look up the rds_ibdev. The current implementation in the IB
    stack acquires an interrupt blocking spinlock to protect the registration of
    client callback data.

    Signed-off-by: Zach Brown

    Zach Brown
     
  • This makes sure we have the proper number of references in
    rds_ib_xmit_atomic and rds_ib_xmit_rdma. We also consistently
    drop references the same way for all message types as the IOs end.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • The RDS send_xmit code was trying to get fancy with message
    counting and was dropping the final reference on the RDMA messages
    too early. This resulted in memory corruption and oopsen.

    The fix here is to always add a ref as the parts of the message passes
    through rds_send_xmit, and always drop a ref as the parts of the message
    go through completion handling.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • Signed-off-by: Andy Grover

    Andy Grover
     
  • Previously, RDS would wait until the final send WR had completed
    and then handle cleanup. With silent ops, we do not know
    if an atomic, rdma, or data op will be last. This patch
    handles any of these cases by keeping a pointer to the last
    op in the message in m_last_op.

    When the TX completion event fires, rds dispatches to per-op-type
    cleanup functions, and then does whole-message cleanup, if the
    last op equalled m_last_op.

    This patch also moves towards having op-specific functions take
    the op struct, instead of the overall rm struct.

    rds_ib_connection has a pointer to keep track of a a partially-
    completed data send operation. This patch changes it from an
    rds_message pointer to the narrower rm_data_op pointer, and
    modifies places that use this pointer as needed.

    Signed-off-by: Andy Grover

    Andy Grover
     
  • For consistency.

    Signed-off-by: Andy Grover

    Andy Grover
     
  • A big changeset, but it's all pretty dumb.

    struct rds_rdma_op was already embedded in struct rm_rdma_op.
    Remove rds_rdma_op and put its members in rm_rdma_op. Rename
    members with "op_" prefix instead of "r_", for consistency.

    Of course this breaks a lot, so fixup the code accordingly.

    Signed-off-by: Andy Grover

    Andy Grover
     
  • Signed-off-by: Andy Grover

    Andy Grover
     
  • Maybe things worked fine with the flow control code running
    even in the non-flow-control case, but making it explicitly
    conditional helps the non-fc case be easier to read.

    Signed-off-by: Andy Grover

    Andy Grover
     
  • Removed unsignaled_bytes sysctl and code to signal
    based on it. I believe unsignaled_wrs is more than
    sufficient for our purposes.

    Signed-off-by: Andy Grover

    Andy Grover
     
  • Now that the header always goes first, it is possible to
    simplify rds_ib_xmit. Instead of having a path to handle 0-byte
    dgrams and another path to handle >0, these can both be handled
    in one path. This lets us eliminate xmit_populate_wr().

    Rename sent to bytes_sent, to differentiate better from other
    variable named "send".

    Signed-off-by: Andy Grover

    Andy Grover
     
  • These functions were to cope with differently ordered
    sg entries depending on RDS 3.0 or 3.1+. Now that
    we've dropped 3.0 compatibility we no longer need them.

    Also, modify usage sites for these to refer to sge[0] or [1]
    directly. Reorder code to initialize header sgs first.

    Signed-off-by: Andy Grover

    Andy Grover
     
  • Signed-off-by: Andy Grover

    Andy Grover
     
  • both atomics and rdmas need to convert ib-specific completion codes
    into RDS status codes. Rename rds_ib_rdma_send_complete to
    rds_ib_send_complete, and have it take a pointer to the function to
    call with the new error code.

    Signed-off-by: Andy Grover

    Andy Grover
     
  • Implement a CMSG-based interface to do FADD and CSWP ops.

    Alter send routines to handle atomic ops.

    Add atomic counters to stats.

    Add xmit_atomic() to struct rds_transport

    Inline rds_ib_send_unmap_rdma into unmap_rm

    Signed-off-by: Andy Grover

    Andy Grover
     
  • This eliminates a separate memory alloc, although
    it is now necessary to add an "r_active" flag, since
    it is no longer to use the m_rdma_op pointer as an
    indicator of if an rdma op is present.

    rdma SGs allocated from rm sg pool.

    rds_rm_size also gets bigger. It's a little inefficient to
    run through CMSGs twice, but it makes later steps a lot smoother.

    Signed-off-by: Andy Grover

    Andy Grover
     
  • RDMA is now an intrinsic part of RDS, so it's easier to just have
    a single header.

    Signed-off-by: Andy Grover

    Andy Grover
     
  • Clearly separate rdma-related variables in rm from data-related ones.
    This is in anticipation of adding atomic support.

    Signed-off-by: Andy Grover

    Andy Grover
     
  • Favor "if (foo)" style over "if (foo != NULL)".

    Signed-off-by: Andy Grover

    Andy Grover
     

17 Mar, 2010

3 commits

  • If the RDMA op has aborted with a remote access error,
    in addition to what we already do (tell userspace it has
    completed with an error) also unmap it and put() the rm.

    Otherwise, hangs may occur on arches that track maps and
    will not exit without proper cleanup.

    Signed-off-by: Andy Grover
    Signed-off-by: David S. Miller

    Sherman Pun
     
  • We have two kinds of loopback: software (via loop transport)
    and hardware (via IB). sw is used for 127.0.0.1, and doesn't
    support rdma ops. hw is used for sends to local device IPs,
    and supports rdma. Both are used in different cases.

    For both of these, when there is a congestion map update, we
    want to call rds_cong_map_updated() but not actually send
    anything -- since loopback local and foreign congestion maps
    point to the same spot, they're already in sync.

    The old code never called sw loop's xmit_cong_map(),so
    rds_cong_map_updated() wasn't being called for it. sw loop
    ports would not work right with the congestion monitor.

    Fixing that meant that hw loopback now would send congestion maps
    to itself. This is also undesirable (racy), so we check for this
    case in the ib-specific xmit code.

    Signed-off-by: Andy Grover
    Signed-off-by: David S. Miller

    Andy Grover
     
  • BUGging on a runtime error code should be avoided. This
    patch also eliminates all other BUG()s that have no real
    reason to exist.

    Signed-off-by: Andy Grover
    Signed-off-by: David S. Miller

    Andy Grover
     

30 Nov, 2009

1 commit


10 Apr, 2009

2 commits


27 Feb, 2009

1 commit