21 Oct, 2010

1 commit


09 Sep, 2010

30 commits

  • Add two CMSGs for masked versions of cswp and fadd. args
    struct modified to use a union for different atomic op type's
    arguments. Change IB to do masked atomic ops. Atomic op type
    in rds_message similarly unionized.

    Signed-off-by: Andy Grover

    Andy Grover
     
  • This prints the constant identifier for work completion status and rdma
    cm event types, like we already do for IB event types.

    A core string array helper is added that each string type uses.

    Signed-off-by: Zach Brown

    Zach Brown
     
  • Right now there's nothing to stop the various paths that use
    rs->rs_transport from racing with rmmod and executing freed transport
    code. The simple fix is to have binding to a transport also hold a
    reference to the transport's module, removing this class of races.

    We already had an unused t_owner field which was set for the modular
    transports and which wasn't set for the built-in loop transport.

    Signed-off-by: Zach Brown

    Zach Brown
     
  • rs_transport is now also used by the rdma paths once the socket is
    bound. We don't need this stale comment to tell us what cscope can.

    Signed-off-by: Zach Brown

    Zach Brown
     
  • The trivial amount of memory saved isn't worth the cost of dealing with section
    mismatches.

    Signed-off-by: Zach Brown

    Zach Brown
     
  • rds_send_xmit() was changed to hold an interrupt masking spinlock instead of a
    mutex so that it could be called from the IB receive tasklet path. This broke
    the TCP transport because its xmit method can block and masks and unmasks
    interrupts.

    This patch serializes callers to rds_send_xmit() with a simple bit instead of
    the current spinlock or previous mutex. This enables rds_send_xmit() to be
    called from any context and to call functions which block. Getting rid of the
    c_send_lock exposes the bare c_lock acquisitions which are changed to block
    interrupts.

    A waitqueue is added so that rds_conn_shutdown() can wait for callers to leave
    rds_send_xmit() before tearing down partial send state. This lets us get rid
    of c_senders.

    rds_send_xmit() is changed to check the conn state after acquiring the
    RDS_IN_XMIT bit to resolve races with the shutdown path. Previously both
    worked with the conn state and then the lock in the same order, allowing them
    to race and execute the paths concurrently.

    rds_send_reset() isn't racing with rds_send_xmit() now that rds_conn_shutdown()
    properly ensures that rds_send_xmit() can't start once the conn state has been
    changed. We can remove its previous use of the spinlock.

    Finally, c_send_generation is redundant. Callers can race to test the c_flags
    bit by simply retrying instead of racing to test the c_send_generation atomic.

    Signed-off-by: Zach Brown

    Zach Brown
     
  • rds_send_acked_before() wasn't blocking interrupts when acquiring c_lock from
    user context but nothing calls it. Rather than fix its use of c_lock we just
    remove the function.

    Signed-off-by: Zach Brown

    Zach Brown
     
  • A few paths had the same block of code to queue a connection's connect work if
    it was in the right state. Let's move this in to a helper function.

    Signed-off-by: Zach Brown

    Zach Brown
     
  • This is the first in a long line of patches that tries to fix races
    between RDS connection shutdown and RDS traffic.

    Here we are maintaining a count of active senders to make sure
    the connection doesn't go away while they are using it.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • The RDS bind lookups are somewhat expensive in terms of CPU
    time and locking overhead. This commit changes them into a
    faster RCU based hash tree instead of the rbtrees they were using
    before.

    On large NUMA systems it is a significant improvement.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • This removes a global waitqueue used to wait for rds messages
    and replaces it with a waitqueue inside the rds_message struct.

    The global waitqueue turns into a global lock and significantly
    bottlenecks operations on large machines.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • rds_send_xmit is required to loop around after it releases the lock
    because someone else could done a trylock, found someone working on the
    list and backed off.

    But, once we drop our lock, it is possible that someone else does come
    in and make progress on the list. We should detect this and not loop
    around if another process is actually working on the list.

    This patch adds a generation counter that is bumped every time we
    get the lock and do some send work. If the retry notices someone else
    has bumped the generation counter, it does not need to loop around and
    continue working.

    Signed-off-by: Chris Mason
    Signed-off-by: Andy Grover

    Chris Mason
     
  • Signed-off-by: Andy Grover

    Andy Grover
     
  • This change allows us to call rds_send_xmit() from a tasklet,
    which is crucial to our new operating model.

    * Change c_send_lock to a spinlock
    * Update stats fields "sem_" to "_lock"
    * Remove unneeded rds_conn_is_sending()

    About locking between shutdown and send -- send checks if the
    connection is up. Shutdown puts the connection into
    DISCONNECTING. After this, all threads entering send will exit
    immediately. However, a thread could be *in* send_xmit(), so
    shutdown acquires the c_send_lock to ensure everyone is out
    before proceeding with connection shutdown.

    Signed-off-by: Andy Grover

    Andy Grover
     
  • We now ask the transport to give us a rm for the congestion
    map, and then we handle it normally. Previously, the
    transport defined a function that we would call to send
    a congestion map.

    Convert TCP and loop transports to new cong map method.

    Signed-off-by: Andy Grover

    Andy Grover
     
  • Previously, RDS would wait until the final send WR had completed
    and then handle cleanup. With silent ops, we do not know
    if an atomic, rdma, or data op will be last. This patch
    handles any of these cases by keeping a pointer to the last
    op in the message in m_last_op.

    When the TX completion event fires, rds dispatches to per-op-type
    cleanup functions, and then does whole-message cleanup, if the
    last op equalled m_last_op.

    This patch also moves towards having op-specific functions take
    the op struct, instead of the overall rm struct.

    rds_ib_connection has a pointer to keep track of a a partially-
    completed data send operation. This patch changes it from an
    rds_message pointer to the narrower rm_data_op pointer, and
    modifies places that use this pointer as needed.

    Signed-off-by: Andy Grover

    Andy Grover
     
  • Add a flag to the API so users can indicate they want
    silent operations. This is needed because silent ops
    cannot be used with USE_ONCE MRs, so we can't just
    assume silent.

    Also, change send_xmit to do atomic op before rdma op if
    both are present, and centralize the hairy logic to determine if
    we want to attempt silent, or not.

    Signed-off-by: Andy Grover

    Andy Grover
     
  • Also, add a comment.

    Signed-off-by: Andy Grover

    Andy Grover
     
  • Simplify rds_send_xmit().

    Send a congestion map (via xmit_cong_map) without
    decrementing send_quota.

    Move resetting of conn xmit variables to end of loop.

    Update comments.

    Implement a special case to turn off sending an rds header
    when there is an atomic op and no other data.

    Signed-off-by: Andy Grover

    Andy Grover
     
  • For consistency.

    Signed-off-by: Andy Grover

    Andy Grover
     
  • A big changeset, but it's all pretty dumb.

    struct rds_rdma_op was already embedded in struct rm_rdma_op.
    Remove rds_rdma_op and put its members in rm_rdma_op. Rename
    members with "op_" prefix instead of "r_", for consistency.

    Of course this breaks a lot, so fixup the code accordingly.

    Signed-off-by: Andy Grover

    Andy Grover
     
  • Add atomic_free_op function, analogous to rdma_free_op,
    and call it in rds_message_purge().

    Signed-off-by: Andy Grover

    Andy Grover
     
  • Signed-off-by: Andy Grover

    Andy Grover
     
  • Signed-off-by: Andy Grover

    Andy Grover
     
  • Implement a CMSG-based interface to do FADD and CSWP ops.

    Alter send routines to handle atomic ops.

    Add atomic counters to stats.

    Add xmit_atomic() to struct rds_transport

    Inline rds_ib_send_unmap_rdma into unmap_rm

    Signed-off-by: Andy Grover

    Andy Grover
     
  • This eliminates a separate memory alloc, although
    it is now necessary to add an "r_active" flag, since
    it is no longer to use the m_rdma_op pointer as an
    indicator of if an rdma op is present.

    rdma SGs allocated from rm sg pool.

    rds_rm_size also gets bigger. It's a little inefficient to
    run through CMSGs twice, but it makes later steps a lot smoother.

    Signed-off-by: Andy Grover

    Andy Grover
     
  • RDMA is now an intrinsic part of RDS, so it's easier to just have
    a single header.

    Signed-off-by: Andy Grover

    Andy Grover
     
  • r_m_copy_from_user used to allocate the rm as well as kernel
    buffers for the data, and then copy the data in. Now, sendmsg()
    allocates the rm, although the data buffer alloc still happens
    in r_m_copy_from_user.

    SGs are still allocated with rm, but now r_m_alloc_sgs() is
    used to reserve them. This allows multiple SG lists to be
    allocated from the one rm -- this is important once we also
    want to alloc our rdma sgl from this pool.

    Signed-off-by: Andy Grover

    Andy Grover
     
  • Clearly separate rdma-related variables in rm from data-related ones.
    This is in anticipation of adding atomic support.

    Signed-off-by: Andy Grover

    Andy Grover
     
  • This fits better in connection.c, rather than threads.c.

    Signed-off-by: Andy Grover

    Andy Grover
     

21 Apr, 2010

1 commit

  • Define a new function to return the waitqueue of a "struct sock".

    static inline wait_queue_head_t *sk_sleep(struct sock *sk)
    {
    return sk->sk_sleep;
    }

    Change all read occurrences of sk_sleep by a call to this function.

    Needed for a future RCU conversion. sk_sleep wont be a field directly
    available.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

17 Mar, 2010

1 commit

  • rds_poll_waitq's listeners will be awoken if we receive a congestion
    notification. Bad performance may result because *all* polled sockets
    contend for this single lock. However, it should not be necessary to
    wake pollers when a congestion update arrives if they have never
    experienced congestion, and not putting these on the waitq will
    hopefully greatly reduce contention.

    Signed-off-by: Andy Grover
    Signed-off-by: David S. Miller

    Andy Grover
     

24 Aug, 2009

1 commit

  • Now that transports can be loaded in arbitrary order,
    it is important for rds_trans_get_preferred() to look
    for them in a particular order, instead of walking the list
    until it finds a transport that works for a given address.
    Now, each transport registers for a specific transport slot,
    and these are ordered so that preferred transports come first,
    and then if they are not loaded, other transports are queried.

    Signed-off-by: Andy Grover
    Signed-off-by: David S. Miller

    Andy Grover
     

06 Aug, 2009

1 commit


19 May, 2009

1 commit


22 Apr, 2009

1 commit

  • In non-SMP mode, the variable section attribute specified by DECLARE_PER_CPU()
    does not agree with that specified by DEFINE_PER_CPU(). This means that
    architectures that have a small data section references relative to a base
    register may throw up linkage errors due to too great a displacement between
    where the base register points and the per-CPU variable.

    On FRV, the .h declaration says that the variable is in the .sdata section, but
    the .c definition says it's actually in the .data section. The linker throws
    up the following errors:

    kernel/built-in.o: In function `release_task':
    kernel/exit.c:78: relocation truncated to fit: R_FRV_GPREL12 against symbol `per_cpu__process_counts' defined in .data section in kernel/built-in.o
    kernel/exit.c:78: relocation truncated to fit: R_FRV_GPREL12 against symbol `per_cpu__process_counts' defined in .data section in kernel/built-in.o

    To fix this, DECLARE_PER_CPU() should simply apply the same section attribute
    as does DEFINE_PER_CPU(). However, this is made slightly more complex by
    virtue of the fact that there are several variants on DEFINE, so these need to
    be matched by variants on DECLARE.

    Signed-off-by: David Howells
    Signed-off-by: Linus Torvalds

    David Howells
     

10 Apr, 2009

1 commit


02 Apr, 2009

1 commit

  • We have a 64bit value that needs to be set atomically.
    This is easy and quick on all 64bit archs, and can also be done
    on x86/32 with set_64bit() (uses cmpxchg8b). However other
    32b archs don't have this.

    I actually changed this to the current state in preparation for
    mainline because the old way (using a spinlock on 32b) resulted in
    unsightly #ifdefs in the code. But obviously, being correct takes
    precedence.

    Signed-off-by: Andy Grover
    Signed-off-by: David S. Miller

    Andy Grover
     

27 Feb, 2009

1 commit