05 Jul, 2017

1 commit


15 Jun, 2016

1 commit

  • In preparation for multipath RDS, split the rds_connection
    structure into a base structure, and a per-path struct rds_conn_path.
    The base structure tracks information and locks common to all
    paths. The workqs for send/recv/shutdown etc are tracked per
    rds_conn_path. Thus the workq callbacks now work with rds_conn_path.

    This commit allows for one rds_conn_path per rds_connection, and will
    be extended into multiple conn_paths in subsequent commits.

    Signed-off-by: Sowmini Varadhan
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     

11 Jun, 2016

1 commit

  • alloc_workqueue replaces deprecated create_workqueue().

    Since the driver is infiniband which can be used as block device and the
    workqueue seems involved in regular operation of the device, so a
    dedicated workqueue has been used with WQ_MEM_RECLAIM set to guarantee
    forward progress under memory pressure.
    Since there are only a fixed number of work items, explicit concurrency
    limit is unnecessary here.

    Signed-off-by: Bhaktipriya Shridhar
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Bhaktipriya Shridhar
     

03 Mar, 2016

5 commits


06 Oct, 2015

4 commits

  • 8K message sizes are pretty important usecase for RDS current
    workloads so we make provison to have 8K mrs available from the pool.
    Based on number of SG's in the RDS message, we pick a pool to use.

    Also to make sure that we don't under utlise mrs when say 8k messages
    are dominating which could lead to 8k pull being exhausted, we fall-back
    to 1m pool till 8k pool recovers for use.

    This helps to at least push ~55 kB/s bidirectional data which
    is a nice improvement.

    Signed-off-by: Santosh Shilimkar
    Signed-off-by: Santosh Shilimkar

    Santosh Shilimkar
     
  • Fix below warning by marking rds_ib_fmr_wq static

    net/rds/ib_rdma.c:87:25: warning: symbol 'rds_ib_fmr_wq' was not declared. Should it be static?

    Signed-off-by: Santosh Shilimkar
    Signed-off-by: Santosh Shilimkar

    Santosh Shilimkar
     
  • rds_ib_mr already keeps the pool handle which it associates
    with. Lets use that instead of round about way of fetching
    it from rds_ib_device.

    No functional change.

    Signed-off-by: Santosh Shilimkar
    Signed-off-by: Santosh Shilimkar

    Santosh Shilimkar
     
  • RDS IB mr pool has its own workqueue 'rds_ib_fmr_wq', so we need
    to use queue_delayed_work() to kick the work. This was hurting
    the performance since pool maintenance was less often triggered
    from other path.

    Signed-off-by: Santosh Shilimkar
    Signed-off-by: Santosh Shilimkar

    Santosh Shilimkar
     

01 Oct, 2015

1 commit

  • synchronize_rcu() slowing down un-necessarily the socket shutdown
    path. It is used just kfree() the ip addresses in rds_ib_remove_ipaddr()
    which is perfect usecase for kfree_rcu();

    So lets use that to gain some speedup.

    Signed-off-by: Santosh Shilimkar
    Signed-off-by: Santosh Shilimkar

    Santosh Shilimkar
     

26 Aug, 2015

6 commits


15 Jul, 2015

1 commit

  • Fixes: 3e0249f9c05c ("RDS/IB: add refcount tracking to struct rds_ib_device")

    There lacks a dropping on rds_ib_device.refcount in case rds_ib_alloc_fmr
    failed(mr pool running out). this lead to the refcount overflow.

    A complain in line 117(see following) is seen. From vmcore:
    s_ib_rdma_mr_pool_depleted is 2147485544 and rds_ibdev->refcount is -2147475448.
    That is the evidence the mr pool is used up. so rds_ib_alloc_fmr is very likely
    to return ERR_PTR(-EAGAIN).

    115 void rds_ib_dev_put(struct rds_ib_device *rds_ibdev)
    116 {
    117 BUG_ON(atomic_read(&rds_ibdev->refcount) refcount))
    119 queue_work(rds_wq, &rds_ibdev->free_work);
    120 }

    fix is to drop refcount when rds_ib_alloc_fmr failed.

    Signed-off-by: Wengang Wang
    Reviewed-by: Haggai Eran
    Signed-off-by: Doug Ledford

    Wengang Wang
     

27 Aug, 2014

1 commit


16 Sep, 2011

1 commit

  • The functionality of xlist and llist is almost same. This patch
    replace xlist with llist to avoid code duplication.

    Known issues: don't know how to test this, need special hardware?

    Signed-off-by: Huang Ying
    Cc: Chris Mason
    Cc: Andy Grover
    Cc: "David S. Miller"
    Signed-off-by: David S. Miller

    Huang Ying
     

01 Feb, 2011

1 commit

  • With cmwq, there's no reason to use dedicated rds_ib_fmr_wq - it's not
    in the memory reclaim path and the maximum number of concurrent work
    items is bound by the number of devices. Drop it and use system_wq
    instead. This rds_ib_fmr_init/exit() noops. Both removed.

    Signed-off-by: Tejun Heo
    Cc: Andy Grover

    Tejun Heo
     

21 Oct, 2010

1 commit


20 Sep, 2010

1 commit

  • This is basically just a cleanup. IRQs were disabled on the previous
    line so we don't need to do it again here. In the current code IRQs
    would get turned on one line earlier than intended.

    Signed-off-by: Dan Carpenter
    Signed-off-by: David S. Miller

    Dan Carpenter
     

09 Sep, 2010

15 commits

  • The RDS IB device list wasn't protected by any locking. Traversal in
    both the get_mr and FMR flushing paths could race with additon and
    removal.

    List manipulation is done with RCU primatives and is protected by the
    write side of a rwsem. The list traversal in the get_mr fast path is
    protected by a rcu read critical section. The FMR list traversal is
    more problematic because it can block while traversing the list. We
    protect this with the read side of the rwsem.

    Signed-off-by: Zach Brown

    Zach Brown
     
  • Flushing FMRs is somewhat expensive, and is currently kicked off when
    the interrupt handler notices that we are getting low. The result of
    this is that FMR flushing only happens from the interrupt cpus.

    This spreads the load more effectively by triggering flushes just before
    we allocate a new FMR.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • The trivial amount of memory saved isn't worth the cost of dealing with section
    mismatches.

    Signed-off-by: Zach Brown

    Zach Brown
     
  • This patch moves the FMR flushing work in to its own mult-threaded work queue.
    This is to maintain performance in preparation for returning the main krdsd
    work queue back to a single threaded work queue to avoid deep-rooted
    concurrency bugs.

    This is also good because it further separates FMRs, which might be removed
    some day, from the rest of the code base.

    Signed-off-by: Zach Brown

    Zach Brown
     
  • IB connections were not being destroyed during rmmod.

    First, recently IB device removal callback was changed to disconnect
    connections that used the removing device rather than destroying them. So
    connections with devices during rmmod were not being destroyed.

    Second, rds_ib_destroy_nodev_conns() was being called before connections are
    disassociated with devices. It would almost never find connections in the
    nodev list.

    We first get rid of rds_ib_destroy_conns(), which is no longer called, and
    refactor the existing caller into the main body of the function and get rid of
    the list and lock wrappers.

    Then we call rds_ib_destroy_nodev_conns() *after* ib_unregister_client() has
    removed the IB device from all the conns and put the conns on the nodev list.

    The result is that IB connections are destroyed by rmmod.

    Signed-off-by: Zach Brown

    Zach Brown
     
  • Andy Grover
     
  • Using a delayed work queue helps us make sure a healthy number of FMRs
    have queued up over the limit. It makes for a large improvement in RDMA
    iops.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • FRM allocation and recycling is performance critical and fairly lock
    intensive. The current code has a per connection lock that all
    processes bang on and it becomes a major bottleneck on large systems.

    This changes things to use a number of cmpxchg based lists instead,
    allowing us to go through the whole FMR lifecycle without locking inside
    RDS.

    Zach Brown pointed out that our usage of cmpxchg for xlist removal is
    racey if someone manages to remove and add back an FMR struct into the list
    while another CPU can see the FMR's address at the head of the list.

    The second CPU might assume the list hasn't changed when in fact any
    number of operations might have happened in between the deletion and
    reinsertion.

    This commit maintains a per cpu count of CPUs that are currently
    in xlist removal, and establishes a grace period to make sure that
    nobody can see an entry we have just removed from the list.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • The RDS IB client .remove callback used to free the rds_ibdev for the given
    device unconditionally. This could race other users of the struct. This patch
    adds refcounting so that we only free the rds_ibdev once all of its users are
    done.

    Many rds_ibdev users are tied to connections. We give the connection a
    reference and change these users to reference the device in the connection
    instead of looking it up in the IB client data. The only user of the IB client
    data remaining is the first lookup of the device as connections are built up.

    Incrementing the reference count of a device found in the IB client data could
    race with final freeing so we use an RCU grace period to make sure that freeing
    won't happen until those lookups are done.

    MRs need the rds_ibdev to get at the pool that they're freed in to. They exist
    outside a connection and many MRs can reference different devices from one
    socket, so it was natural to have each MR hold a reference. MR refs can be
    dropped from interrupt handlers and final device teardown can block so we push
    it off to a work struct. Pool teardown had to be fixed to cancel its pending
    work instead of deadlocking waiting for all queued work, including itself, to
    finish.

    MRs get their reference from the global device list, which gets a reference.
    It is left unprotected by locks and remains racy. A simple global lock would
    be a significant bottleneck. More scalable (complicated) locking should be
    done carefully in a later patch.

    Signed-off-by: Zach Brown

    Zach Brown
     
  • The RDS bind lookups are somewhat expensive in terms of CPU
    time and locking overhead. This commit changes them into a
    faster RCU based hash tree instead of the rbtrees they were using
    before.

    On large NUMA systems it is a significant improvement.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • Allocate send/recv rings in memory that is node-local to the HCA.
    This significantly helps performance.

    Signed-off-by: Andy Grover

    Andy Grover
     
  • Signed-off-by: Andy Grover

    Andy Grover
     
  • rds_ib_get_device is called very often as we turn an
    ip address into a corresponding device structure. It currently
    take a global spinlock as it walks different lists to find active
    devices.

    This commit changes the lists over to RCU, which isn't very complex
    because they are not updated very often at all.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • Implement a CMSG-based interface to do FADD and CSWP ops.

    Alter send routines to handle atomic ops.

    Add atomic counters to stats.

    Add xmit_atomic() to struct rds_transport

    Inline rds_ib_send_unmap_rdma into unmap_rm

    Signed-off-by: Andy Grover

    Andy Grover
     
  • RDMA is now an intrinsic part of RDS, so it's easier to just have
    a single header.

    Signed-off-by: Andy Grover

    Andy Grover