03 Jan, 2017

1 commit


03 Mar, 2016

1 commit


06 Oct, 2015

3 commits

  • 8K message sizes are pretty important usecase for RDS current
    workloads so we make provison to have 8K mrs available from the pool.
    Based on number of SG's in the RDS message, we pick a pool to use.

    Also to make sure that we don't under utlise mrs when say 8k messages
    are dominating which could lead to 8k pull being exhausted, we fall-back
    to 1m pool till 8k pool recovers for use.

    This helps to at least push ~55 kB/s bidirectional data which
    is a nice improvement.

    Signed-off-by: Santosh Shilimkar
    Signed-off-by: Santosh Shilimkar

    Santosh Shilimkar
     
  • Similar to what we did with receive CQ completion handling, we split
    the transmit completion handler so that it lets us implement batched
    work completion handling.

    We re-use the cq_poll routine and makes use of RDS_IB_SEND_OP to
    identify the send vs receive completion event handler invocation.

    Signed-off-by: Santosh Shilimkar
    Signed-off-by: Santosh Shilimkar

    Santosh Shilimkar
     
  • For better performance, we split the receive completion IRQ handler. That
    lets us acknowledge several WCE events in one call. We also limit the WC
    to max 32 to avoid latency. Acknowledging several completions in one call
    instead of several calls each time will provide better performance since
    less mutual exclusion locks are being performed.

    In next patch, send completion is also split which re-uses the poll_cq()
    and hence the code is moved to ib_cm.c

    Signed-off-by: Santosh Shilimkar
    Signed-off-by: Santosh Shilimkar

    Santosh Shilimkar
     

09 Sep, 2010

1 commit


16 Sep, 2009

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (46 commits)
    powerpc64: convert to dynamic percpu allocator
    sparc64: use embedding percpu first chunk allocator
    percpu: kill lpage first chunk allocator
    x86,percpu: use embedding for 64bit NUMA and page for 32bit NUMA
    percpu: update embedding first chunk allocator to handle sparse units
    percpu: use group information to allocate vmap areas sparsely
    vmalloc: implement pcpu_get_vm_areas()
    vmalloc: separate out insert_vmalloc_vm()
    percpu: add chunk->base_addr
    percpu: add pcpu_unit_offsets[]
    percpu: introduce pcpu_alloc_info and pcpu_group_info
    percpu: move pcpu_lpage_build_unit_map() and pcpul_lpage_dump_cfg() upward
    percpu: add @align to pcpu_fc_alloc_fn_t
    percpu: make @dyn_size mandatory for pcpu_setup_first_chunk()
    percpu: drop @static_size from first chunk allocators
    percpu: generalize first chunk allocator selection
    percpu: build first chunk allocators selectively
    percpu: rename 4k first chunk allocator to page
    percpu: improve boot messages
    percpu: fix pcpu_reclaim() locking
    ...

    Fix trivial conflict as by Tejun Heo in kernel/sched.c

    Linus Torvalds
     

06 Aug, 2009

1 commit


24 Jun, 2009

1 commit

  • There are a few places where ___cacheline_aligned* is used with
    DEFINE_PER_CPU(). Use DEFINE_PER_CPU_SHARED_ALIGNED() instead.

    DEFINE_PER_CPU_SHARED_ALIGNED() applies alignment only on SMPs. While
    all other converted places used _in_smp variant or only get compiled
    for SMP, net/rds used unconditional ____cacheline_aligned. I don't
    see any reason these data structures should be aligned on UP and thus
    converted together.

    Signed-off-by: Tejun Heo
    Cc: Mike Frysinger
    Cc: Tony Luck
    Cc: Andy Grover

    Tejun Heo
     

27 Feb, 2009

1 commit