Eric Lee / smarc-ti-linux-kernel | Embedian Git Server

19 May, 2008

1 commit

82524746c rcu: split list.h and move rcu-protected lists into rculist.h ... Browse Code »

Move rcu-protected lists from list.h into a new header file rculist.h.

This is done because list are a very used primitive structure all over the
kernel and it's currently impossible to include other header files in this
list.h without creating some circular dependencies.

For example, list.h implements rcu-protected list and uses rcu_dereference()
without including rcupdate.h. It actually compiles because users of
rcu_dereference() are macros. Others RCU functions could be used too but
aren't probably because of this.

Therefore this patch creates rculist.h which includes rcupdates without to
many changes/troubles.

Signed-off-by: Franck Bui-Huu
Acked-by: Paul E. McKenney
Acked-by: Josh Triplett
Signed-off-by: Andrew Morton
Signed-off-by: Ingo Molnar

Franck Bui-Huu
2008-05-19 16:01:37 +0800

18 Apr, 2008

2 commits

636bdeaa1 dmaengine: ack to flags: make use of the unused bits in the 'ack' field ... Browse Code »

'ack' is currently a simple integer that flags whether or not a client is done
touching fields in the given descriptor. It is effectively just a single bit
of information. Converting this to a flags parameter allows the other bits to
be put to use to control completion actions, like dma-unmap, and capture
results, like xor-zero-sum == 0.

Changes are one of:
1/ convert all open-coded ->ack manipulations to use async_tx_ack
and async_tx_test_ack.
2/ set the ack bit at prep time where possible
3/ make drivers store the flags at prep time
4/ add flags to the device_prep_dma_interrupt prototype

Acked-by: Maciej Sosnowski
Signed-off-by: Dan Williams

Dan Williams
2008-04-18 04:25:54 +0800
19242d723 async_tx: fix multiple dependency submission ... Browse Code »

Shrink struct dma_async_tx_descriptor and introduce
async_tx_channel_switch to properly inject a channel switch interrupt in
the descriptor stream. This simplifies the locking model as drivers no
longer need to handle dma_async_tx_descriptor.lock.

Acked-by: Shannon Nelson
Signed-off-by: Dan Williams

Dan Williams
2008-04-18 04:25:05 +0800

19 Mar, 2008

1 commit

8d8002f64 async_tx: avoid the async xor_zero_sum path when src_cnt > device->max_xor ... Browse Code »

If the channel cannot perform the operation in one call to
->device_prep_dma_zero_sum, then fallback to the xor+page_is_zero path.
This only affects users with arrays larger than 16 devices on iop13xx or
32 devices on iop3xx.

Cc:
Cc: Neil Brown
Signed-off-by: Dan Williams

Dan Williams
2008-03-19 08:01:00 +0800

14 Mar, 2008

1 commit

3280ab3e8 async_tx: checkpatch says s/__FUNCTION__/__func__/g ... Browse Code »

Signed-off-by: Dan Williams

Dan Williams
2008-03-14 01:57:10 +0800

07 Feb, 2008

6 commits

47437b2c9 async_tx: allow architecture specific async_tx_find_channel implementations ... Browse Code »

The source and destination addresses are included to allow channel
selection based on address alignment.

Signed-off-by: Dan Williams
Reviewed-by: Haavard Skinnemoen

Dan Williams
2008-02-07 01:12:18 +0800
d4c56f97f async_tx: replace 'int_en' with operation preparation flags ... Browse Code »

Pass a full set of flags to drivers' per-operation 'prep' routines.
Currently the only flag passed is DMA_PREP_INTERRUPT. The expectation is
that arch-specific async_tx_find_channel() implementations can exploit this
capability to find the best channel for an operation.

Signed-off-by: Dan Williams
Acked-by: Shannon Nelson
Reviewed-by: Haavard Skinnemoen

Dan Williams
2008-02-07 01:12:18 +0800
0036731c8 async_tx: kill tx_set_src and tx_set_dest methods ... Browse Code »

The tx_set_src and tx_set_dest methods were originally implemented to allow
an array of addresses to be passed down from async_xor to the dmaengine
driver while minimizing stack overhead. Removing these methods allows
drivers to have all transaction parameters available at 'prep' time, saves
two function pointers in struct dma_async_tx_descriptor, and reduces the
number of indirect branches..

A consequence of moving this data to the 'prep' routine is that
multi-source routines like async_xor need temporary storage to convert an
array of linear addresses into an array of dma addresses. In order to keep
the same stack footprint of the previous implementation the input array is
reused as storage for the dma addresses. This requires that
sizeof(dma_addr_t) be less than or equal to sizeof(void *). As a
consequence CONFIG_DMADEVICES now depends on !CONFIG_HIGHMEM64G. It also
requires that drivers be able to make descriptor resources available when
the 'prep' routine is polled.

Signed-off-by: Dan Williams
Acked-by: Shannon Nelson

Dan Williams
2008-02-07 01:12:17 +0800
d909b3475 async_tx: kill ASYNC_TX_ASSUME_COHERENT ... Browse Code »

Remove the unused ASYNC_TX_ASSUME_COHERENT flag. Async_tx is
meant to hide the difference between asynchronous hardware and synchronous
software operations, this flag requires clients to understand cache
coherency consequences of the async path.

Signed-off-by: Dan Williams
Reviewed-by: Haavard Skinnemoen

Dan Williams
2008-02-07 01:12:17 +0800
cf8f68aa7 async_tx: use LIST_HEAD instead of LIST_HEAD_INIT ... Browse Code »

single list_head variable initialized with LIST_HEAD_INIT could almost
always can be replaced with LIST_HEAD declaration, this shrinks the code
and looks better.

Signed-off-by: Denis Cheng
Signed-off-by: Dan Williams

Denis Cheng
2008-02-07 01:12:17 +0800
1367a3d31 async_tx: fix compile breakage, mark do_async_xor __always_inline ... Browse Code »

do_async_xor must be compiled away on !HAS_DMA archs.

Signed-off-by: Dan Williams
Acked-by: Cornelia Huck

Dan Williams
2008-02-07 01:12:17 +0800

25 Sep, 2007

1 commit

6247cdc2c async_tx: fix dma_wait_for_async_tx ... Browse Code »

Fix dma_wait_for_async_tx to not loop forever in the case where a
dependency chain is longer than two entries. This condition will not
happen with current in-kernel drivers, but fix it for future drivers.

Found-by: Saeed Bishara
Signed-off-by: Dan Williams

Dan Williams
2007-09-25 01:26:26 +0800

20 Jul, 2007

1 commit

eb0645a8b async_tx: fix kmap_atomic usage in async_memcpy ... Browse Code »

Andrew Morton:
[async_memcpy] is very wrong if both ASYNC_TX_KMAP_DST and
ASYNC_TX_KMAP_SRC can ever be set. We'll end up using the same kmap
slot for both src add dest and we get either corrupted data or a BUG.

Evgeniy Polyakov:
Btw, shouldn't it always be kmap_atomic() even if flag is not set.
That pages are usual one returned by alloc_page().

So fix the usage of kmap_atomic and kill the ASYNC_TX_KMAP_DST and
ASYNC_TX_KMAP_SRC flags.

Cc: Andrew Morton
Cc: Evgeniy Polyakov
Signed-off-by: Dan Williams
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dan Williams
2007-07-20 23:44:19 +0800

13 Jul, 2007

1 commit

9bc89cd82 async_tx: add the async_tx api ... Browse Code »

The async_tx api provides methods for describing a chain of asynchronous
bulk memory transfers/transforms with support for inter-transactional
dependencies. It is implemented as a dmaengine client that smooths over
the details of different hardware offload engine implementations. Code
that is written to the api can optimize for asynchronous operation and the
api will fit the chain of operations to the available offload resources.

I imagine that any piece of ADMA hardware would register with the
'async_*' subsystem, and a call to async_X would be routed as
appropriate, or be run in-line. - Neil Brown

async_tx exploits the capabilities of struct dma_async_tx_descriptor to
provide an api of the following general format:

struct dma_async_tx_descriptor *
async_(..., struct dma_async_tx_descriptor *depend_tx,
dma_async_tx_callback cb_fn, void *cb_param)
{
struct dma_chan *chan = async_tx_find_channel(depend_tx, );
struct dma_device *device = chan ? chan->device : NULL;
int int_en = cb_fn ? 1 : 0;
struct dma_async_tx_descriptor *tx = device ?
device->device_prep_dma_(chan, len, int_en) : NULL;

if (tx) { /* run asynchronously */
...
tx->tx_set_dest(addr, tx, index);
...
tx->tx_set_src(addr, tx, index);
...
async_tx_submit(chan, tx, flags, depend_tx, cb_fn, cb_param);
} else { /* run synchronously */
...

...
async_tx_sync_epilog(flags, depend_tx, cb_fn, cb_param);
}

return tx;
}

async_tx_find_channel() returns a capable channel from its pool. The
channel pool is organized as a per-cpu array of channel pointers. The
async_tx_rebalance() routine is tasked with managing these arrays. In the
uniprocessor case async_tx_rebalance() tries to spread responsibility
evenly over channels of similar capabilities. For example if there are two
copy+xor channels, one will handle copy operations and the other will
handle xor. In the SMP case async_tx_rebalance() attempts to spread the
operations evenly over the cpus, e.g. cpu0 gets copy channel0 and xor
channel0 while cpu1 gets copy channel 1 and xor channel 1. When a
dependency is specified async_tx_find_channel defaults to keeping the
operation on the same channel. A xor->copy->xor chain will stay on one
channel if it supports both operation types, otherwise the transaction will
transition between a copy and a xor resource.

Currently the raid5 implementation in the MD raid456 driver has been
converted to the async_tx api. A driver for the offload engines on the
Intel Xscale series of I/O processors, iop-adma, is provided in a later
commit. With the iop-adma driver and async_tx, raid456 is able to offload
copy, xor, and xor-zero-sum operations to hardware engines.

On iop342 tiobench showed higher throughput for sequential writes (20 - 30%
improvement) and sequential reads to a degraded array (40 - 55%
improvement). For the other cases performance was roughly equal, +/- a few
percentage points. On a x86-smp platform the performance of the async_tx
implementation (in synchronous mode) was also +/- a few percentage points
of the original implementation. According to 'top' on iop342 CPU
utilization drops from ~50% to ~15% during a 'resync' while the speed
according to /proc/mdstat doubles from ~25 MB/s to ~50 MB/s.

The tiobench command line used for testing was: tiobench --size 2048
--block 4096 --block 131072 --dir /mnt/raid --numruns 5
* iop342 had 1GB of memory available

Details:
* if CONFIG_DMA_ENGINE=n the asynchronous path is compiled away by making
async_tx_find_channel a static inline routine that always returns NULL
* when a callback is specified for a given transaction an interrupt will
fire at operation completion time and the callback will occur in a
tasklet. if the the channel does not support interrupts then a live
polling wait will be performed
* the api is written as a dmaengine client that requests all available
channels
* In support of dependencies the api implicitly schedules channel-switch
interrupts. The interrupt triggers the cleanup tasklet which causes
pending operations to be scheduled on the next channel
* Xor engines treat an xor destination address differently than a software
xor routine. To the software routine the destination address is an implied
source, whereas engines treat it as a write-only destination. This patch
modifies the xor_blocks routine to take a an explicit destination address
to mirror the hardware.

Changelog:
* fixed a leftover debug print
* don't allow callbacks in async_interrupt_cond
* fixed xor_block changes
* fixed usage of ASYNC_TX_XOR_DROP_DEST
* drop dma mapping methods, suggested by Chris Leech
* printk warning fixups from Andrew Morton
* don't use inline in C files, Adrian Bunk
* select the API when MD is enabled
* BUG_ON xor source counts
Signed-off-by: Dan Williams
Acked-By: NeilBrown

Dan Williams
2007-07-13 23:06:14 +0800