05 Jun, 2019
1 commit
-
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms and conditions of the gnu general public license
version 2 as published by the free software foundation this program
is distributed in the hope it will be useful but without any
warranty without even the implied warranty of merchantability or
fitness for a particular purpose see the gnu general public license
for more details you should have received a copy of the gnu general
public license along with this program if not write to the free
software foundation inc 51 franklin st fifth floor boston ma 02110
1301 usaextracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 111 file(s).
Signed-off-by: Thomas Gleixner
Reviewed-by: Alexios Zavras
Reviewed-by: Allison Randal
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190530000436.567572064@linutronix.de
Signed-off-by: Greg Kroah-Hartman
07 Jan, 2016
1 commit
-
These async_XX functions are called from md/raid5 in an atomic
section, between get_cpu() and put_cpu(), so they must not sleep.
So use GFP_NOWAIT rather than GFP_IO.Dan Williams writes: Longer term async_tx needs to be merged into md
directly as we can allocate this unmap data statically per-stripe
rather than per request.Fixed: 7476bd79fc01 ("async_pq: convert to dmaengine_unmap_data")
Cc: stable@vger.kernel.org (v3.13+)
Reported-and-tested-by: Stanislav Samsonov
Acked-by: Dan Williams
Signed-off-by: NeilBrown
Signed-off-by: Vinod Koul
15 Nov, 2013
2 commits
-
Remove no longer needed DMA unmap flags:
- DMA_COMPL_SKIP_SRC_UNMAP
- DMA_COMPL_SKIP_DEST_UNMAP
- DMA_COMPL_SRC_UNMAP_SINGLE
- DMA_COMPL_DEST_UNMAP_SINGLECc: Vinod Koul
Cc: Tomasz Figa
Cc: Dave Jiang
Signed-off-by: Bartlomiej Zolnierkiewicz
Signed-off-by: Kyungmin Park
Acked-by: Jon Mason
Acked-by: Mark Brown
[djbw: clean up straggling skip unmap flags in ntb]
Signed-off-by: Dan Williams -
Use the generic unmap object to unmap dma buffers.
Cc: Vinod Koul
Cc: Tomasz Figa
Cc: Dave Jiang
Reported-by: Bartlomiej Zolnierkiewicz
[bzolnier: add missing unmap->len initialization]
[bzolnier: fix whitespace damage]
Signed-off-by: Bartlomiej Zolnierkiewicz
Signed-off-by: Kyungmin Park
[djbw: add DMA_ENGINE=n support]
Signed-off-by: Dan Williams
08 Jan, 2013
1 commit
-
Do DMA unmap on ->device_prep_dma_memcpy failure.
Cc: Dan Williams
Cc: Tomasz Figa
Signed-off-by: Bartlomiej Zolnierkiewicz
Signed-off-by: Kyungmin Park
Signed-off-by: Dan Williams
20 Mar, 2012
1 commit
-
Signed-off-by: Cong Wang
01 Nov, 2011
1 commit
-
Part of the include cleanups means that the implicit
inclusion of module.h via device.h is going away. So
fix things up in advance.Signed-off-by: Paul Gortmaker
27 Oct, 2010
1 commit
-
Ensure kmap_atomic() usage is strictly nested
Signed-off-by: Peter Zijlstra
Reviewed-by: Rik van Riel
Acked-by: Chris Metcalf
Cc: David Howells
Cc: Hugh Dickins
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: "H. Peter Anvin"
Cc: Steven Rostedt
Cc: Russell King
Cc: Ralf Baechle
Cc: David Miller
Cc: Paul Mackerras
Cc: Benjamin Herrenschmidt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
09 Sep, 2009
2 commits
-
Some engines have transfer size and address alignment restrictions. Add
a per-operation alignment property to struct dma_device that the async
routines and dmatest can use to check alignment capabilities.Signed-off-by: Dan Williams
-
Some engines optimize operation by reading ahead in the descriptor chain
such that descriptor2 may start execution before descriptor1 completes.
If descriptor2 depends on the result from descriptor1 then a fence is
required (on descriptor2) to disable this optimization. The async_tx
api could implicitly identify dependencies via the 'depend_tx'
parameter, but that would constrain cases where the dependency chain
only specifies a completion order rather than a data dependency. So,
provide an ASYNC_TX_FENCE to explicitly identify data dependencies.Signed-off-by: Dan Williams
30 Aug, 2009
1 commit
-
If module_init and module_exit are nops then neither need to be defined.
[ Impact: pure cleanup ]
Reviewed-by: Andre Noll
Acked-by: Maciej Sosnowski
Signed-off-by: Dan Williams
04 Jun, 2009
2 commits
-
Prepare the api for the arrival of a new parameter, 'scribble'. This
will allow callers to identify scratchpad memory for dma address or page
address conversions. As this adds yet another parameter, take this
opportunity to convert the common submission parameters (flags,
dependency, callback, and callback argument) into an object that is
passed by reference.Also, take this opportunity to fix up the kerneldoc and add notes about
the relevant ASYNC_TX_* flags for each routine.[ Impact: moves api pass-by-value parameters to a pass-by-reference struct ]
Signed-off-by: Andre Noll
Acked-by: Maciej Sosnowski
Signed-off-by: Dan Williams -
In support of inter-channel chaining async_tx utilizes an ack flag to
gate whether a dependent operation can be chained to another. While the
flag is not set the chain can be considered open for appending. Setting
the ack flag closes the chain and flags the descriptor for garbage
collection. The ASYNC_TX_DEP_ACK flag essentially means "close the
chain after adding this dependency". Since each operation can only have
one child the api now implicitly sets the ack flag at dependency
submission time. This removes an unnecessary management burden from
clients of the api.[ Impact: clean up and enforce one dependency per operation ]
Reviewed-by: Andre Noll
Acked-by: Maciej Sosnowski
Signed-off-by: Dan Williams
18 Jul, 2008
2 commits
-
All callers of async_tx_sync_epilog have called async_tx_quiesce on the
depend_tx, so async_tx_sync_epilog need only call the callback to
complete the operation.Signed-off-by: Dan Williams
-
Replace open coded "wait and acknowledge" instances with async_tx_quiesce.
Signed-off-by: Dan Williams
18 Apr, 2008
1 commit
-
'ack' is currently a simple integer that flags whether or not a client is done
touching fields in the given descriptor. It is effectively just a single bit
of information. Converting this to a flags parameter allows the other bits to
be put to use to control completion actions, like dma-unmap, and capture
results, like xor-zero-sum == 0.Changes are one of:
1/ convert all open-coded ->ack manipulations to use async_tx_ack
and async_tx_test_ack.
2/ set the ack bit at prep time where possible
3/ make drivers store the flags at prep time
4/ add flags to the device_prep_dma_interrupt prototypeAcked-by: Maciej Sosnowski
Signed-off-by: Dan Williams
14 Mar, 2008
1 commit
-
Signed-off-by: Dan Williams
07 Feb, 2008
4 commits
-
The source and destination addresses are included to allow channel
selection based on address alignment.Signed-off-by: Dan Williams
Reviewed-by: Haavard Skinnemoen -
Pass a full set of flags to drivers' per-operation 'prep' routines.
Currently the only flag passed is DMA_PREP_INTERRUPT. The expectation is
that arch-specific async_tx_find_channel() implementations can exploit this
capability to find the best channel for an operation.Signed-off-by: Dan Williams
Acked-by: Shannon Nelson
Reviewed-by: Haavard Skinnemoen -
The tx_set_src and tx_set_dest methods were originally implemented to allow
an array of addresses to be passed down from async_xor to the dmaengine
driver while minimizing stack overhead. Removing these methods allows
drivers to have all transaction parameters available at 'prep' time, saves
two function pointers in struct dma_async_tx_descriptor, and reduces the
number of indirect branches..A consequence of moving this data to the 'prep' routine is that
multi-source routines like async_xor need temporary storage to convert an
array of linear addresses into an array of dma addresses. In order to keep
the same stack footprint of the previous implementation the input array is
reused as storage for the dma addresses. This requires that
sizeof(dma_addr_t) be less than or equal to sizeof(void *). As a
consequence CONFIG_DMADEVICES now depends on !CONFIG_HIGHMEM64G. It also
requires that drivers be able to make descriptor resources available when
the 'prep' routine is polled.Signed-off-by: Dan Williams
Acked-by: Shannon Nelson -
Remove the unused ASYNC_TX_ASSUME_COHERENT flag. Async_tx is
meant to hide the difference between asynchronous hardware and synchronous
software operations, this flag requires clients to understand cache
coherency consequences of the async path.Signed-off-by: Dan Williams
Reviewed-by: Haavard Skinnemoen
20 Jul, 2007
1 commit
-
Andrew Morton:
[async_memcpy] is very wrong if both ASYNC_TX_KMAP_DST and
ASYNC_TX_KMAP_SRC can ever be set. We'll end up using the same kmap
slot for both src add dest and we get either corrupted data or a BUG.Evgeniy Polyakov:
Btw, shouldn't it always be kmap_atomic() even if flag is not set.
That pages are usual one returned by alloc_page().So fix the usage of kmap_atomic and kill the ASYNC_TX_KMAP_DST and
ASYNC_TX_KMAP_SRC flags.Cc: Andrew Morton
Cc: Evgeniy Polyakov
Signed-off-by: Dan Williams
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
13 Jul, 2007
1 commit
-
The async_tx api provides methods for describing a chain of asynchronous
bulk memory transfers/transforms with support for inter-transactional
dependencies. It is implemented as a dmaengine client that smooths over
the details of different hardware offload engine implementations. Code
that is written to the api can optimize for asynchronous operation and the
api will fit the chain of operations to the available offload resources.I imagine that any piece of ADMA hardware would register with the
'async_*' subsystem, and a call to async_X would be routed as
appropriate, or be run in-line. - Neil Brownasync_tx exploits the capabilities of struct dma_async_tx_descriptor to
provide an api of the following general format:struct dma_async_tx_descriptor *
async_(..., struct dma_async_tx_descriptor *depend_tx,
dma_async_tx_callback cb_fn, void *cb_param)
{
struct dma_chan *chan = async_tx_find_channel(depend_tx, );
struct dma_device *device = chan ? chan->device : NULL;
int int_en = cb_fn ? 1 : 0;
struct dma_async_tx_descriptor *tx = device ?
device->device_prep_dma_(chan, len, int_en) : NULL;if (tx) { /* run asynchronously */
...
tx->tx_set_dest(addr, tx, index);
...
tx->tx_set_src(addr, tx, index);
...
async_tx_submit(chan, tx, flags, depend_tx, cb_fn, cb_param);
} else { /* run synchronously */
...
...
async_tx_sync_epilog(flags, depend_tx, cb_fn, cb_param);
}return tx;
}async_tx_find_channel() returns a capable channel from its pool. The
channel pool is organized as a per-cpu array of channel pointers. The
async_tx_rebalance() routine is tasked with managing these arrays. In the
uniprocessor case async_tx_rebalance() tries to spread responsibility
evenly over channels of similar capabilities. For example if there are two
copy+xor channels, one will handle copy operations and the other will
handle xor. In the SMP case async_tx_rebalance() attempts to spread the
operations evenly over the cpus, e.g. cpu0 gets copy channel0 and xor
channel0 while cpu1 gets copy channel 1 and xor channel 1. When a
dependency is specified async_tx_find_channel defaults to keeping the
operation on the same channel. A xor->copy->xor chain will stay on one
channel if it supports both operation types, otherwise the transaction will
transition between a copy and a xor resource.Currently the raid5 implementation in the MD raid456 driver has been
converted to the async_tx api. A driver for the offload engines on the
Intel Xscale series of I/O processors, iop-adma, is provided in a later
commit. With the iop-adma driver and async_tx, raid456 is able to offload
copy, xor, and xor-zero-sum operations to hardware engines.On iop342 tiobench showed higher throughput for sequential writes (20 - 30%
improvement) and sequential reads to a degraded array (40 - 55%
improvement). For the other cases performance was roughly equal, +/- a few
percentage points. On a x86-smp platform the performance of the async_tx
implementation (in synchronous mode) was also +/- a few percentage points
of the original implementation. According to 'top' on iop342 CPU
utilization drops from ~50% to ~15% during a 'resync' while the speed
according to /proc/mdstat doubles from ~25 MB/s to ~50 MB/s.The tiobench command line used for testing was: tiobench --size 2048
--block 4096 --block 131072 --dir /mnt/raid --numruns 5
* iop342 had 1GB of memory availableDetails:
* if CONFIG_DMA_ENGINE=n the asynchronous path is compiled away by making
async_tx_find_channel a static inline routine that always returns NULL
* when a callback is specified for a given transaction an interrupt will
fire at operation completion time and the callback will occur in a
tasklet. if the the channel does not support interrupts then a live
polling wait will be performed
* the api is written as a dmaengine client that requests all available
channels
* In support of dependencies the api implicitly schedules channel-switch
interrupts. The interrupt triggers the cleanup tasklet which causes
pending operations to be scheduled on the next channel
* Xor engines treat an xor destination address differently than a software
xor routine. To the software routine the destination address is an implied
source, whereas engines treat it as a write-only destination. This patch
modifies the xor_blocks routine to take a an explicit destination address
to mirror the hardware.Changelog:
* fixed a leftover debug print
* don't allow callbacks in async_interrupt_cond
* fixed xor_block changes
* fixed usage of ASYNC_TX_XOR_DROP_DEST
* drop dma mapping methods, suggested by Chris Leech
* printk warning fixups from Andrew Morton
* don't use inline in C files, Adrian Bunk
* select the API when MD is enabled
* BUG_ON xor source counts
Signed-off-by: Dan Williams
Acked-By: NeilBrown