Eric Lee / smarc-fsl-linux-kernel

19 Oct, 2007

1 commit

6212e3a38 Remove struct task_struct::io_wait ... Browse Code »

Hell knows what happened in commit 63b05203af57e7de4f3bb63b8b81d43bc196d32b
during 2.6.9 development. Commit introduced io_wait field which remained
write-only than and still remains write-only.

Also garbage collect macros which "use" io_wait.

Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2007-10-19 05:37:20 +0800

17 Oct, 2007

1 commit

41d10da37 aio: account I/O wait time properly ... Browse Code »

Some months back I proposed changing the schedule() call in
read_events to an io_schedule():
http://osdir.com/ml/linux.kernel.aio.general/2006-10/msg00024.html
This was rejected as there are AIO operations that do not initiate
disk I/O. I've had another look at the problem, and the only AIO
operation that will not initiate disk I/O is IOCB_CMD_NOOP. However,
this command isn't even wired up!

Given that it doesn't work, and hasn't for *years*, I'm going to
suggest again that we do proper I/O accounting when using AIO.

Signed-off-by: Jeff Moyer
Acked-by: Zach Brown
Cc: Benjamin LaHaise
Cc: Suparna Bhattacharya
Cc: Badari Pulavarty
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jeff Moyer
2007-10-17 23:42:53 +0800

09 Oct, 2007

1 commit

87e2831c3 AIO: fix cleanup in io_submit_one(...) ... Browse Code »

When IOCB_FLAG_RESFD flag is set and iocb->aio_resfd is incorrect,
statement 'goto out_put_req' is executed. At label 'out_put_req',
aio_put_req(..) is called, which requires 'req->ki_filp' set.

Signed-off-by: Yan Zheng
Cc: Zach Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Yan Zheng
2007-10-09 03:58:14 +0800

11 May, 2007

1 commit

9c3060bed signal/timer/event: KAIO eventfd support example ... Browse Code »

This is an example about how to add eventfd support to the current KAIO code,
in order to enable KAIO to post readiness events to a pollable fd (hence
compatible with POSIX select/poll). The KAIO code simply signals the eventfd
fd when events are ready, and this triggers a POLLIN in the fd. This patch
uses a reserved for future use member of the struct iocb to pass an eventfd
file descriptor, that KAIO will use to post events every time a request
completes. At that point, an aio_getevents() will return the completed result
to a struct io_event. I made a quick test program to verify the patch, and it
runs fine here:

http://www.xmailserver.org/eventfd-aio-test.c

The test program uses poll(2), but it'd, of course, work with select and epoll
too.

This can allow to schedule both block I/O and other poll-able devices
requests, and wait for results using select/poll/epoll. In a typical
scenario, an application would submit KAIO request using aio_submit(), and
will also use epoll_ctl() on the whole other class of devices (that with the
addition of signals, timers and user events, now it's pretty much complete),
and then would:

epoll_wait(...);
for_each_event {
if (curr_event_is_kaiofd) {
aio_getevents();
dispatch_aio_events();
} else {
dispatch_epoll_event();
}
}

Signed-off-by: Davide Libenzi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davide Libenzi
2007-05-11 23:29:37 +0800

10 May, 2007

2 commits

28e53bddf unify flush_work/flush_work_keventd and rename it to cancel_work_sync ... Browse Code »

flush_work(wq, work) doesn't need the first parameter, we can use cwq->wq
(this was possible from the very beginnig, I missed this). So we can unify
flush_work_keventd and flush_work.

Also, rename flush_work() to cancel_work_sync() and fix all callers.
Perhaps this is not the best name, but "flush_work" is really bad.

(akpm: this is why the earlier patches bypassed maintainers)

Signed-off-by: Oleg Nesterov
Cc: Jeff Garzik
Cc: "David S. Miller"
Cc: Jens Axboe
Cc: Tejun Heo
Cc: Auke Kok ,
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2007-05-10 03:30:53 +0800
a9df62c75 aio: use flush_work() ... Browse Code »

Migrate AIO over to use flush_work().

Cc: "Maciej W. Rozycki"
Cc: David Howells
Cc: Zach Brown
Cc: Benjamin LaHaise
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2007-05-10 03:30:51 +0800

08 May, 2007

1 commit

0a31bd5f2 KMEM_CACHE(): simplify slab cache creation ... Browse Code »

This patch provides a new macro

KMEM_CACHE(, )

to simplify slab creation. KMEM_CACHE creates a slab with the name of the
struct, with the size of the struct and with the alignment of the struct.
Additional slab flags may be specified if necessary.

Example

struct test_slab {
int a,b,c;
struct list_head;
} __cacheline_aligned_in_smp;

test_slab_cache = KMEM_CACHE(test_slab, SLAB_PANIC)

will create a new slab named "test_slab" of the size sizeof(struct
test_slab) and aligned to the alignment of test slab. If it fails then we
panic.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2007-05-08 03:12:55 +0800

28 Mar, 2007

1 commit

28defbea6 [PATCH] aio: remove bare user-triggerable error printk ... Browse Code »

The user can generate console output if they cause do_mmap() to fail
during sys_io_setup(). This was seen in a regression test that does
exactly that by spinning calling mmap() until it gets -ENOMEM before
calling io_setup().

We don't need this printk at all, just remove it.

Signed-off-by: Zach Brown
Signed-off-by: Linus Torvalds

Zach Brown
2007-03-28 08:53:25 +0800

12 Feb, 2007

2 commits

c37622296 [PATCH] Transform kmem_cache_alloc()+memset(0) -> kmem_cache_zalloc(). ... Browse Code »

Replace appropriate pairs of "kmem_cache_alloc()" + "memset(0)" with the
corresponding "kmem_cache_zalloc()" call.

Signed-off-by: Robert P. J. Day
Cc: "Luck, Tony"
Cc: Andi Kleen
Cc: Roland McGrath
Cc: James Bottomley
Cc: Greg KH
Acked-by: Joel Becker
Cc: Steven Whitehouse
Cc: Jan Kara
Cc: Michael Halcrow
Cc: "David S. Miller"
Cc: Stephen Smalley
Cc: James Morris
Cc: Chris Wright
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Robert P. J. Day
2007-02-12 02:51:27 +0800
e10a4437c [PATCH] Remove final references to deprecated "MAP_ANON" page protection flag ... Browse Code »

Remove the last vestiges of the long-deprecated "MAP_ANON" page protection
flag: use "MAP_ANONYMOUS" instead.

Signed-off-by: Robert P. J. Day
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Robert P. J. Day
2007-02-12 02:51:17 +0800

04 Feb, 2007

1 commit

dee11c236 [PATCH] aio: fix buggy put_ioctx call in aio_complete - v2 ... Browse Code »

An AIO bug was reported that sleeping function is being called in softirq
context:

BUG: warning at kernel/mutex.c:132/__mutex_lock_common()
Call Trace:
[] __mutex_lock_slowpath+0x640/0x6c0
[] mutex_lock+0x20/0x40
[] flush_workqueue+0xb0/0x1a0
[] __put_ioctx+0xc0/0x240
[] aio_complete+0x2f0/0x420
[] finished_one_bio+0x200/0x2a0
[] dio_bio_complete+0x1c0/0x200
[] dio_bio_end_aio+0x60/0x80
[] bio_endio+0x110/0x1c0
[] __end_that_request_first+0x180/0xba0
[] end_that_request_chunk+0x30/0x60
[] scsi_end_request+0x50/0x300 [scsi_mod]
[] scsi_io_completion+0x200/0x8a0 [scsi_mod]
[] sd_rw_intr+0x330/0x860 [sd_mod]
[] scsi_finish_command+0x100/0x1c0 [scsi_mod]
[] scsi_softirq_done+0x230/0x300 [scsi_mod]
[] blk_done_softirq+0x160/0x1c0
[] __do_softirq+0x200/0x240
[] do_softirq+0x70/0xc0

See report: http://marc.theaimsgroup.com/?l=linux-kernel&m=116599593200888&w=2

flush_workqueue() is not allowed to be called in the softirq context.
However, aio_complete() called from I/O interrupt can potentially call
put_ioctx with last ref count on ioctx and triggers bug. It is simply
incorrect to perform ioctx freeing from aio_complete.

The bug is trigger-able from a race between io_destroy() and aio_complete().
A possible scenario:

cpu0 cpu1
io_destroy aio_complete
wait_for_all_aios { __aio_put_req
... ctx->reqs_active--;
if (!ctx->reqs_active)
return;
}
...
put_ioctx(ioctx)

put_ioctx(ctx);
__put_ioctx
bam! Bug trigger!

The real problem is that the condition check of ctx->reqs_active in
wait_for_all_aios() is incorrect that access to reqs_active is not
being properly protected by spin lock.

This patch adds that protective spin lock, and at the same time removes
all duplicate ref counting for each kiocb as reqs_active is already used
as a ref count for each active ioctx. This also ensures that buggy call
to flush_workqueue() in softirq context is eliminated.

Signed-off-by: "Ken Chen"
Cc: Zach Brown
Cc: Suparna Bhattacharya
Cc: Benjamin LaHaise
Cc: Badari Pulavarty
Cc:
Acked-by: Jeff Moyer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ken Chen
2007-02-04 03:26:06 +0800

31 Dec, 2006

1 commit

1ebb1101c [PATCH] Fix lock inversion aio_kick_handler() ... Browse Code »

lockdep found a AB BC CA lock inversion in retry-based AIO:

1) The task struct's alloc_lock (A) is acquired in process context with
interrupts enabled. An interrupt might arrive and call wake_up() which
grabs the wait queue's q->lock (B).

2) When performing retry-based AIO the AIO core registers
aio_wake_function() as the wake funtion for iocb->ki_wait. It is called
with the wait queue's q->lock (B) held and then tries to add the iocb to
the run list after acquiring the ctx_lock (C).

3) aio_kick_handler() holds the ctx_lock (C) while acquiring the
alloc_lock (A) via lock_task() and unuse_mm(). Lockdep emits a warning
saying that we're trying to connect the irq-safe q->lock to the
irq-unsafe alloc_lock via ctx_lock.

This fixes the inversion by calling unuse_mm() in the AIO kick handing path
after we've released the ctx_lock. As Ben LaHaise pointed out __put_ioctx
could set ctx->mm to NULL, so we must only access ctx->mm while we have the
lock.

Signed-off-by: Zach Brown
Signed-off-by: Suparna Bhattacharya
Acked-by: Benjamin LaHaise
Cc: "Chen, Kenneth W"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Zach Brown
2006-12-31 02:55:54 +0800

14 Dec, 2006

1 commit

90aef12e6 [PATCH] Use activate_mm() in fs/aio.c:use_mm() ... Browse Code »

activate_mm() is not the right thing to be using in use_mm(). It should be
switch_mm().

On normal x86, they're synonymous, but for the Xen patches I'm adding a
hook which assumes that activate_mm is only used the first time a new mm
is used after creation (I have another hook for dealing with dup_mm). I
think this use of activate_mm() is the only place where it could be used
a second time on an mm.

>From a quick look at the other architectures I think this is OK (most
simply implement one in terms of the other), but some are doing some
subtly different stuff between the two.

Acked-by: David Miller
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jeremy Fitzhardinge
2006-12-14 01:05:51 +0800

08 Dec, 2006

3 commits

97d2a8058 [PATCH] aio: remove ki_retried debugging member ... Browse Code »

Remove the ki_retried member from struct kiocb. I think the idea was
bounced around a while back, but Arnaldo pointed out another reason that we
should dig it up when he pointed out that the last cacheline of struct
kiocb only contains 4 bytes. By removing the debugging member, we save
more than the 8 byte on 64 bit machines.

Signed-off-by: Benjamin LaHaise
Acked-by: Ken Chen
Acked-by: Zach Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Benjamin LaHaise
2006-12-08 00:39:46 +0800
b62e8ec2a [PATCH] aio: kill pointless ki_nbytes assignment in aio_setup_single_vector ... Browse Code »

io_submit_one assigns ki_left = ki_nbytes = iocb->aio_nbytes, then calls
down to aio_setup_iocb, then to aio_setup_single_vector. In there,
ki_nbytes is reassigned to the same value it got two call stack above it.
There is no need to do so.

Signed-off-by: Ken Chen
Acked-by: Zach Brown
Cc: Benjamin LaHaise
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Chen, Kenneth W
2006-12-08 00:39:46 +0800
e18b890bb [PATCH] slab: remove kmem_cache_t ... Browse Code »

Replace all uses of kmem_cache_t with struct kmem_cache.

The patch was generated using the following script:

#!/bin/sh
#
# Replace one string by another in all the kernel sources.
#

set -e

for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
quilt add $file
sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
mv /tmp/$$ $file
quilt refresh
done

The script was run like this

sh replace kmem_cache_t "struct kmem_cache"

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-12-08 00:39:25 +0800

05 Dec, 2006

1 commit

4c1ac1b49 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 ... Browse Code »

Conflicts:

drivers/infiniband/core/iwcm.c
drivers/net/chelsio/cxgb2.c
drivers/net/wireless/bcm43xx/bcm43xx_main.c
drivers/net/wireless/prism54/islpci_eth.c
drivers/usb/core/hub.h
drivers/usb/input/hid-core.c
net/core/netpoll.c

Fix up merge failures with Linus's head and fix new compilation failures.

Signed-Off-By: David Howells

David Howells
2006-12-05 22:37:56 +0800

30 Nov, 2006

1 commit

93e06b414 BUG_ON conversion for fs/aio.c ... Browse Code »

This patch converts a if () BUG(); construct to BUG_ON();
which occupies less space, uses unlikely and is safer when
BUG() is disabled.

Signed-off-by: Eric Sesterhenn
Signed-off-by: Adrian Bunk

Eric Sesterhenn
2006-11-30 12:29:23 +0800

22 Nov, 2006

2 commits

65f27f384 WorkStruct: Pass the work_struct pointer instead of context data ... Browse Code »

Pass the work_struct pointer to the work function rather than context data.
The work function can use container_of() to work out the data.

For the cases where the container of the work_struct may go away the moment the
pending bit is cleared, it is made possible to defer the release of the
structure by deferring the clearing of the pending bit.

To make this work, an extra flag is introduced into the management side of the
work_struct. This governs auto-release of the structure upon execution.

Ordinarily, the work queue executor would release the work_struct for further
scheduling or deallocation by clearing the pending bit prior to jumping to the
work function. This means that, unless the driver makes some guarantee itself
that the work_struct won't go away, the work function may not access anything
else in the work_struct or its container lest they be deallocated.. This is a
problem if the auxiliary data is taken away (as done by the last patch).

However, if the pending bit is *not* cleared before jumping to the work
function, then the work function *may* access the work_struct and its container
with no problems. But then the work function must itself release the
work_struct by calling work_release().

In most cases, automatic release is fine, so this is the default. Special
initiators exist for the non-auto-release case (ending in _NAR).

Signed-Off-By: David Howells

David Howells
2006-11-22 22:55:48 +0800
52bad64d9 WorkStruct: Separate delayable and non-delayable events. ... Browse Code »

Separate delayable work items from non-delayable work items be splitting them
into a separate structure (delayed_work), which incorporates a work_struct and
the timer_list removed from work_struct.

The work_struct struct is huge, and this limits it's usefulness. On a 64-bit
architecture it's nearly 100 bytes in size. This reduces that by half for the
non-delayable type of event.

Signed-Off-By: David Howells

David Howells
2006-11-22 22:54:01 +0800

03 Oct, 2006

1 commit

691578cd5 [PATCH] pr_debug: aio: use size_t length modifier in pr_debug format arguments ... Browse Code »

aio: use size_t length modifier in pr_debug format arguments

Signed-off-by: Zach Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Zach Brown
2006-10-03 23:04:19 +0800

01 Oct, 2006

2 commits

eed4e51fb [PATCH] Add vector AIO support ... Browse Code »

This work is initially done by Zach Brown to add support for vectored aio.
These are the core changes for AIO to support
IOCB_CMD_PREADV/IOCB_CMD_PWRITEV.

[akpm@osdl.org: huge build fix]
Signed-off-by: Zach Brown
Signed-off-by: Christoph Hellwig
Signed-off-by: Badari Pulavarty
Acked-by: Benjamin LaHaise
Acked-by: James Morris
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Badari Pulavarty
2006-10-01 15:39:29 +0800
027445c37 [PATCH] Vectorize aio_read/aio_write fileop methods ... Browse Code »

This patch vectorizes aio_read() and aio_write() methods to prepare for
collapsing all aio & vectored operations into one interface - which is
aio_read()/aio_write().

Signed-off-by: Badari Pulavarty
Signed-off-by: Christoph Hellwig
Cc: Michael Holzheu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Badari Pulavarty
2006-10-01 15:39:28 +0800

27 Jun, 2006

1 commit

d6e05edc5 spelling fixes ... Browse Code »

acquired (aquired)
contiguous (contigious)
successful (succesful, succesfull)
surprise (suprise)
whether (weather)
some other misspellings

Signed-off-by: Andreas Mohr
Signed-off-by: Adrian Bunk

Andreas Mohr
2006-06-27 00:35:02 +0800

23 Jun, 2006

1 commit

626ab0e69 [PATCH] list: use list_replace_init() instead of list_splice_init() ... Browse Code »

list_splice_init(list, head) does unneeded job if it is known that
list_empty(head) == 1. We can use list_replace_init() instead.

Signed-off-by: Oleg Nesterov
Acked-by: David S. Miller
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2006-06-23 22:43:07 +0800

26 Mar, 2006

1 commit

11b0b5abb [PATCH] use kzalloc and kcalloc in core fs code ... Browse Code »

Signed-off-by: Oliver Neukum
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oliver Neukum
2006-03-26 00:23:00 +0800

09 Jan, 2006

1 commit

095975da2 [PATCH] rcu file: use atomic primitives ... Browse Code »

Use atomic_inc_not_zero for rcu files instead of special case rcuref.

Signed-off-by: Nick Piggin
Cc: "Paul E. McKenney"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2006-01-09 12:13:48 +0800

14 Nov, 2005

2 commits

d00689af6 [PATCH] aio: replace locking comments with assert_spin_locked() ... Browse Code »

aio: replace locking comments with assert_spin_locked()

Signed-off-by: Zach Brown
Acked-by: Benjamin LaHaise
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Zach Brown
2005-11-14 10:14:16 +0800
20dcae324 [PATCH] aio: remove kioctx from mm_struct ... Browse Code »

Sync iocbs have a life cycle that don't need a kioctx. Their retrying, if
any, is done in the context of their owner who has allocated them on the
stack.

The sole user of a sync iocb's ctx reference was aio_complete() checking for
an elevated iocb ref count that could never happen. No path which grabs an
iocb ref has access to sync iocbs.

If we were to implement sync iocb cancelation it would be done by the owner of
the iocb using its on-stack reference.

Removing this chunk from aio_complete allows us to remove the entire kioctx
instance from mm_struct, reducing its size by a third. On a i386 testing box
the slab size went from 768 to 504 bytes and from 5 to 8 per page.

Signed-off-by: Zach Brown
Acked-by: Benjamin LaHaise
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Zach Brown
2005-11-14 10:14:16 +0800

07 Nov, 2005

1 commit

d55b5fdaf [PATCH] aio: remove aio_max_nr accounting race ... Browse Code »

AIO was adding a new context's max requests to the global total before
testing if that resulting total was over the global limit. This let
innocent tasks get their new limit tested along with a racing guilty task
that was crossing the limit. This serializes the _nr accounting with a
spinlock It also switches to using unsigned long for the global totals.
Individual contexts are still limited to an unsigned int's worth of
requests by the syscall interface.

The problem and fix were verified with a simple program that spun creating
and destroying a context while holding on to another long lived context.
Before the patch a task creating a tiny context could get a spurious EAGAIN
if it raced with a task creating a very large context that overran the
limit.

Signed-off-by: Zach Brown
Cc: Benjamin LaHaise
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Zach Brown
2005-11-07 23:53:38 +0800

24 Oct, 2005

1 commit

8766ce410 [PATCH] aio syscalls are not checked by lsm ... Browse Code »

Another case of missing call to security_file_permission: aio functions
(namely, io_submit) does not check credentials with security modules.

Below is the simple patch to the problem. It seems that it is enough to
check for rights at the request submission time.

Signed-off-by: Kostik Belousov
Signed-off-by: Chris Wright
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kostik Belousov
2005-10-24 07:38:38 +0800

18 Oct, 2005

1 commit

4faa52852 [PATCH] aio: revert lock_kiocb() ... Browse Code »

lock_kiocb() was introduced to serialize retrying and cancellation. In the
process of doing so it tried to sleep waiting for KIF_LOCKED while holding
the ctx_lock spinlock. Recent fixes have ensured that multiple concurrent
retries won't be attempted for a given iocb. Cancel has other problems and
has no significant in-tree users that have been complaining about it. So
for the immediate future we'll revert sleeping with the lock held and will
address proper cancellation and retry serialization in the future.

Signed-off-by: Zach Brown
Acked-by: Benjamin LaHaise
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Zach Brown
2005-10-18 08:03:57 +0800

01 Oct, 2005

3 commits

353fb07e2 [PATCH] aio: avoid extra aio_{read,write} call when ki_left == 0 ... Browse Code »

Recently aio_p{read,write} changed to perform retries internally rather
than returning -EIOCBRETRY. This inadvertantly resulted in always calling
aio_{read,write} with ki_left at 0 which would in turn immediately return
0. Harmless, but we can avoid this call by checking in the caller.

Signed-off-by: Zach Brown
Signed-off-by: Benjamin LaHaise
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Zach Brown
2005-10-01 03:41:17 +0800
897f15fb5 [PATCH] aio: remove unlocked task_list test and resulting race ... Browse Code »

Only one of the run or kick path is supposed to put an iocb on the run
list. If both of them do it than one of them can end up referencing a
freed iocb. The kick path could delete the task_list item from the wait
queue before getting the ctx_lock and putting the iocb on the run list.
The run path was testing the task_list item outside the lock so that it
could catch ki_retry methods that return -EIOCBRETRY *without* putting the
iocb on a wait queue and promising to call kick_iocb. This unlocked check
could then race with the kick path to cause both to try and put the iocb on
the run list.

The patch stops the run path from testing task_list by requring that any
ki_retry that returns -EIOCBRETRY *must* guarantee that kick_iocb() will be
called in the future. aio_p{read,write}, the only in-tree -EIOCBRETRY
users, are updated.

Signed-off-by: Zach Brown
Signed-off-by: Benjamin LaHaise
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Zach Brown
2005-10-01 03:41:17 +0800
998765e55 [PATCH] aio: lock around kiocbTryKick() ... Browse Code »

Only one of the run or kick path is supposed to put an iocb on the run
list. If both of them do it than one of them can end up referencing a
freed iocb. The kick patch could set the Kicked bit before acquiring the
ctx_lock and putting the iocb on the run list. The run path, while holding
the ctx_lock, could see this partial kick and mistake it for a kick that
was deferred while it was doing work with the run_list NULLed out. It
would then race with the kick thread to add the iocb to the run list.

This patch moves the kick setting under the ctx_lock so that only one of
the kick or run path queues the iocb on the run list, as intended.

Signed-off-by: Zach Brown
Signed-off-by: Benjamin LaHaise
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Zach Brown
2005-10-01 03:41:17 +0800

18 Sep, 2005

1 commit

a464adeb7 [PATCH] Add smp_mb__after_clear_bit() to unlock_kiocb() ... Browse Code »

Add smp_mb__after_clear_bit() to unlock_kiocb()

AIO's use of wait_on_bit_lock()/wake_up_bit() forgot to add a barrier
between clearing its lock bit and calling wake_up_bit() so wake_up_bit()'s
unlocked waitqueue_active() can race. This puts AIO's use in line with the
others and the comment above wake_up_bit().

Signed-off-by: Zach Brown
Acked-by: Benjamin LaHaise
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Zach Brown
2005-09-18 02:50:02 +0800

10 Sep, 2005

3 commits

ab2af1f50 [PATCH] files: files struct with RCU ... Browse Code »

Patch to eliminate struct files_struct.file_lock spinlock on the reader side
and use rcu refcounting rcuref_xxx api for the f_count refcounter. The
updates to the fdtable are done by allocating a new fdtable structure and
setting files->fdt to point to the new structure. The fdtable structure is
protected by RCU thereby allowing lock-free lookup. For fd arrays/sets that
are vmalloced, we use keventd to free them since RCU callbacks can't sleep. A
global list of fdtable to be freed is not scalable, so we use a per-cpu list.
If keventd is already handling the current cpu's work, we use a timer to defer
queueing of that work.

Since the last publication, this patch has been re-written to avoid using
explicit memory barriers and use rcu_assign_pointer(), rcu_dereference()
premitives instead. This required that the fd information is kept in a
separate structure (fdtable) and updated atomically.

Signed-off-by: Dipankar Sarma
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dipankar Sarma
2005-09-10 04:57:55 +0800
ac0b1bc1e [PATCH] aio: kiocb locking to serialise retry and cancel ... Browse Code »

Implement a per-kiocb lock to serialise retry operations and cancel. This
is done using wait_on_bit_lock() on the KIF_LOCKED bit of kiocb->ki_flags.
Also, make the cancellation path lock the kiocb and subsequently release
all references to it if the cancel was successful. This version includes a
fix for the deadlock with __aio_run_iocbs.

Signed-off-by: Benjamin LaHaise
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Benjamin LaHaise
2005-09-10 04:57:32 +0800
8f58202bf [PATCH] change io_cancel return code for no cancel case ... Browse Code »

Note that other than few exceptions, most of the current filesystem and/or
drivers do not have aio cancel specifically defined (kiob->ki_cancel field
is mostly NULL). However, sys_io_cancel system call universally sets
return code to -EAGAIN. This gives applications a wrong impression that
this call is implemented but just never works. We have customer inquires
about this issue.

Changed by Benjamin LaHaise to EINVAL instead of ENOSYS

Signed-off-by: S. Wendy Cheng
Acked-by: Benjamin LaHaise
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wendy Cheng
2005-09-10 04:57:32 +0800

05 Sep, 2005

1 commit

1e40cd383 [PATCH] uml: fixes performance regression in activate_mm and thus exec() ... Browse Code »

Normally, activate_mm() is called from exec(), and thus it used to be a
no-op because we use a completely new "MM context" on the host (for
instance, a new process), and so we didn't need to flush any "TLB entries"
(which for us are the set of memory mappings for the host process from the
virtual "RAM" file).

Kernel threads, instead, are usually handled in a different way. So, when
for AIO we call use_mm(), things used to break and so Benjamin implemented
activate_mm(). However, that is only needed for AIO, and could slow down
exec() inside UML, so be smart: detect being called for AIO (via
PF_BORROWED_MM) and do the full flush only in that situation.

Comment also the caller so that people won't go breaking UML without
noticing. I also rely on the caller's locks for testing current->flags.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso
CC: Benjamin LaHaise
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paolo 'Blaisorblade' Giarrusso
2005-09-05 15:06:21 +0800