Eric Lee / smarc-fsl-linux-kernel

16 Dec, 2009

1 commit

fac046ad0 aio: remove unused field ... Browse Code »

Don't know the reason, but it appears ki_wait field of iocb never gets used.

Signed-off-by: Shaohua Li
Cc: Jeff Moyer
Cc: Benjamin LaHaise
Cc: Zach Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Shaohua Li
15 years ago

29 Oct, 2009

1 commit

b9d128f10 block: move bdi/address_space unplug functions to backing-dev.h ... Browse Code »

There's nothing block related about them, the backing device
is used by things like NFS etc as well. This gets rid of the
need to protect such calls by CONFIG_BLOCK.

Signed-off-by: Jens Axboe

Jens Axboe
15 years ago

28 Oct, 2009

1 commit

cfb1e33ee aio: implement request batching ... Browse Code »

Hi,

Some workloads issue batches of small I/O, and the performance is poor
due to the call to blk_run_address_space for every single iocb. Nathan
Roberts pointed this out, and suggested that by deferring this call
until all I/Os in the iocb array are submitted to the block layer, we
can realize some impressive performance gains (up to 30% for sequential
4k reads in batches of 16).

Signed-off-by: Jeff Moyer
Signed-off-by: Jens Axboe

Jeff Moyer
15 years ago

23 Sep, 2009

1 commit

385773e04 aio.c: move EXPORT* macros to line after function ... Browse Code »

As mentioned in Documentation/CodingStyle, move EXPORT* macro's
to the line immediately after the closing function brace line.

Also, move the __initcall() similarly.

Signed-off-by: H Hartley Sweeten
Cc: Zach Brown
Cc: Benjamin LaHaise
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

H Hartley Sweeten
16 years ago

22 Sep, 2009

1 commit

3d2d827f5 mm: move use_mm/unuse_mm from aio.c to mm/ ... Browse Code »

Anyone who wants to do copy to/from user from a kernel thread, needs
use_mm (like what fs/aio has). Move that into mm/, to make reusing and
exporting easier down the line, and make aio use it. Next intended user,
besides aio, will be vhost-net.

Acked-by: Andrea Arcangeli
Signed-off-by: Michael S. Tsirkin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michael S. Tsirkin
16 years ago

01 Jul, 2009

1 commit

133890103 eventfd: revised interface and cleanups ... Browse Code »

Change the eventfd interface to de-couple the eventfd memory context, from
the file pointer instance.

Without such change, there is no clean way to racely free handle the
POLLHUP event sent when the last instance of the file* goes away. Also,
now the internal eventfd APIs are using the eventfd context instead of the
file*.

This patch is required by KVM's IRQfd code, which is still under
development.

Signed-off-by: Davide Libenzi
Cc: Gregory Haskins
Cc: Rusty Russell
Cc: Benjamin LaHaise
Cc: Avi Kivity
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davide Libenzi
16 years ago

20 Mar, 2009

2 commits

65c24491b aio: lookup_ioctx can return the wrong value when looking up a bogus context ... Browse Code »

The libaio test harness turned up a problem whereby lookup_ioctx on a
bogus io context was returning the 1 valid io context from the list
(harness/cases/3.p).

Because of that, an extra put_iocontext was done, and when the process
exited, it hit a BUG_ON in the put_iocontext macro called from exit_aio
(since we expect a users count of 1 and instead get 0).

The problem was introduced by "aio: make the lookup_ioctx() lockless"
(commit abf137dd7712132ee56d5b3143c2ff61a72a5faa).

Thanks to Zach for pointing out that hlist_for_each_entry_rcu will not
return with a NULL tpos at the end of the loop, even if the entry was
not found.

Signed-off-by: Jeff Moyer
Acked-by: Zach Brown
Acked-by: Jens Axboe
Cc: Benjamin LaHaise
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jeff Moyer
16 years ago
87c3a86e1 eventfd: remove fput() call from possible IRQ context ... Browse Code »

Remove a source of fput() call from inside IRQ context. Myself, like Eric,
wasn't able to reproduce an fput() call from IRQ context, but Jeff said he was
able to, with the attached test program. Independently from this, the bug is
conceptually there, so we might be better off fixing it. This patch adds an
optimization similar to the one we already do on ->ki_filp, on ->ki_eventfd.
Playing with ->f_count directly is not pretty in general, but the alternative
here would be to add a brand new delayed fput() infrastructure, that I'm not
sure is worth it.

Signed-off-by: Davide Libenzi
Cc: Benjamin LaHaise
Cc: Trond Myklebust
Cc: Eric Dumazet
Signed-off-by: Jeff Moyer
Cc: Zach Brown
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davide Libenzi
16 years ago

14 Jan, 2009

1 commit

002c8976e [CVE-2009-0029] System call wrappers part 16 ... Browse Code »

Signed-off-by: Heiko Carstens

Heiko Carstens
16 years ago

29 Dec, 2008

1 commit

abf137dd7 aio: make the lookup_ioctx() lockless ... Browse Code »

The mm->ioctx_list is currently protected by a reader-writer lock,
so we always grab that lock on the read side for doing ioctx
lookups. As the workload is extremely reader biased, turn this into
an rcu hlist so we can make lookup_ioctx() lockless. Get rid of
the rwlock and use a spinlock for providing update side exclusion.

There's usually only 1 entry on this list, so it doesn't make sense
to look into fancier data structures.

Reviewed-by: Jeff Moyer
Signed-off-by: Jens Axboe

Jens Axboe
16 years ago

27 Jul, 2008

1 commit

516e0cc56 [PATCH] f_count may wrap around ... Browse Code »

make it atomic_long_t; while we are at it, get rid of useless checks in affs,
hfs and hpfs - ->open() always has it equal to 1, ->release() - to 0.

Signed-off-by: Al Viro

Al Viro
17 years ago

26 Jul, 2008

1 commit

246bb0b1d kill PF_BORROWED_MM in favour of PF_KTHREAD ... Browse Code »

Kill PF_BORROWED_MM. Change use_mm/unuse_mm to not play with ->flags, and
do s/PF_BORROWED_MM/PF_KTHREAD/ for a couple of other users.

No functional changes yet. But this allows us to do further
fixes/cleanups.

oom_kill/ptrace/etc often check "p->mm != NULL" to filter out the
kthreads, this is wrong because of use_mm(). The problem with
PF_BORROWED_MM is that we need task_lock() to avoid races. With this
patch we can check PF_KTHREAD directly, or use a simple lockless helper:

/* The result must not be dereferenced !!! */
struct mm_struct *__get_task_mm(struct task_struct *tsk)
{
if (tsk->flags & PF_KTHREAD)
return NULL;
return tsk->mm;
}

Note also ecard_task(). It runs with ->mm != NULL, but it's the kernel
thread without PF_BORROWED_MM.

Signed-off-by: Oleg Nesterov
Cc: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
17 years ago

07 Jun, 2008

1 commit

aab2545fd uml: activate_mm: remove the dead PF_BORROWED_MM check ... Browse Code »

use_mm() was changed to use switch_mm() instead of activate_mm(), since
then nobody calls (and nobody should call) activate_mm() with
PF_BORROWED_MM bit set.

As Jeff Dike pointed out, we can also remove the "old != new" check, it is
always true.

Signed-off-by: Oleg Nesterov
Cc: Jeff Dike
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
17 years ago

30 Apr, 2008

1 commit

c6f3a97f8 debugobjects: add timer specific object debugging code ... Browse Code »

Add calls to the generic object debugging infrastructure and provide fixup
functions which allow to keep the system alive when recoverable problems have
been detected by the object debugging core code.

Signed-off-by: Thomas Gleixner
Acked-by: Ingo Molnar
Cc: Greg KH
Cc: Randy Dunlap
Cc: Kay Sievers
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Thomas Gleixner
17 years ago

29 Apr, 2008

3 commits

39fa00311 aio: fix misleading comments ... Browse Code »

The FIXME comments are inaccurate.
The locking comment over lookup_ioctx() is wrong.

Signed-off-by: Jeff Moyer
Signed-off-by: Zach Brown
Signed-off-by: Shen Feng
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jeff Moyer
17 years ago
801678c5a Remove duplicated unlikely() in IS_ERR() ... Browse Code »

Some drivers have duplicated unlikely() macros. IS_ERR() already has
unlikely() in itself.

This patch cleans up such pointless code.

Signed-off-by: Hirofumi Nakagawa
Acked-by: David S. Miller
Acked-by: Jeff Garzik
Cc: Paul Clements
Cc: Richard Purdie
Cc: Alessandro Zummo
Cc: David Brownell
Cc: James Bottomley
Cc: Michael Halcrow
Cc: Anton Altaparmakov
Cc: Al Viro
Cc: Carsten Otte
Cc: Patrick McHardy
Cc: Paul Mundt
Cc: Jaroslav Kysela
Cc: Takashi Iwai
Acked-by: Mike Frysinger
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hirofumi Nakagawa
17 years ago
d5470b596 fs/aio.c: make 3 functions static ... Browse Code »

Make the following needlessly global functions static:

- __put_ioctx()
- lookup_ioctx()
- io_submit_one()

Signed-off-by: Adrian Bunk
Cc: Zach Brown
Cc: Benjamin LaHaise
Cc: Badari Pulavarty
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Adrian Bunk
17 years ago

28 Apr, 2008

1 commit

e92adcba2 aio: io_getevents() should return if io_destroy() is invoked ... Browse Code »

This patch wakes up a thread waiting in io_getevents if another thread
destroys the context. This was tested using a small program that spawns a
thread to wait in io_getevents while the parent thread destroys the io context
and then waits for the getevents thread to exit. Without this patch, the
program hangs indefinitely. With the patch, the program exits as expected.

Signed-off-by: Jeff Moyer
Cc: Zach Brown
Cc: Christopher Smith
Cc: Benjamin LaHaise
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jeff Moyer
17 years ago

11 Apr, 2008

2 commits

8d1c98b0b eventfd/kaio integration fix ... Browse Code »

Jeff Roberson discovered a race when using kaio eventfd based notifications.
When it occurs it can lead tomissed wakeups and hung userspace.

This patch fixes the race by moving the notification inside the spinlocked
section of kaio. The operation is safe since eventfd spinlock and kaio one
are unrelated.

Signed-off-by: Davide Libenzi
Cc: Zach Brown
Cc: Jeff Roberson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davide Libenzi
17 years ago
598af051a asmlinkage_protect sys_io_getevents ... Browse Code »

Use asmlinkage_protect in sys_io_getevents, because GCC for i386 with
CONFIG_FRAME_POINTER=n can decide to clobber an argument word on the
stack, i.e. the user struct pt_regs. Here the problem is not a tail
call, but just the compiler's use of the stack when it inlines and
optimizes the body of the called function. This seems to avoid it.

Signed-off-by: Roland McGrath
Signed-off-by: Linus Torvalds

Roland McGrath
17 years ago

20 Mar, 2008

1 commit

6cb2a2104 aio: bad AIO race in aio_complete() leads to process hang ... Browse Code »

My group ran into a AIO process hang on a 2.6.24 kernel with the process
sleeping indefinitely in io_getevents(2) waiting for the last wakeup to come
and it never would.

We ran the tests on x86_64 SMP. The hang only occurred on a Xeon box
("Clovertown") but not a Core2Duo ("Conroe"). On the Xeon, the L2 cache isn't
shared between all eight processors, but is L2 is shared between between all
two processors on the Core2Duo we use.

My analysis of the hang is if you go down to the second while-loop
in read_events(), what happens on processor #1:
1) add_wait_queue_exclusive() adds thread to ctx->wait
2) aio_read_evt() to check tail
3) if aio_read_evt() returned 0, call [io_]schedule() and sleep

In aio_complete() with processor #2:
A) info->tail = tail;
B) waitqueue_active(&ctx->wait)
C) if waitqueue_active() returned non-0, call wake_up()

The way the code is written, step 1 must be seen by all other processors
before processor 1 checks for pending events in step 2 (that were recorded by
step A) and step A by processor 2 must be seen by all other processors
(checked in step 2) before step B is done.

The race I believed I was seeing is that steps 1 and 2 were
effectively swapped due to the __list_add() being delayed by the L2
cache not shared by some of the other processors. Imagine:
proc 2: just before step A
proc 1, step 1: adds to ctx->wait, but is not visible by other processors yet
proc 1, step 2: checks tail and sees no pending events
proc 2, step A: updates tail
proc 1, step 3: calls [io_]schedule() and sleeps
proc 2, step B: checks ctx->wait, but sees no one waiting, skips wakeup
so proc 1 sleeps indefinitely

My patch adds a memory barrier between steps A and B. It ensures that the
update in step 1 gets seen on processor 2 before continuing. If processor 1
was just before step 1, the memory barrier makes sure that step A (update
tail) gets seen by the time processor 1 makes it to step 2 (check tail).

Before the patch our AIO process would hang virtually 100% of the time. After
the patch, we have yet to see the process ever hang.

Signed-off-by: Quentin Barnes
Reviewed-by: Zach Brown
Cc: Benjamin LaHaise
Cc:
Cc: Nick Piggin
Signed-off-by: Andrew Morton
[ We should probably disallow that "if (waitqueue_active()) wake_up()"
coding pattern, because it's so often buggy wrt memory ordering ]
Signed-off-by: Linus Torvalds

Quentin Barnes
17 years ago

09 Feb, 2008

3 commits

c2ec66828 aio: negative offset should return -EINVAL ... Browse Code »

An AIO read or write should return -EINVAL if the offset is negative.
This check matches the one in pread and pwrite.

This was found by the libaio test suite.

Signed-off-by: Rusty Russell
Acked-by: Zach Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rusty Russell
17 years ago
7adfa2ff3 aio: partial write should not return error code ... Browse Code »

When an AIO write gets an error after writing some data (eg. ENOSPC), it
should return the amount written already, not the error. Just like write()
is supposed to.

This was found by the libaio test suite.

Signed-off-by: Rusty Russell
Acked-By: Zach Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rusty Russell
17 years ago
fc9b52cd8 fs: remove fastcall, it is always empty ... Browse Code »

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Harvey Harrison
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Harvey Harrison
17 years ago

30 Jan, 2008

1 commit

56c4da454 core: remove last users of empty FASTCALL macro ... Browse Code »

FASTCALL is always empty after the x86 removal.

Signed-off-by: Harvey Harrison
Signed-off-by: Ingo Molnar
Signed-off-by: Thomas Gleixner

Harvey Harrison
17 years ago

06 Dec, 2007

1 commit

e00ba3dae aio: only account I/O wait time in read_events if there are active requests ... Browse Code »

On 2.6.24, top started showing 100% iowait on one CPU when a UML instance was
running (but completely idle). The UML code sits in io_getevents waiting for
an event to be submitted and completed.

Fix this by checking ctx->reqs_active before scheduling to determine whether
or not we are waiting for I/O.

Signed-off-by: Jeff Moyer
Cc: Zach Brown
Cc: Miklos Szeredi
Cc: Jeff Dike
Cc: "Rafael J. Wysocki"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jeff Moyer
17 years ago

19 Oct, 2007

1 commit

6212e3a38 Remove struct task_struct::io_wait ... Browse Code »

Hell knows what happened in commit 63b05203af57e7de4f3bb63b8b81d43bc196d32b
during 2.6.9 development. Commit introduced io_wait field which remained
write-only than and still remains write-only.

Also garbage collect macros which "use" io_wait.

Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
17 years ago

17 Oct, 2007

1 commit

41d10da37 aio: account I/O wait time properly ... Browse Code »

Some months back I proposed changing the schedule() call in
read_events to an io_schedule():
http://osdir.com/ml/linux.kernel.aio.general/2006-10/msg00024.html
This was rejected as there are AIO operations that do not initiate
disk I/O. I've had another look at the problem, and the only AIO
operation that will not initiate disk I/O is IOCB_CMD_NOOP. However,
this command isn't even wired up!

Given that it doesn't work, and hasn't for *years*, I'm going to
suggest again that we do proper I/O accounting when using AIO.

Signed-off-by: Jeff Moyer
Acked-by: Zach Brown
Cc: Benjamin LaHaise
Cc: Suparna Bhattacharya
Cc: Badari Pulavarty
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jeff Moyer
17 years ago

09 Oct, 2007

1 commit

87e2831c3 AIO: fix cleanup in io_submit_one(...) ... Browse Code »

When IOCB_FLAG_RESFD flag is set and iocb->aio_resfd is incorrect,
statement 'goto out_put_req' is executed. At label 'out_put_req',
aio_put_req(..) is called, which requires 'req->ki_filp' set.

Signed-off-by: Yan Zheng
Cc: Zach Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Yan Zheng
18 years ago

11 May, 2007

1 commit

9c3060bed signal/timer/event: KAIO eventfd support example ... Browse Code »

This is an example about how to add eventfd support to the current KAIO code,
in order to enable KAIO to post readiness events to a pollable fd (hence
compatible with POSIX select/poll). The KAIO code simply signals the eventfd
fd when events are ready, and this triggers a POLLIN in the fd. This patch
uses a reserved for future use member of the struct iocb to pass an eventfd
file descriptor, that KAIO will use to post events every time a request
completes. At that point, an aio_getevents() will return the completed result
to a struct io_event. I made a quick test program to verify the patch, and it
runs fine here:

http://www.xmailserver.org/eventfd-aio-test.c

The test program uses poll(2), but it'd, of course, work with select and epoll
too.

This can allow to schedule both block I/O and other poll-able devices
requests, and wait for results using select/poll/epoll. In a typical
scenario, an application would submit KAIO request using aio_submit(), and
will also use epoll_ctl() on the whole other class of devices (that with the
addition of signals, timers and user events, now it's pretty much complete),
and then would:

epoll_wait(...);
for_each_event {
if (curr_event_is_kaiofd) {
aio_getevents();
dispatch_aio_events();
} else {
dispatch_epoll_event();
}
}

Signed-off-by: Davide Libenzi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davide Libenzi
18 years ago

10 May, 2007

2 commits

28e53bddf unify flush_work/flush_work_keventd and rename it to cancel_work_sync ... Browse Code »

flush_work(wq, work) doesn't need the first parameter, we can use cwq->wq
(this was possible from the very beginnig, I missed this). So we can unify
flush_work_keventd and flush_work.

Also, rename flush_work() to cancel_work_sync() and fix all callers.
Perhaps this is not the best name, but "flush_work" is really bad.

(akpm: this is why the earlier patches bypassed maintainers)

Signed-off-by: Oleg Nesterov
Cc: Jeff Garzik
Cc: "David S. Miller"
Cc: Jens Axboe
Cc: Tejun Heo
Cc: Auke Kok ,
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
18 years ago
a9df62c75 aio: use flush_work() ... Browse Code »

Migrate AIO over to use flush_work().

Cc: "Maciej W. Rozycki"
Cc: David Howells
Cc: Zach Brown
Cc: Benjamin LaHaise
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
18 years ago

08 May, 2007

1 commit

0a31bd5f2 KMEM_CACHE(): simplify slab cache creation ... Browse Code »

This patch provides a new macro

KMEM_CACHE(, )

to simplify slab creation. KMEM_CACHE creates a slab with the name of the
struct, with the size of the struct and with the alignment of the struct.
Additional slab flags may be specified if necessary.

Example

struct test_slab {
int a,b,c;
struct list_head;
} __cacheline_aligned_in_smp;

test_slab_cache = KMEM_CACHE(test_slab, SLAB_PANIC)

will create a new slab named "test_slab" of the size sizeof(struct
test_slab) and aligned to the alignment of test slab. If it fails then we
panic.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
18 years ago

28 Mar, 2007

1 commit

28defbea6 [PATCH] aio: remove bare user-triggerable error printk ... Browse Code »

The user can generate console output if they cause do_mmap() to fail
during sys_io_setup(). This was seen in a regression test that does
exactly that by spinning calling mmap() until it gets -ENOMEM before
calling io_setup().

We don't need this printk at all, just remove it.

Signed-off-by: Zach Brown
Signed-off-by: Linus Torvalds

Zach Brown
18 years ago

12 Feb, 2007

2 commits

c37622296 [PATCH] Transform kmem_cache_alloc()+memset(0) -> kmem_cache_zalloc(). ... Browse Code »

Replace appropriate pairs of "kmem_cache_alloc()" + "memset(0)" with the
corresponding "kmem_cache_zalloc()" call.

Signed-off-by: Robert P. J. Day
Cc: "Luck, Tony"
Cc: Andi Kleen
Cc: Roland McGrath
Cc: James Bottomley
Cc: Greg KH
Acked-by: Joel Becker
Cc: Steven Whitehouse
Cc: Jan Kara
Cc: Michael Halcrow
Cc: "David S. Miller"
Cc: Stephen Smalley
Cc: James Morris
Cc: Chris Wright
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Robert P. J. Day
18 years ago
e10a4437c [PATCH] Remove final references to deprecated "MAP_ANON" page protection flag ... Browse Code »

Remove the last vestiges of the long-deprecated "MAP_ANON" page protection
flag: use "MAP_ANONYMOUS" instead.

Signed-off-by: Robert P. J. Day
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Robert P. J. Day
18 years ago

04 Feb, 2007

1 commit

dee11c236 [PATCH] aio: fix buggy put_ioctx call in aio_complete - v2 ... Browse Code »

An AIO bug was reported that sleeping function is being called in softirq
context:

BUG: warning at kernel/mutex.c:132/__mutex_lock_common()
Call Trace:
[] __mutex_lock_slowpath+0x640/0x6c0
[] mutex_lock+0x20/0x40
[] flush_workqueue+0xb0/0x1a0
[] __put_ioctx+0xc0/0x240
[] aio_complete+0x2f0/0x420
[] finished_one_bio+0x200/0x2a0
[] dio_bio_complete+0x1c0/0x200
[] dio_bio_end_aio+0x60/0x80
[] bio_endio+0x110/0x1c0
[] __end_that_request_first+0x180/0xba0
[] end_that_request_chunk+0x30/0x60
[] scsi_end_request+0x50/0x300 [scsi_mod]
[] scsi_io_completion+0x200/0x8a0 [scsi_mod]
[] sd_rw_intr+0x330/0x860 [sd_mod]
[] scsi_finish_command+0x100/0x1c0 [scsi_mod]
[] scsi_softirq_done+0x230/0x300 [scsi_mod]
[] blk_done_softirq+0x160/0x1c0
[] __do_softirq+0x200/0x240
[] do_softirq+0x70/0xc0

See report: http://marc.theaimsgroup.com/?l=linux-kernel&m=116599593200888&w=2

flush_workqueue() is not allowed to be called in the softirq context.
However, aio_complete() called from I/O interrupt can potentially call
put_ioctx with last ref count on ioctx and triggers bug. It is simply
incorrect to perform ioctx freeing from aio_complete.

The bug is trigger-able from a race between io_destroy() and aio_complete().
A possible scenario:

cpu0 cpu1
io_destroy aio_complete
wait_for_all_aios { __aio_put_req
... ctx->reqs_active--;
if (!ctx->reqs_active)
return;
}
...
put_ioctx(ioctx)

put_ioctx(ctx);
__put_ioctx
bam! Bug trigger!

The real problem is that the condition check of ctx->reqs_active in
wait_for_all_aios() is incorrect that access to reqs_active is not
being properly protected by spin lock.

This patch adds that protective spin lock, and at the same time removes
all duplicate ref counting for each kiocb as reqs_active is already used
as a ref count for each active ioctx. This also ensures that buggy call
to flush_workqueue() in softirq context is eliminated.

Signed-off-by: "Ken Chen"
Cc: Zach Brown
Cc: Suparna Bhattacharya
Cc: Benjamin LaHaise
Cc: Badari Pulavarty
Cc:
Acked-by: Jeff Moyer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ken Chen
18 years ago

31 Dec, 2006

1 commit

1ebb1101c [PATCH] Fix lock inversion aio_kick_handler() ... Browse Code »

lockdep found a AB BC CA lock inversion in retry-based AIO:

1) The task struct's alloc_lock (A) is acquired in process context with
interrupts enabled. An interrupt might arrive and call wake_up() which
grabs the wait queue's q->lock (B).

2) When performing retry-based AIO the AIO core registers
aio_wake_function() as the wake funtion for iocb->ki_wait. It is called
with the wait queue's q->lock (B) held and then tries to add the iocb to
the run list after acquiring the ctx_lock (C).

3) aio_kick_handler() holds the ctx_lock (C) while acquiring the
alloc_lock (A) via lock_task() and unuse_mm(). Lockdep emits a warning
saying that we're trying to connect the irq-safe q->lock to the
irq-unsafe alloc_lock via ctx_lock.

This fixes the inversion by calling unuse_mm() in the AIO kick handing path
after we've released the ctx_lock. As Ben LaHaise pointed out __put_ioctx
could set ctx->mm to NULL, so we must only access ctx->mm while we have the
lock.

Signed-off-by: Zach Brown
Signed-off-by: Suparna Bhattacharya
Acked-by: Benjamin LaHaise
Cc: "Chen, Kenneth W"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Zach Brown
18 years ago

14 Dec, 2006

1 commit

90aef12e6 [PATCH] Use activate_mm() in fs/aio.c:use_mm() ... Browse Code »

activate_mm() is not the right thing to be using in use_mm(). It should be
switch_mm().

On normal x86, they're synonymous, but for the Xen patches I'm adding a
hook which assumes that activate_mm is only used the first time a new mm
is used after creation (I have another hook for dealing with dup_mm). I
think this use of activate_mm() is the only place where it could be used
a second time on an mm.

>From a quick look at the other architectures I think this is OK (most
simply implement one in terms of the other), but some are doing some
subtly different stuff between the two.

Acked-by: David Miller
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jeremy Fitzhardinge
18 years ago

08 Dec, 2006

1 commit

97d2a8058 [PATCH] aio: remove ki_retried debugging member ... Browse Code »

Remove the ki_retried member from struct kiocb. I think the idea was
bounced around a while back, but Arnaldo pointed out another reason that we
should dig it up when he pointed out that the last cacheline of struct
kiocb only contains 4 bytes. By removing the debugging member, we save
more than the 8 byte on 64 bit machines.

Signed-off-by: Benjamin LaHaise
Acked-by: Ken Chen
Acked-by: Zach Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Benjamin LaHaise
18 years ago