25 Mar, 2011
1 commit
-
* 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block: (65 commits)
Documentation/iostats.txt: bit-size reference etc.
cfq-iosched: removing unnecessary think time checking
cfq-iosched: Don't clear queue stats when preempt.
blk-throttle: Reset group slice when limits are changed
blk-cgroup: Only give unaccounted_time under debug
cfq-iosched: Don't set active queue in preempt
block: fix non-atomic access to genhd inflight structures
block: attempt to merge with existing requests on plug flush
block: NULL dereference on error path in __blkdev_get()
cfq-iosched: Don't update group weights when on service tree
fs: assign sb->s_bdi to default_backing_dev_info if the bdi is going away
block: Require subsystems to explicitly allocate bio_set integrity mempool
jbd2: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
jbd: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
fs: make fsync_buffers_list() plug
mm: make generic_writepages() use plugging
blk-cgroup: Add unaccounted time to timeslice_used.
block: fixup plugging stubs for !CONFIG_BLOCK
block: remove obsolete comments for blkdev_issue_zeroout.
blktrace: Use rq->cmd_flags directly in blk_add_trace_rq.
...Fix up conflicts in fs/{aio.c,super.c}
23 Mar, 2011
1 commit
-
The test program below will hang because io_getevents() uses
add_wait_queue_exclusive(), which means the wake_up() in io_destroy() only
wakes up one of the threads. Fix this by using wake_up_all() in the aio
code paths where we want to make sure no one gets stuck.// t.c -- compile with gcc -lpthread -laio t.c
#include
#include
#include
#includestatic const int nthr = 2;
void *getev(void *ctx)
{
struct io_event ev;
io_getevents(ctx, 1, 1, &ev, NULL);
printf("io_getevents returned\n");
return NULL;
}int main(int argc, char *argv[])
{
io_context_t ctx = 0;
pthread_t thread[nthr];
int i;io_setup(1024, &ctx);
for (i = 0; i < nthr; ++i)
pthread_create(&thread[i], NULL, getev, ctx);sleep(1);
io_destroy(ctx);
for (i = 0; i < nthr; ++i)
pthread_join(thread[i], NULL);return 0;
}Signed-off-by: Roland Dreier
Reviewed-by: Jeff Moyer
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
16 Mar, 2011
1 commit
-
* 'for-2.6.39' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: fix build failure introduced by s/freezeable/freezable/
workqueue: add system_freezeable_wq
rds/ib: use system_wq instead of rds_ib_fmr_wq
net/9p: replace p9_poll_task with a work
net/9p: use system_wq instead of p9_mux_wq
xfs: convert to alloc_workqueue()
reiserfs: make commit_wq use the default concurrency level
ocfs2: use system_wq instead of ocfs2_quota_wq
ext4: convert to alloc_workqueue()
scsi/scsi_tgt_lib: scsi_tgtd isn't used in memory reclaim path
scsi/be2iscsi,qla2xxx: convert to alloc_workqueue()
misc/iwmc3200top: use system_wq instead of dedicated workqueues
i2o: use alloc_workqueue() instead of create_workqueue()
acpi: kacpi*_wq don't need WQ_MEM_RECLAIM
fs/aio: aio_wq isn't used in memory reclaim path
input/tps6507x-ts: use system_wq instead of dedicated workqueue
cpufreq: use system_wq instead of dedicated workqueues
wireless/ipw2x00: use system_wq instead of dedicated workqueues
arm/omap: use system_wq in mailbox
workqueue: use WQ_MEM_RECLAIM instead of WQ_RESCUER
10 Mar, 2011
4 commits
-
Conflicts:
block/blk-core.c
block/blk-flush.c
drivers/md/raid1.c
drivers/md/raid10.c
drivers/md/raid5.c
fs/nilfs2/btnode.c
fs/nilfs2/mdt.cSigned-off-by: Jens Axboe
-
This should be useless now that we have on-stack plugging. So lets just
kill it.Signed-off-by: Jens Axboe
-
Signed-off-by: Shaohua Li
Signed-off-by: Jens Axboe -
Code has been converted over to the new explicit on-stack plugging,
and delay users have been converted to use the new API for that.
So lets kill off the old plugging along with aops->sync_page().Signed-off-by: Jens Axboe
26 Feb, 2011
2 commits
-
A race can occur when io_submit() races with io_destroy():
CPU1 CPU2
io_submit()
do_io_submit()
...
ctx = lookup_ioctx(ctx_id);
io_destroy()
Now do_io_submit() holds the last reference to ctx.
...
queue new AIO
put_ioctx(ctx) - frees ctx with active AIOsWe solve this issue by checking whether ctx is being destroyed in AIO
submission path after adding new AIO to ctx. Then we are guaranteed that
either io_destroy() waits for new AIO or we see that ctx is being
destroyed and bail out.Cc: Nick Piggin
Reviewed-by: Jeff Moyer
Signed-off-by: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
aio-dio-invalidate-failure GPFs in aio_put_req from io_submit.
lookup_ioctx doesn't implement the rcu lookup pattern properly.
rcu_read_lock does not prevent refcount going to zero, so we might take
a refcount on a zero count ioctx.Fix the bug by atomically testing for zero refcount before incrementing.
[jack@suse.cz: added comment into the code]
Reviewed-by: Jeff Moyer
Signed-off-by: Nick Piggin
Signed-off-by: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
27 Jan, 2011
1 commit
-
aio_wq isn't used during memory reclaim. Convert to alloc_workqueue()
without WQ_MEM_RECLAIM. It's possible to use system_wq but given that
the number of work items is determined from userland and the work item
may block, enforcing strict concurrency limit would be a good idea.Also, move fput_work to system_wq so that aio_wq is used soley to
throttle the max concurrency of aio work items and fput_work doesn't
interact with other work items.Signed-off-by: Tejun Heo
Acked-by: Jeff Moyer
Cc: Benjamin LaHaise
Cc: linux-aio@kvack.org
17 Jan, 2011
1 commit
-
Signed-off-by: Namhyung Kim
Signed-off-by: Al Viro
14 Jan, 2011
2 commits
-
aio_run_iocbs() is not used at all, so get rid of it.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Jeff Moyer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
'nr >= min_nr >= 0' always satisfies 'nr >= 0' so the check is unnecesary.
Signed-off-by: Namhyung Kim
Acked-by: Jeff Moyer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
26 Oct, 2010
2 commits
-
Clones an existing reference to inode; caller must already hold one.
Signed-off-by: Al Viro
-
The aio batching code is using igrab to get an extra reference on the
inode so it can safely batch. igrab will go ahead and take the global
inode spinlock, which can be a bottleneck on large machines doing lots
of AIO.In this case, igrab isn't required because we already have a reference
on the file handle. It is safe to just bump the i_count directly
on the inode.Benchmarking shows this patch brings IOP/s on tons of flash up by about
2.5X.Signed-off-by: Chris Mason
23 Sep, 2010
1 commit
-
OCFS2 can return ERESTARTSYS from its write function when the process is
signalled while waiting for a cluster lock (and the filesystem is mounted
with intr mount option). Generally, it seems reasonable to allow
filesystems to return this error code from its IO functions. As we must
not leak ERESTARTSYS (and similar error codes) to userspace as a result of
an AIO operation, we have to properly convert it to EINTR inside AIO code
(restarting the syscall isn't really an option because other AIO could
have been already submitted by the same io_submit syscall).Signed-off-by: Jan Kara
Reviewed-by: Jeff Moyer
Cc: Christoph Hellwig
Cc: Zach Brown
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
15 Sep, 2010
1 commit
-
Tavis Ormandy pointed out that do_io_submit does not do proper bounds
checking on the passed-in iocb array:if (unlikely(nr < 0))
return -EINVAL;if (unlikely(!access_ok(VERIFY_READ, iocbpp, (nr*sizeof(iocbpp)))))
return -EFAULT; ^^^^^^^^^^^^^^^^^^The attached patch checks for overflow, and if it is detected, the
number of iocbs submitted is scaled down to a number that will fit in
the long. This is an ok thing to do, as sys_io_submit is documented as
returning the number of iocbs submitted, so callers should handle a
return value of less than the 'nr' argument passed in.Reported-by: Tavis Ormandy
Signed-off-by: Jeff Moyer
Signed-off-by: Linus Torvalds
06 Aug, 2010
1 commit
-
- sys_io_destroy(): acutually return -EINVAL if the context pointed to
is invalidIndex: linux-2.6.33-rc4/fs/aio.c
- sys_io_getevents(): An argument specifying timeout is not `when',
but `timeout'.
- sys_io_getevents(): Should describe what is returned if this syscall
succeeds.Signed-off-by: Satoru Takeuchi
Signed-off-by: Randy Dunlap
Reviewed-by: Jeff Moyer
Signed-off-by: Linus Torvalds
28 May, 2010
2 commits
-
__aio_put_req() plays sick games with file refcount. What
it wants is fput() from atomic context; it's almost always
done with f_count > 1, so they only have to deal with delayed
work in rare cases when their reference happens to be the
last one. Current code decrements f_count and if it hasn't
hit 0, everything is fine. Otherwise it keeps a pointer
to struct file (with zero f_count!) around and has delayed
work do __fput() on it.Better way to do it: use atomic_long_add_unless( , -1, 1)
instead of !atomic_long_dec_and_test(). IOW, decrement it
only if it's not the last reference, leave refcount alone
if it was. And use normal fput() in delayed work.I've made that atomic_long_add_unless call a new helper -
fput_atomic(). Drops a reference to file if it's safe to
do in atomic (i.e. if that's not the last one), tells if
it had been able to do that. aio.c converted to it, __fput()
use is gone. req->ki_file *always* contributes to refcount
now. And __fput() became static.Signed-off-by: Al Viro
-
The aio compat code was not converting the struct iovecs from 32bit to
64bit pointers, causing either EINVAL to be returned from io_getevents, or
EFAULT as the result of the I/O. This patch passes a compat flag to
io_submit to signal that pointer conversion is necessary for a given iocb
array.A variant of this was tested by Michael Tokarev. I have also updated the
libaio test harness to exercise this code path with good success.
Further, I grabbed a copy of ltp and ran the
testcases/kernel/syscall/readv and writev tests there (compiled with -m32
on my 64bit system). All seems happy, but extra eyes on this would be
welcome.[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix CONFIG_COMPAT=n build]
Signed-off-by: Jeff Moyer
Reported-by: Michael Tokarev
Cc: Zach Brown
Cc: [2.6.35.1]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
16 Dec, 2009
1 commit
-
Don't know the reason, but it appears ki_wait field of iocb never gets used.
Signed-off-by: Shaohua Li
Cc: Jeff Moyer
Cc: Benjamin LaHaise
Cc: Zach Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
29 Oct, 2009
1 commit
-
There's nothing block related about them, the backing device
is used by things like NFS etc as well. This gets rid of the
need to protect such calls by CONFIG_BLOCK.Signed-off-by: Jens Axboe
28 Oct, 2009
1 commit
-
Hi,
Some workloads issue batches of small I/O, and the performance is poor
due to the call to blk_run_address_space for every single iocb. Nathan
Roberts pointed this out, and suggested that by deferring this call
until all I/Os in the iocb array are submitted to the block layer, we
can realize some impressive performance gains (up to 30% for sequential
4k reads in batches of 16).Signed-off-by: Jeff Moyer
Signed-off-by: Jens Axboe
23 Sep, 2009
1 commit
-
As mentioned in Documentation/CodingStyle, move EXPORT* macro's
to the line immediately after the closing function brace line.Also, move the __initcall() similarly.
Signed-off-by: H Hartley Sweeten
Cc: Zach Brown
Cc: Benjamin LaHaise
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
22 Sep, 2009
1 commit
-
Anyone who wants to do copy to/from user from a kernel thread, needs
use_mm (like what fs/aio has). Move that into mm/, to make reusing and
exporting easier down the line, and make aio use it. Next intended user,
besides aio, will be vhost-net.Acked-by: Andrea Arcangeli
Signed-off-by: Michael S. Tsirkin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
01 Jul, 2009
1 commit
-
Change the eventfd interface to de-couple the eventfd memory context, from
the file pointer instance.Without such change, there is no clean way to racely free handle the
POLLHUP event sent when the last instance of the file* goes away. Also,
now the internal eventfd APIs are using the eventfd context instead of the
file*.This patch is required by KVM's IRQfd code, which is still under
development.Signed-off-by: Davide Libenzi
Cc: Gregory Haskins
Cc: Rusty Russell
Cc: Benjamin LaHaise
Cc: Avi Kivity
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
20 Mar, 2009
2 commits
-
The libaio test harness turned up a problem whereby lookup_ioctx on a
bogus io context was returning the 1 valid io context from the list
(harness/cases/3.p).Because of that, an extra put_iocontext was done, and when the process
exited, it hit a BUG_ON in the put_iocontext macro called from exit_aio
(since we expect a users count of 1 and instead get 0).The problem was introduced by "aio: make the lookup_ioctx() lockless"
(commit abf137dd7712132ee56d5b3143c2ff61a72a5faa).Thanks to Zach for pointing out that hlist_for_each_entry_rcu will not
return with a NULL tpos at the end of the loop, even if the entry was
not found.Signed-off-by: Jeff Moyer
Acked-by: Zach Brown
Acked-by: Jens Axboe
Cc: Benjamin LaHaise
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Remove a source of fput() call from inside IRQ context. Myself, like Eric,
wasn't able to reproduce an fput() call from IRQ context, but Jeff said he was
able to, with the attached test program. Independently from this, the bug is
conceptually there, so we might be better off fixing it. This patch adds an
optimization similar to the one we already do on ->ki_filp, on ->ki_eventfd.
Playing with ->f_count directly is not pretty in general, but the alternative
here would be to add a brand new delayed fput() infrastructure, that I'm not
sure is worth it.Signed-off-by: Davide Libenzi
Cc: Benjamin LaHaise
Cc: Trond Myklebust
Cc: Eric Dumazet
Signed-off-by: Jeff Moyer
Cc: Zach Brown
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
14 Jan, 2009
1 commit
-
Signed-off-by: Heiko Carstens
29 Dec, 2008
1 commit
-
The mm->ioctx_list is currently protected by a reader-writer lock,
so we always grab that lock on the read side for doing ioctx
lookups. As the workload is extremely reader biased, turn this into
an rcu hlist so we can make lookup_ioctx() lockless. Get rid of
the rwlock and use a spinlock for providing update side exclusion.There's usually only 1 entry on this list, so it doesn't make sense
to look into fancier data structures.Reviewed-by: Jeff Moyer
Signed-off-by: Jens Axboe
27 Jul, 2008
1 commit
-
make it atomic_long_t; while we are at it, get rid of useless checks in affs,
hfs and hpfs - ->open() always has it equal to 1, ->release() - to 0.Signed-off-by: Al Viro
26 Jul, 2008
1 commit
-
Kill PF_BORROWED_MM. Change use_mm/unuse_mm to not play with ->flags, and
do s/PF_BORROWED_MM/PF_KTHREAD/ for a couple of other users.No functional changes yet. But this allows us to do further
fixes/cleanups.oom_kill/ptrace/etc often check "p->mm != NULL" to filter out the
kthreads, this is wrong because of use_mm(). The problem with
PF_BORROWED_MM is that we need task_lock() to avoid races. With this
patch we can check PF_KTHREAD directly, or use a simple lockless helper:/* The result must not be dereferenced !!! */
struct mm_struct *__get_task_mm(struct task_struct *tsk)
{
if (tsk->flags & PF_KTHREAD)
return NULL;
return tsk->mm;
}Note also ecard_task(). It runs with ->mm != NULL, but it's the kernel
thread without PF_BORROWED_MM.Signed-off-by: Oleg Nesterov
Cc: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
07 Jun, 2008
1 commit
-
use_mm() was changed to use switch_mm() instead of activate_mm(), since
then nobody calls (and nobody should call) activate_mm() with
PF_BORROWED_MM bit set.As Jeff Dike pointed out, we can also remove the "old != new" check, it is
always true.Signed-off-by: Oleg Nesterov
Cc: Jeff Dike
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
30 Apr, 2008
1 commit
-
Add calls to the generic object debugging infrastructure and provide fixup
functions which allow to keep the system alive when recoverable problems have
been detected by the object debugging core code.Signed-off-by: Thomas Gleixner
Acked-by: Ingo Molnar
Cc: Greg KH
Cc: Randy Dunlap
Cc: Kay Sievers
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
29 Apr, 2008
3 commits
-
The FIXME comments are inaccurate.
The locking comment over lookup_ioctx() is wrong.Signed-off-by: Jeff Moyer
Signed-off-by: Zach Brown
Signed-off-by: Shen Feng
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Some drivers have duplicated unlikely() macros. IS_ERR() already has
unlikely() in itself.This patch cleans up such pointless code.
Signed-off-by: Hirofumi Nakagawa
Acked-by: David S. Miller
Acked-by: Jeff Garzik
Cc: Paul Clements
Cc: Richard Purdie
Cc: Alessandro Zummo
Cc: David Brownell
Cc: James Bottomley
Cc: Michael Halcrow
Cc: Anton Altaparmakov
Cc: Al Viro
Cc: Carsten Otte
Cc: Patrick McHardy
Cc: Paul Mundt
Cc: Jaroslav Kysela
Cc: Takashi Iwai
Acked-by: Mike Frysinger
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Make the following needlessly global functions static:
- __put_ioctx()
- lookup_ioctx()
- io_submit_one()Signed-off-by: Adrian Bunk
Cc: Zach Brown
Cc: Benjamin LaHaise
Cc: Badari Pulavarty
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
28 Apr, 2008
1 commit
-
This patch wakes up a thread waiting in io_getevents if another thread
destroys the context. This was tested using a small program that spawns a
thread to wait in io_getevents while the parent thread destroys the io context
and then waits for the getevents thread to exit. Without this patch, the
program hangs indefinitely. With the patch, the program exits as expected.Signed-off-by: Jeff Moyer
Cc: Zach Brown
Cc: Christopher Smith
Cc: Benjamin LaHaise
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
11 Apr, 2008
2 commits
-
Jeff Roberson discovered a race when using kaio eventfd based notifications.
When it occurs it can lead tomissed wakeups and hung userspace.This patch fixes the race by moving the notification inside the spinlocked
section of kaio. The operation is safe since eventfd spinlock and kaio one
are unrelated.Signed-off-by: Davide Libenzi
Cc: Zach Brown
Cc: Jeff Roberson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Use asmlinkage_protect in sys_io_getevents, because GCC for i386 with
CONFIG_FRAME_POINTER=n can decide to clobber an argument word on the
stack, i.e. the user struct pt_regs. Here the problem is not a tail
call, but just the compiler's use of the stack when it inlines and
optimizes the body of the called function. This seems to avoid it.Signed-off-by: Roland McGrath
Signed-off-by: Linus Torvalds