21 Dec, 2018
1 commit
-
commit a538e3ff9dabcdf6c3f477a373c629213d1c3066 upstream.
Matthew pointed out that the ioctx_table is susceptible to spectre v1,
because the index can be controlled by an attacker. The below patch
should mitigate the attack for all of the aio system calls.Cc: stable@vger.kernel.org
Reported-by: Matthew Wilcox
Reported-by: Dan Carpenter
Signed-off-by: Jeff Moyer
Signed-off-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman
05 Jun, 2018
1 commit
-
commit 4faa99965e027cc057c5145ce45fa772caa04e8d upstream.
If io_destroy() gets to cancelling everything that can be cancelled and
gets to kiocb_cancel() calling the function driver has left in ->ki_cancel,
it becomes vulnerable to a race with IO completion. At that point req
is already taken off the list and aio_complete() does *NOT* spin until
we (in free_ioctx_users()) releases ->ctx_lock. As the result, it proceeds
to kiocb_free(), freing req just it gets passed to ->ki_cancel().Fix is simple - remove from the list after the call of kiocb_cancel(). All
instances of ->ki_cancel() already have to cope with the being called with
iocb still on list - that's what happens in io_cancel(2).Cc: stable@kernel.org
Fixes: 0460fef2a921 "aio: use cancellation list lazily"
Signed-off-by: Al Viro
Signed-off-by: Greg Kroah-Hartman
30 May, 2018
1 commit
-
commit baf10564fbb66ea222cae66fbff11c444590ffd9 upstream.
kill_ioctx() used to have an explicit RCU delay between removing the
reference from ->ioctx_table and percpu_ref_kill() dropping the refcount.
At some point that delay had been removed, on the theory that
percpu_ref_kill() itself contained an RCU delay. Unfortunately, that was
the wrong kind of RCU delay and it didn't care about rcu_read_lock() used
by lookup_ioctx(). As the result, we could get ctx freed right under
lookup_ioctx(). Tejun has fixed that in a6d7cff472e ("fs/aio: Add explicit
RCU grace period when freeing kioctx"); however, that fix is not enough.Suppose io_destroy() from one thread races with e.g. io_setup() from another;
CPU1 removes the reference from current->mm->ioctx_table[...] just as CPU2
has picked it (under rcu_read_lock()). Then CPU1 proceeds to drop the
refcount, getting it to 0 and triggering a call of free_ioctx_users(),
which proceeds to drop the secondary refcount and once that reaches zero
calls free_ioctx_reqs(). That does
INIT_RCU_WORK(&ctx->free_rwork, free_ioctx);
queue_rcu_work(system_wq, &ctx->free_rwork);
and schedules freeing the whole thing after RCU delay.In the meanwhile CPU2 has gotten around to percpu_ref_get(), bumping the
refcount from 0 to 1 and returned the reference to io_setup().Tejun's fix (that queue_rcu_work() in there) guarantees that ctx won't get
freed until after percpu_ref_get(). Sure, we'd increment the counter before
ctx can be freed. Now we are out of rcu_read_lock() and there's nothing to
stop freeing of the whole thing. Unfortunately, CPU2 assumes that since it
has grabbed the reference, ctx is *NOT* going away until it gets around to
dropping that reference.The fix is obvious - use percpu_ref_tryget_live() and treat failure as miss.
It's not costlier than what we currently do in normal case, it's safe to
call since freeing *is* delayed and it closes the race window - either
lookup_ioctx() comes before percpu_ref_kill() (in which case ctx->users
won't reach 0 until the caller of lookup_ioctx() drops it) or lookup_ioctx()
fails, ctx->users is unaffected and caller of lookup_ioctx() doesn't see
the object in question at all.Cc: stable@kernel.org
Fixes: a6d7cff472e "fs/aio: Add explicit RCU grace period when freeing kioctx"
Signed-off-by: Al Viro
Signed-off-by: Greg Kroah-Hartman
21 Mar, 2018
2 commits
-
commit d0264c01e7587001a8c4608a5d1818dba9a4c11a upstream.
While converting ioctx index from a list to a table, db446a08c23d
("aio: convert the ioctx list to table lookup v3") missed tagging
kioctx_table->table[] as an array of RCU pointers and using the
appropriate RCU accessors. This introduces a small window in the
lookup path where init and access may race.Mark kioctx_table->table[] with __rcu and use the approriate RCU
accessors when using the field.Signed-off-by: Tejun Heo
Reported-by: Jann Horn
Fixes: db446a08c23d ("aio: convert the ioctx list to table lookup v3")
Cc: Benjamin LaHaise
Cc: Linus Torvalds
Cc: stable@vger.kernel.org # v3.12+
Signed-off-by: Greg Kroah-Hartman -
commit a6d7cff472eea87d96899a20fa718d2bab7109f3 upstream.
While fixing refcounting, e34ecee2ae79 ("aio: Fix a trinity splat")
incorrectly removed explicit RCU grace period before freeing kioctx.
The intention seems to be depending on the internal RCU grace periods
of percpu_ref; however, percpu_ref uses a different flavor of RCU,
sched-RCU. This can lead to kioctx being freed while RCU read
protected dereferences are still in progress.Fix it by updating free_ioctx() to go through call_rcu() explicitly.
v2: Comment added to explain double bouncing.
Signed-off-by: Tejun Heo
Reported-by: Jann Horn
Fixes: e34ecee2ae79 ("aio: Fix a trinity splat")
Cc: Kent Overstreet
Cc: Linus Torvalds
Cc: stable@vger.kernel.org # v3.13+
Signed-off-by: Greg Kroah-Hartman
15 Sep, 2017
1 commit
-
Pull nowait read support from Al Viro:
"Support IOCB_NOWAIT for buffered reads and block devices"* 'work.read_write' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
block_dev: support RFW_NOWAIT on block device nodes
fs: support RWF_NOWAIT for buffered reads
fs: support IOCB_NOWAIT in generic_file_buffered_read
fs: pass iocb to do_generic_file_read
09 Sep, 2017
1 commit
-
Introduce a new migration mode that allow to offload the copy to a device
DMA engine. This changes the workflow of migration and not all
address_space migratepage callback can support this.This is intended to be use by migrate_vma() which itself is use for thing
like HMM (see include/linux/hmm.h).No additional per-filesystem migratepage testing is needed. I disables
MIGRATE_SYNC_NO_COPY in all problematic migratepage() callback and i
added comment in those to explain why (part of this patch). The commit
message is unclear it should say that any callback that wish to support
this new mode need to be aware of the difference in the migration flow
from other mode.Some of these callbacks do extra locking while copying (aio, zsmalloc,
balloon, ...) and for DMA to be effective you want to copy multiple
pages in one DMA operations. But in the problematic case you can not
easily hold the extra lock accross multiple call to this callback.Usual flow is:
For each page {
1 - lock page
2 - call migratepage() callback
3 - (extra locking in some migratepage() callback)
4 - migrate page state (freeze refcount, update page cache, buffer
head, ...)
5 - copy page
6 - (unlock any extra lock of migratepage() callback)
7 - return from migratepage() callback
8 - unlock page
}The new mode MIGRATE_SYNC_NO_COPY:
1 - lock multiple pages
For each page {
2 - call migratepage() callback
3 - abort in all problematic migratepage() callback
4 - migrate page state (freeze refcount, update page cache, buffer
head, ...)
} // finished all calls to migratepage() callback
5 - DMA copy multiple pages
6 - unlock all the pagesTo support MIGRATE_SYNC_NO_COPY in the problematic case we would need a
new callback migratepages() (for instance) that deals with multiple
pages in one transaction.Because the problematic cases are not important for current usage I did
not wanted to complexify this patchset even more for no good reason.Link: http://lkml.kernel.org/r/20170817000548.32038-14-jglisse@redhat.com
Signed-off-by: Jérôme Glisse
Cc: Aneesh Kumar
Cc: Balbir Singh
Cc: Benjamin Herrenschmidt
Cc: Dan Williams
Cc: David Nellans
Cc: Evgeny Baskakov
Cc: Johannes Weiner
Cc: John Hubbard
Cc: Kirill A. Shutemov
Cc: Mark Hairgrove
Cc: Michal Hocko
Cc: Paul E. McKenney
Cc: Ross Zwisler
Cc: Sherry Cheung
Cc: Subhash Gutti
Cc: Vladimir Davydov
Cc: Bob Liu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
08 Sep, 2017
1 commit
-
Currently, aio-nr is incremented in steps of 'num_possible_cpus() * 8'
for io_setup(nr_events, ..) with 'nr_events < num_possible_cpus() * 4':ioctx_alloc()
...
nr_events = max(nr_events, num_possible_cpus() * 4);
nr_events *= 2;
...
ctx->max_reqs = nr_events;
...
aio_nr += ctx->max_reqs;
....This limits the number of aio contexts actually available to much less
than aio-max-nr, and is increasingly worse with greater number of CPUs.For example, with 64 CPUs, only 256 aio contexts are actually available
(with aio-max-nr = 65536) because the increment is 512 in that scenario.Note: 65536 [max aio contexts] / (64*4*2) [increment per aio context]
is 128, but make it 256 (double) as counting against 'aio-max-nr * 2':ioctx_alloc()
...
if (aio_nr + nr_events > (aio_max_nr * 2UL) ||
...
goto err_ctx;
...This patch uses the original value of nr_events (from userspace) to
increment aio-nr and count against aio-max-nr, which resolves those.Signed-off-by: Mauricio Faria de Oliveira
Reported-by: Lekshmi C. Pillai
Tested-by: Lekshmi C. Pillai
Tested-by: Paul Nguyen
Reviewed-by: Jeff Moyer
Signed-off-by: Benjamin LaHaise
05 Sep, 2017
1 commit
-
This is based on the old idea and code from Milosz Tanski. With the aio
nowait code it becomes mostly trivial now. Buffered writes continue to
return -EOPNOTSUPP if RWF_NOWAIT is passed.Signed-off-by: Christoph Hellwig
Reviewed-by: Jan Kara
Signed-off-by: Al Viro
28 Jun, 2017
1 commit
-
Reviewed-by: Andreas Dilger
Reviewed-by: Martin K. Petersen
Signed-off-by: Jens Axboe
20 Jun, 2017
2 commits
-
RWF_NOWAIT informs kernel to bail out if an AIO request will block
for reasons such as file allocations, or a writeback triggered,
or would block while allocating requests while performing
direct I/O.RWF_NOWAIT is translated to IOCB_NOWAIT for iocb->ki_flags.
FMODE_AIO_NOWAIT is a flag which identifies the file opened is capable
of returning -EAGAIN if the AIO call will block. This must be set by
supporting filesystems in the ->open() call.Filesystems xfs, btrfs and ext4 would be supported in the following patches.
Reviewed-by: Christoph Hellwig
Reviewed-by: Jan Kara
Signed-off-by: Goldwyn Rodrigues
Signed-off-by: Jens Axboe -
aio_rw_flags is introduced in struct iocb (using aio_reserved1) which will
carry the RWF_* flags. We cannot use aio_flags because they are not
checked for validity which may break existing applications.Note, the only place RWF_HIPRI comes in effect is dio_await_one().
All the rest of the locations, aio code return -EIOCBQUEUED before the
checks for RWF_HIPRI.Reviewed-by: Christoph Hellwig
Reviewed-by: Jan Kara
Signed-off-by: Goldwyn Rodrigues
Signed-off-by: Jens Axboe
04 Mar, 2017
1 commit
-
Pull sched.h split-up from Ingo Molnar:
"The point of these changes is to significantly reduce the
header footprint, to speed up the kernel build and to
have a cleaner header structure.After these changes the new 's typical preprocessed
size goes down from a previous ~0.68 MB (~22K lines) to ~0.45 MB (~15K
lines), which is around 40% faster to build on typical configs.Not much changed from the last version (-v2) posted three weeks ago: I
eliminated quirks, backmerged fixes plus I rebased it to an upstream
SHA1 from yesterday that includes most changes queued up in -next plus
all sched.h changes that were pending from Andrew.I've re-tested the series both on x86 and on cross-arch defconfigs,
and did a bisectability test at a number of random points.I tried to test as many build configurations as possible, but some
build breakage is probably still left - but it should be mostly
limited to architectures that have no cross-compiler binaries
available on kernel.org, and non-default configurations"* 'WIP.sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (146 commits)
sched/headers: Clean up
sched/headers: Remove #ifdefs from
sched/headers: Remove the include from
sched/headers, hrtimer: Remove the include from
sched/headers, x86/apic: Remove the header inclusion from
sched/headers, timers: Remove the include from
sched/headers: Remove from
sched/headers: Remove from
sched/core: Remove unused prefetch_stack()
sched/headers: Remove from
sched/headers: Remove the 'init_pid_ns' prototype from
sched/headers: Remove from
sched/headers: Remove from
sched/headers: Remove the runqueue_is_locked() prototype
sched/headers: Remove from
sched/headers: Remove from
sched/headers: Remove from
sched/headers: Remove from
sched/headers: Remove the include from
sched/headers: Remove from
...
03 Mar, 2017
1 commit
-
Pull vfs pile two from Al Viro:
- orangefs fix
- series of fs/namei.c cleanups from me
- VFS stuff coming from overlayfs tree
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
orangefs: Use RCU for destroy_inode
vfs: use helper for calling f_op->fsync()
mm: use helper for calling f_op->mmap()
vfs: use helpers for calling f_op->{read,write}_iter()
vfs: pass type instead of fn to do_{loop,iter}_readv_writev()
vfs: extract common parts of {compat_,}do_readv_writev()
vfs: wrap write f_ops with file_{start,end}_write()
vfs: deny copy_file_range() for non regular files
vfs: deny fallocate() on directory
vfs: create vfs helper vfs_tmpfile()
namei.c: split unlazy_walk()
namei.c: fold the check for DCACHE_OP_REVALIDATE into d_revalidate()
lookup_fast(): clean up the logics around the fallback to non-rcu mode
namei: fold unlazy_link() into its sole caller
02 Mar, 2017
2 commits
-
Overlayfs-related series from Miklos and Amir
-
…hed.h> into <linux/sched/signal.h>
Fix up affected files that include this signal functionality via sched.h.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
25 Feb, 2017
1 commit
-
When a non-cooperative userfaultfd monitor copies pages in the
background, it may encounter regions that were already unmapped.
Addition of UFFD_EVENT_UNMAP allows the uffd monitor to track precisely
changes in the virtual memory layout.Since there might be different uffd contexts for the affected VMAs, we
first should create a temporary representation for the unmap event for
each uffd context and then notify them one by one to the appropriate
userfault file descriptors.The event notification occurs after the mmap_sem has been released.
[arnd@arndb.de: fix nommu build]
Link: http://lkml.kernel.org/r/20170203165141.3665284-1-arnd@arndb.de
[mhocko@suse.com: fix nommu build]
Link: http://lkml.kernel.org/r/20170202091503.GA22823@dhcp22.suse.cz
Link: http://lkml.kernel.org/r/1485542673-24387-3-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport
Signed-off-by: Michal Hocko
Signed-off-by: Arnd Bergmann
Acked-by: Hillf Danton
Cc: Andrea Arcangeli
Cc: "Dr. David Alan Gilbert"
Cc: Mike Kravetz
Cc: Pavel Emelyanov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
20 Feb, 2017
1 commit
-
Signed-off-by: Miklos Szeredi
15 Jan, 2017
1 commit
-
lockdep reports a warnning. file_start_write/file_end_write only
acquire/release the lock for regular files. So checking the files in aio
side too.[ 453.532141] ------------[ cut here ]------------
[ 453.533011] WARNING: CPU: 1 PID: 1298 at ../kernel/locking/lockdep.c:3514 lock_release+0x434/0x670
[ 453.533011] DEBUG_LOCKS_WARN_ON(depth ] dump_stack+0x67/0x9c
[ 453.533011] [] __warn+0x111/0x130
[ 453.533011] [] warn_slowpath_fmt+0x97/0xb0
[ 453.533011] [] ? __warn+0x130/0x130
[ 453.533011] [] ? blk_finish_plug+0x29/0x60
[ 453.533011] [] lock_release+0x434/0x670
[ 453.533011] [] ? import_single_range+0xd4/0x110
[ 453.533011] [] ? rw_verify_area+0x65/0x140
[ 453.533011] [] ? aio_write+0x1f6/0x280
[ 453.533011] [] aio_write+0x229/0x280
[ 453.533011] [] ? aio_complete+0x640/0x640
[ 453.533011] [] ? debug_check_no_locks_freed+0x1a0/0x1a0
[ 453.533011] [] ? debug_lockdep_rcu_enabled.part.2+0x1a/0x30
[ 453.533011] [] ? debug_lockdep_rcu_enabled+0x35/0x40
[ 453.533011] [] ? __might_fault+0x7e/0xf0
[ 453.533011] [] do_io_submit+0x94c/0xb10
[ 453.533011] [] ? do_io_submit+0x23e/0xb10
[ 453.533011] [] ? SyS_io_destroy+0x270/0x270
[ 453.533011] [] ? mark_held_locks+0x23/0xc0
[ 453.533011] [] ? trace_hardirqs_on_thunk+0x1a/0x1c
[ 453.533011] [] SyS_io_submit+0x10/0x20
[ 453.533011] [] entry_SYSCALL_64_fastpath+0x18/0xad
[ 453.533011] [] ? trace_hardirqs_off_caller+0xc0/0x110
[ 453.533011] ---[ end trace b2fbe664d1cc0082 ]---Cc: Dmitry Monakhov
Cc: Jan Kara
Cc: Christoph Hellwig
Cc: Al Viro
Reviewed-by: Christoph Hellwig
Signed-off-by: Shaohua Li
Signed-off-by: Al Viro
26 Dec, 2016
1 commit
-
ktime is a union because the initial implementation stored the time in
scalar nanoseconds on 64 bit machine and in a endianess optimized timespec
variant for 32bit machines. The Y2038 cleanup removed the timespec variant
and switched everything to scalar nanoseconds. The union remained, but
become completely pointless.Get rid of the union and just keep ktime_t as simple typedef of type s64.
The conversion was done with coccinelle and some manual mopping up.
Signed-off-by: Thomas Gleixner
Cc: Peter Zijlstra
25 Dec, 2016
1 commit
-
This was entirely automated, using the script by Al:
PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*'
sed -i -e "s!$PATT!#include !" \
$(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)to do the replacement at the end of the merge window.
Requested-by: Al Viro
Signed-off-by: Linus Torvalds
23 Dec, 2016
1 commit
-
... and fix the minor buglet in compat io_submit() - native one
kills ioctx as cleanup when put_user() fails. Get rid of
bogus compat_... in !CONFIG_AIO case, while we are at it - they
should simply fail with ENOSYS, same as for native counterparts.Signed-off-by: Al Viro
05 Dec, 2016
1 commit
-
Signed-off-by: Al Viro
31 Oct, 2016
4 commits
-
Currently we dropped freeze protection of aio writes just after IO was
submitted. Thus aio write could be in flight while the filesystem was
frozen and that could result in unexpected situation like aio completion
wanting to convert extent type on frozen filesystem. Testcase from
Dmitry triggering this is like:for ((i=0;i
Signed-off-by: Jan Kara
[hch: forward ported on top of various VFS and aio changes]
Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro -
Pass the ABI iocb structure to aio_setup_rw and let it handle the
non-vectored I/O case as well. With that and a new helper for the AIO
return value handling we can now define new aio_read and aio_write
helpers that implement reads and writes in a self-contained way without
duplicating too much code.Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro -
Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro -
Otherwise we might dereference an already freed file and/or inode
when aio_complete is called before we return from the read_iter or
write_iter method.Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro
28 Sep, 2016
1 commit
-
Using a local variable we can prevent gcc from reloading
aio_ring_file->f_inode->i_mapping twice, eliminating 2x2 dependent
loads.Signed-off-by: Rasmus Villemoes
Signed-off-by: Al Viro
16 Sep, 2016
1 commit
-
This ensures that do_mmap() won't implicitly make AIO memory mappings
executable if the READ_IMPLIES_EXEC personality flag is set. Such
behavior is problematic because the security_mmap_file LSM hook doesn't
catch this case, potentially permitting an attacker to bypass a W^X
policy enforced by SELinux.I have tested the patch on my machine.
To test the behavior, compile and run this:
#define _GNU_SOURCE
#include
#include
#include
#include
#include
#include
#includeint main(void) {
personality(READ_IMPLIES_EXEC);
aio_context_t ctx = 0;
if (syscall(__NR_io_setup, 1, &ctx))
err(1, "io_setup");char cmd[1000];
sprintf(cmd, "cat /proc/%d/maps | grep -F '/[aio]'",
(int)getpid());
system(cmd);
return 0;
}In the output, "rw-s" is good, "rwxs" is bad.
Signed-off-by: Jann Horn
Signed-off-by: Linus Torvalds
24 May, 2016
1 commit
-
aio_setup_ring waits for mmap_sem in writable mode. If the waiting task
gets killed by the oom killer it would block oom_reaper from
asynchronous address space reclaim and reduce the chances of timely OOM
resolving. Wait for the lock in the killable mode and return with EINTR
if the task got killed while waiting. This will also expedite the
return to the userspace and do_exit.Signed-off-by: Michal Hocko
Acked-by: Jeff Moyer
Acked-by: Vlastimil Babka
Cc: Benamin LaHaise
Cc: Alexander Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
04 Apr, 2016
1 commit
-
the value is never used after that point
Signed-off-by: Al Viro
05 Sep, 2015
1 commit
-
vma->vm_ops->mremap() looks more natural and clean in move_vma(), and this
way ->mremap() can have more users. Say, vdso.While at it, s/aio_ring_remap/aio_ring_mremap/.
Note: this is the minimal change before ->mremap() finds another user in
file_operations; this method should have more arguments, and it can be
used to kill arch_remap().Signed-off-by: Oleg Nesterov
Acked-by: Pavel Emelyanov
Acked-by: Kirill A. Shutemov
Cc: David Rientjes
Cc: Benjamin LaHaise
Cc: Hugh Dickins
Cc: Jeff Moyer
Cc: Laurent Dufour
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
17 Apr, 2015
2 commits
-
Pull third hunk of vfs changes from Al Viro:
"This contains the ->direct_IO() changes from Omar + saner
generic_write_checks() + dealing with fcntl()/{read,write}() races
(mirroring O_APPEND/O_DIRECT into iocb->ki_flags and instead of
repeatedly looking at ->f_flags, which can be changed by fcntl(2),
check ->ki_flags - which cannot) + infrastructure bits for dhowells'
d_inode annotations + Christophs switch of /dev/loop to
vfs_iter_write()"* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (30 commits)
block: loop: switch to VFS ITER_BVEC
configfs: Fix inconsistent use of file_inode() vs file->f_path.dentry->d_inode
VFS: Make pathwalk use d_is_reg() rather than S_ISREG()
VFS: Fix up debugfs to use d_is_dir() in place of S_ISDIR()
VFS: Combine inode checks with d_is_negative() and d_is_positive() in pathwalk
NFS: Don't use d_inode as a variable name
VFS: Impose ordering on accesses of d_inode and d_flags
VFS: Add owner-filesystem positive/negative dentry checks
nfs: generic_write_checks() shouldn't be done on swapout...
ocfs2: use __generic_file_write_iter()
mirror O_APPEND and O_DIRECT into iocb->ki_flags
switch generic_write_checks() to iocb and iter
ocfs2: move generic_write_checks() before the alignment checks
ocfs2_file_write_iter: stop messing with ppos
udf_file_write_iter: reorder and simplify
fuse: ->direct_IO() doesn't need generic_write_checks()
ext4_file_write_iter: move generic_write_checks() up
xfs_file_aio_write_checks: switch to iocb/iov_iter
generic_write_checks(): drop isblk argument
blkdev_write_iter: expand generic_file_checks() call in there
... -
Pull block layer core bits from Jens Axboe:
"This is the core pull request for 4.1. Not a lot of stuff in here for
this round, mostly little fixes or optimizations. This pull request
contains:- An optimization that speeds up queue runs on blk-mq, especially for
the case where there's a large difference between nr_cpu_ids and
the actual mapped software queues on a hardware queue. From Chong
Yuan.- Honor node local allocations for requests on legacy devices. From
David Rientjes.- Cleanup of blk_mq_rq_to_pdu() from me.
- exit_aio() fixup from me, greatly speeding up exiting multiple IO
contexts off exit_group(). For my particular test case, fio exit
took ~6 seconds. A typical case of both exposing RCU grace periods
to user space, and serializing exit of them.- Make blk_mq_queue_enter() honor the gfp mask passed in, so we only
wait if __GFP_WAIT is set. From Keith Busch.- blk-mq exports and two added helpers from Mike Snitzer, which will
be used by the dm-mq code.- Cleanups of blk-mq queue init from Wei Fang and Xiaoguang Wang"
* 'for-4.1/core' of git://git.kernel.dk/linux-block:
blk-mq: reduce unnecessary software queue looping
aio: fix serial draining in exit_aio()
blk-mq: cleanup blk_mq_rq_to_pdu()
blk-mq: put blk_queue_rq_timeout together in blk_mq_init_queue()
block: remove redundant check about 'set->nr_hw_queues' in blk_mq_alloc_tag_set()
block: allocate request memory local to request queue
blk-mq: don't wait in blk_mq_queue_enter() if __GFP_WAIT isn't set
blk-mq: export blk_mq_run_hw_queues
blk-mq: add blk_mq_init_allocated_queue and export blk_mq_register_disk
16 Apr, 2015
2 commits
-
Pull second vfs update from Al Viro:
"Now that net-next went in... Here's the next big chunk - killing
->aio_read() and ->aio_write().There'll be one more pile today (direct_IO changes and
generic_write_checks() cleanups/fixes), but I'd prefer to keep that
one separate"* 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (37 commits)
->aio_read and ->aio_write removed
pcm: another weird API abuse
infinibad: weird APIs switched to ->write_iter()
kill do_sync_read/do_sync_write
fuse: use iov_iter_get_pages() for non-splice path
fuse: switch to ->read_iter/->write_iter
switch drivers/char/mem.c to ->read_iter/->write_iter
make new_sync_{read,write}() static
coredump: accept any write method
switch /dev/loop to vfs_iter_write()
serial2002: switch to __vfs_read/__vfs_write
ashmem: use __vfs_read()
export __vfs_read()
autofs: switch to __vfs_write()
new helper: __vfs_write()
switch hugetlbfs to ->read_iter()
coda: switch to ->read_iter/->write_iter
ncpfs: switch to ->read_iter/->write_iter
net/9p: remove (now-)unused helpers
p9_client_attach(): set fid->uid correctly
... -
exit_aio() currently serializes killing io contexts. Each context
killing ends up having to do percpu_ref_kill(), which in turns has
to wait for an RCU grace period. This can take a long time, depending
on the number of contexts. And there's no point in doing them serially,
when we could be waiting for all of them in one fell swoop.This patches makes my fio thread offload test case exit 0.2s instead
of almost 6s.Reviewed-by: Jeff Moyer
Signed-off-by: Jens Axboe
15 Apr, 2015
1 commit
-
Pull vfs update from Al Viro:
"Part one:- struct filename-related cleanups
- saner iov_iter_init() replacements (and switching the syscalls to
use of those)- ntfs switch to ->write_iter() (Anton)
- aio cleanups and splitting iocb into common and async parts
(Christoph)- assorted fixes (me, bfields, Andrew Elble)
There's a lot more, including the completion of switchover to
->{read,write}_iter(), d_inode/d_backing_inode annotations, f_flags
race fixes, etc, but that goes after #for-davem merge. David has
pulled it, and once it's in I'll send the next vfs pull request"* 'for-linus-1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (35 commits)
sg_start_req(): use import_iovec()
sg_start_req(): make sure that there's not too many elements in iovec
blk_rq_map_user(): use import_single_range()
sg_io(): use import_iovec()
process_vm_access: switch to {compat_,}import_iovec()
switch keyctl_instantiate_key_common() to iov_iter
switch {compat_,}do_readv_writev() to {compat_,}import_iovec()
aio_setup_vectored_rw(): switch to {compat_,}import_iovec()
vmsplice_to_user(): switch to import_iovec()
kill aio_setup_single_vector()
aio: simplify arguments of aio_setup_..._rw()
aio: lift iov_iter_init() into aio_setup_..._rw()
lift iov_iter into {compat_,}do_readv_writev()
NFS: fix BUG() crash in notify_change() with patch to chown_common()
dcache: return -ESTALE not -EBUSY on distributed fs race
NTFS: Version 2.1.32 - Update file write from aio_write to write_iter.
VFS: Add iov_iter_fault_in_multipages_readable()
drop bogus check in file_open_root()
switch security_inode_getattr() to struct path *
constify tomoyo_realpath_from_path()
...
12 Apr, 2015
3 commits
-
... avoiding write_iter/fcntl races.
Signed-off-by: Al Viro
-
no remaining users
Signed-off-by: Al Viro