Eric Lee / smarc-fsl-linux-kernel

21 Dec, 2018

1 commit

5f4610fe2 aio: fix spectre gadget in lookup_ioctx ... Browse Code »

commit a538e3ff9dabcdf6c3f477a373c629213d1c3066 upstream.

Matthew pointed out that the ioctx_table is susceptible to spectre v1,
because the index can be controlled by an attacker. The below patch
should mitigate the attack for all of the aio system calls.

Cc: stable@vger.kernel.org
Reported-by: Matthew Wilcox
Reported-by: Dan Carpenter
Signed-off-by: Jeff Moyer
Signed-off-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman

Jeff Moyer
2018-12-21 21:13:04 +0800

05 Jun, 2018

1 commit

6a19487d5 fix io_destroy()/aio_complete() race ... Browse Code »

commit 4faa99965e027cc057c5145ce45fa772caa04e8d upstream.

If io_destroy() gets to cancelling everything that can be cancelled and
gets to kiocb_cancel() calling the function driver has left in ->ki_cancel,
it becomes vulnerable to a race with IO completion. At that point req
is already taken off the list and aio_complete() does *NOT* spin until
we (in free_ioctx_users()) releases ->ctx_lock. As the result, it proceeds
to kiocb_free(), freing req just it gets passed to ->ki_cancel().

Fix is simple - remove from the list after the call of kiocb_cancel(). All
instances of ->ki_cancel() already have to cope with the being called with
iocb still on list - that's what happens in io_cancel(2).

Cc: stable@kernel.org
Fixes: 0460fef2a921 "aio: use cancellation list lazily"
Signed-off-by: Al Viro
Signed-off-by: Greg Kroah-Hartman

Al Viro
2018-06-05 17:41:54 +0800

30 May, 2018

1 commit

fbcede36b aio: fix io_destroy(2) vs. lookup_ioctx() race ... Browse Code »

commit baf10564fbb66ea222cae66fbff11c444590ffd9 upstream.

kill_ioctx() used to have an explicit RCU delay between removing the
reference from ->ioctx_table and percpu_ref_kill() dropping the refcount.
At some point that delay had been removed, on the theory that
percpu_ref_kill() itself contained an RCU delay. Unfortunately, that was
the wrong kind of RCU delay and it didn't care about rcu_read_lock() used
by lookup_ioctx(). As the result, we could get ctx freed right under
lookup_ioctx(). Tejun has fixed that in a6d7cff472e ("fs/aio: Add explicit
RCU grace period when freeing kioctx"); however, that fix is not enough.

Suppose io_destroy() from one thread races with e.g. io_setup() from another;
CPU1 removes the reference from current->mm->ioctx_table[...] just as CPU2
has picked it (under rcu_read_lock()). Then CPU1 proceeds to drop the
refcount, getting it to 0 and triggering a call of free_ioctx_users(),
which proceeds to drop the secondary refcount and once that reaches zero
calls free_ioctx_reqs(). That does
INIT_RCU_WORK(&ctx->free_rwork, free_ioctx);
queue_rcu_work(system_wq, &ctx->free_rwork);
and schedules freeing the whole thing after RCU delay.

In the meanwhile CPU2 has gotten around to percpu_ref_get(), bumping the
refcount from 0 to 1 and returned the reference to io_setup().

Tejun's fix (that queue_rcu_work() in there) guarantees that ctx won't get
freed until after percpu_ref_get(). Sure, we'd increment the counter before
ctx can be freed. Now we are out of rcu_read_lock() and there's nothing to
stop freeing of the whole thing. Unfortunately, CPU2 assumes that since it
has grabbed the reference, ctx is *NOT* going away until it gets around to
dropping that reference.

The fix is obvious - use percpu_ref_tryget_live() and treat failure as miss.
It's not costlier than what we currently do in normal case, it's safe to
call since freeing *is* delayed and it closes the race window - either
lookup_ioctx() comes before percpu_ref_kill() (in which case ctx->users
won't reach 0 until the caller of lookup_ioctx() drops it) or lookup_ioctx()
fails, ctx->users is unaffected and caller of lookup_ioctx() doesn't see
the object in question at all.

Cc: stable@kernel.org
Fixes: a6d7cff472e "fs/aio: Add explicit RCU grace period when freeing kioctx"
Signed-off-by: Al Viro
Signed-off-by: Greg Kroah-Hartman

Al Viro
2018-05-30 13:51:47 +0800

21 Mar, 2018

2 commits

cd21b3400 fs/aio: Use RCU accessors for kioctx_table->table[] ... Browse Code »

commit d0264c01e7587001a8c4608a5d1818dba9a4c11a upstream.

While converting ioctx index from a list to a table, db446a08c23d
("aio: convert the ioctx list to table lookup v3") missed tagging
kioctx_table->table[] as an array of RCU pointers and using the
appropriate RCU accessors. This introduces a small window in the
lookup path where init and access may race.

Mark kioctx_table->table[] with __rcu and use the approriate RCU
accessors when using the field.

Signed-off-by: Tejun Heo
Reported-by: Jann Horn
Fixes: db446a08c23d ("aio: convert the ioctx list to table lookup v3")
Cc: Benjamin LaHaise
Cc: Linus Torvalds
Cc: stable@vger.kernel.org # v3.12+
Signed-off-by: Greg Kroah-Hartman

Tejun Heo
2018-03-21 19:06:43 +0800
076c7c068 fs/aio: Add explicit RCU grace period when freeing kioctx ... Browse Code »

commit a6d7cff472eea87d96899a20fa718d2bab7109f3 upstream.

While fixing refcounting, e34ecee2ae79 ("aio: Fix a trinity splat")
incorrectly removed explicit RCU grace period before freeing kioctx.
The intention seems to be depending on the internal RCU grace periods
of percpu_ref; however, percpu_ref uses a different flavor of RCU,
sched-RCU. This can lead to kioctx being freed while RCU read
protected dereferences are still in progress.

Fix it by updating free_ioctx() to go through call_rcu() explicitly.

v2: Comment added to explain double bouncing.

Signed-off-by: Tejun Heo
Reported-by: Jann Horn
Fixes: e34ecee2ae79 ("aio: Fix a trinity splat")
Cc: Kent Overstreet
Cc: Linus Torvalds
Cc: stable@vger.kernel.org # v3.13+
Signed-off-by: Greg Kroah-Hartman

Tejun Heo
2018-03-21 19:06:43 +0800

15 Sep, 2017

1 commit

e253d98f5 Merge branch 'work.read_write' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull nowait read support from Al Viro:
"Support IOCB_NOWAIT for buffered reads and block devices"

* 'work.read_write' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
block_dev: support RFW_NOWAIT on block device nodes
fs: support RWF_NOWAIT for buffered reads
fs: support IOCB_NOWAIT in generic_file_buffered_read
fs: pass iocb to do_generic_file_read

Linus Torvalds
2017-09-15 10:29:55 +0800

09 Sep, 2017

1 commit

2916ecc0f mm/migrate: new migrate mode MIGRATE_SYNC_NO_COPY ... Browse Code »

Introduce a new migration mode that allow to offload the copy to a device
DMA engine. This changes the workflow of migration and not all
address_space migratepage callback can support this.

This is intended to be use by migrate_vma() which itself is use for thing
like HMM (see include/linux/hmm.h).

No additional per-filesystem migratepage testing is needed. I disables
MIGRATE_SYNC_NO_COPY in all problematic migratepage() callback and i
added comment in those to explain why (part of this patch). The commit
message is unclear it should say that any callback that wish to support
this new mode need to be aware of the difference in the migration flow
from other mode.

Some of these callbacks do extra locking while copying (aio, zsmalloc,
balloon, ...) and for DMA to be effective you want to copy multiple
pages in one DMA operations. But in the problematic case you can not
easily hold the extra lock accross multiple call to this callback.

Usual flow is:

For each page {
1 - lock page
2 - call migratepage() callback
3 - (extra locking in some migratepage() callback)
4 - migrate page state (freeze refcount, update page cache, buffer
head, ...)
5 - copy page
6 - (unlock any extra lock of migratepage() callback)
7 - return from migratepage() callback
8 - unlock page
}

The new mode MIGRATE_SYNC_NO_COPY:
1 - lock multiple pages
For each page {
2 - call migratepage() callback
3 - abort in all problematic migratepage() callback
4 - migrate page state (freeze refcount, update page cache, buffer
head, ...)
} // finished all calls to migratepage() callback
5 - DMA copy multiple pages
6 - unlock all the pages

To support MIGRATE_SYNC_NO_COPY in the problematic case we would need a
new callback migratepages() (for instance) that deals with multiple
pages in one transaction.

Because the problematic cases are not important for current usage I did
not wanted to complexify this patchset even more for no good reason.

Link: http://lkml.kernel.org/r/20170817000548.32038-14-jglisse@redhat.com
Signed-off-by: Jérôme Glisse
Cc: Aneesh Kumar
Cc: Balbir Singh
Cc: Benjamin Herrenschmidt
Cc: Dan Williams
Cc: David Nellans
Cc: Evgeny Baskakov
Cc: Johannes Weiner
Cc: John Hubbard
Cc: Kirill A. Shutemov
Cc: Mark Hairgrove
Cc: Michal Hocko
Cc: Paul E. McKenney
Cc: Ross Zwisler
Cc: Sherry Cheung
Cc: Subhash Gutti
Cc: Vladimir Davydov
Cc: Bob Liu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jérôme Glisse
2017-09-09 09:26:46 +0800

08 Sep, 2017

1 commit

2a8a98673 fs: aio: fix the increment of aio-nr and counting against aio-max-nr ... Browse Code »

Currently, aio-nr is incremented in steps of 'num_possible_cpus() * 8'
for io_setup(nr_events, ..) with 'nr_events < num_possible_cpus() * 4':

ioctx_alloc()
...
nr_events = max(nr_events, num_possible_cpus() * 4);
nr_events *= 2;
...
ctx->max_reqs = nr_events;
...
aio_nr += ctx->max_reqs;
....

This limits the number of aio contexts actually available to much less
than aio-max-nr, and is increasingly worse with greater number of CPUs.

For example, with 64 CPUs, only 256 aio contexts are actually available
(with aio-max-nr = 65536) because the increment is 512 in that scenario.

Note: 65536 [max aio contexts] / (64*4*2) [increment per aio context]
is 128, but make it 256 (double) as counting against 'aio-max-nr * 2':

ioctx_alloc()
...
if (aio_nr + nr_events > (aio_max_nr * 2UL) ||
...
goto err_ctx;
...

This patch uses the original value of nr_events (from userspace) to
increment aio-nr and count against aio-max-nr, which resolves those.

Signed-off-by: Mauricio Faria de Oliveira
Reported-by: Lekshmi C. Pillai
Tested-by: Lekshmi C. Pillai
Tested-by: Paul Nguyen
Reviewed-by: Jeff Moyer
Signed-off-by: Benjamin LaHaise

Mauricio Faria de Oliveira
2017-09-08 00:28:28 +0800

05 Sep, 2017

1 commit

91f9943e1 fs: support RWF_NOWAIT for buffered reads ... Browse Code »

This is based on the old idea and code from Milosz Tanski. With the aio
nowait code it becomes mostly trivial now. Buffered writes continue to
return -EOPNOTSUPP if RWF_NOWAIT is passed.

Signed-off-by: Christoph Hellwig
Reviewed-by: Jan Kara
Signed-off-by: Al Viro

Christoph Hellwig
2017-09-05 07:04:23 +0800

28 Jun, 2017

1 commit

45d06cf70 fs: add O_DIRECT and aio support for sending down write life time hints ... Browse Code »

Reviewed-by: Andreas Dilger
Reviewed-by: Martin K. Petersen
Signed-off-by: Jens Axboe

Jens Axboe
2017-06-28 02:05:36 +0800

20 Jun, 2017

2 commits

b745fafaf fs: Introduce RWF_NOWAIT and FMODE_AIO_NOWAIT ... Browse Code »

RWF_NOWAIT informs kernel to bail out if an AIO request will block
for reasons such as file allocations, or a writeback triggered,
or would block while allocating requests while performing
direct I/O.

RWF_NOWAIT is translated to IOCB_NOWAIT for iocb->ki_flags.

FMODE_AIO_NOWAIT is a flag which identifies the file opened is capable
of returning -EAGAIN if the AIO call will block. This must be set by
supporting filesystems in the ->open() call.

Filesystems xfs, btrfs and ext4 would be supported in the following patches.

Reviewed-by: Christoph Hellwig
Reviewed-by: Jan Kara
Signed-off-by: Goldwyn Rodrigues
Signed-off-by: Jens Axboe

Goldwyn Rodrigues
2017-06-20 21:12:03 +0800
9830f4be1 fs: Use RWF_* flags for AIO operations ... Browse Code »

aio_rw_flags is introduced in struct iocb (using aio_reserved1) which will
carry the RWF_* flags. We cannot use aio_flags because they are not
checked for validity which may break existing applications.

Note, the only place RWF_HIPRI comes in effect is dio_await_one().
All the rest of the locations, aio code return -EIOCBQUEUED before the
checks for RWF_HIPRI.

Reviewed-by: Christoph Hellwig
Reviewed-by: Jan Kara
Signed-off-by: Goldwyn Rodrigues
Signed-off-by: Jens Axboe

Goldwyn Rodrigues
2017-06-20 21:12:03 +0800

04 Mar, 2017

1 commit

1827adb11 Merge branch 'WIP.sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull sched.h split-up from Ingo Molnar:
"The point of these changes is to significantly reduce the
header footprint, to speed up the kernel build and to
have a cleaner header structure.

After these changes the new 's typical preprocessed
size goes down from a previous ~0.68 MB (~22K lines) to ~0.45 MB (~15K
lines), which is around 40% faster to build on typical configs.

Not much changed from the last version (-v2) posted three weeks ago: I
eliminated quirks, backmerged fixes plus I rebased it to an upstream
SHA1 from yesterday that includes most changes queued up in -next plus
all sched.h changes that were pending from Andrew.

I've re-tested the series both on x86 and on cross-arch defconfigs,
and did a bisectability test at a number of random points.

I tried to test as many build configurations as possible, but some
build breakage is probably still left - but it should be mostly
limited to architectures that have no cross-compiler binaries
available on kernel.org, and non-default configurations"

* 'WIP.sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (146 commits)
sched/headers: Clean up
sched/headers: Remove #ifdefs from
sched/headers: Remove the include from
sched/headers, hrtimer: Remove the include from
sched/headers, x86/apic: Remove the header inclusion from
sched/headers, timers: Remove the include from
sched/headers: Remove from
sched/headers: Remove from
sched/core: Remove unused prefetch_stack()
sched/headers: Remove from
sched/headers: Remove the 'init_pid_ns' prototype from
sched/headers: Remove from
sched/headers: Remove from
sched/headers: Remove the runqueue_is_locked() prototype
sched/headers: Remove from
sched/headers: Remove from
sched/headers: Remove from
sched/headers: Remove from
sched/headers: Remove the include from
sched/headers: Remove from
...

Linus Torvalds
2017-03-04 02:16:38 +0800

03 Mar, 2017

1 commit

94e877d0f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs pile two from Al Viro:

- orangefs fix

- series of fs/namei.c cleanups from me

- VFS stuff coming from overlayfs tree

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
orangefs: Use RCU for destroy_inode
vfs: use helper for calling f_op->fsync()
mm: use helper for calling f_op->mmap()
vfs: use helpers for calling f_op->{read,write}_iter()
vfs: pass type instead of fn to do_{loop,iter}_readv_writev()
vfs: extract common parts of {compat_,}do_readv_writev()
vfs: wrap write f_ops with file_{start,end}_write()
vfs: deny copy_file_range() for non regular files
vfs: deny fallocate() on directory
vfs: create vfs helper vfs_tmpfile()
namei.c: split unlazy_walk()
namei.c: fold the check for DCACHE_OP_REVALIDATE into d_revalidate()
lookup_fast(): clean up the logics around the fallback to non-rcu mode
namei: fold unlazy_link() into its sole caller

Linus Torvalds
2017-03-03 07:20:00 +0800

02 Mar, 2017

2 commits

653a7746f Merge remote-tracking branch 'ovl/for-viro' into for-linus ... Browse Code »

Overlayfs-related series from Miklos and Amir

Al Viro
2017-03-02 19:41:22 +0800
174cd4b1e sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sc… ... Browse Code »

…hed.h> into <linux/sched/signal.h>

Fix up affected files that include this signal functionality via sched.h.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Ingo Molnar
2017-03-02 15:42:32 +0800

25 Feb, 2017

1 commit

897ab3e0c userfaultfd: non-cooperative: add event for memory unmaps ... Browse Code »

When a non-cooperative userfaultfd monitor copies pages in the
background, it may encounter regions that were already unmapped.
Addition of UFFD_EVENT_UNMAP allows the uffd monitor to track precisely
changes in the virtual memory layout.

Since there might be different uffd contexts for the affected VMAs, we
first should create a temporary representation for the unmap event for
each uffd context and then notify them one by one to the appropriate
userfault file descriptors.

The event notification occurs after the mmap_sem has been released.

[arnd@arndb.de: fix nommu build]
Link: http://lkml.kernel.org/r/20170203165141.3665284-1-arnd@arndb.de
[mhocko@suse.com: fix nommu build]
Link: http://lkml.kernel.org/r/20170202091503.GA22823@dhcp22.suse.cz
Link: http://lkml.kernel.org/r/1485542673-24387-3-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport
Signed-off-by: Michal Hocko
Signed-off-by: Arnd Bergmann
Acked-by: Hillf Danton
Cc: Andrea Arcangeli
Cc: "Dr. David Alan Gilbert"
Cc: Mike Kravetz
Cc: Pavel Emelyanov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mike Rapoport
2017-02-25 09:46:55 +0800

20 Feb, 2017

1 commit

bb7462b6f vfs: use helpers for calling f_op->{read,write}_iter() ... Browse Code »

Signed-off-by: Miklos Szeredi

Miklos Szeredi
2017-02-20 23:51:23 +0800

15 Jan, 2017

1 commit

a12f1ae61 aio: fix lock dep warning ... Browse Code »

lockdep reports a warnning. file_start_write/file_end_write only
acquire/release the lock for regular files. So checking the files in aio
side too.

[ 453.532141] ------------[ cut here ]------------
[ 453.533011] WARNING: CPU: 1 PID: 1298 at ../kernel/locking/lockdep.c:3514 lock_release+0x434/0x670
[ 453.533011] DEBUG_LOCKS_WARN_ON(depth ] dump_stack+0x67/0x9c
[ 453.533011] [] __warn+0x111/0x130
[ 453.533011] [] warn_slowpath_fmt+0x97/0xb0
[ 453.533011] [] ? __warn+0x130/0x130
[ 453.533011] [] ? blk_finish_plug+0x29/0x60
[ 453.533011] [] lock_release+0x434/0x670
[ 453.533011] [] ? import_single_range+0xd4/0x110
[ 453.533011] [] ? rw_verify_area+0x65/0x140
[ 453.533011] [] ? aio_write+0x1f6/0x280
[ 453.533011] [] aio_write+0x229/0x280
[ 453.533011] [] ? aio_complete+0x640/0x640
[ 453.533011] [] ? debug_check_no_locks_freed+0x1a0/0x1a0
[ 453.533011] [] ? debug_lockdep_rcu_enabled.part.2+0x1a/0x30
[ 453.533011] [] ? debug_lockdep_rcu_enabled+0x35/0x40
[ 453.533011] [] ? __might_fault+0x7e/0xf0
[ 453.533011] [] do_io_submit+0x94c/0xb10
[ 453.533011] [] ? do_io_submit+0x23e/0xb10
[ 453.533011] [] ? SyS_io_destroy+0x270/0x270
[ 453.533011] [] ? mark_held_locks+0x23/0xc0
[ 453.533011] [] ? trace_hardirqs_on_thunk+0x1a/0x1c
[ 453.533011] [] SyS_io_submit+0x10/0x20
[ 453.533011] [] entry_SYSCALL_64_fastpath+0x18/0xad
[ 453.533011] [] ? trace_hardirqs_off_caller+0xc0/0x110
[ 453.533011] ---[ end trace b2fbe664d1cc0082 ]---

Cc: Dmitry Monakhov
Cc: Jan Kara
Cc: Christoph Hellwig
Cc: Al Viro
Reviewed-by: Christoph Hellwig
Signed-off-by: Shaohua Li
Signed-off-by: Al Viro

Shaohua Li
2017-01-15 08:31:40 +0800

26 Dec, 2016

1 commit

2456e8553 ktime: Get rid of the union ... Browse Code »

ktime is a union because the initial implementation stored the time in
scalar nanoseconds on 64 bit machine and in a endianess optimized timespec
variant for 32bit machines. The Y2038 cleanup removed the timespec variant
and switched everything to scalar nanoseconds. The union remained, but
become completely pointless.

Get rid of the union and just keep ktime_t as simple typedef of type s64.

The conversion was done with coccinelle and some manual mopping up.

Signed-off-by: Thomas Gleixner
Cc: Peter Zijlstra

Thomas Gleixner
2016-12-26 00:21:22 +0800

25 Dec, 2016

1 commit

7c0f6ba68 Replace <asm/uaccess.h> with <linux/uaccess.h> globally ... Browse Code »

This was entirely automated, using the script by Al:

PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*'
sed -i -e "s!$PATT!#include !" \
$(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)

to do the replacement at the end of the merge window.

Requested-by: Al Viro
Signed-off-by: Linus Torvalds

Linus Torvalds
2016-12-25 03:46:01 +0800

23 Dec, 2016

1 commit

c00d2c7e8 move aio compat to fs/aio.c ... Browse Code »

... and fix the minor buglet in compat io_submit() - native one
kills ioctx as cleanup when put_user() fails. Get rid of
bogus compat_... in !CONFIG_AIO case, while we are at it - they
should simply fail with ENOSYS, same as for native counterparts.

Signed-off-by: Al Viro

Al Viro
2016-12-23 11:58:37 +0800

05 Dec, 2016

1 commit

450630975 don't open-code file_inode() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2016-12-05 07:29:28 +0800

31 Oct, 2016

4 commits

70fe2f481 aio: fix freeze protection of aio writes ... Browse Code »

Currently we dropped freeze protection of aio writes just after IO was
submitted. Thus aio write could be in flight while the filesystem was
frozen and that could result in unexpected situation like aio completion
wanting to convert extent type on frozen filesystem. Testcase from
Dmitry triggering this is like:

for ((i=0;i
Signed-off-by: Jan Kara
[hch: forward ported on top of various VFS and aio changes]
Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Jan Kara
2016-10-31 01:09:42 +0800
89319d31d fs: remove aio_run_iocb ... Browse Code »

Pass the ABI iocb structure to aio_setup_rw and let it handle the
non-vectored I/O case as well. With that and a new helper for the AIO
return value handling we can now define new aio_read and aio_write
helpers that implement reads and writes in a self-contained way without
duplicating too much code.

Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2016-10-31 01:09:42 +0800
723c03847 fs: remove the never implemented aio_fsync file operation ... Browse Code »

Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2016-10-31 01:09:42 +0800
0b944d3a4 aio: hold an extra file reference over AIO read/write operations ... Browse Code »

Otherwise we might dereference an already freed file and/or inode
when aio_complete is called before we return from the read_iter or
write_iter method.

Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2016-10-31 01:09:42 +0800

28 Sep, 2016

1 commit

de04e7693 fs/aio.c: eliminate redundant loads in put_aio_ring_file ... Browse Code »

Using a local variable we can prevent gcc from reloading
aio_ring_file->f_inode->i_mapping twice, eliminating 2x2 dependent
loads.

Signed-off-by: Rasmus Villemoes
Signed-off-by: Al Viro

Rasmus Villemoes
2016-09-28 09:45:46 +0800

16 Sep, 2016

1 commit

22f6b4d34 aio: mark AIO pseudo-fs noexec ... Browse Code »

This ensures that do_mmap() won't implicitly make AIO memory mappings
executable if the READ_IMPLIES_EXEC personality flag is set. Such
behavior is problematic because the security_mmap_file LSM hook doesn't
catch this case, potentially permitting an attacker to bypass a W^X
policy enforced by SELinux.

I have tested the patch on my machine.

To test the behavior, compile and run this:

#define _GNU_SOURCE
#include
#include
#include
#include
#include
#include
#include

int main(void) {
personality(READ_IMPLIES_EXEC);
aio_context_t ctx = 0;
if (syscall(__NR_io_setup, 1, &ctx))
err(1, "io_setup");

char cmd[1000];
sprintf(cmd, "cat /proc/%d/maps | grep -F '/[aio]'",
(int)getpid());
system(cmd);
return 0;
}

In the output, "rw-s" is good, "rwxs" is bad.

Signed-off-by: Jann Horn
Signed-off-by: Linus Torvalds

Jann Horn
2016-09-16 06:49:28 +0800

24 May, 2016

1 commit

013373e8b aio: make aio_setup_ring killable ... Browse Code »

aio_setup_ring waits for mmap_sem in writable mode. If the waiting task
gets killed by the oom killer it would block oom_reaper from
asynchronous address space reclaim and reduce the chances of timely OOM
resolving. Wait for the lock in the killable mode and return with EINTR
if the task got killed while waiting. This will also expedite the
return to the userspace and do_exit.

Signed-off-by: Michal Hocko
Acked-by: Jeff Moyer
Acked-by: Vlastimil Babka
Cc: Benamin LaHaise
Cc: Alexander Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2016-05-24 08:04:14 +0800

04 Apr, 2016

1 commit

2958ec177 aio: remove a pointless assignment ... Browse Code »

the value is never used after that point

Signed-off-by: Al Viro

Al Viro
2016-04-04 07:51:33 +0800

05 Sep, 2015

1 commit

5477e70a6 mm: move ->mremap() from file_operations to vm_operations_struct ... Browse Code »

vma->vm_ops->mremap() looks more natural and clean in move_vma(), and this
way ->mremap() can have more users. Say, vdso.

While at it, s/aio_ring_remap/aio_ring_mremap/.

Note: this is the minimal change before ->mremap() finds another user in
file_operations; this method should have more arguments, and it can be
used to kill arch_remap().

Signed-off-by: Oleg Nesterov
Acked-by: Pavel Emelyanov
Acked-by: Kirill A. Shutemov
Cc: David Rientjes
Cc: Benjamin LaHaise
Cc: Hugh Dickins
Cc: Jeff Moyer
Cc: Laurent Dufour
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2015-09-05 07:54:41 +0800

17 Apr, 2015

2 commits

4fc8adcfe Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull third hunk of vfs changes from Al Viro:
"This contains the ->direct_IO() changes from Omar + saner
generic_write_checks() + dealing with fcntl()/{read,write}() races
(mirroring O_APPEND/O_DIRECT into iocb->ki_flags and instead of
repeatedly looking at ->f_flags, which can be changed by fcntl(2),
check ->ki_flags - which cannot) + infrastructure bits for dhowells'
d_inode annotations + Christophs switch of /dev/loop to
vfs_iter_write()"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (30 commits)
block: loop: switch to VFS ITER_BVEC
configfs: Fix inconsistent use of file_inode() vs file->f_path.dentry->d_inode
VFS: Make pathwalk use d_is_reg() rather than S_ISREG()
VFS: Fix up debugfs to use d_is_dir() in place of S_ISDIR()
VFS: Combine inode checks with d_is_negative() and d_is_positive() in pathwalk
NFS: Don't use d_inode as a variable name
VFS: Impose ordering on accesses of d_inode and d_flags
VFS: Add owner-filesystem positive/negative dentry checks
nfs: generic_write_checks() shouldn't be done on swapout...
ocfs2: use __generic_file_write_iter()
mirror O_APPEND and O_DIRECT into iocb->ki_flags
switch generic_write_checks() to iocb and iter
ocfs2: move generic_write_checks() before the alignment checks
ocfs2_file_write_iter: stop messing with ppos
udf_file_write_iter: reorder and simplify
fuse: ->direct_IO() doesn't need generic_write_checks()
ext4_file_write_iter: move generic_write_checks() up
xfs_file_aio_write_checks: switch to iocb/iov_iter
generic_write_checks(): drop isblk argument
blkdev_write_iter: expand generic_file_checks() call in there
...

Linus Torvalds
2015-04-17 11:27:56 +0800
d82312c80 Merge branch 'for-4.1/core' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block layer core bits from Jens Axboe:
"This is the core pull request for 4.1. Not a lot of stuff in here for
this round, mostly little fixes or optimizations. This pull request
contains:

- An optimization that speeds up queue runs on blk-mq, especially for
the case where there's a large difference between nr_cpu_ids and
the actual mapped software queues on a hardware queue. From Chong
Yuan.

- Honor node local allocations for requests on legacy devices. From
David Rientjes.

- Cleanup of blk_mq_rq_to_pdu() from me.

- exit_aio() fixup from me, greatly speeding up exiting multiple IO
contexts off exit_group(). For my particular test case, fio exit
took ~6 seconds. A typical case of both exposing RCU grace periods
to user space, and serializing exit of them.

- Make blk_mq_queue_enter() honor the gfp mask passed in, so we only
wait if __GFP_WAIT is set. From Keith Busch.

- blk-mq exports and two added helpers from Mike Snitzer, which will
be used by the dm-mq code.

- Cleanups of blk-mq queue init from Wei Fang and Xiaoguang Wang"

* 'for-4.1/core' of git://git.kernel.dk/linux-block:
blk-mq: reduce unnecessary software queue looping
aio: fix serial draining in exit_aio()
blk-mq: cleanup blk_mq_rq_to_pdu()
blk-mq: put blk_queue_rq_timeout together in blk_mq_init_queue()
block: remove redundant check about 'set->nr_hw_queues' in blk_mq_alloc_tag_set()
block: allocate request memory local to request queue
blk-mq: don't wait in blk_mq_queue_enter() if __GFP_WAIT isn't set
blk-mq: export blk_mq_run_hw_queues
blk-mq: add blk_mq_init_allocated_queue and export blk_mq_register_disk

Linus Torvalds
2015-04-17 09:49:16 +0800

16 Apr, 2015

2 commits

fa927894b Merge branch 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull second vfs update from Al Viro:
"Now that net-next went in... Here's the next big chunk - killing
->aio_read() and ->aio_write().

There'll be one more pile today (direct_IO changes and
generic_write_checks() cleanups/fixes), but I'd prefer to keep that
one separate"

* 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (37 commits)
->aio_read and ->aio_write removed
pcm: another weird API abuse
infinibad: weird APIs switched to ->write_iter()
kill do_sync_read/do_sync_write
fuse: use iov_iter_get_pages() for non-splice path
fuse: switch to ->read_iter/->write_iter
switch drivers/char/mem.c to ->read_iter/->write_iter
make new_sync_{read,write}() static
coredump: accept any write method
switch /dev/loop to vfs_iter_write()
serial2002: switch to __vfs_read/__vfs_write
ashmem: use __vfs_read()
export __vfs_read()
autofs: switch to __vfs_write()
new helper: __vfs_write()
switch hugetlbfs to ->read_iter()
coda: switch to ->read_iter/->write_iter
ncpfs: switch to ->read_iter/->write_iter
net/9p: remove (now-)unused helpers
p9_client_attach(): set fid->uid correctly
...

Linus Torvalds
2015-04-16 04:22:56 +0800
dc48e56d7 aio: fix serial draining in exit_aio() ... Browse Code »

exit_aio() currently serializes killing io contexts. Each context
killing ends up having to do percpu_ref_kill(), which in turns has
to wait for an RCU grace period. This can take a long time, depending
on the number of contexts. And there's no point in doing them serially,
when we could be waiting for all of them in one fell swoop.

This patches makes my fio thread offload test case exit 0.2s instead
of almost 6s.

Reviewed-by: Jeff Moyer
Signed-off-by: Jens Axboe

Jens Axboe
2015-04-16 01:17:23 +0800

15 Apr, 2015

1 commit

ca2ec3265 Merge branch 'for-linus-1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs update from Al Viro:
"Part one:

- struct filename-related cleanups

- saner iov_iter_init() replacements (and switching the syscalls to
use of those)

- ntfs switch to ->write_iter() (Anton)

- aio cleanups and splitting iocb into common and async parts
(Christoph)

- assorted fixes (me, bfields, Andrew Elble)

There's a lot more, including the completion of switchover to
->{read,write}_iter(), d_inode/d_backing_inode annotations, f_flags
race fixes, etc, but that goes after #for-davem merge. David has
pulled it, and once it's in I'll send the next vfs pull request"

* 'for-linus-1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (35 commits)
sg_start_req(): use import_iovec()
sg_start_req(): make sure that there's not too many elements in iovec
blk_rq_map_user(): use import_single_range()
sg_io(): use import_iovec()
process_vm_access: switch to {compat_,}import_iovec()
switch keyctl_instantiate_key_common() to iov_iter
switch {compat_,}do_readv_writev() to {compat_,}import_iovec()
aio_setup_vectored_rw(): switch to {compat_,}import_iovec()
vmsplice_to_user(): switch to import_iovec()
kill aio_setup_single_vector()
aio: simplify arguments of aio_setup_..._rw()
aio: lift iov_iter_init() into aio_setup_..._rw()
lift iov_iter into {compat_,}do_readv_writev()
NFS: fix BUG() crash in notify_change() with patch to chown_common()
dcache: return -ESTALE not -EBUSY on distributed fs race
NTFS: Version 2.1.32 - Update file write from aio_write to write_iter.
VFS: Add iov_iter_fault_in_multipages_readable()
drop bogus check in file_open_root()
switch security_inode_getattr() to struct path *
constify tomoyo_realpath_from_path()
...

Linus Torvalds
2015-04-15 06:31:03 +0800

12 Apr, 2015

3 commits

2ba48ce51 mirror O_APPEND and O_DIRECT into iocb->ki_flags ... Browse Code »

... avoiding write_iter/fcntl races.

Signed-off-by: Al Viro

Al Viro
2015-04-12 10:30:22 +0800
dfea93457 Merge branch 'for-linus' into for-next Browse Code »

Al Viro
2015-04-12 10:29:51 +0800
843631820 ->aio_read and ->aio_write removed ... Browse Code »

no remaining users

Signed-off-by: Al Viro

Al Viro
2015-04-12 10:29:43 +0800