Eric Lee / smarc-ti-linux-kernel | Embedian Git Server

18 Dec, 2014

1 commit

c103b21c2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse ... Browse Code »

Pull fuse update from Miklos Szeredi:
"The first part makes sure we don't hold up umount with pending async
requests. In addition to being a cleanup, this is a small behavioral
change (for the better) and unlikely to break anything.

The second part prepares for a cleanup of the fuse device I/O code by
adding a helper for simple request submission, with some savings in
line numbers already realized"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: use file_inode() in fuse_file_fallocate()
fuse: introduce fuse_simple_request() helper
fuse: reduce max out args
fuse: hold inode instead of path after release
fuse: flush requests on umount
fuse: don't wake up reserved req in fuse_conn_kill()

Linus Torvalds
2014-12-18 01:41:32 +0800

12 Dec, 2014

6 commits

1c68271cf fuse: use file_inode() in fuse_file_fallocate() ... Browse Code »

Signed-off-by: Miklos Szeredi

Miklos Szeredi
2014-12-12 17:04:51 +0800
7078187a7 fuse: introduce fuse_simple_request() helper ... Browse Code »
26

The following pattern is repeated many times:

req = fuse_get_req_nopages(fc);
/* Initialize req->(in|out).args */
fuse_request_send(fc, req);
err = req->out.h.error;
fuse_put_request(req);

Create a new replacement helper:

/* Initialize args */
err = fuse_simple_request(fc, &args);

In addition to reducing the code size, this will ease moving from the
complex arg-based to a simpler page-based I/O on the fuse device.

Signed-off-by: Miklos Szeredi

Miklos Szeredi
2014-12-12 16:49:05 +0800
f704dcb53 fuse: reduce max out args ... Browse Code »

The third out-arg is never actually used.

Signed-off-by: Miklos Szeredi

Miklos Szeredi
2014-12-12 16:49:05 +0800
baebccbe9 fuse: hold inode instead of path after release ... Browse Code »

path_put() in release could trigger a DESTROY request in fuseblk. The
possible deadlock was worked around by doing the path_put() with
schedule_work().

This complexity isn't needed if we just hold the inode instead of the path.
Since we now flush all requests before destroying the super block we can be
sure that all held inodes will be dropped.

Signed-off-by: Miklos Szeredi

Miklos Szeredi
2014-12-12 16:49:04 +0800
580640ba5 fuse: flush requests on umount ... Browse Code »

Use fuse_abort_conn() instead of fuse_conn_kill() in fuse_put_super().
This flushes and aborts requests still on any queues. But since we've
already reset fc->connected, those requests would not be useful anyway and
would be flushed when the fuse device is closed.

Next patches will rely on requests being flushed before the superblock is
destroyed.

Use fuse_abort_conn() in cuse_process_init_reply() too, since it makes no
difference there, and we can get rid of fuse_conn_kill().

Signed-off-by: Miklos Szeredi

Miklos Szeredi
2014-12-12 16:49:04 +0800
0c4dd4ba1 fuse: don't wake up reserved req in fuse_conn_kill() ... Browse Code »

Waking up reserved_req_waitq from fuse_conn_kill() doesn't make sense since
we aren't chaging ff->reserved_req here, which is what this waitqueue
signals.

Signed-off-by: Miklos Szeredi

Miklos Szeredi
2014-12-12 16:49:04 +0800

20 Nov, 2014

2 commits

a455589f1 assorted conversions to %p[dD] ... Browse Code »
13

Signed-off-by: Al Viro

Al Viro
2014-11-20 02:01:20 +0800
41d28bca2 switch d_materialise_unique() users to d_splice_alias() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2014-11-20 02:01:20 +0800

09 Oct, 2014

2 commits

5542aa2fa vfs: Make d_invalidate return void ... Browse Code »

Now that d_invalidate can no longer fail, stop returning a useless
return code. For the few callers that checked the return code update
remove the handling of d_invalidate failure.

Reviewed-by: Miklos Szeredi
Signed-off-by: "Eric W. Biederman"
Signed-off-by: Al Viro

Eric W. Biederman
2014-10-09 14:38:57 +0800
9b053f320 vfs: Remove unnecessary calls of check_submounts_and_drop ... Browse Code »

Now that check_submounts_and_drop can not fail and is called from
d_invalidate there is no longer a need to call check_submounts_and_drom
from filesystem d_revalidate methods so remove it.

Reviewed-by: Miklos Szeredi
Signed-off-by: "Eric W. Biederman"
Signed-off-by: Al Viro

Eric W. Biederman
2014-10-09 14:38:56 +0800

27 Sep, 2014

1 commit

2c80929c4 fuse: honour max_read and max_write in direct_io mode ... Browse Code »

The third argument of fuse_get_user_pages() "nbytesp" refers to the number of
bytes a caller asked to pack into fuse request. This value may be lesser
than capacity of fuse request or iov_iter. So fuse_get_user_pages() must
ensure that *nbytesp won't grow.

Now, when helper iov_iter_get_pages() performs all hard work of extracting
pages from iov_iter, it can be done by passing properly calculated
"maxsize" to the helper.

The other caller of iov_iter_get_pages() (dio_refill_pages()) doesn't need
this capability, so pass LONG_MAX as the maxsize argument here.

Fixes: c9c37e2e6378 ("fuse: switch to iov_iter_get_pages()")
Reported-by: Werner Baumann
Tested-by: Maxim Patlasov
Signed-off-by: Miklos Szeredi
Signed-off-by: Al Viro

Miklos Szeredi
2014-09-27 09:16:51 +0800

08 Aug, 2014

2 commits

c7f3888ad switch iov_iter_get_pages() to passing maximal number of pages ... Browse Code »

... instead of maximal size.

Signed-off-by: Al Viro

Al Viro
2014-08-08 02:40:11 +0800
7177a9c4b fs: call rename2 if exists ... Browse Code »
26

Christoph Hellwig suggests:

1) make vfs_rename call ->rename2 if it exists instead of ->rename
2) switch all filesystems that you're adding NOREPLACE support for to
use ->rename2
3) see how many ->rename instances we'll have left after a few
iterations of 2.

Signed-off-by: Miklos Szeredi
Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Miklos Szeredi
2014-08-08 02:40:09 +0800

22 Jul, 2014

2 commits

d7afaec0b fuse: add FUSE_NO_OPEN_SUPPORT flag to INIT ... Browse Code »

Here some additional changes to set a capability flag so that clients can
detect when it's appropriate to return -ENOSYS from open.

This amends the following commit introduced in 3.14:

7678ac50615d fuse: support clients that don't implement 'open'

However we can only add the flag to 3.15 and later since there was no
protocol version update in 3.14.

Signed-off-by: Miklos Szeredi
Cc: # v3.15+

Andrew Gallagher
2014-07-22 22:37:43 +0800
a800bad36 fuse: s_time_gran fix ... Browse Code »

Default s_time_gran is 1, don't overwrite that if userspace didn't
explicitly specify one.

Signed-off-by: Miklos Szeredi
Cc: # v3.15+

Miklos Szeredi
2014-07-22 22:37:42 +0800

15 Jul, 2014

1 commit

0b632204c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse ... Browse Code »

Pull fuse fixes from Miklos Szeredi:
"This contains miscellaneous fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: replace count*size kzalloc by kcalloc
fuse: release temporary page if fuse_writepage_locked() failed
fuse: restructure ->rename2()
fuse: avoid scheduling while atomic
fuse: handle large user and group ID
fuse: inode: drop cast
fuse: ignore entry-timeout on LOOKUP_REVAL
fuse: timeout comparison fix

Linus Torvalds
2014-07-15 23:57:17 +0800

14 Jul, 2014

2 commits

f2b3455e4 fuse: replace count*size kzalloc by kcalloc ... Browse Code »

kcalloc manages count*sizeof overflow.

Signed-off-by: Fabian Frederick
Signed-off-by: Miklos Szeredi

Fabian Frederick
2014-07-14 22:30:25 +0800
27f1b3632 fuse: release temporary page if fuse_writepage_locked() failed ... Browse Code »

tmp_page to be freed if fuse_write_file_get() returns NULL.

Signed-off-by: Maxim Patlasov
Signed-off-by: Miklos Szeredi

Maxim Patlasov
2014-07-14 22:17:57 +0800

10 Jul, 2014

1 commit

4237ba43b fuse: restructure ->rename2() ... Browse Code »

Make ->rename2() universal, i.e. able to handle zero flags. This is to
make future change of the API easier.

Signed-off-by: Miklos Szeredi

Miklos Szeredi
2014-07-10 16:50:19 +0800

07 Jul, 2014

5 commits

c55a01d36 fuse: avoid scheduling while atomic ... Browse Code »

As reported by Richard Sharpe, an attempt to use fuse_notify_inval_entry()
triggers complains about scheduling while atomic:

BUG: scheduling while atomic: fuse.hf/13976/0x10000001

This happens because fuse_notify_inval_entry() attempts to allocate memory
with GFP_KERNEL, holding "struct fuse_copy_state" mapped by kmap_atomic().

Introduced by commit 58bda1da4b3c "fuse/dev: use atomic maps"

Fix by moving the map/unmap to just cover the actual memcpy operation.

Original patch from Maxim Patlasov

Reported-by: Richard Sharpe
Signed-off-by: Miklos Szeredi
Cc: # v3.15+

Miklos Szeredi
2014-07-07 21:28:51 +0800
233a01fa9 fuse: handle large user and group ID ... Browse Code »
5

If the number in "user_id=N" or "group_id=N" mount options was larger than
INT_MAX then fuse returned EINVAL.

Fix this to handle all valid uid/gid values.

Signed-off-by: Miklos Szeredi
Cc: stable@vger.kernel.org

Miklos Szeredi
2014-07-07 21:28:51 +0800
7b3d8bf77 fuse: inode: drop cast ... Browse Code »

This patch removes the cast on data of type void * as it is not needed.
The following Coccinelle semantic patch was used for making the change:

@r@
expression x;
void* e;
type T;
identifier f;
@@

(
*((T *)e)
|
((T *)x)[...]
|
((T *)x)->f
|
- (T *)
e
)

Signed-off-by: Himangi Saraogi
Acked-by: Julia Lawall
Signed-off-by: Miklos Szeredi

Himangi Saraogi
2014-07-07 21:28:51 +0800
154210ccb fuse: ignore entry-timeout on LOOKUP_REVAL ... Browse Code »
5

The following test case demonstrates the bug:

sh# mount -t glusterfs localhost:meta-test /mnt/one

sh# mount -t glusterfs localhost:meta-test /mnt/two

sh# echo stuff > /mnt/one/file; rm -f /mnt/two/file; echo stuff > /mnt/one/file
bash: /mnt/one/file: Stale file handle

sh# echo stuff > /mnt/one/file; rm -f /mnt/two/file; sleep 1; echo stuff > /mnt/one/file

On the second open() on /mnt/one, FUSE would have used the old
nodeid (file handle) trying to re-open it. Gluster is returning
-ESTALE. The ESTALE propagates back to namei.c:filename_lookup()
where lookup is re-attempted with LOOKUP_REVAL. The right
behavior now, would be for FUSE to ignore the entry-timeout and
and do the up-call revalidation. Instead FUSE is ignoring
LOOKUP_REVAL, succeeding the revalidation (because entry-timeout
has not passed), and open() is again retried on the old file
handle and finally the ESTALE is going back to the application.

Fix: if revalidation is happening with LOOKUP_REVAL, then ignore
entry-timeout and always do the up-call.

Signed-off-by: Anand Avati
Reviewed-by: Niels de Vos
Signed-off-by: Miklos Szeredi
Cc: stable@vger.kernel.org

Anand Avati
2014-07-07 21:28:51 +0800
126b9d436 fuse: timeout comparison fix ... Browse Code »
5

As suggested by checkpatch.pl, use time_before64() instead of direct
comparison of jiffies64 values.

Signed-off-by: Miklos Szeredi
Cc:

Miklos Szeredi
2014-07-07 21:28:50 +0800

13 Jun, 2014

1 commit

16b905780 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs updates from Al Viro:
"This the bunch that sat in -next + lock_parent() fix. This is the
minimal set; there's more pending stuff.

In particular, I really hope to get acct.c fixes merged this cycle -
we need that to deal sanely with delayed-mntput stuff. In the next
pile, hopefully - that series is fairly short and localized
(kernel/acct.c, fs/super.c and fs/namespace.c). In this pile: more
iov_iter work. Most of prereqs for ->splice_write with sane locking
order are there and Kent's dio rewrite would also fit nicely on top of
this pile"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (70 commits)
lock_parent: don't step on stale ->d_parent of all-but-freed one
kill generic_file_splice_write()
ceph: switch to iter_file_splice_write()
shmem: switch to iter_file_splice_write()
nfs: switch to iter_splice_write_file()
fs/splice.c: remove unneeded exports
ocfs2: switch to iter_file_splice_write()
->splice_write() via ->write_iter()
bio_vec-backed iov_iter
optimize copy_page_{to,from}_iter()
bury generic_file_aio_{read,write}
lustre: get rid of messing with iovecs
ceph: switch to ->write_iter()
ceph_sync_direct_write: stop poking into iov_iter guts
ceph_sync_read: stop poking into iov_iter guts
new helper: copy_page_from_iter()
fuse: switch to ->write_iter()
btrfs: switch to ->write_iter()
ocfs2: switch to ->write_iter()
xfs: switch to ->write_iter()
...

Linus Torvalds
2014-06-13 01:30:18 +0800

05 Jun, 2014

2 commits

2457aec63 mm: non-atomically mark page accessed during page cache allocation where possible ... Browse Code »
65

aops->write_begin may allocate a new page and make it visible only to have
mark_page_accessed called almost immediately after. Once the page is
visible the atomic operations are necessary which is noticable overhead
when writing to an in-memory filesystem like tmpfs but should also be
noticable with fast storage. The objective of the patch is to initialse
the accessed information with non-atomic operations before the page is
visible.

The bulk of filesystems directly or indirectly use
grab_cache_page_write_begin or find_or_create_page for the initial
allocation of a page cache page. This patch adds an init_page_accessed()
helper which behaves like the first call to mark_page_accessed() but may
called before the page is visible and can be done non-atomically.

The primary APIs of concern in this care are the following and are used
by most filesystems.

find_get_page
find_lock_page
find_or_create_page
grab_cache_page_nowait
grab_cache_page_write_begin

All of them are very similar in detail to the patch creates a core helper
pagecache_get_page() which takes a flags parameter that affects its
behavior such as whether the page should be marked accessed or not. Then
old API is preserved but is basically a thin wrapper around this core
function.

Each of the filesystems are then updated to avoid calling
mark_page_accessed when it is known that the VM interfaces have already
done the job. There is a slight snag in that the timing of the
mark_page_accessed() has now changed so in rare cases it's possible a page
gets to the end of the LRU as PageReferenced where as previously it might
have been repromoted. This is expected to be rare but it's worth the
filesystem people thinking about it in case they see a problem with the
timing change. It is also the case that some filesystems may be marking
pages accessed that previously did not but it makes sense that filesystems
have consistent behaviour in this regard.

The test case used to evaulate this is a simple dd of a large file done
multiple times with the file deleted on each iterations. The size of the
file is 1/10th physical memory to avoid dirty page balancing. In the
async case it will be possible that the workload completes without even
hitting the disk and will have variable results but highlight the impact
of mark_page_accessed for async IO. The sync results are expected to be
more stable. The exception is tmpfs where the normal case is for the "IO"
to not hit the disk.

The test machine was single socket and UMA to avoid any scheduling or NUMA
artifacts. Throughput and wall times are presented for sync IO, only wall
times are shown for async as the granularity reported by dd and the
variability is unsuitable for comparison. As async results were variable
do to writback timings, I'm only reporting the maximum figures. The sync
results were stable enough to make the mean and stddev uninteresting.

The performance results are reported based on a run with no profiling.
Profile data is based on a separate run with oprofile running.

async dd
3.15.0-rc3 3.15.0-rc3
vanilla accessed-v2
ext3 Max elapsed 13.9900 ( 0.00%) 11.5900 ( 17.16%)
tmpfs Max elapsed 0.5100 ( 0.00%) 0.4900 ( 3.92%)
btrfs Max elapsed 12.8100 ( 0.00%) 12.7800 ( 0.23%)
ext4 Max elapsed 18.6000 ( 0.00%) 13.3400 ( 28.28%)
xfs Max elapsed 12.5600 ( 0.00%) 2.0900 ( 83.36%)

The XFS figure is a bit strange as it managed to avoid a worst case by
sheer luck but the average figures looked reasonable.

samples percentage
ext3 86107 0.9783 vmlinux-3.15.0-rc4-vanilla mark_page_accessed
ext3 23833 0.2710 vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed
ext3 5036 0.0573 vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed
ext4 64566 0.8961 vmlinux-3.15.0-rc4-vanilla mark_page_accessed
ext4 5322 0.0713 vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed
ext4 2869 0.0384 vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed
xfs 62126 1.7675 vmlinux-3.15.0-rc4-vanilla mark_page_accessed
xfs 1904 0.0554 vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed
xfs 103 0.0030 vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed
btrfs 10655 0.1338 vmlinux-3.15.0-rc4-vanilla mark_page_accessed
btrfs 2020 0.0273 vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed
btrfs 587 0.0079 vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed
tmpfs 59562 3.2628 vmlinux-3.15.0-rc4-vanilla mark_page_accessed
tmpfs 1210 0.0696 vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed
tmpfs 94 0.0054 vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed

[akpm@linux-foundation.org: don't run init_page_accessed() against an uninitialised pointer]
Signed-off-by: Mel Gorman
Cc: Johannes Weiner
Cc: Vlastimil Babka
Cc: Jan Kara
Cc: Michal Hocko
Cc: Hugh Dickins
Cc: Dave Hansen
Cc: Theodore Ts'o
Cc: "Paul E. McKenney"
Cc: Oleg Nesterov
Cc: Rik van Riel
Cc: Peter Zijlstra
Tested-by: Prabhakar Lad
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2014-06-05 07:54:10 +0800
b745bc85f mm: page_alloc: convert hot/cold parameter and immediate callers to bool ... Browse Code »
4

cold is a bool, make it one. Make the likely case the "if" part of the
block instead of the else as according to the optimisation manual this is
preferred.

Signed-off-by: Mel Gorman
Acked-by: Rik van Riel
Cc: Johannes Weiner
Cc: Vlastimil Babka
Cc: Jan Kara
Cc: Michal Hocko
Cc: Hugh Dickins
Cc: Dave Hansen
Cc: Theodore Ts'o
Cc: "Paul E. McKenney"
Cc: Oleg Nesterov
Cc: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2014-06-05 07:54:09 +0800

02 Jun, 2014

1 commit

130d1f956 locks: ensure that fl_owner is always initialized properly in flock and lease codepaths ... Browse Code »
26

Currently, the fl_owner isn't set for flock locks. Some filesystems use
byte-range locks to simulate flock locks and there is a common idiom in
those that does:

fl->fl_owner = (fl_owner_t)filp;
fl->fl_start = 0;
fl->fl_end = OFFSET_MAX;

Since flock locks are generally "owned" by the open file description,
move this into the common flock lock setup code. The fl_start and fl_end
fields are already set appropriately, so remove the unneeded setting of
that in flock ops in those filesystems as well.

Finally, the lease code also sets the fl_owner as if they were owned by
the process and not the open file description. This is incorrect as
leases have the same ownership semantics as flock locks. Set them the
same way. The lease code doesn't actually use the fl_owner value for
anything, so this is more for consistency's sake than a bugfix.

Reported-by: Trond Myklebust
Signed-off-by: Jeff Layton
Acked-by: Greg Kroah-Hartman (Staging portion)
Acked-by: J. Bruce Fields

Jeff Layton
2014-06-02 20:09:29 +0800

07 May, 2014

11 commits

62a8067a7 bio_vec-backed iov_iter ... Browse Code »
13

New variant of iov_iter - ITER_BVEC in iter->type, backed with
bio_vec array instead of iovec one. Primitives taught to deal
with such beasts, __swap_write() switched to using that kind
of iov_iter.

Note that bio_vec is just a triple - there's
nothing block-specific about it. I've left the definition where it
was, but took it from under ifdef CONFIG_BLOCK.

Next target: ->splice_write()...

Signed-off-by: Al Viro

Al Viro
2014-05-07 05:39:45 +0800
84c3d55cc fuse: switch to ->write_iter() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2014-05-07 05:39:41 +0800
37c20f16e fuse_file_aio_read(): convert to ->read_iter() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2014-05-07 05:37:57 +0800
0c949334a iov_iter_truncate() ... Browse Code »

Now It Can Be Done(tm) - we don't need to do iov_shorten() in
generic_file_direct_write() anymore, now that all ->direct_IO()
instances are converted to proper iov_iter methods and honour
iter->count and iter->iov_offset properly.

Get rid of count/ocount arguments of generic_file_direct_write(),
while we are at it.

Signed-off-by: Al Viro

Al Viro
2014-05-07 05:32:54 +0800
f67da30c1 new helper: iov_iter_npages() ... Browse Code »

counts the pages covered by iov_iter, up to given limit.
do_block_direct_io() and fuse_iter_npages() switched to
it.

Signed-off-by: Al Viro

Al Viro
2014-05-07 05:32:52 +0800
c9c37e2e6 fuse: switch to iov_iter_get_pages() ... Browse Code »
13

Signed-off-by: Al Viro

Al Viro
2014-05-07 05:32:51 +0800
d22a943f4 fuse: pull iov_iter initializations up ... Browse Code »

... to fuse_direct_{read,write}(). ->direct_IO() path uses the
iov_iter passed by the caller instead.

Signed-off-by: Al Viro

Al Viro
2014-05-07 05:32:51 +0800
71d8e532b start adding the tag to iov_iter ... Browse Code »

For now, just use the same thing we pass to ->direct_IO() - it's all
iovec-based at the moment. Pass it explicitly to iov_iter_init() and
account for kvec vs. iovec in there, by the same kludge NFS ->direct_IO()
uses.

Signed-off-by: Al Viro

Al Viro
2014-05-07 05:32:49 +0800
23faa7b8d fuse_file_aio_write(): merge initializations of iov_iter ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2014-05-07 05:32:48 +0800
a6cbcd4a4 get rid of pointless iov_length() in ->direct_IO() ... Browse Code »

all callers have iov_length(iter->iov, iter->nr_segs) == iov_iter_count(iter)

Signed-off-by: Al Viro

Al Viro
2014-05-07 05:32:45 +0800
d8d3d94b8 pass iov_iter to ->direct_IO() ... Browse Code »
13

unmodified, for now

Signed-off-by: Al Viro

Al Viro
2014-05-07 05:32:44 +0800