25 Aug, 2016
1 commit
-
When reading from a loop device backed by a fuse file it deadlocks on
lock_page().This is because the page is already locked by the read() operation done on
the loop device. In this case we don't want to either lock the page or
dirty it.So do what fs/direct-io.c does: only dirty the page for ITER_IOVEC vectors.
Reported-by: Sheng Yang
Fixes: aa4d86163e4e ("block: loop: switch to VFS ITER_BVEC")
Signed-off-by: Miklos Szeredi
Cc: # v4.1+
Reviewed-by: Sheng Yang
Reviewed-by: Ashish Samant
Tested-by: Sheng Yang
Tested-by: Ashish Samant
06 Aug, 2016
1 commit
-
Pull qstr constification updates from Al Viro:
"Fairly self-contained bunch - surprising lot of places passes struct
qstr * as an argument when const struct qstr * would suffice; it
complicates analysis for no good reason.I'd prefer to feed that separately from the assorted fixes (those are
in #for-linus and with somewhat trickier topology)"* 'work.const-qstr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
qstr: constify instances in adfs
qstr: constify instances in lustre
qstr: constify instances in f2fs
qstr: constify instances in ext2
qstr: constify instances in vfat
qstr: constify instances in procfs
qstr: constify instances in fuse
qstr constify instances in fs/dcache.c
qstr: constify instances in nfs
qstr: constify instances in ocfs2
qstr: constify instances in autofs4
qstr: constify instances in hfs
qstr: constify instances in hfsplus
qstr: constify instances in logfs
qstr: constify dentry_init_security
31 Jul, 2016
1 commit
-
Signed-off-by: Al Viro
30 Jul, 2016
1 commit
-
Pull fuse updates from Miklos Szeredi:
"This fixes error propagation from writeback to fsync/close for
writeback cache mode as well as adding a missing capability flag to
the INIT message. The rest are cleanups.(The commits are recent but all the code actually sat in -next for a
while now. The recommits are due to conflict avoidance and the
addition of Cc: stable@...)"* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: use filemap_check_errors()
mm: export filemap_check_errors() to modules
fuse: fix wrong assignment of ->flags in fuse_send_init()
fuse: fuse_flush must check mapping->flags for errors
fuse: fsync() did not return IO errors
fuse: don't mess with blocking signals
new helper: wait_event_killable_exclusive()
fuse: improve aio directIO write performance for size extending writes
29 Jul, 2016
7 commits
-
Signed-off-by: Miklos Szeredi
-
FUSE_HAS_IOCTL_DIR should be assigned to ->flags, it may be a typo.
Signed-off-by: Wei Fang
Signed-off-by: Miklos Szeredi
Fixes: 69fe05c90ed5 ("fuse: add missing INIT flags")
Cc: -
fuse_flush() calls write_inode_now() that triggers writeback, but actual
writeback will happen later, on fuse_sync_writes(). If an error happens,
fuse_writepage_end() will set error bit in mapping->flags. So, we have to
check mapping->flags after fuse_sync_writes().Signed-off-by: Maxim Patlasov
Signed-off-by: Miklos Szeredi
Fixes: 4d99ff8f12eb ("fuse: Turn writeback cache on")
Cc: # v3.15+ -
Due to implementation of fuse writeback filemap_write_and_wait_range() does
not catch errors. We have to do this directly after fuse_sync_writes()Signed-off-by: Alexey Kuznetsov
Signed-off-by: Maxim Patlasov
Signed-off-by: Miklos Szeredi
Fixes: 4d99ff8f12eb ("fuse: Turn writeback cache on")
Cc: # v3.15+ -
Merge more updates from Andrew Morton:
"The rest of MM"* emailed patches from Andrew Morton : (101 commits)
mm, compaction: simplify contended compaction handling
mm, compaction: introduce direct compaction priority
mm, thp: remove __GFP_NORETRY from khugepaged and madvised allocations
mm, page_alloc: make THP-specific decisions more generic
mm, page_alloc: restructure direct compaction handling in slowpath
mm, page_alloc: don't retry initial attempt in slowpath
mm, page_alloc: set alloc_flags only once in slowpath
lib/stackdepot.c: use __GFP_NOWARN for stack allocations
mm, kasan: switch SLUB to stackdepot, enable memory quarantine for SLUB
mm, kasan: account for object redzone in SLUB's nearest_obj()
mm: fix use-after-free if memory allocation failed in vma_adjust()
zsmalloc: Delete an unnecessary check before the function call "iput"
mm/memblock.c: fix index adjustment error in __next_mem_range_rev()
mem-hotplug: alloc new page from a nearest neighbor node when mem-offline
mm: optimize copy_page_to/from_iter_iovec
mm: add cond_resched() to generic_swapfile_activate()
Revert "mm, mempool: only set __GFP_NOMEMALLOC if there are free elements"
mm, compaction: don't isolate PageWriteback pages in MIGRATE_SYNC_LIGHT mode
mm: hwpoison: remove incorrect comments
make __section_nr() more efficient
... -
There are now a number of accounting oddities such as mapped file pages
being accounted for on the node while the total number of file pages are
accounted on the zone. This can be coped with to some extent but it's
confusing so this patch moves the relevant file-based accounted. Due to
throttling logic in the page allocator for reliable OOM detection, it is
still necessary to track dirty and writeback pages on a per-zone basis.[mgorman@techsingularity.net: fix NR_ZONE_WRITE_PENDING accounting]
Link: http://lkml.kernel.org/r/1468404004-5085-5-git-send-email-mgorman@techsingularity.net
Link: http://lkml.kernel.org/r/1467970510-21195-20-git-send-email-mgorman@techsingularity.net
Signed-off-by: Mel Gorman
Acked-by: Vlastimil Babka
Acked-by: Michal Hocko
Cc: Hillf Danton
Acked-by: Johannes Weiner
Cc: Joonsoo Kim
Cc: Minchan Kim
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This changes the vfs dentry hashing to mix in the parent pointer at the
_beginning_ of the hash, rather than at the end.That actually improves both the hash and the code generation, because we
can move more of the computation to the "static" part of the dcache
setup, and do less at lookup runtime.It turns out that a lot of other hash users also really wanted to mix in
a base pointer as a 'salt' for the hash, and so the slightly extended
interface ends up working well for other cases too.Users that want a string hash that is purely about the string pass in a
'salt' pointer of NULL.* merge branch 'salted-string-hash':
fs/dcache.c: Save one 32-bit multiply in dcache lookup
vfs: make the string hashes salt the hash
21 Jul, 2016
1 commit
19 Jul, 2016
1 commit
-
just use wait_event_killable{,_exclusive}().
Signed-off-by: Al Viro
06 Jul, 2016
1 commit
-
->atomic_open() can be given an in-lookup dentry *or* a negative one
found in dcache. Use d_in_lookup() to tell one from another, rather
than d_unhashed().Signed-off-by: Al Viro
30 Jun, 2016
2 commits
-
While sending the blocking directIO in fuse, the write request is broken
into sub-requests, each of default size 128k and all the requests are sent
in non-blocking background mode if async_dio mode is supported by libfuse.
The process which issue the write wait for the completion of all the
sub-requests. Sending multiple requests parallely gives a chance to perform
parallel writes in the user space fuse implementation if it is
multi-threaded and hence improves the performance.When there is a size extending aio dio write, we switch to blocking mode so
that we can properly update the size of the file after completion of the
writes. However, in this situation all the sub-requests are sent in
serialized manner where the next request is sent only after receiving the
reply of the current request. Hence the multi-threaded user space
implementation is not utilized properly.This patch changes the size extending aio dio behavior to exactly follow
blocking dio. For multi threaded fuse implementation having 10 threads and
using buffer size of 64MB to perform async directIO, we are getting double
the speed.Signed-off-by: Ashish Sangwan
Signed-off-by: Miklos Szeredi -
Negotiate with userspace filesystems whether they support parallel readdir
and lookup. Disable parallelism by default for fear of breaking fuse
filesystems.Signed-off-by: Miklos Szeredi
Fixes: 9902af79c01a ("parallel lookups: actual switch to rwsem")
Fixes: d9b3dbdcfd62 ("fuse: switch to ->iterate_shared()")
11 Jun, 2016
1 commit
-
We always mixed in the parent pointer into the dentry name hash, but we
did it late at lookup time. It turns out that we can simplify that
lookup-time action by salting the hash with the parent pointer early
instead of late.A few other users of our string hashes also wanted to mix in their own
pointers into the hash, and those are updated to use the same mechanism.Hash users that don't have any particular initial salt can just use the
NULL pointer as a no-salt.Cc: Vegard Nossum
Cc: George Spelvin
Cc: Al Viro
Signed-off-by: Linus Torvalds
28 May, 2016
1 commit
-
smack ->d_instantiate() uses ->setxattr(), so to be able to call it before
we'd hashed the new dentry and attached it to inode, we need ->setxattr()
instances getting the inode as an explicit argument rather than obtaining
it from dentry.Similar change for ->getxattr() had been done in commit ce23e64. Unlike
->getxattr() (which is used by both selinux and smack instances of
->d_instantiate()) ->setxattr() is used only by smack one and unfortunately
it got missed back then.Reported-by: Seung-Woo Kim
Tested-by: Casey Schaufler
Signed-off-by: Al Viro
18 May, 2016
1 commit
-
Pull vfs cleanups from Al Viro:
"More cleanups from Christoph"* 'work.preadv2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
nfsd: use RWF_SYNC
fs: add RWF_DSYNC aand RWF_SYNC
ceph: use generic_write_sync
fs: simplify the generic_write_sync prototype
fs: add IOCB_SYNC and IOCB_DSYNC
direct-io: remove the offset argument to dio_complete
direct-io: eliminate the offset argument to ->direct_IO
xfs: eliminate the pos variable in xfs_file_dio_aio_write
filemap: remove the pos argument to generic_file_direct_write
filemap: remove pos variables in generic_file_read_iter
17 May, 2016
1 commit
-
Backmerge to resolve a conflict in ovl_lookup_real();
"ovl_lookup_real(): use lookup_one_len_unlocked()" instead,
but it was too late in the cycle to rebase.
03 May, 2016
2 commits
-
Switch dcache pre-seeding on readdir to d_alloc_parallel();
nothing else is needed.Signed-off-by: Al Viro
-
The rest of work.xattr stuff isn't needed for this branch
02 May, 2016
2 commits
-
Including blkdev_direct_IO and dax_do_io. It has to be ki_pos to actually
work, so eliminate the superflous argument.Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro -
Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro
25 Apr, 2016
1 commit
-
fuse_get_user_pages() should return error or 0. Otherwise fuse_direct_io
read will not return 0 to indicate that read has completed.Fixes: 742f992708df ("fuse: return patrial success from fuse_direct_io()")
Signed-off-by: Ashish Samant
Signed-off-by: Seth Forshee
Signed-off-by: Miklos Szeredi
11 Apr, 2016
1 commit
-
Signed-off-by: Al Viro
05 Apr, 2016
1 commit
-
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.This promise never materialized. And unlikely will.
We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.Let's stop pretending that pages in page cache are special. They are
not.The changes are pretty straight-forward:
- << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;
- >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;
- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
- page_cache_get() -> get_page();
- page_cache_release() -> put_page();
This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.virtual patch
@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)Signed-off-by: Kirill A. Shutemov
Acked-by: Michal Hocko
Signed-off-by: Linus Torvalds
16 Mar, 2016
1 commit
-
If a user calls writev/readv in direct io mode with partially valid data
in the iovec array such that any vector other than the first one in the
array contains invalid data, we currently return the error for the invalid
iovec.Instead, we should return the number of bytes already written/read and not
the error as we do in the non direct_io case.Reported-by: Alexey Kodanev
Signed-off-by: Ashish Samant
Signed-off-by: Miklos Szeredi
14 Mar, 2016
2 commits
-
The 'reqs' member of fuse_io_priv serves two purposes. First is to track
the number of oustanding async requests to the server and to signal that
the io request is completed. The second is to be a reference count on the
structure to know when it can be freed.For sync io requests these purposes can be at odds. fuse_direct_IO() wants
to block until the request is done, and since the signal is sent when
'reqs' reaches 0 it cannot keep a reference to the object. Yet it needs to
use the object after the userspace server has completed processing
requests. This leads to some handshaking and special casing that it
needlessly complicated and responsible for at least one race condition.It's much cleaner and safer to maintain a separate reference count for the
object lifecycle and to let 'reqs' just be a count of outstanding requests
to the userspace server. Then we can know for sure when it is safe to free
the object without any handshaking or special cases.The catch here is that most of the time these objects are stack allocated
and should not be freed. Initializing these objects with a single reference
that is never released prevents accidental attempts to free the objects.Fixes: 9d5722b7777e ("fuse: handle synchronous iocbs internally")
Cc: stable@vger.kernel.org # v4.1+
Signed-off-by: Seth Forshee
Signed-off-by: Miklos Szeredi -
There's a race in fuse_direct_IO(), whereby is_sync_kiocb() is called on an
iocb that could have been freed if async io has already completed. The fix
in this case is simple and obvious: cache the result before starting io.It was discovered by KASan:
kernel: ==================================================================
kernel: BUG: KASan: use after free in fuse_direct_IO+0xb1a/0xcc0 at addr ffff88036c414390Signed-off-by: Robert Doebbelin
Signed-off-by: Miklos Szeredi
Fixes: bcba24ccdc82 ("fuse: enable asynchronous processing direct IO")
Cc: # 3.10+
23 Jan, 2016
1 commit
-
parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
inode_foo(inode) being mutex_foo(&inode->i_mutex).Please, use those for access to ->i_mutex; over the coming cycle
->i_mutex will become rwsem, with ->lookup() done with it held
only shared.Signed-off-by: Al Viro
22 Jan, 2016
1 commit
-
Pull fuse updates from Miklos Szeredi:
"This adds SEEK_HOLE and SEEK_DATA support in lseek"* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: add support for SEEK_HOLE and SEEK_DATA in lseek
15 Jan, 2016
1 commit
-
Mark those kmem allocations that are known to be easily triggered from
userspace as __GFP_ACCOUNT/SLAB_ACCOUNT, which makes them accounted to
memcg. For the list, see below:- threadinfo
- task_struct
- task_delay_info
- pid
- cred
- mm_struct
- vm_area_struct and vm_region (nommu)
- anon_vma and anon_vma_chain
- signal_struct
- sighand_struct
- fs_struct
- files_struct
- fdtable and fdtable->full_fds_bits
- dentry and external_name
- inode for all filesystems. This is the most tedious part, because
most filesystems overwrite the alloc_inode method.The list is far from complete, so feel free to add more objects.
Nevertheless, it should be close to "account everything" approach and
keep most workloads within bounds. Malevolent users will be able to
breach the limit, but this was possible even with the former "account
everything" approach (simply because it did not account everything in
fact).[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Vladimir Davydov
Acked-by: Johannes Weiner
Acked-by: Michal Hocko
Cc: Tejun Heo
Cc: Greg Thelen
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: David Rientjes
Cc: Joonsoo Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
12 Jan, 2016
1 commit
-
Pull vfs RCU symlink updates from Al Viro:
"Replacement of ->follow_link/->put_link, allowing to stay in RCU mode
even if the symlink is not an embedded one.No changes since the mailbomb on Jan 1"
* 'work.symlinks' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
switch ->get_link() to delayed_call, kill ->put_link()
kill free_page_put_link()
teach nfs_get_link() to work in RCU mode
teach proc_self_get_link()/proc_thread_self_get_link() to work in RCU mode
teach shmem_get_link() to work in RCU mode
teach page_get_link() to work in RCU mode
replace ->follow_link() with new method that could stay in RCU mode
don't put symlink bodies in pagecache into highmem
namei: page_getlink() and page_follow_link_light() are the same thing
ufs: get rid of ->setattr() for symlinks
udf: don't duplicate page_symlink_inode_operations
logfs: don't duplicate page_symlink_inode_operations
switch befs long symlinks to page_symlink_operations
31 Dec, 2015
1 commit
-
Signed-off-by: Al Viro
30 Dec, 2015
1 commit
-
all callers are better off with kfree_put_link()
Signed-off-by: Al Viro
12 Dec, 2015
1 commit
-
Pull fuse fixes from Miklos Szeredi:
"Two bugfixes, both bound for -stable"* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: break infinite loop in fuse_fill_write_pages()
cuse: fix memory leak
09 Dec, 2015
1 commit
-
new method: ->get_link(); replacement of ->follow_link(). The differences
are:
* inode and dentry are passed separately
* might be called both in RCU and non-RCU mode;
the former is indicated by passing it a NULL dentry.
* when called that way it isn't allowed to block
and should return ERR_PTR(-ECHILD) if it needs to be called
in non-RCU mode.It's a flagday change - the old method is gone, all in-tree instances
converted. Conversion isn't hard; said that, so far very few instances
do not immediately bail out when called in RCU mode. That'll change
in the next commits.Signed-off-by: Al Viro
10 Nov, 2015
2 commits
-
A useful performance improvement for accessing virtual machine images
via FUSE mount.See https://bugzilla.redhat.com/show_bug.cgi?id=1220173 for a use-case
for glusterFS.Signed-off-by: Ravishankar N
Signed-off-by: Miklos Szeredi -
I got a report about unkillable task eating CPU. Further
investigation shows, that the problem is in the fuse_fill_write_pages()
function. If iov's first segment has zero length, we get an infinite
loop, because we never reach iov_iter_advance() call.Fix this by calling iov_iter_advance() before repeating an attempt to
copy data from userspace.A similar problem is described in 124d3b7041f ("fix writev regression:
pan hanging unkillable and un-straceable"). If zero-length segmend
is followed by segment with invalid address,
iov_iter_fault_in_readable() checks only first segment (zero-length),
iov_iter_copy_from_user_atomic() skips it, fails at second and
returns zero -> goto again without skipping zero-length segment.Patch calls iov_iter_advance() before goto again: we'll skip zero-length
segment at second iteraction and iov_iter_fault_in_readable() will detect
invalid address.Special thanks to Konstantin Khlebnikov, who helped a lot with the commit
description.Cc: Andrew Morton
Cc: Maxim Patlasov
Cc: Konstantin Khlebnikov
Signed-off-by: Roman Gushchin
Signed-off-by: Miklos Szeredi
Fixes: ea9b9907b82a ("fuse: implement perform_write")
Cc: