Eric Lee / smarc-fsl-linux-kernel

14 Jul, 2020

3 commits

b330966f7 fuse: reject options on reconfigure via fsconfig(2) ... Browse Code »

Previous patch changed handling of remount/reconfigure to ignore all
options, including those that are unknown to the fuse kernel fs. This was
done for backward compatibility, but this likely only affects the old
mount(2) API.

The new fsconfig(2) based reconfiguration could possibly be improved. This
would make the new API less of a drop in replacement for the old, OTOH this
is a good chance to get rid of some weirdnesses in the old API.

Several other behaviors might make sense:

1) unknown options are rejected, known options are ignored

2) unknown options are rejected, known options are rejected if the value
is changed, allowed otherwise

3) all options are rejected

Prior to the backward compatibility fix to ignore all options all known
options were accepted (1), even if they change the value of a mount
parameter; fuse_reconfigure() does not look at the config values set by
fuse_parse_param().

To fix that we'd need to verify that the value provided is the same as set
in the initial configuration (2). The major drawback is that this is much
more complex than just rejecting all attempts at changing options (3);
i.e. all options signify initial configuration values and don't make sense
on reconfigure.

This patch opts for (3) with the rationale that no mount options are
reconfigurable in fuse.

Signed-off-by: Miklos Szeredi

Miklos Szeredi
2020-07-14 20:45:41 +0800
e8b20a474 fuse: ignore 'data' argument of mount(..., MS_REMOUNT) ... Browse Code »

The command

mount -o remount -o unknownoption /mnt/fuse

succeeds on kernel versions prior to v5.4 and fails on kernel version at or
after. This is because fuse_parse_param() rejects any unrecognised options
in case of FS_CONTEXT_FOR_RECONFIGURE, just as for FS_CONTEXT_FOR_MOUNT.

This causes a regression in case the fuse filesystem is in fstab, since
remount sends all options found there to the kernel; even ones that are
meant for the initial mount and are consumed by the userspace fuse server.

Fix this by ignoring mount options, just as fuse_remount_fs() did prior to
the conversion to the new API.

Reported-by: Stefan Priebe
Fixes: c30da2e981a7 ("fuse: convert to use the new mount API")
Cc: # v5.4
Signed-off-by: Miklos Szeredi

Miklos Szeredi
2020-07-14 20:45:41 +0800
0189a2d36 fuse: use ->reconfigure() instead of ->remount_fs() ... Browse Code »

s_op->remount_fs() is only called from legacy_reconfigure(), which is not
used after being converted to the new API.

Convert to using ->reconfigure(). This restores the previous behavior of
syncing the filesystem and rejecting MS_MANDLOCK on remount.

Fixes: c30da2e981a7 ("fuse: convert to use the new mount API")
Cc: # v5.4
Signed-off-by: Miklos Szeredi

Miklos Szeredi
2020-07-14 20:45:41 +0800

19 May, 2020

2 commits

5ddd9ced9 fuse: update attr_version counter on fuse_notify_inval_inode() ... Browse Code »

A GETATTR request can race with FUSE_NOTIFY_INVAL_INODE, resulting in the
attribute cache being updated with stale information after the
invalidation.

Fix this by bumping the attribute version in fuse_reverse_inval_inode().

Reported-by: Krzysztof Rusek
Signed-off-by: Miklos Szeredi

Miklos Szeredi
2020-05-19 20:50:38 +0800
7fd3abfa8 virtiofs: do not use fuse_fill_super_common() for device installation ... Browse Code »

fuse_fill_super_common() allocates and installs one fuse_device. Hence
virtiofs allocates and install all fuse devices by itself except one.

This makes logic little twisted. There does not seem to be any real need
that why virtiofs can't allocate and install all fuse devices itself.

So opt out of fuse device allocation and installation while calling
fuse_fill_super_common().

Regular fuse still wants fuse_fill_super_common() to install fuse_device.
It needs to prevent against races where two mounters are trying to mount
fuse using same fd. In that case one will succeed while other will get
-EINVAL.

virtiofs does not have this issue because sget_fc() resolves the race
w.r.t multiple mounters and only one instance of virtio_fs_fill_super()
should be in progress for same filesystem.

Signed-off-by: Vivek Goyal
Signed-off-by: Miklos Szeredi

Vivek Goyal
2020-05-19 20:50:37 +0800

09 Feb, 2020

1 commit

c9d35ee04 Merge branch 'merge.nfs-fs_parse.1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs file system parameter updates from Al Viro:
"Saner fs_parser.c guts and data structures. The system-wide registry
of syntax types (string/enum/int32/oct32/.../etc.) is gone and so is
the horror switch() in fs_parse() that would have to grow another case
every time something got added to that system-wide registry.

New syntax types can be added by filesystems easily now, and their
namespace is that of functions - not of system-wide enum members. IOW,
they can be shared or kept private and if some turn out to be widely
useful, we can make them common library helpers, etc., without having
to do anything whatsoever to fs_parse() itself.

And we already get that kind of requests - the thing that finally
pushed me into doing that was "oh, and let's add one for timeouts -
things like 15s or 2h". If some filesystem really wants that, let them
do it. Without somebody having to play gatekeeper for the variants
blessed by direct support in fs_parse(), TYVM.

Quite a bit of boilerplate is gone. And IMO the data structures make a
lot more sense now. -200LoC, while we are at it"

* 'merge.nfs-fs_parse.1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (25 commits)
tmpfs: switch to use of invalfc()
cgroup1: switch to use of errorfc() et.al.
procfs: switch to use of invalfc()
hugetlbfs: switch to use of invalfc()
cramfs: switch to use of errofc() et.al.
gfs2: switch to use of errorfc() et.al.
fuse: switch to use errorfc() et.al.
ceph: use errorfc() and friends instead of spelling the prefix out
prefix-handling analogues of errorf() and friends
turn fs_param_is_... into functions
fs_parse: handle optional arguments sanely
fs_parse: fold fs_parameter_desc/fs_parameter_spec
fs_parser: remove fs_parameter_description name field
add prefix to fs_context->log
ceph_parse_param(), ceph_parse_mon_ips(): switch to passing fc_log
new primitive: __fs_parse()
switch rbd and libceph to p_log-based primitives
struct p_log, variants of warnf() et.al. taking that one instead
teach logfc() to handle prefices, give it saner calling conventions
get rid of cg_invalf()
...

Linus Torvalds
2020-02-09 05:26:41 +0800

08 Feb, 2020

3 commits

2e28c49ea fuse: switch to use errorfc() et.al. ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2020-02-08 03:48:40 +0800
d7167b149 fs_parse: fold fs_parameter_desc/fs_parameter_spec ... Browse Code »

The former contains nothing but a pointer to an array of the latter...

Signed-off-by: Al Viro

Al Viro
2020-02-08 03:48:37 +0800
96cafb9cc fs_parser: remove fs_parameter_description name field ... Browse Code »

Unused now.

Signed-off-by: Eric Sandeen
Acked-by: David Howells
Signed-off-by: Al Viro

Eric Sandeen
2020-02-08 03:48:36 +0800

06 Feb, 2020

1 commit

cabdb4fa2 fuse: use true,false for bool variable ... Browse Code »

Fixes coccicheck warning:

fs/fuse/readdir.c:335:1-19: WARNING: Assignment of 0/1 to bool variable
fs/fuse/file.c:1398:2-19: WARNING: Assignment of 0/1 to bool variable
fs/fuse/file.c:1400:2-20: WARNING: Assignment of 0/1 to bool variable
fs/fuse/cuse.c:454:1-20: WARNING: Assignment of 0/1 to bool variable
fs/fuse/cuse.c:455:1-19: WARNING: Assignment of 0/1 to bool variable
fs/fuse/inode.c:497:2-17: WARNING: Assignment of 0/1 to bool variable
fs/fuse/inode.c:504:2-23: WARNING: Assignment of 0/1 to bool variable
fs/fuse/inode.c:511:2-22: WARNING: Assignment of 0/1 to bool variable
fs/fuse/inode.c:518:2-23: WARNING: Assignment of 0/1 to bool variable
fs/fuse/inode.c:522:2-26: WARNING: Assignment of 0/1 to bool variable
fs/fuse/inode.c:526:2-18: WARNING: Assignment of 0/1 to bool variable
fs/fuse/inode.c:1000:1-20: WARNING: Assignment of 0/1 to bool variable

Reported-by: Hulk Robot
Signed-off-by: zhengbin
Signed-off-by: Miklos Szeredi

zhengbin
2020-02-06 23:39:28 +0800

15 Oct, 2019

1 commit

3f22c7467 virtio-fs: don't show mount options ... Browse Code »

Virtio-fs does not accept any mount options, so it's confusing and wrong to
show any in /proc/mounts.

Reported-by: Stefan Hajnoczi
Signed-off-by: Miklos Szeredi

Miklos Szeredi
2019-10-15 22:11:41 +0800

28 Sep, 2019

1 commit

8f744bdee Merge tag 'virtio-fs-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse ... Browse Code »

Pull fuse virtio-fs support from Miklos Szeredi:
"Virtio-fs allows exporting directory trees on the host and mounting
them in guest(s).

This isn't actually a new filesystem, but a glue layer between the
fuse filesystem and a virtio based back-end.

It's similar in functionality to the existing virtio-9p solution, but
significantly faster in benchmarks and has better POSIX compliance.
Further permformance improvements can be achieved by sharing the page
cache between host and guest, allowing for faster I/O and reduced
memory use.

Kata Containers have been including the out-of-tree virtio-fs (with
the shared page cache patches as well) since version 1.7 as an
experimental feature. They have been active in development and plan to
switch from virtio-9p to virtio-fs as their default solution. There
has been interest from other sources as well.

The userspace infrastructure is slated to be merged into qemu once the
kernel part hits mainline.

This was developed by Vivek Goyal, Dave Gilbert and Stefan Hajnoczi"

* tag 'virtio-fs-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
virtio-fs: add virtiofs filesystem
virtio-fs: add Documentation/filesystems/virtiofs.rst
fuse: reserve values for mapping protocol

Linus Torvalds
2019-09-28 06:54:24 +0800

24 Sep, 2019

1 commit

dc69e98c2 fuse: kmemcg account fs data ... Browse Code »

account per-file, dentry, and inode data

blockdev/superblock and temporary per-request data was left alone, as
this usually isn't accounted

Reviewed-by: Shakeel Butt
Signed-off-by: Khazhismel Kumykov
Signed-off-by: Miklos Szeredi

Khazhismel Kumykov
2019-09-24 21:28:01 +0800

19 Sep, 2019

1 commit

a62a8ef9d virtio-fs: add virtiofs filesystem ... Browse Code »

Add a basic file system module for virtio-fs. This does not yet contain
shared data support between host and guest or metadata coherency speedups.
However it is already significantly faster than virtio-9p.

Design Overview
===============

With the goal of designing something with better performance and local file
system semantics, a bunch of ideas were proposed.

- Use fuse protocol (instead of 9p) for communication between guest and
host. Guest kernel will be fuse client and a fuse server will run on
host to serve the requests.

- For data access inside guest, mmap portion of file in QEMU address space
and guest accesses this memory using dax. That way guest page cache is
bypassed and there is only one copy of data (on host). This will also
enable mmap(MAP_SHARED) between guests.

- For metadata coherency, there is a shared memory region which contains
version number associated with metadata and any guest changing metadata
updates version number and other guests refresh metadata on next access.
This is yet to be implemented.

How virtio-fs differs from existing approaches
==============================================

The unique idea behind virtio-fs is to take advantage of the co-location of
the virtual machine and hypervisor to avoid communication (vmexits).

DAX allows file contents to be accessed without communication with the
hypervisor. The shared memory region for metadata avoids communication in
the common case where metadata is unchanged.

By replacing expensive communication with cheaper shared memory accesses,
we expect to achieve better performance than approaches based on network
file system protocols. In addition, this also makes it easier to achieve
local file system semantics (coherency).

These techniques are not applicable to network file system protocols since
the communications channel is bypassed by taking advantage of shared memory
on a local machine. This is why we decided to build virtio-fs rather than
focus on 9P or NFS.

Caching Modes
=============

Like virtio-9p, different caching modes are supported which determine the
coherency level as well. The “cache=FOO” and “writeback” options control
the level of coherence between the guest and host filesystems.

- cache=none
metadata, data and pathname lookup are not cached in guest. They are
always fetched from host and any changes are immediately pushed to host.

- cache=always
metadata, data and pathname lookup are cached in guest and never expire.

- cache=auto
metadata and pathname lookup cache expires after a configured amount of
time (default is 1 second). Data is cached while the file is open
(close to open consistency).

- writeback/no_writeback
These options control the writeback strategy. If writeback is disabled,
then normal writes will immediately be synchronized with the host fs.
If writeback is enabled, then writes may be cached in the guest until
the file is closed or an fsync(2) performed. This option has no effect
on mmap-ed writes or writes going through the DAX mechanism.

Signed-off-by: Stefan Hajnoczi
Signed-off-by: Vivek Goyal
Acked-by: Michael S. Tsirkin
Signed-off-by: Miklos Szeredi

Stefan Hajnoczi
2019-09-19 02:17:50 +0800

12 Sep, 2019

7 commits

15c8e72e8 fuse: allow skipping control interface and forced unmount ... Browse Code »

virtio-fs does not support aborting requests which are being
processed. That is requests which have been sent to fuse daemon on host.

Signed-off-by: Vivek Goyal
Signed-off-by: Miklos Szeredi

Vivek Goyal
2019-09-12 20:59:41 +0800
783863d64 fuse: dissociate DESTROY from fuseblk ... Browse Code »

Allow virtio-fs to also send DESTROY request.

Signed-off-by: Miklos Szeredi

Miklos Szeredi
2019-09-12 20:59:41 +0800
0cd1eb9a4 fuse: separate fuse device allocation and installation in fuse_conn ... Browse Code »

As of now fuse_dev_alloc() both allocates a fuse device and installs it in
fuse_conn list. fuse_dev_alloc() can fail if fuse_device allocation fails.

virtio-fs needs to initialize multiple fuse devices (one per virtio queue).
It initializes one fuse device as part of call to fuse_fill_super_common()
and rest of the devices are allocated and installed after that.

But, we can't afford to fail after calling fuse_fill_super_common() as we
don't have a way to undo all the actions done by fuse_fill_super_common().
So to avoid failures after the call to fuse_fill_super_common(),
pre-allocate all fuse devices early and install them into fuse connection
later.

This patch provides two separate helpers for fuse device allocation and
fuse device installation in fuse_conn.

Signed-off-by: Vivek Goyal
Signed-off-by: Miklos Szeredi

Vivek Goyal
2019-09-12 20:59:41 +0800
ae3aad77f fuse: add fuse_iqueue_ops callbacks ... Browse Code »

The /dev/fuse device uses fiq->waitq and fasync to signal that requests are
available. These mechanisms do not apply to virtio-fs. This patch
introduces callbacks so alternative behavior can be used.

Note that queue_interrupt() changes along these lines:

spin_lock(&fiq->waitq.lock);
wake_up_locked(&fiq->waitq);
+ kill_fasync(&fiq->fasync, SIGIO, POLL_IN);
spin_unlock(&fiq->waitq.lock);
- kill_fasync(&fiq->fasync, SIGIO, POLL_IN);

Since queue_request() and queue_forget() also call kill_fasync() inside
the spinlock this should be safe.

Signed-off-by: Stefan Hajnoczi
Signed-off-by: Miklos Szeredi

Stefan Hajnoczi
2019-09-12 20:59:41 +0800
0cc2656cd fuse: extract fuse_fill_super_common() ... Browse Code »

fuse_fill_super() includes code to process the fd= option and link the
struct fuse_dev to the fd's struct file. In virtio-fs there is no file
descriptor because /dev/fuse is not used.

This patch extracts fuse_fill_super_common() so that both classic fuse and
virtio-fs can share the code to initialize a mount.

Signed-off-by: Stefan Hajnoczi
Signed-off-by: Miklos Szeredi

Stefan Hajnoczi
2019-09-12 20:59:40 +0800
95a84cdb1 fuse: export fuse_send_init_request() ... Browse Code »

This will be used by virtio-fs to send init request to fuse server after
initialization of virt queues.

Signed-off-by: Vivek Goyal
Signed-off-by: Miklos Szeredi

Vivek Goyal
2019-09-12 20:59:40 +0800
f22f812d5 fuse: fix request limit ... Browse Code »

The size of struct fuse_req was reduced from 392B to 144B on a non-debug
config, thus the sanitize_global_limit() helper was setting a larger
default limit. This doesn't really reflect reduction in the memory used by
requests, since the fields removed from fuse_req were added to fuse_args
derived structs; e.g. sizeof(struct fuse_writepages_args) is 248B, thus
resulting in slightly more memory being used for writepage requests
overalll (due to using 256B slabs).

Make the calculatation ignore the size of fuse_req and use the old 392B
value.

Signed-off-by: Miklos Szeredi

Miklos Szeredi
2019-09-12 20:59:40 +0800

10 Sep, 2019

5 commits

615047eff fuse: convert init to simple api ... Browse Code »

Bypass the fc->initialized check by setting the force flag.

Signed-off-by: Miklos Szeredi

Miklos Szeredi
2019-09-10 22:29:49 +0800
1ccd1ea24 fuse: convert destroy to simple api ... Browse Code »

We can use the "force" flag to make sure the DESTROY request is always sent
to userspace. So no need to keep it allocated during the lifetime of the
filesystem.

Signed-off-by: Miklos Szeredi

Miklos Szeredi
2019-09-10 22:29:49 +0800
40ac7ab2d fuse: simplify 'nofail' request ... Browse Code »

Instead of complex games with a reserved request, just use __GFP_NOFAIL.

Both calers (flush, readdir) guarantee that connection was already
initialized, so no need to wait for fc->initialized.

Also remove unneeded clearing of FR_BACKGROUND flag.

Signed-off-by: Miklos Szeredi

Miklos Szeredi
2019-09-10 22:29:48 +0800
d5b485435 fuse: flatten 'struct fuse_args' ... Browse Code »

...to make future expansion simpler. The hiearachical structure is a
historical thing that does not serve any practical purpose.

The generated code is excatly the same before and after the patch.

Signed-off-by: Miklos Szeredi

Miklos Szeredi
2019-09-10 22:29:48 +0800
76e43c8cc fuse: fix deadlock with aio poll and fuse_iqueue::waitq.lock ... Browse Code »

When IOCB_CMD_POLL is used on the FUSE device, aio_poll() disables IRQs
and takes kioctx::ctx_lock, then fuse_iqueue::waitq.lock.

This may have to wait for fuse_iqueue::waitq.lock to be released by one
of many places that take it with IRQs enabled. Since the IRQ handler
may take kioctx::ctx_lock, lockdep reports that a deadlock is possible.

Fix it by protecting the state of struct fuse_iqueue with a separate
spinlock, and only accessing fuse_iqueue::waitq using the versions of
the waitqueue functions which do IRQ-safe locking internally.

Reproducer:

#include
#include
#include
#include
#include
#include
#include

int main()
{
char opts[128];
int fd = open("/dev/fuse", O_RDWR);
aio_context_t ctx = 0;
struct iocb cb = { .aio_lio_opcode = IOCB_CMD_POLL, .aio_fildes = fd };
struct iocb *cbp = &cb;

sprintf(opts, "fd=%d,rootmode=040000,user_id=0,group_id=0", fd);
mkdir("mnt", 0700);
mount("foo", "mnt", "fuse", 0, opts);
syscall(__NR_io_setup, 1, &ctx);
syscall(__NR_io_submit, ctx, 1, &cbp);
}

Beginning of lockdep output:

=====================================================
WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
5.3.0-rc5 #9 Not tainted
-----------------------------------------------------
syz_fuse/135 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
000000003590ceda (&fiq->waitq){+.+.}, at: spin_lock include/linux/spinlock.h:338 [inline]
000000003590ceda (&fiq->waitq){+.+.}, at: aio_poll fs/aio.c:1751 [inline]
000000003590ceda (&fiq->waitq){+.+.}, at: __io_submit_one.constprop.0+0x203/0x5b0 fs/aio.c:1825

and this task is already holding:
0000000075037284 (&(&ctx->ctx_lock)->rlock){..-.}, at: spin_lock_irq include/linux/spinlock.h:363 [inline]
0000000075037284 (&(&ctx->ctx_lock)->rlock){..-.}, at: aio_poll fs/aio.c:1749 [inline]
0000000075037284 (&(&ctx->ctx_lock)->rlock){..-.}, at: __io_submit_one.constprop.0+0x1f4/0x5b0 fs/aio.c:1825
which would create a new lock dependency:
(&(&ctx->ctx_lock)->rlock){..-.} -> (&fiq->waitq){+.+.}

but this new dependency connects a SOFTIRQ-irq-safe lock:
(&(&ctx->ctx_lock)->rlock){..-.}

[...]

Reported-by: syzbot+af05535bb79520f95431@syzkaller.appspotmail.com
Reported-by: syzbot+d86c4426a01f60feddc7@syzkaller.appspotmail.com
Fixes: bfe4037e722e ("aio: implement IOCB_CMD_POLL")
Cc: # v4.19+
Cc: Christoph Hellwig
Signed-off-by: Eric Biggers
Signed-off-by: Miklos Szeredi

Eric Biggers
2019-09-10 22:29:29 +0800

07 Sep, 2019

2 commits

c7eb68696 vfs: subtype handling moved to fuse ... Browse Code »

The unused vfs code can be removed. Don't pass empty subtype (same as if
->parse callback isn't called).

The bits that are left involve determining whether it's permitted to split the
filesystem type string passed in to mount(2). Consequently, this means that we
cannot get rid of the FS_HAS_SUBTYPE flag unless we define that a type string
with a dot in it always indicates a subtype specification.

Signed-off-by: David Howells
Signed-off-by: Al Viro
Signed-off-by: Miklos Szeredi

David Howells
2019-09-07 03:28:49 +0800
c30da2e98 fuse: convert to use the new mount API ... Browse Code »

Convert the fuse filesystem to the new internal mount API as the old
one will be obsoleted and removed. This allows greater flexibility in
communication of mount parameters between userspace, the VFS and the
filesystem.

See Documentation/filesystems/mount_api.txt for more information.

Signed-off-by: David Howells
Signed-off-by: Al Viro
Signed-off-by: Miklos Szeredi

David Howells
2019-09-07 03:27:09 +0800

14 May, 2019

1 commit

4856118f4 Merge tag 'fuse-update-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse ... Browse Code »

Pull fuse update from Miklos Szeredi:
"Add more caching controls for userspace filesystems to use, as well as
bug fixes and cleanups"

* tag 'fuse-update-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: clean up fuse_alloc_inode
fuse: Add ioctl flag for x32 compat ioctl
fuse: Convert fusectl to use the new mount API
fuse: fix changelog entry for protocol 7.9
fuse: fix changelog entry for protocol 7.12
fuse: document fuse_fsync_in.fsync_flags
fuse: Add FOPEN_STREAM to use stream_open()
fuse: require /dev/fuse reads to have enough buffer capacity
fuse: retrieve: cap requested size to negotiated max_write
fuse: allow filesystems to have precise control over data cache
fuse: convert printk -> pr_*
fuse: honor RLIMIT_FSIZE in fuse_file_fallocate
fuse: fix writepages on 32bit

Linus Torvalds
2019-05-14 23:59:14 +0800

08 May, 2019

1 commit

9031a69cf fuse: clean up fuse_alloc_inode ... Browse Code »

This patch cleans up fuse_alloc_inode function, just simply the code, no
logic change.

Signed-off-by: zhangliguang
Signed-off-by: Miklos Szeredi

zhangliguang
2019-05-08 19:58:29 +0800

02 May, 2019

1 commit

9baf28bbf fuse: switch to ->free_inode() ... Browse Code »

fuse_destroy_inode() is gone - sanity checks that need the stack
trace of the caller get moved into ->evict_inode(), the rest joins
the RCU-delayed part which becomes ->free_inode().

While we are at it, don't just pass the address of what happens
to be the first member of structure to kmem_cache_free() -
get_fuse_inode() is there for purpose and it gives the proper
container_of() use. No behaviour change, but verifying correctness
is easier that way.

Signed-off-by: Al Viro

Al Viro
2019-05-02 10:43:26 +0800

24 Apr, 2019

2 commits

ad2ba64dd fuse: allow filesystems to have precise control over data cache ... Browse Code »

On networked filesystems file data can be changed externally. FUSE
provides notification messages for filesystem to inform kernel that
metadata or data region of a file needs to be invalidated in local page
cache. That provides the basis for filesystem implementations to invalidate
kernel cache explicitly based on observed filesystem-specific events.

FUSE has also "automatic" invalidation mode(*) when the kernel
automatically invalidates data cache of a file if it sees mtime change. It
also automatically invalidates whole data cache of a file if it sees file
size being changed.

The automatic mode has corresponding capability - FUSE_AUTO_INVAL_DATA.
However, due to probably historical reason, that capability controls only
whether mtime change should be resulting in automatic invalidation or
not. A change in file size always results in invalidating whole data cache
of a file irregardless of whether FUSE_AUTO_INVAL_DATA was negotiated(+).

The filesystem I write[1] represents data arrays stored in networked
database as local files suitable for mmap. It is read-only filesystem -
changes to data are committed externally via database interfaces and the
filesystem only glues data into contiguous file streams suitable for mmap
and traditional array processing. The files are big - starting from
hundreds gigabytes and more. The files change regularly, and frequently by
data being appended to their end. The size of files thus changes
frequently.

If a file was accessed locally and some part of its data got into page
cache, we want that data to stay cached unless there is memory pressure, or
unless corresponding part of the file was actually changed. However current
FUSE behaviour - when it sees file size change - is to invalidate the whole
file. The data cache of the file is thus completely lost even on small size
change, and despite that the filesystem server is careful to accurately
translate database changes into FUSE invalidation messages to kernel.

Let's fix it: if a filesystem, through new FUSE_EXPLICIT_INVAL_DATA
capability, indicates to kernel that it is fully responsible for data cache
invalidation, then the kernel won't invalidate files data cache on size
change and only truncate that cache to new size in case the size decreased.

(*) see 72d0d248ca "fuse: add FUSE_AUTO_INVAL_DATA init flag",
eed2179efe "fuse: invalidate inode mapping if mtime changes"

(+) in writeback mode the kernel does not invalidate data cache on file
size change, but neither it allows the filesystem to set the size due to
external event (see 8373200b12 "fuse: Trust kernel i_size only")

[1] https://lab.nexedi.com/kirr/wendelin.core/blob/a50f1d9f/wcfs/wcfs.go#L20

Signed-off-by: Kirill Smelkov
Signed-off-by: Miklos Szeredi

Kirill Smelkov
2019-04-24 23:05:06 +0800
f2294482f fuse: convert printk -> pr_* ... Browse Code »

Functions, like pr_err, are a more modern variant of printing compared to
printk. They could be used to denoise sources by using needed level in
the print function name, and by automatically inserting per-driver /
function / ... print prefix as defined by pr_fmt macro. pr_* are also
said to be used in Documentation/process/coding-style.rst and more
recent code - for example overlayfs - uses them instead of printk.

Convert CUSE and FUSE to use the new pr_* functions.

CUSE output stays completely unchanged, while FUSE output is amended a
bit for "trying to steal weird page" warning - the second line now comes
also with "fuse:" prefix. I hope it is ok.

Suggested-by: Kirill Tkhai
Signed-off-by: Kirill Smelkov
Reviewed-by: Kirill Tkhai
Signed-off-by: Miklos Szeredi

Kirill Smelkov
2019-04-24 23:05:06 +0800

13 Mar, 2019

2 commits

dfee9c257 Merge tag 'fuse-update-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse ... Browse Code »

Pull fuse updates from Miklos Szeredi:
"Scalability and performance improvements, as well as minor bug fixes
and cleanups"

* tag 'fuse-update-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: (25 commits)
fuse: cache readdir calls if filesystem opts out of opendir
fuse: support clients that don't implement 'opendir'
fuse: lift bad inode checks into callers
fuse: multiplex cached/direct_io file operations
fuse add copy_file_range to direct io fops
fuse: use iov_iter based generic splice helpers
fuse: Switch to using async direct IO for FOPEN_DIRECT_IO
fuse: use atomic64_t for khctr
fuse: clean up aborted
fuse: Protect ff->reserved_req via corresponding fi->lock
fuse: Protect fi->nlookup with fi->lock
fuse: Introduce fi->lock to protect write related fields
fuse: Convert fc->attr_version into atomic64_t
fuse: Add fuse_inode argument to fuse_prepare_release()
fuse: Verify userspace asks to requeue interrupt that we really sent
fuse: Do some refactoring in fuse_dev_do_write()
fuse: Wake up req->waitq of only if not background
fuse: Optimize request_end() by not taking fiq->waitq.lock
fuse: Kill fasync only if interrupt is queued in queue_interrupt()
fuse: Remove stale comment in end_requests()
...

Linus Torvalds
2019-03-13 05:46:26 +0800
b5420237e mm: refactor readahead defines in mm.h ... Browse Code »

All users of VM_MAX_READAHEAD actually convert it to kbytes and then to
pages. Define the macro explicitly as (SZ_128K / PAGE_SIZE). This
simplifies the expression in every filesystem. Also rename the macro to
VM_READAHEAD_PAGES to properly convey its meaning. Finally remove unused
VM_MIN_READAHEAD

[akpm@linux-foundation.org: fix fs/io_uring.c, per Stephen]
Link: http://lkml.kernel.org/r/20181221144053.24318-1-nborisov@suse.com
Signed-off-by: Nikolay Borisov
Reviewed-by: Matthew Wilcox
Reviewed-by: David Hildenbrand
Cc: Jens Axboe
Cc: Eric Van Hensbergen
Cc: Latchesar Ionkov
Cc: Dominique Martinet
Cc: David Howells
Cc: Chris Mason
Cc: Josef Bacik
Cc: David Sterba
Cc: Miklos Szeredi
Cc: Stephen Rothwell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nikolay Borisov
2019-03-13 01:04:01 +0800

13 Feb, 2019

5 commits

d9a9ea94f fuse: support clients that don't implement 'opendir' ... Browse Code »

Allow filesystems to return ENOSYS from opendir, preventing the kernel from
sending opendir and releasedir messages in the future. This avoids
userspace transitions when filesystems don't need to keep track of state
per directory handle.

A new capability flag, FUSE_NO_OPENDIR_SUPPORT, parallels
FUSE_NO_OPEN_SUPPORT, indicating the new semantics for returning ENOSYS
from opendir.

Signed-off-by: Chad Austin
Signed-off-by: Miklos Szeredi

Chad Austin
2019-02-13 20:15:15 +0800
75126f550 fuse: use atomic64_t for khctr ... Browse Code »

...to get rid of one more fc->lock use.

Signed-off-by: Miklos Szeredi

Miklos Szeredi
2019-02-13 20:15:14 +0800
eb98e3bdf fuse: clean up aborted ... Browse Code »

The only caller that needs fc->aborted set is fuse_conn_abort_write().
Setting fc->aborted is now racy (fuse_abort_conn() may already be in
progress or finished) but there's no reason to care.

Signed-off-by: Miklos Szeredi

Miklos Szeredi
2019-02-13 20:15:14 +0800
c9d8f5f06 fuse: Protect fi->nlookup with fi->lock ... Browse Code »

This continues previous patch and introduces the same protection for
nlookup field.

Signed-off-by: Kirill Tkhai
Signed-off-by: Miklos Szeredi

Kirill Tkhai
2019-02-13 20:15:14 +0800
f15ecfef0 fuse: Introduce fi->lock to protect write related fields ... Browse Code »

To minimize contention of fc->lock, this patch introduces a new spinlock
for protection fuse_inode metadata:

fuse_inode:
writectr
writepages
write_files
queued_writes
attr_version

inode:
i_size
i_nlink
i_mtime
i_ctime

Also, it protects the fields changed in fuse_change_attributes_common()
(too many to list).

Signed-off-by: Kirill Tkhai
Signed-off-by: Miklos Szeredi

Kirill Tkhai
2019-02-13 20:15:14 +0800