Eric Lee / smarc-fsl-linux-kernel

11 Nov, 2020

3 commits

9b8523423 vfs: move __sb_{start,end}_write* to fs.h ... Browse Code »

Now that we've straightened out the callers, move these three functions
to fs.h since they're fairly trivial.

Signed-off-by: Darrick J. Wong
Reviewed-by: Christoph Hellwig
Reviewed-by: Jan Kara

Darrick J. Wong
2020-11-11 08:53:11 +0800
8a3c84b64 vfs: separate __sb_start_write into blocking and non-blocking helpers ... Browse Code »

Break this function into two helpers so that it's obvious that the
trylock versions return a value that must be checked, and the blocking
versions don't require that. While we're at it, clean up the return
type mismatch.

Signed-off-by: Darrick J. Wong
Reviewed-by: Jan Kara
Reviewed-by: Christoph Hellwig

Darrick J. Wong
2020-11-11 08:53:07 +0800
22843291e vfs: remove lockdep bogosity in __sb_start_write ... Browse Code »

__sb_start_write has some weird looking lockdep code that claims to
exist to handle nested freeze locking requests from xfs. The code as
written seems broken -- if we think we hold a read lock on any of the
higher freeze levels (e.g. we hold SB_FREEZE_WRITE and are trying to
lock SB_FREEZE_PAGEFAULT), it converts a blocking lock attempt into a
trylock.

However, it's not correct to downgrade a blocking lock attempt to a
trylock unless the downgrading code or the callers are prepared to deal
with that situation. Neither __sb_start_write nor its callers handle
this at all. For example:

sb_start_pagefault ignores the return value completely, with the result
that if xfs_filemap_fault loses a race with a different thread trying to
fsfreeze, it will proceed without pagefault freeze protection (thereby
breaking locking rules) and then unlocks the pagefault freeze lock that
it doesn't own on its way out (thereby corrupting the lock state), which
leads to a system hang shortly afterwards.

Normally, this won't happen because our ownership of a read lock on a
higher freeze protection level blocks fsfreeze from grabbing a write
lock on that higher level. *However*, if lockdep is offline,
lock_is_held_type unconditionally returns 1, which means that
percpu_rwsem_is_held returns 1, which means that __sb_start_write
unconditionally converts blocking freeze lock attempts into trylocks,
even when we *don't* hold anything that would block a fsfreeze.

Apparently this all held together until 5.10-rc1, when bugs in lockdep
caused lockdep to shut itself off early in an fstests run, and once
fstests gets to the "race writes with freezer" tests, kaboom. This
might explain the long trail of vanishingly infrequent livelocks in
fstests after lockdep goes offline that I've never been able to
diagnose.

We could fix it by spinning on the trylock if wait==true, but AFAICT the
locking works fine if lockdep is not built at all (and I didn't see any
complaints running fstests overnight), so remove this snippet entirely.

NOTE: Commit f4b554af9931 in 2015 created the current weird logic (which
used to exist in a different form in commit 5accdf82ba25c from 2012) in
__sb_start_write. XFS solved this whole problem in the late 2.6 era by
creating a variant of transactions (XFS_TRANS_NO_WRITECOUNT) that don't
grab intwrite freeze protection, thus making lockdep's solution
unnecessary. The commit claims that Dave Chinner explained that the
trylock hack + comment could be removed, but nobody ever did.

Signed-off-by: Darrick J. Wong
Reviewed-by: Christoph Hellwig
Reviewed-by: Jan Kara

Darrick J. Wong
2020-11-11 08:49:29 +0800

25 Sep, 2020

1 commit

1cb039f3d bdi: replace BDI_CAP_STABLE_WRITES with a queue and a sb flag ... Browse Code »

The BDI_CAP_STABLE_WRITES is one of the few bits of information in the
backing_dev_info shared between the block drivers and the writeback code.
To help untangling the dependency replace it with a queue flag and a
superblock flag derived from it. This also helps with the case of e.g.
a file system requiring stable writes due to its own checksumming, but
not forcing it on other users of the block device like the swap code.

One downside is that we an't support the stable_pages_required bdi
attribute in sysfs anymore. It is replaced with a queue attribute which
also is writable for easier testing.

Signed-off-by: Christoph Hellwig
Reviewed-by: Jan Kara
Reviewed-by: Johannes Thumshirn
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-09-25 03:43:39 +0800

11 Jun, 2020

1 commit

4dbb29fe9 Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs fixes from Al Viro:
"A couple of trivial patches that fell through the cracks last cycle"

* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: fix indentation in deactivate_super()
vfs: Remove duplicated d_mountpoint check in __is_local_mountpoint

Linus Torvalds
2020-06-11 07:09:11 +0800

03 Jun, 2020

1 commit

750a02ab8 Merge tag 'for-5.8/block-2020-06-01' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block updates from Jens Axboe:
"Core block changes that have been queued up for this release:

- Remove dead blk-throttle and blk-wbt code (Guoqing)

- Include pid in blktrace note traces (Jan)

- Don't spew I/O errors on wouldblock termination (me)

- Zone append addition (Johannes, Keith, Damien)

- IO accounting improvements (Konstantin, Christoph)

- blk-mq hardware map update improvements (Ming)

- Scheduler dispatch improvement (Salman)

- Inline block encryption support (Satya)

- Request map fixes and improvements (Weiping)

- blk-iocost tweaks (Tejun)

- Fix for timeout failing with error injection (Keith)

- Queue re-run fixes (Douglas)

- CPU hotplug improvements (Christoph)

- Queue entry/exit improvements (Christoph)

- Move DMA drain handling to the few drivers that use it (Christoph)

- Partition handling cleanups (Christoph)"

* tag 'for-5.8/block-2020-06-01' of git://git.kernel.dk/linux-block: (127 commits)
block: mark bio_wouldblock_error() bio with BIO_QUIET
blk-wbt: rename __wbt_update_limits to wbt_update_limits
blk-wbt: remove wbt_update_limits
blk-throttle: remove tg_drain_bios
blk-throttle: remove blk_throtl_drain
null_blk: force complete for timeout request
blk-mq: drain I/O when all CPUs in a hctx are offline
blk-mq: add blk_mq_all_tag_iter
blk-mq: open code __blk_mq_alloc_request in blk_mq_alloc_request_hctx
blk-mq: use BLK_MQ_NO_TAG in more places
blk-mq: rename BLK_MQ_TAG_FAIL to BLK_MQ_NO_TAG
blk-mq: move more request initialization to blk_mq_rq_ctx_init
blk-mq: simplify the blk_mq_get_request calling convention
blk-mq: remove the bio argument to ->prepare_request
nvme: force complete cancelled requests
blk-mq: blk-mq: provide forced completion method
block: fix a warning when blkdev.h is included for !CONFIG_BLOCK builds
block: blk-crypto-fallback: remove redundant initialization of variable err
block: reduce part_stat_lock() scope
block: use __this_cpu_add() instead of access by smp_processor_id()
...

Linus Torvalds
2020-06-03 06:29:19 +0800

29 May, 2020

1 commit

cc23402c1 fs: fix indentation in deactivate_super() ... Browse Code »

Fix the breaked indent in deactive_super().

Signed-off-by: Yufen Yu
Signed-off-by: Al Viro

Yufen Yu
2020-05-29 22:35:25 +0800

10 May, 2020

2 commits

1cd925d58 bdi: remove the name field in struct backing_dev_info ... Browse Code »

The name is only printed for a not registered bdi in writeback. Use the
device name there as is more useful anyway for the unlike case that the
warning triggers.

Signed-off-by: Christoph Hellwig
Reviewed-by: Jan Kara
Reviewed-by: Greg Kroah-Hartman
Reviewed-by: Bart Van Assche
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-05-10 06:15:13 +0800
aef33c2ff bdi: simplify bdi_alloc ... Browse Code »

Merge the _node vs normal version and drop the superflous gfp_t argument.

Signed-off-by: Christoph Hellwig
Reviewed-by: Jan Kara
Reviewed-by: Greg Kroah-Hartman
Reviewed-by: Bart Van Assche
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-05-10 06:15:13 +0800

29 Apr, 2020

1 commit

dd7bc8158 Fix use after free in get_tree_bdev() ... Browse Code »

Commit 6fcf0c72e4b9, a fix to get_tree_bdev() put a missing blkdev_put() in
the wrong place, before a warnf() that displays the bdev under
consideration rather after it.

This results in a silent lockup in printk("%pg") called via warnf() from
get_tree_bdev() under some circumstances when there's a race with the
blockdev being frozen. This can be caused by xfstests/tests/generic/085 in
combination with Lukas Czerner's ext4 mount API conversion patchset. It
looks like it ought to occur with other users of get_tree_bdev() such as
XFS, but apparently doesn't.

Fix this by switching the order of the lines.

Fixes: 6fcf0c72e4b9 ("vfs: add missing blkdev_put() in get_tree_bdev()")
Reported-by: Lukas Czerner
Signed-off-by: David Howells
cc: Ian Kent
cc: Al Viro
Signed-off-by: Linus Torvalds

David Howells
2020-04-29 05:37:40 +0800

18 Dec, 2019

1 commit

1edc8eb2e fs: call fsnotify_sb_delete after evict_inodes ... Browse Code »

When a filesystem is unmounted, we currently call fsnotify_sb_delete()
before evict_inodes(), which means that fsnotify_unmount_inodes()
must iterate over all inodes on the superblock looking for any inodes
with watches. This is inefficient and can lead to livelocks as it
iterates over many unwatched inodes.

At this point, SB_ACTIVE is gone and dropping refcount to zero kicks
the inode out out immediately, so anything processed by
fsnotify_sb_delete / fsnotify_unmount_inodes gets evicted in that loop.

After that, the call to evict_inodes will evict everything else with a
zero refcount.

This should speed things up overall, and avoid livelocks in
fsnotify_unmount_inodes().

Signed-off-by: Eric Sandeen
Reviewed-by: Jan Kara
Signed-off-by: Al Viro

Eric Sandeen
2019-12-18 13:03:01 +0800

10 Oct, 2019

2 commits

015c21ba5 Merge branch 'work.mount3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull mount fixes from Al Viro:
"A couple of regressions from the mount series"

* 'work.mount3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
vfs: add missing blkdev_put() in get_tree_bdev()
shmem: fix LSM options parsing

Linus Torvalds
2019-10-10 23:16:44 +0800
6fcf0c72e vfs: add missing blkdev_put() in get_tree_bdev() ... Browse Code »

Is there are a couple of missing blkdev_put() in get_tree_bdev()?

Signed-off-by: Al Viro

Ian Kent
2019-10-10 10:53:57 +0800

26 Sep, 2019

1 commit

7b1373dd6 Merge tag 'fuse-update-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse ... Browse Code »

Pull fuse updates from Miklos Szeredi:

- Continue separating the transport (user/kernel communication) and the
filesystem layers of fuse. Getting rid of most layering violations
will allow for easier cleanup and optimization later on.

- Prepare for the addition of the virtio-fs filesystem. The actual
filesystem will be introduced by a separate pull request.

- Convert to new mount API.

- Various fixes, optimizations and cleanups.

* tag 'fuse-update-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: (55 commits)
fuse: Make fuse_args_to_req static
fuse: fix memleak in cuse_channel_open
fuse: fix beyond-end-of-page access in fuse_parse_cache()
fuse: unexport fuse_put_request
fuse: kmemcg account fs data
fuse: on 64-bit store time in d_fsdata directly
fuse: fix missing unlock_page in fuse_writepage()
fuse: reserve byteswapped init opcodes
fuse: allow skipping control interface and forced unmount
fuse: dissociate DESTROY from fuseblk
fuse: delete dentry if timeout is zero
fuse: separate fuse device allocation and installation in fuse_conn
fuse: add fuse_iqueue_ops callbacks
fuse: extract fuse_fill_super_common()
fuse: export fuse_dequeue_forget() function
fuse: export fuse_get_unique()
fuse: export fuse_send_init_request()
fuse: export fuse_len_args()
fuse: export fuse_end_request()
fuse: fix request limit
...

Linus Torvalds
2019-09-26 00:55:59 +0800

20 Sep, 2019

2 commits

bc7d9aee3 Merge branch 'work.mount2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull misc mount API conversions from Al Viro:
"Conversions to new API for shmem and friends and for mount_mtd()-using
filesystems.

As for the rest of the mount API conversions in -next, some of them
belong in the individual trees (e.g. binderfs one should definitely go
through android folks, after getting redone on top of their changes).
I'm going to drop those and send the rest (trivial ones + stuff ACKed
by maintainers) in a separate series - by that point they are
independent from each other.

Some stuff has already migrated into individual trees (NFS conversion,
for example, or FUSE stuff, etc.); those presumably will go through
the regular merges from corresponding trees."

* 'work.mount2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
vfs: Make fs_parse() handle fs_param_is_fd-type params better
vfs: Convert ramfs, shmem, tmpfs, devtmpfs, rootfs to use the new mount API
shmem_parse_one(): switch to use of fs_parse()
shmem_parse_options(): take handling a single option into a helper
shmem_parse_options(): don't bother with mpol in separate variable
shmem_parse_options(): use a separate structure to keep the results
make shmem_fill_super() static
make ramfs_fill_super() static
devtmpfs: don't mix {ramfs,shmem}_fill_super() with mount_single()
vfs: Convert squashfs to use the new mount API
mtd: Kill mount_mtd()
vfs: Convert jffs2 to use the new mount API
vfs: Convert cramfs to use the new mount API
vfs: Convert romfs to use the new mount API
vfs: Add a single-or-reconfig keying to vfs_get_super()

Linus Torvalds
2019-09-20 01:06:57 +0800
cfb82e1df Merge tag 'y2038-vfs' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground ... Browse Code »

Pull y2038 vfs updates from Arnd Bergmann:
"Add inode timestamp clamping.

This series from Deepa Dinamani adds a per-superblock minimum/maximum
timestamp limit for a file system, and clamps timestamps as they are
written, to avoid random behavior from integer overflow as well as
having different time stamps on disk vs in memory.

At mount time, a warning is now printed for any file system that can
represent current timestamps but not future timestamps more than 30
years into the future, similar to the arbitrary 30 year limit that was
added to settimeofday().

This was picked as a compromise to warn users to migrate to other file
systems (e.g. ext4 instead of ext3) when they need the file system to
survive beyond 2038 (or similar limits in other file systems), but not
get in the way of normal usage"

* tag 'y2038-vfs' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground:
ext4: Reduce ext4 timestamp warnings
isofs: Initialize filesystem timestamp ranges
pstore: fs superblock limits
fs: omfs: Initialize filesystem timestamp ranges
fs: hpfs: Initialize filesystem timestamp ranges
fs: ceph: Initialize filesystem timestamp ranges
fs: sysv: Initialize filesystem timestamp ranges
fs: affs: Initialize filesystem timestamp ranges
fs: fat: Initialize filesystem timestamp ranges
fs: cifs: Initialize filesystem timestamp ranges
fs: nfs: Initialize filesystem timestamp ranges
ext4: Initialize timestamps limits
9p: Fill min and max timestamps in sb
fs: Fill in max and min timestamps in superblock
utimes: Clamp the timestamps before update
mount: Add mount warning for impending timestamp expiry
timestamp_truncate: Replace users of timespec64_trunc
vfs: Add timestamp_truncate() api
vfs: Add file timestamp range support

Linus Torvalds
2019-09-20 00:42:37 +0800

19 Sep, 2019

1 commit

734d1ed83 Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt ... Browse Code »

Pull fscrypt updates from Eric Biggers:
"This is a large update to fs/crypto/ which includes:

- Add ioctls that add/remove encryption keys to/from a
filesystem-level keyring.

These fix user-reported issues where e.g. an encrypted home
directory can break NetworkManager, sshd, Docker, etc. because they
don't get access to the needed keyring. These ioctls also provide a
way to lock encrypted directories that doesn't use the
vm.drop_caches sysctl, so is faster, more reliable, and doesn't
always need root.

- Add a new encryption policy version ("v2") which switches to a more
standard, secure, and flexible key derivation function, and starts
verifying that the correct key was supplied before using it.

The key derivation improvement is needed for its own sake as well
as for ongoing feature work for which the current way is too
inflexible.

Work is in progress to update both Android and the 'fscrypt' userspace
tool to use both these features. (Working patches are available and
just need to be reviewed+merged.) Chrome OS will likely use them too.

This has also been tested on ext4, f2fs, and ubifs with xfstests --
both the existing encryption tests, and the new tests for this. This
has also been in linux-next since Aug 16 with no reported issues. I'm
also using an fscrypt v2-encrypted home directory on my personal
desktop"

* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt: (27 commits)
ext4 crypto: fix to check feature status before get policy
fscrypt: document the new ioctls and policy version
ubifs: wire up new fscrypt ioctls
f2fs: wire up new fscrypt ioctls
ext4: wire up new fscrypt ioctls
fscrypt: require that key be added when setting a v2 encryption policy
fscrypt: add FS_IOC_REMOVE_ENCRYPTION_KEY_ALL_USERS ioctl
fscrypt: allow unprivileged users to add/remove keys for v2 policies
fscrypt: v2 encryption policy support
fscrypt: add an HKDF-SHA512 implementation
fscrypt: add FS_IOC_GET_ENCRYPTION_KEY_STATUS ioctl
fscrypt: add FS_IOC_REMOVE_ENCRYPTION_KEY ioctl
fscrypt: add FS_IOC_ADD_ENCRYPTION_KEY ioctl
fscrypt: rename keyinfo.c to keysetup.c
fscrypt: move v1 policy key setup to keysetup_v1.c
fscrypt: refactor key setup code in preparation for v2 policies
fscrypt: rename fscrypt_master_key to fscrypt_direct_key
fscrypt: add ->ci_inode to fscrypt_info
fscrypt: use FSCRYPT_* definitions, not FS_*
fscrypt: use FSCRYPT_ prefix for uapi constants
...

Linus Torvalds
2019-09-19 07:08:52 +0800

07 Sep, 2019

1 commit

c7eb68696 vfs: subtype handling moved to fuse ... Browse Code »

The unused vfs code can be removed. Don't pass empty subtype (same as if
->parse callback isn't called).

The bits that are left involve determining whether it's permitted to split the
filesystem type string passed in to mount(2). Consequently, this means that we
cannot get rid of the FS_HAS_SUBTYPE flag unless we define that a type string
with a dot in it always indicates a subtype specification.

Signed-off-by: David Howells
Signed-off-by: Al Viro
Signed-off-by: Miklos Szeredi

David Howells
2019-09-07 03:28:49 +0800

06 Sep, 2019

3 commits

43ce4c1fe vfs: Add a single-or-reconfig keying to vfs_get_super() ... Browse Code »

Add an additional keying mode to vfs_get_super() to indicate that only a
single superblock should exist in the system, and that, if it does, further
mounts should invoke reconfiguration upon it.

This allows mount_single() to be replaced.

[Fix by Eric Biggers folded in]

Signed-off-by: David Howells
Signed-off-by: Al Viro

David Howells
2019-09-06 02:34:23 +0800
fe62c3a4e vfs: Create fs_context-aware mount_bdev() replacement ... Browse Code »

Create a function, get_tree_bdev(), that is fs_context-aware and a
->get_tree() counterpart of mount_bdev().

It caches the block device pointer in the fs_context struct so that this
information can be passed into sget_fc()'s test and set functions.

Signed-off-by: David Howells
cc: Jens Axboe
cc: linux-block@vger.kernel.org
Signed-off-by: Al Viro

David Howells
2019-09-06 02:34:22 +0800
533770cc0 new helper: get_tree_keyed() ... Browse Code »

For vfs_get_keyed_super users.

Signed-off-by: Al Viro

Al Viro
2019-09-06 02:34:22 +0800

30 Aug, 2019

1 commit

188d20bcd vfs: Add file timestamp range support ... Browse Code »

Add fields to the superblock to track the min and max
timestamps supported by filesystems.

Initially, when a superblock is allocated, initialize
it to the max and min values the fields can hold.
Individual filesystems override these to match their
actual limits.

Pseudo filesystems are assumed to always support the
min and max allowable values for the fields.

Signed-off-by: Deepa Dinamani
Acked-by: Jeff Layton

Deepa Dinamani
2019-08-30 22:27:17 +0800

13 Aug, 2019

1 commit

22d94f493 fscrypt: add FS_IOC_ADD_ENCRYPTION_KEY ioctl ... Browse Code »

Add a new fscrypt ioctl, FS_IOC_ADD_ENCRYPTION_KEY. This ioctl adds an
encryption key to the filesystem's fscrypt keyring ->s_master_keys,
making any files encrypted with that key appear "unlocked".

Why we need this
~~~~~~~~~~~~~~~~

The main problem is that the "locked/unlocked" (ciphertext/plaintext)
status of encrypted files is global, but the fscrypt keys are not.
fscrypt only looks for keys in the keyring(s) the process accessing the
filesystem is subscribed to: the thread keyring, process keyring, and
session keyring, where the session keyring may contain the user keyring.

Therefore, userspace has to put fscrypt keys in the keyrings for
individual users or sessions. But this means that when a process with a
different keyring tries to access encrypted files, whether they appear
"unlocked" or not is nondeterministic. This is because it depends on
whether the files are currently present in the inode cache.

Fixing this by consistently providing each process its own view of the
filesystem depending on whether it has the key or not isn't feasible due
to how the VFS caches work. Furthermore, while sometimes users expect
this behavior, it is misguided for two reasons. First, it would be an
OS-level access control mechanism largely redundant with existing access
control mechanisms such as UNIX file permissions, ACLs, LSMs, etc.
Encryption is actually for protecting the data at rest.

Second, almost all users of fscrypt actually do need the keys to be
global. The largest users of fscrypt, Android and Chromium OS, achieve
this by having PID 1 create a "session keyring" that is inherited by
every process. This works, but it isn't scalable because it prevents
session keyrings from being used for any other purpose.

On general-purpose Linux distros, the 'fscrypt' userspace tool [1] can't
similarly abuse the session keyring, so to make 'sudo' work on all
systems it has to link all the user keyrings into root's user keyring
[2]. This is ugly and raises security concerns. Moreover it can't make
the keys available to system services, such as sshd trying to access the
user's '~/.ssh' directory (see [3], [4]) or NetworkManager trying to
read certificates from the user's home directory (see [5]); or to Docker
containers (see [6], [7]).

By having an API to add a key to the *filesystem* we'll be able to fix
the above bugs, remove userspace workarounds, and clearly express the
intended semantics: the locked/unlocked status of an encrypted directory
is global, and encryption is orthogonal to OS-level access control.

Why not use the add_key() syscall
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

We use an ioctl for this API rather than the existing add_key() system
call because the ioctl gives us the flexibility needed to implement
fscrypt-specific semantics that will be introduced in later patches:

- Supporting key removal with the semantics such that the secret is
removed immediately and any unused inodes using the key are evicted;
also, the eviction of any in-use inodes can be retried.

- Calculating a key-dependent cryptographic identifier and returning it
to userspace.

- Allowing keys to be added and removed by non-root users, but only keys
for v2 encryption policies; and to prevent denial-of-service attacks,
users can only remove keys they themselves have added, and a key is
only really removed after all users who added it have removed it.

Trying to shoehorn these semantics into the keyrings syscalls would be
very difficult, whereas the ioctls make things much easier.

However, to reuse code the implementation still uses the keyrings
service internally. Thus we get lockless RCU-mode key lookups without
having to re-implement it, and the keys automatically show up in
/proc/keys for debugging purposes.

References:

[1] https://github.com/google/fscrypt
[2] https://goo.gl/55cCrI#heading=h.vf09isp98isb
[3] https://github.com/google/fscrypt/issues/111#issuecomment-444347939
[4] https://github.com/google/fscrypt/issues/116
[5] https://bugs.launchpad.net/ubuntu/+source/fscrypt/+bug/1770715
[6] https://github.com/google/fscrypt/issues/128
[7] https://askubuntu.com/questions/1130306/cannot-run-docker-on-an-encrypted-filesystem

Reviewed-by: Theodore Ts'o
Signed-off-by: Eric Biggers

Eric Biggers
2019-08-13 10:06:13 +0800

01 Aug, 2019

1 commit

c2c44ec20 Unbreak mount_capable() ... Browse Code »

In "consolidate the capability checks in sget_{fc,userns}())" the
wrong argument had been passed to mount_capable() by sget_fc().
That mistake had been further obscured later, when switching
mount_capable() to fs_context has moved the calculation of
bogus argument from sget_fc() to mount_capable() itself. It
should've been fc->user_ns all along.

Screwed-up-by: Al Viro
Reported-by: Christian Brauner
Tested-by: Christian Brauner
Reviewed-by: David Howells
Signed-off-by: Al Viro

Al Viro
2019-08-01 00:22:32 +0800

05 Jul, 2019

2 commits

c23a0bbab convenience helper: get_tree_single() ... Browse Code »

counterpart of mount_single(); switch fusectl to it

Signed-off-by: Al Viro

Al Viro
2019-07-05 10:01:58 +0800
2ac295d4f convenience helper get_tree_nodev() ... Browse Code »

counterpart of mount_nodev(). Switch hugetlb and pseudo to it.

Signed-off-by: Al Viro

Al Viro
2019-07-05 10:01:58 +0800

26 May, 2019

9 commits

023d066a0 vfs: Kill sget_userns() ... Browse Code »

Kill sget_userns(), folding it into sget() as that's the only remaining
user.

Signed-off-by: David Howells
cc: linux-fsdevel@vger.kernel.org

David Howells
2019-05-26 06:06:17 +0800
c80fa7c83 vfs: Provide sb->s_iflags settings in fs_context struct ... Browse Code »

Provide a field in the fs_context struct through which bits in the
sb->s_iflags superblock field can be set.

Signed-off-by: David Howells
cc: linux-fsdevel@vger.kernel.org

David Howells
2019-05-26 06:00:03 +0800
c3aabf078 move mount_capable() further out ... Browse Code »

Call graph of vfs_get_tree():
vfs_fsconfig_locked() # neither kernmount, nor submount
do_new_mount() # neither kernmount, nor submount
fc_mount()
afs_mntpt_do_automount() # submount
mount_one_hugetlbfs() # kernmount
pid_ns_prepare_proc() # kernmount
mq_create_mount() # kernmount
vfs_kern_mount()
simple_pin_fs() # kernmount
vfs_submount() # submount
kern_mount() # kernmount
init_mount_tree()
btrfs_mount()
nfs_do_root_mount()

The first two need the check (unconditionally).
init_mount_tree() is setting rootfs up; any capability
checks make zero sense for that one. And btrfs_mount()/
nfs_do_root_mount() have the checks already done in their
callers.

IOW, we can shift mount_capable() handling into
the two callers - one in the normal case of mount(2),
another - in fsconfig(2) handling of FSCONFIG_CMD_CREATE.
I.e. the syscalls that set a new filesystem up.

Signed-off-by: Al Viro

Al Viro
2019-05-26 06:00:02 +0800
059338aae move mount_capable() calls to vfs_get_tree() ... Browse Code »

sget_fc() is called only from ->get_tree() instances and
the only instance not calling it is legacy_get_tree(),
which calls mount_capable() directly.

In all sget_fc() callers the checks could be moved to the
very beginning of ->get_tree() - ->user_ns is not changed
in between. So lifting the checks to the only caller of
->get_tree() is OK.

Signed-off-by: Al Viro

Al Viro
2019-05-26 06:00:01 +0800
20284ab74 switch mount_capable() to fs_context ... Browse Code »

now both callers of mount_capable() have access to fs_context;
the only difference is that for sget_fc() we have the possibility
of fc->global being true, while for legacy_get_tree() it's guaranteed
to be impossible. Unify to more generic variant...

Signed-off-by: Al Viro

Al Viro
2019-05-26 05:59:59 +0800
2527b284d move the capability checks from sget_userns() to legacy_get_tree() ... Browse Code »

1) all call chains leading to sget_userns() pass through ->mount()
instances.
2) none of ->mount() instances is ever called directly - the only
call site is legacy_get_tree()
3) all remaining ->mount() instances end up calling sget_userns()

IOW, we might as well do the capability checks just before calling
->mount(). As for the arguments passed to mount_capable(),
in case of call chains to sget_userns() going through sget(),
we either don't call mount_capable() at all, or pass current_user_ns()
to it. The call chains going through mount_pseudo_xattr() don't
call mount_capable() at all (SB_KERNMOUNT in flags on those).

That could've been split into smaller steps (lifting the checks
into sget(), then callers of sget(), then all the way to the
entries of every ->mount() out there, then to the sole caller),
but that would be too much churn for little benefit...

Signed-off-by: Al Viro

Al Viro
2019-05-26 05:59:58 +0800
bb7b6b2bb vfs: Kill mount_ns() ... Browse Code »

Kill mount_ns() as it has been replaced by vfs_get_super() in the new mount
API.

Signed-off-by: David Howells
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Al Viro

David Howells
2019-05-26 05:59:57 +0800
0ce0cf12f consolidate the capability checks in sget_{fc,userns}() ... Browse Code »

... into a common helper - mount_capable(type, userns)

Signed-off-by: Al Viro

Al Viro
2019-05-26 05:59:56 +0800
feb8ae43a start massaging the checks in sget_...(): move to sget_userns() ... Browse Code »

there are 3 remaining callers of sget_userns() - sget(), mount_ns()
and mount_pseudo_xattr(). Extra check in sget() is conditional
upon mount being neither KERNMOUNT nor SUBMOUNT, the identical one
in mount_ns() - upon being not KERNMOUNT; mount_pseudo_xattr()
has no such checks at all.

However, mount_ns() is never used with SUBMOUNT and mount_pseudo_xattr()
is used only for KERNMOUNT, so both would be fine with the same logics
as currently done in sget(), allowing to consolidate the entire thing
in sget_userns() itself.

That's not where these checks will end up in the long run, though -
the whole reason why they'd been done so deep in the bowels of
mount(2) was that there had been no way for a filesystem to specify
which userns to look at until it has entered ->mount().

Now there is a place where filesystem could override the defaults -
->init_fs_context(). Which allows to pull the checks out into
the callers of vfs_get_tree(). That'll take quite a bit of
massage, but that mess is possible to tease apart.

Signed-off-by: Al Viro

Al Viro
2019-05-26 05:59:55 +0800

29 Apr, 2019

1 commit

ee948837d [fix] get rid of checking for absent device name in vfs_get_tree() ... Browse Code »

It has no business being there, it's checked by relevant ->get_tree()
as it is *and* it returns the wrong error for no reason whatsoever.

Fixes: f3a09c92018a "introduce fs_context methods"
Signed-off-by: Al Viro

Al Viro
2019-04-29 09:34:21 +0800

28 Feb, 2019

2 commits

06a2ae56b vfs: Add some logging to the core users of the fs_context log ... Browse Code »

Add some logging to the core users of the fs_context log so that
information can be extracted from them as to the reason for failure.

Signed-off-by: David Howells
Signed-off-by: Al Viro

David Howells
2019-02-28 16:29:38 +0800
cb50b348c convenience helpers: vfs_get_super() and sget_fc() ... Browse Code »

the former is an analogue of mount_{single,nodev} for use in
->get_tree() instances, the latter - analogue of sget() for the
same.

These are fairly similar to the originals, but the callback signature
for sget_fc() is different from sget() ones, so getting bits and
pieces shared would be too convoluted; we might get around to that
later, but for now let's just remember to keep them in sync. They
do live next to each other, and changes in either won't be hard
to spot.

Signed-off-by: Al Viro

Al Viro
2019-02-28 16:29:26 +0800

31 Jan, 2019

2 commits

f3a09c920 introduce fs_context methods ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2019-01-31 06:44:27 +0800
8d0347f6c convert do_remount_sb() to fs_context ... Browse Code »

Replace do_remount_sb() with a function, reconfigure_super(), that's
fs_context aware. The fs_context is expected to be parameterised already
and have ->root pointing to the superblock to be reconfigured.

A legacy wrapper is provided that is intended to be called from the
fs_context ops when those appear, but for now is called directly from
reconfigure_super(). This wrapper invokes the ->remount_fs() superblock op
for the moment. It is intended that the remount_fs() op will be phased
out.

The fs_context->purpose is set to FS_CONTEXT_FOR_RECONFIGURE to indicate
that the context is being used for reconfiguration.

do_umount_root() is provided to consolidate remount-to-R/O for umount and
emergency remount by creating a context and invoking reconfiguration.

do_remount(), do_umount() and do_emergency_remount_callback() are switched
to use the new process.

[AV -- fold UMOUNT and EMERGENCY_REMOUNT in; fixes the
umount / bug, gets rid of pointless complexity]
[AV -- set ->net_ns in all cases; nfs remount will need that]
[AV -- shift security_sb_remount() call into reconfigure_super(); the callers
that didn't do security_sb_remount() have NULL fc->security anyway, so it's
a no-op for them]

Signed-off-by: David Howells
Co-developed-by: Al Viro
Signed-off-by: Al Viro

David Howells
2019-01-31 06:44:26 +0800