Eric Lee / smarc-fsl-linux-kernel

09 Feb, 2020

1 commit

236f45329 Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull misc vfs updates from Al Viro:

- bmap series from cmaiolino

- getting rid of convolutions in copy_mount_options() (use a couple of
copy_from_user() instead of the __get_user() crap)

* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
saner copy_mount_options()
fibmap: Reject negative block numbers
fibmap: Use bmap instead of ->bmap method in ioctl_fibmap
ecryptfs: drop direct calls to ->bmap
cachefiles: drop direct usage of ->bmap method.
fs: Enable bmap() function to properly return errors

Linus Torvalds
2020-02-09 05:04:49 +0800

05 Feb, 2020

1 commit

bddea11b1 Merge branch 'imm.timestamp' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs timestamp updates from Al Viro:
"More 64bit timestamp work"

* 'imm.timestamp' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
kernfs: don't bother with timestamp truncation
fs: Do not overload update_time
fs: Delete timespec64_trunc()
fs: ubifs: Eliminate timespec64_trunc() usage
fs: ceph: Delete timespec64_trunc() usage
fs: cifs: Delete usage of timespec64_trunc
fs: fat: Eliminate timespec64_trunc() usage
utimes: Clamp the timestamps in notify_change()

Linus Torvalds
2020-02-05 13:02:42 +0800

04 Feb, 2020

1 commit

45586c707 treewide: remove redundant IS_ERR() before error code check ... Browse Code »

'PTR_ERR(p) == -E*' is a stronger condition than IS_ERR(p).
Hence, IS_ERR(p) is unneeded.

The semantic patch that generates this commit is as follows:

//
@@
expression ptr;
constant error_code;
@@
-IS_ERR(ptr) && (PTR_ERR(ptr) == - error_code)
+PTR_ERR(ptr) == - error_code
//

Link: http://lkml.kernel.org/r/20200106045833.1725-1-masahiroy@kernel.org
Signed-off-by: Masahiro Yamada
Cc: Julia Lawall
Acked-by: Stephen Boyd [drivers/clk/clk.c]
Acked-by: Bartosz Golaszewski [GPIO]
Acked-by: Wolfram Sang [drivers/i2c]
Acked-by: Rafael J. Wysocki [acpi/scan.c]
Acked-by: Rob Herring
Cc: Eric Biggers
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Masahiro Yamada
2020-02-04 11:05:27 +0800

03 Feb, 2020

1 commit

30460e1ea fs: Enable bmap() function to properly return errors ... Browse Code »

By now, bmap() will either return the physical block number related to
the requested file offset or 0 in case of error or the requested offset
maps into a hole.
This patch makes the needed changes to enable bmap() to proper return
errors, using the return value as an error return, and now, a pointer
must be passed to bmap() to be filled with the mapped physical block.

It will change the behavior of bmap() on return:

- negative value in case of error
- zero on success or map fell into a hole

In case of a hole, the *block will be zero too

Since this is a prep patch, by now, the only error return is -EINVAL if
->bmap doesn't exist.

Reviewed-by: Christoph Hellwig
Signed-off-by: Carlos Maiolino
Signed-off-by: Al Viro

Carlos Maiolino
2020-02-03 21:05:37 +0800

31 Jan, 2020

1 commit

6e135baed Merge tag 'f2fs-for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs ... Browse Code »

Pull f2fs updates from Jaegeuk Kim:
"In this series, we've implemented transparent compression
experimentally. It supports LZO and LZ4, but will add more later as we
investigate in the field more.

At this point, the feature doesn't expose compressed space to user
directly in order to guarantee potential data updates later to the
space. Instead, the main goal is to reduce data writes to flash disk
as much as possible, resulting in extending disk life time as well as
relaxing IO congestion.

Alternatively, we're also considering to add ioctl() to reclaim
compressed space and show it to user after putting the immutable bit.

Enhancements:
- add compression support
- avoid unnecessary locks in quota ops
- harden power-cut scenario for zoned block devices
- use private bio_set to avoid IO congestion
- replace GC mutex with rwsem to serialize callers

Bug fixes:
- fix dentry consistency and memory corruption in rename()'s error case
- fix wrong swap extent reports
- fix casefolding bugs
- change lock coverage to avoid deadlock
- avoid GFP_KERNEL under f2fs_lock_op

And, we've cleaned up sysfs entries to prepare no debugfs"

* tag 'f2fs-for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (31 commits)
f2fs: fix race conditions in ->d_compare() and ->d_hash()
f2fs: fix dcache lookup of !casefolded directories
f2fs: Add f2fs stats to sysfs
f2fs: delete duplicate information on sysfs nodes
f2fs: change to use rwsem for gc_mutex
f2fs: update f2fs document regarding to fsync_mode
f2fs: add a way to turn off ipu bio cache
f2fs: code cleanup for f2fs_statfs_project()
f2fs: fix miscounted block limit in f2fs_statfs_project()
f2fs: show the CP_PAUSE reason in checkpoint traces
f2fs: fix deadlock allocating bio_post_read_ctx from mempool
f2fs: remove unneeded check for error allocating bio_post_read_ctx
f2fs: convert inline_dir early before starting rename
f2fs: fix memleak of kobject
f2fs: fix to add swap extent correctly
f2fs: run fsck when getting bad inode during GC
f2fs: support data compression
f2fs: free sysfs kobject
f2fs: declare nested quota_sem and remove unnecessary sems
f2fs: don't put new_page twice in f2fs_rename
...

Linus Torvalds
2020-01-31 07:39:24 +0800

29 Jan, 2020

1 commit

c8994374d Merge tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt ... Browse Code »

Pull fsverity updates from Eric Biggers:

- Optimize fs-verity sequential read performance by implementing
readahead of Merkle tree pages. This allows the Merkle tree to be
read in larger chunks.

- Optimize FS_IOC_ENABLE_VERITY performance in the uncached case by
implementing readahead of data pages.

- Allocate the hash requests from a mempool in order to eliminate the
possibility of allocation failures during I/O.

* tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
fs-verity: use u64_to_user_ptr()
fs-verity: use mempool for hash requests
fs-verity: implement readahead of Merkle tree pages
fs-verity: implement readahead for FS_IOC_ENABLE_VERITY

Linus Torvalds
2020-01-29 07:31:03 +0800

25 Jan, 2020

2 commits

80f2388af f2fs: fix race conditions in ->d_compare() and ->d_hash() ... Browse Code »

Since ->d_compare() and ->d_hash() can be called in RCU-walk mode,
->d_parent and ->d_inode can be concurrently modified, and in
particular, ->d_inode may be changed to NULL. For f2fs_d_hash() this
resulted in a reproducible NULL dereference if a lookup is done in a
directory being deleted, e.g. with:

int main()
{
if (fork()) {
for (;;) {
mkdir("subdir", 0700);
rmdir("subdir");
}
} else {
for (;;)
access("subdir/file", 0);
}
}

... or by running the 't_encrypted_d_revalidate' program from xfstests.
Both repros work in any directory on a filesystem with the encoding
feature, even if the directory doesn't actually have the casefold flag.

I couldn't reproduce a crash in f2fs_d_compare(), but it appears that a
similar crash is possible there.

Fix these bugs by reading ->d_parent and ->d_inode using READ_ONCE() and
falling back to the case sensitive behavior if the inode is NULL.

Reported-by: Al Viro
Fixes: 2c2eb7a300cd ("f2fs: Support case-insensitive file name lookups")
Cc: # v5.4+
Signed-off-by: Eric Biggers
Signed-off-by: Jaegeuk Kim

Eric Biggers
2020-01-25 02:04:09 +0800
5515eae64 f2fs: fix dcache lookup of !casefolded directories ... Browse Code »

Do the name comparison for non-casefolded directories correctly.

This is analogous to ext4's commit 66883da1eee8 ("ext4: fix dcache
lookup of !casefolded directories").

Fixes: 2c2eb7a300cd ("f2fs: Support case-insensitive file name lookups")
Cc: # v5.4+
Signed-off-by: Eric Biggers
Signed-off-by: Jaegeuk Kim

Eric Biggers
2020-01-25 01:53:02 +0800

24 Jan, 2020

1 commit

fc7100ea2 f2fs: Add f2fs stats to sysfs ... Browse Code »

Currently f2fs stats are only available from /d/f2fs/status. This patch
adds some of the f2fs stats to sysfs so that they are accessible even
when debugfs is not mounted.

The following sysfs nodes are added:
-/sys/fs/f2fs//free_segments
-/sys/fs/f2fs//cp_foreground_calls
-/sys/fs/f2fs//cp_background_calls
-/sys/fs/f2fs//gc_foreground_calls
-/sys/fs/f2fs//gc_background_calls
-/sys/fs/f2fs//moved_blocks_foreground
-/sys/fs/f2fs//moved_blocks_background
-/sys/fs/f2fs//avg_vblocks

Signed-off-by: Hridya Valsaraju
[Jaegeuk Kim: allow STAT_FS without DEBUG_FS]
Signed-off-by: Jaegeuk Kim

Hridya Valsaraju
2020-01-24 01:24:25 +0800

18 Jan, 2020

12 commits

fb24fea75 f2fs: change to use rwsem for gc_mutex ... Browse Code »

Mutex lock won't serialize callers, in order to avoid starving of unlucky
caller, let's use rwsem lock instead.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2020-01-18 08:48:44 +0800
d7b0a23d8 f2fs: update f2fs document regarding to fsync_mode ... Browse Code »

This patch adds missing fsync_mode entry in f2fs document.

Fixes: 04485987f053 ("f2fs: introduce async IPU policy")
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2020-01-18 08:48:44 +0800
0e7f41974 f2fs: add a way to turn off ipu bio cache ... Browse Code »

Setting 0x40 in /sys/fs/f2fs/dev/ipu_policy gives a way to turn off
bio cache, which is useufl to check whether block layer using hardware
encryption engine merges IOs correctly.

Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2020-01-18 08:48:43 +0800
bf2cbd3c5 f2fs: code cleanup for f2fs_statfs_project() ... Browse Code »

Calling min_not_zero() to simplify complicated prjquota
limit comparison in f2fs_statfs_project().

Signed-off-by: Chengguang Xu
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chengguang Xu
2020-01-18 08:48:43 +0800
acdf21721 f2fs: fix miscounted block limit in f2fs_statfs_project() ... Browse Code »

statfs calculates Total/Used/Avail disk space in block unit,
so we should translate soft/hard prjquota limit to block unit
as well.

Below testing result shows the block/inode numbers of
Total/Used/Avail from df command are all correct afer
applying this patch.

[root@localhost quota-tools]\# ./repquota -P /dev/sdb1
*** Report for project quotas on device /dev/sdb1
Block grace time: 7days; Inode grace time: 7days
Block limits File limits
Project used soft hard grace used soft hard grace
-----------------------------------------------------------
\#0 -- 4 0 0 1 0 0
\#101 -- 0 0 0 2 0 0
\#102 -- 0 10240 0 2 10 0
\#103 -- 0 0 20480 2 0 20
\#104 -- 0 10240 20480 2 10 20
\#105 -- 0 20480 10240 2 20 10

[root@localhost sdb1]\# lsattr -p t{1,2,3,4,5}
101 ----------------N-- t1/a1
102 ----------------N-- t2/a2
103 ----------------N-- t3/a3
104 ----------------N-- t4/a4
105 ----------------N-- t5/a5

[root@localhost sdb1]\# df -hi t{1,2,3,4,5}
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/sdb1 2.4M 21 2.4M 1% /mnt/sdb1
/dev/sdb1 10 2 8 20% /mnt/sdb1
/dev/sdb1 20 2 18 10% /mnt/sdb1
/dev/sdb1 10 2 8 20% /mnt/sdb1
/dev/sdb1 10 2 8 20% /mnt/sdb1

[root@localhost sdb1]\# df -h t{1,2,3,4,5}
Filesystem Size Used Avail Use% Mounted on
/dev/sdb1 10G 489M 9.6G 5% /mnt/sdb1
/dev/sdb1 10M 0 10M 0% /mnt/sdb1
/dev/sdb1 20M 0 20M 0% /mnt/sdb1
/dev/sdb1 10M 0 10M 0% /mnt/sdb1
/dev/sdb1 10M 0 10M 0% /mnt/sdb1

Fixes: 909110c060f2 ("f2fs: choose hardlimit when softlimit is larger than hardlimit in f2fs_statfs_project()")
Signed-off-by: Chengguang Xu
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chengguang Xu
2020-01-18 08:48:43 +0800
644c8c92a f2fs: fix deadlock allocating bio_post_read_ctx from mempool ... Browse Code »

Without any form of coordination, any case where multiple allocations
from the same mempool are needed at a time to make forward progress can
deadlock under memory pressure.

This is the case for struct bio_post_read_ctx, as one can be allocated
to decrypt a Merkle tree page during fsverity_verify_bio(), which itself
is running from a post-read callback for a data bio which has its own
struct bio_post_read_ctx.

Fix this by freeing first bio_post_read_ctx before calling
fsverity_verify_bio(). This works because verity (if enabled) is always
the last post-read step.

This deadlock can be reproduced by trying to read from an encrypted
verity file after reducing NUM_PREALLOC_POST_READ_CTXS to 1 and patching
mempool_alloc() to pretend that pool->alloc() always fails.

Note that since NUM_PREALLOC_POST_READ_CTXS is actually 128, to actually
hit this bug in practice would require reading from lots of encrypted
verity files at the same time. But it's theoretically possible, as N
available objects doesn't guarantee forward progress when > N/2 threads
each need 2 objects at a time.

Fixes: 95ae251fe828 ("f2fs: add fs-verity support")
Signed-off-by: Eric Biggers
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Eric Biggers
2020-01-18 08:48:43 +0800
e8ce5749d f2fs: remove unneeded check for error allocating bio_post_read_ctx ... Browse Code »

Since allocating an object from a mempool never fails when
__GFP_DIRECT_RECLAIM (which is included in GFP_NOFS) is set, the check
for failure to allocate a bio_post_read_ctx is unnecessary. Remove it.

Signed-off-by: Eric Biggers
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Eric Biggers
2020-01-18 08:48:42 +0800
b06af2aff f2fs: convert inline_dir early before starting rename ... Browse Code »

If we hit an error during rename, we'll get two dentries in different
directories.

Chao adds to check the room in inline_dir which can avoid needless
inversion. This should be done by inode_lock(&old_dir).

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2020-01-18 08:48:42 +0800
fe396ad8e f2fs: fix memleak of kobject ... Browse Code »

If kobject_init_and_add() failed, caller needs to invoke kobject_put()
to release kobject explicitly.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2020-01-18 08:48:42 +0800
3e5e479a3 f2fs: fix to add swap extent correctly ... Browse Code »

As Youling reported in mailing list:

https://www.linuxquestions.org/questions/linux-newbie-8/the-file-system-f2fs-is-broken-4175666043/

https://www.linux.org/threads/the-file-system-f2fs-is-broken.26490/

There is a test case can corrupt f2fs image:
- dd if=/dev/zero of=/swapfile bs=1M count=4096
- chmod 600 /swapfile
- mkswap /swapfile
- swapon --discard /swapfile

The root cause is f2fs_swap_activate() intends to return zero value
to setup_swap_extents() to enable SWP_FS mode (swap file goes through
fs), in this flow, setup_swap_extents() setups swap extent with wrong
block address range, result in discard_swap() erasing incorrect address.

Because f2fs_swap_activate() has pinned swapfile, its data block
address will not change, it's safe to let swap to handle IO through
raw device, so we can get rid of SWAP_FS mode and initial swap extents
inside f2fs_swap_activate(), by this way, later discard_swap() can trim
in right address range.

Fixes: 4969c06a0d83 ("f2fs: support swap file w/ DIO")
Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2020-01-18 08:48:42 +0800
4eea93e3f f2fs: run fsck when getting bad inode during GC ... Browse Code »

This is to avoid inifinite GC when trying to disable checkpoint.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2020-01-18 08:48:42 +0800
4c8ff7095 f2fs: support data compression ... Browse Code »

This patch tries to support compression in f2fs.

- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.

- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.

- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.

- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext

Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+

Changelog:

20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().

20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().

20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().

20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.

20190402
- don't preallocate blocks for compressed file.

- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.

20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR

One cluster contain 4 blocks

before overwrite after overwrite

- VVVV -> CVNN
- CVNN -> VVVV

- CVNN -> CVNN
- CVNN -> CVVV

- CVVV -> CVNN
- CVVV -> CVVV

20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.

20191101
- apply fixes from Jaegeuk

20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity

20191216
- apply fixes from Jaegeuk

20200117
- fix to avoid NULL pointer dereference

[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks

Reported-by:
Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2020-01-18 08:48:07 +0800

16 Jan, 2020

9 commits

820d36673 f2fs: free sysfs kobject ... Browse Code »

Detected kmemleak.

Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2020-01-16 05:43:49 +0800
2c4e0c528 f2fs: declare nested quota_sem and remove unnecessary sems ... Browse Code »

1.
f2fs_quota_sync
-> down_read(&sbi->quota_sem)
-> dquot_writeback_dquots
-> f2fs_dquot_commit
-> down_read(&sbi->quota_sem)

2.
f2fs_quota_sync
-> down_read(&sbi->quota_sem)
-> f2fs_write_data_pages
-> f2fs_write_single_data_page
-> down_write(&F2FS_I(inode)->i_sem)

f2fs_mkdir
-> f2fs_do_add_link
-> down_write(&F2FS_I(inode)->i_sem)
-> f2fs_init_inode_metadata
-> f2fs_new_node_page
-> dquot_alloc_inode
-> f2fs_dquot_mark_dquot_dirty
-> down_read(&sbi->quota_sem)

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2020-01-16 05:43:49 +0800
762e4db54 f2fs: don't put new_page twice in f2fs_rename ... Browse Code »

In f2fs_rename(), new_page is gone after f2fs_set_link(), but it tries
to put again when whiteout is failed and jumped to put_out_dir.

Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2020-01-16 05:43:49 +0800
5b1dbb082 f2fs: set I_LINKABLE early to avoid wrong access by vfs ... Browse Code »

This patch moves setting I_LINKABLE early in rename2(whiteout) to avoid the
below warning.

[ 3189.163385] WARNING: CPU: 3 PID: 59523 at fs/inode.c:358 inc_nlink+0x32/0x40
[ 3189.246979] Call Trace:
[ 3189.248707] f2fs_init_inode_metadata+0x2d6/0x440 [f2fs]
[ 3189.251399] f2fs_add_inline_entry+0x162/0x8c0 [f2fs]
[ 3189.254010] f2fs_add_dentry+0x69/0xe0 [f2fs]
[ 3189.256353] f2fs_do_add_link+0xc5/0x100 [f2fs]
[ 3189.258774] f2fs_rename2+0xabf/0x1010 [f2fs]
[ 3189.261079] vfs_rename+0x3f8/0xaa0
[ 3189.263056] ? tomoyo_path_rename+0x44/0x60
[ 3189.265283] ? do_renameat2+0x49b/0x550
[ 3189.267324] do_renameat2+0x49b/0x550
[ 3189.269316] __x64_sys_renameat2+0x20/0x30
[ 3189.271441] do_syscall_64+0x5a/0x230
[ 3189.273410] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 3189.275848] RIP: 0033:0x7f270b4d9a49

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2020-01-16 05:43:48 +0800
542989b67 f2fs: don't keep META_MAPPING pages used for moving verity file blocks ... Browse Code »

META_MAPPING is used to move blocks for both encrypted and verity files.
So the META_MAPPING invalidation condition in do_checkpoint() should
consider verity too, not just encrypt.

Signed-off-by: Eric Biggers
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Eric Biggers
2020-01-16 05:43:48 +0800
f543805fc f2fs: introduce private bioset ... Browse Code »

In low memory scenario, we can allocate multiple bios without
submitting any of them.

- f2fs_write_checkpoint()
- block_operations()
- f2fs_sync_node_pages()
step 1) flush cold nodes, allocate new bio from mempool
- bio_alloc()
- mempool_alloc()
step 2) flush hot nodes, allocate a bio from mempool
- bio_alloc()
- mempool_alloc()
step 3) flush warm nodes, be stuck in below call path
- bio_alloc()
- mempool_alloc()
- loop to wait mempool element release, as we only
reserved memory for two bio allocation, however above
allocated two bios may never be submitted.

So we need avoid using default bioset, in this patch we introduce a
private bioset, in where we enlarg mempool element count to total
number of log header, so that we can make sure we have enough
backuped memory pool in scenario of allocating/holding multiple
bios.

Signed-off-by: Gao Xiang
Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2020-01-16 05:43:48 +0800
0e6d01643 f2fs: cleanup duplicate stats for atomic files ... Browse Code »

Remove duplicate sbi->aw_cnt stats counter that tracks
the number of atomic files currently opened (it also shows
incorrect value sometimes). Use more relit lable sbi->atomic_files
to show in the stats.

Signed-off-by: Sahitya Tummala
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Sahitya Tummala
2020-01-16 05:43:48 +0800
d508c94e4 f2fs: Check write pointer consistency of non-open zones ... Browse Code »

To catch f2fs bugs in write pointer handling code for zoned block
devices, check write pointers of non-open zones that current segments do
not point to. Do this check at mount time, after the fsync data recovery
and current segments' write pointer consistency fix. Or when fsync data
recovery is disabled by mount option, do the check when there is no fsync
data.

Check two items comparing write pointers with valid block maps in SIT.
The first item is check for zones with no valid blocks. When there is no
valid blocks in a zone, the write pointer should be at the start of the
zone. If not, next write operation to the zone will cause unaligned write
error. If write pointer is not at the zone start, reset the write pointer
to place at the zone start.

The second item is check between the write pointer position and the last
valid block in the zone. It is unexpected that the last valid block
position is beyond the write pointer. In such a case, report as a bug.
Fix is not required for such zone, because the zone is not selected for
next write operation until the zone get discarded.

Signed-off-by: Shin'ichiro Kawasaki
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Shin'ichiro Kawasaki
2020-01-16 05:43:48 +0800
c426d9912 f2fs: Check write pointer consistency of open zones ... Browse Code »

On sudden f2fs shutdown, write pointers of zoned block devices can go
further but f2fs meta data keeps current segments at positions before the
write operations. After remounting the f2fs, this inconsistency causes
write operations not at write pointers and "Unaligned write command"
error is reported.

To avoid the error, compare current segments with write pointers of open
zones the current segments point to, during mount operation. If the write
pointer position is not aligned with the current segment position, assign
a new zone to the current segment. Also check the newly assigned zone has
write pointer at zone start. If not, reset write pointer of the zone.

Perform the consistency check during fsync recovery. Not to lose the
fsync data, do the check after fsync data gets restored and before
checkpoint commit which flushes data at current segment positions. Not to
cause conflict with kworker's dirfy data/node flush, do the fix within
SBI_POR_DOING protection.

Signed-off-by: Shin'ichiro Kawasaki
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Shin'ichiro Kawasaki
2020-01-16 05:42:14 +0800

15 Jan, 2020

1 commit

fd39073db fs-verity: implement readahead of Merkle tree pages ... Browse Code »

When fs-verity verifies data pages, currently it reads each Merkle tree
page synchronously using read_mapping_page().

Therefore, when the Merkle tree pages aren't already cached, fs-verity
causes an extra 4 KiB I/O request for every 512 KiB of data (assuming
that the Merkle tree uses SHA-256 and 4 KiB blocks). This results in
more I/O requests and performance loss than is strictly necessary.

Therefore, implement readahead of the Merkle tree pages.

For simplicity, we take advantage of the fact that the kernel already
does readahead of the file's *data*, just like it does for any other
file. Due to this, we don't really need a separate readahead state
(struct file_ra_state) just for the Merkle tree, but rather we just need
to piggy-back on the existing data readahead requests.

We also only really need to bother with the first level of the Merkle
tree, since the usual fan-out factor is 128, so normally over 99% of
Merkle tree I/O requests are for the first level.

Therefore, make fsverity_verify_bio() enable readahead of the first
Merkle tree level, for up to 1/4 the number of pages in the bio, when it
sees that the REQ_RAHEAD flag is set on the bio. The readahead size is
then passed down to ->read_merkle_tree_page() for the filesystem to
(optionally) implement if it sees that the requested page is uncached.

While we're at it, also make build_merkle_tree_level() set the Merkle
tree readahead size, since it's easy to do there.

However, for now don't set the readahead size in fsverity_verify_page(),
since currently it's only used to verify holes on ext4 and f2fs, and it
would need parameters added to know how much to read ahead.

This patch significantly improves fs-verity sequential read performance.
Some quick benchmarks with 'cat'-ing a 250MB file after dropping caches:

On an ARM64 phone (using sha256-ce):
Before: 217 MB/s
After: 263 MB/s
(compare to sha256sum of non-verity file: 357 MB/s)

In an x86_64 VM (using sha256-avx2):
Before: 173 MB/s
After: 215 MB/s
(compare to sha256sum of non-verity file: 223 MB/s)

Link: https://lore.kernel.org/r/20200106205533.137005-1-ebiggers@kernel.org
Reviewed-by: Theodore Ts'o
Signed-off-by: Eric Biggers

Eric Biggers
2020-01-15 05:27:32 +0800

01 Jan, 2020

2 commits

ede7a09fc fscrypt: Allow modular crypto algorithms ... Browse Code »

The commit 643fa9612bf1 ("fscrypt: remove filesystem specific
build config option") removed modular support for fs/crypto. This
causes the Crypto API to be built-in whenever fscrypt is enabled.
This makes it very difficult for me to test modular builds of
the Crypto API without disabling fscrypt which is a pain.

As fscrypt is still evolving and it's developing new ties with the
fs layer, it's hard to build it as a module for now.

However, the actual algorithms are not required until a filesystem
is mounted. Therefore we can allow them to be built as modules.

Signed-off-by: Herbert Xu
Link: https://lore.kernel.org/r/20191227024700.7vrzuux32uyfdgum@gondor.apana.org.au
Signed-off-by: Eric Biggers

Herbert Xu
2020-01-01 00:33:51 +0800
3b1ada55b fscrypt: don't check for ENOKEY from fscrypt_get_encryption_info() ... Browse Code »

fscrypt_get_encryption_info() returns 0 if the encryption key is
unavailable; it never returns ENOKEY. So remove checks for ENOKEY.

Link: https://lore.kernel.org/r/20191209212348.243331-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers

Eric Biggers
2020-01-01 00:33:51 +0800

13 Dec, 2019

3 commits

dd973007b f2fs: set GFP_NOFS when moving inline dentries ... Browse Code »

Otherwise, it can cause circular locking dependency reported by mm.

Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2019-12-13 05:24:34 +0800
4f4460c08 f2fs: should avoid recursive filesystem ops ... Browse Code »

We need to use GFP_NOFS, since we did f2fs_lock_op().

Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2019-12-13 05:24:34 +0800
3f188c23d f2fs: keep quota data on write_begin failure ... Browse Code »

This patch avoids some unnecessary locks for quota files when write_begin
fails.

Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2019-12-13 05:24:34 +0800

11 Dec, 2019

1 commit

bdf032992 f2fs: call f2fs_balance_fs outside of locked page ... Browse Code »

Otherwise, we can hit deadlock by waiting for the locked page in
move_data_block in GC.

Thread A Thread B
- do_page_mkwrite
- f2fs_vm_page_mkwrite
- lock_page
- f2fs_balance_fs
- mutex_lock(gc_mutex)
- f2fs_gc
- do_garbage_collect
- ra_data_block
- grab_cache_page
- f2fs_balance_fs
- mutex_lock(gc_mutex)

Fixes: 39a8695824510 ("f2fs: refactor ->page_mkwrite() flow")
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2019-12-11 08:03:55 +0800

10 Dec, 2019

1 commit

47501f87c f2fs: preallocate DIO blocks when forcing buffered_io ... Browse Code »

The previous preallocation and DIO decision like below.

allow_outplace_dio !allow_outplace_dio
f2fs_force_buffered_io (*) No_Prealloc / Buffered_IO Prealloc / Buffered_IO
!f2fs_force_buffered_io No_Prealloc / DIO Prealloc / DIO

But, Javier reported Case (*) where zoned device bypassed preallocation but
fell back to buffered writes in f2fs_direct_IO(), resulting in stale data
being read.

In order to fix the issue, actually we need to preallocate blocks whenever
we fall back to buffered IO like this. No change is made in the other cases.

allow_outplace_dio !allow_outplace_dio
f2fs_force_buffered_io (*) Prealloc / Buffered_IO Prealloc / Buffered_IO
!f2fs_force_buffered_io No_Prealloc / DIO Prealloc / DIO

Reported-and-tested-by: Javier Gonzalez
Signed-off-by: Damien Le Moal
Tested-by: Shin'ichiro Kawasaki
Reviewed-by: Chao Yu
Reviewed-by: Javier González
Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2019-12-10 07:57:45 +0800

09 Dec, 2019

1 commit

eb31e2f63 utimes: Clamp the timestamps in notify_change() ... Browse Code »

Push clamping timestamps into notify_change(), so in-kernel
callers like nfsd and overlayfs will get similar timestamp
set behavior as utimes.

AV: get rid of clamping in ->setattr() instances; we don't need
to bother with that there, with notify_change() doing normalization
in all cases now (it already did for implicit case, since current_time()
clamps).

Suggested-by: Miklos Szeredi
Fixes: 42e729b9ddbb ("utimes: Clamp the timestamps before update")
Cc: stable@vger.kernel.org # v5.4
Cc: Deepa Dinamani
Cc: Jeff Layton
Signed-off-by: Amir Goldstein
Signed-off-by: Al Viro

Amir Goldstein
2019-12-09 08:10:50 +0800

02 Dec, 2019

1 commit

0da522107 Merge tag 'compat-ioctl-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground ... Browse Code »

Pull removal of most of fs/compat_ioctl.c from Arnd Bergmann:
"As part of the cleanup of some remaining y2038 issues, I came to
fs/compat_ioctl.c, which still has a couple of commands that need
support for time64_t.

In completely unrelated work, I spent time on cleaning up parts of
this file in the past, moving things out into drivers instead.

After Al Viro reviewed an earlier version of this series and did a lot
more of that cleanup, I decided to try to completely eliminate the
rest of it and move it all into drivers.

This series incorporates some of Al's work and many patches of my own,
but in the end stops short of actually removing the last part, which
is the scsi ioctl handlers. I have patches for those as well, but they
need more testing or possibly a rewrite"

* tag 'compat-ioctl-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: (42 commits)
scsi: sd: enable compat ioctls for sed-opal
pktcdvd: add compat_ioctl handler
compat_ioctl: move SG_GET_REQUEST_TABLE handling
compat_ioctl: ppp: move simple commands into ppp_generic.c
compat_ioctl: handle PPPIOCGIDLE for 64-bit time_t
compat_ioctl: move PPPIOCSCOMPRESS to ppp_generic
compat_ioctl: unify copy-in of ppp filters
tty: handle compat PPP ioctls
compat_ioctl: move SIOCOUTQ out of compat_ioctl.c
compat_ioctl: handle SIOCOUTQNSD
af_unix: add compat_ioctl support
compat_ioctl: reimplement SG_IO handling
compat_ioctl: move WDIOC handling into wdt drivers
fs: compat_ioctl: move FITRIM emulation into file systems
gfs2: add compat_ioctl support
compat_ioctl: remove unused convert_in_user macro
compat_ioctl: remove last RAID handling code
compat_ioctl: remove /dev/raw ioctl translation
compat_ioctl: remove PCI ioctl translation
compat_ioctl: remove joystick ioctl translation
...

Linus Torvalds
2019-12-02 05:46:15 +0800