Eric Lee / smarc-fsl-linux-kernel

30 May, 2020

1 commit

ca7f76e68 f2fs: fix wrong discard space ... Browse Code »

Under heavy fsstress, we may triggle panic while issuing discard,
because __check_sit_bitmap() detects that discard command may earse
valid data blocks, the root cause is as below race stack described,
since we removed lock when flushing quota data, quota data writeback
may race with write_checkpoint(), so that it causes inconsistency in
between cached discard entry and segment bitmap.

- f2fs_write_checkpoint
- block_operations
- set_sbi_flag(sbi, SBI_QUOTA_SKIP_FLUSH)
- f2fs_flush_sit_entries
- add_discard_addrs
- __set_bit_le(i, (void *)de->discard_map);
- f2fs_write_data_pages
- f2fs_write_single_data_page
: inode is quota one, cp_rwsem won't be locked
- f2fs_do_write_data_page
- f2fs_allocate_data_block
- f2fs_wait_discard_bio
: discard entry has not been added yet.
- update_sit_entry
- f2fs_clear_prefree_segments
- f2fs_issue_discard
: add discard entry

In order to fix this, this patch uses node_write to serialize
f2fs_allocate_data_block and checkpoint.

Fixes: 435cbab95e39 ("f2fs: fix quota_sync failure due to f2fs_lock_op")
Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2020-05-30 23:17:52 +0800

29 May, 2020

1 commit

47d0d7d76 f2fs: remove unneeded return value of __insert_discard_tree() ... Browse Code »

We never use return value of __insert_discard_tree(), so remove it.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2020-05-29 03:00:43 +0800

18 Apr, 2020

3 commits

df4233997 f2fs: Fix the accounting of dcc->undiscard_blks ... Browse Code »

When a discard_cmd needs to be split due to dpolicy->max_requests, then
for the remaining length it will be either merged into another cmd or a
new discard_cmd will be created. In this case, there is double
accounting of dcc->undiscard_blks for the remaining len, due to which
it shows incorrect value in stats.

Signed-off-by: Sahitya Tummala
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Sahitya Tummala
2020-04-18 00:17:00 +0800
3fa6a8c5b f2fs: report the discard cmd errors properly ... Browse Code »

In case a discard_cmd is split into several bios, the dc->error
must not be overwritten once an error is reported by a bio. Also,
move it under dc->lock.

Signed-off-by: Sahitya Tummala
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Sahitya Tummala
2020-04-18 00:17:00 +0800
141af6ba5 f2fs: fix long latency due to discard during umount ... Browse Code »

F2FS already has a default timeout of 5 secs for discards that
can be issued during umount, but it can take more than the 5 sec
timeout if the underlying UFS device queue is already full and there
are no more available free tags to be used. Fix this by submitting a
small batch of discard requests so that it won't cause the device
queue to be full at any time and thus doesn't incur its wait time
in the umount context.

Signed-off-by: Sahitya Tummala
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Sahitya Tummala
2020-04-18 00:17:00 +0800

04 Apr, 2020

1 commit

6ce48b0c6 f2fs: switch discard_policy.timeout to bool type ... Browse Code »

While checking discard timeout, we use specified type
UMOUNT_DISCARD_TIMEOUT, so just replace doplicy.timeout with
it, and switch doplicy.timeout to bool type.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2020-04-04 01:21:31 +0800

31 Mar, 2020

1 commit

7bcd0cfa7 f2fs: don't trigger data flush in foreground operation ... Browse Code »

Data flush can generate heavy IO and cause long latency during
flush, so it's not appropriate to trigger it in foreground
operation.

And also, we may face below potential deadlock during data flush:
- f2fs_write_multi_pages
- f2fs_write_raw_pages
- f2fs_write_single_data_page
- f2fs_balance_fs
- f2fs_balance_fs_bg
- f2fs_sync_dirty_inodes
- filemap_fdatawrite -- stuck on flush same cluster

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2020-03-31 11:46:24 +0800

20 Mar, 2020

4 commits

985100035 f2fs: add prefix for f2fs slab cache name ... Browse Code »

In order to avoid polluting global slab cache namespace.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2020-03-20 02:41:26 +0800
5df7731f6 f2fs: introduce DEFAULT_IO_TIMEOUT ... Browse Code »

As Geert Uytterhoeven reported:

for parameter HZ/50 in congestion_wait(BLK_RW_ASYNC, HZ/50);

On some platforms, HZ can be less than 50, then unexpected 0 timeout
jiffies will be set in congestion_wait().

This patch introduces a macro DEFAULT_IO_TIMEOUT to wrap a determinate
value with msecs_to_jiffies(20) to instead HZ/50 to avoid such issue.

Quoted from Geert Uytterhoeven:

"A timeout of HZ means 1 second.
HZ/50 means 20 ms, but has the risk of being zero, if HZ < 50.

If you want to use a timeout of 20 ms, you best use msecs_to_jiffies(20),
as that takes care of the special cases, and never returns 0."

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2020-03-20 02:41:26 +0800
b0332a0f9 f2fs: clean up lfs/adaptive mount option ... Browse Code »

This patch removes F2FS_MOUNT_ADAPTIVE and F2FS_MOUNT_LFS mount options,
and add F2FS_OPTION.fs_mode with below two status to indicate filesystem
mode.

enum {
FS_MODE_ADAPTIVE, /* use both lfs/ssr allocation */
FS_MODE_LFS, /* use lfs allocation only */
};

It can enhance code readability and fs mode's scalability.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2020-03-20 02:41:25 +0800
a7e679b53 f2fs: show mounted time ... Browse Code »

Let's show mounted time.

Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2020-03-20 02:41:25 +0800

18 Jan, 2020

3 commits

fb24fea75 f2fs: change to use rwsem for gc_mutex ... Browse Code »

Mutex lock won't serialize callers, in order to avoid starving of unlucky
caller, let's use rwsem lock instead.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2020-01-18 08:48:44 +0800
0e7f41974 f2fs: add a way to turn off ipu bio cache ... Browse Code »

Setting 0x40 in /sys/fs/f2fs/dev/ipu_policy gives a way to turn off
bio cache, which is useufl to check whether block layer using hardware
encryption engine merges IOs correctly.

Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2020-01-18 08:48:43 +0800
4c8ff7095 f2fs: support data compression ... Browse Code »

This patch tries to support compression in f2fs.

- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.

- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.

- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.

- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext

Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+

Changelog:

20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().

20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().

20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().

20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.

20190402
- don't preallocate blocks for compressed file.

- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.

20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR

One cluster contain 4 blocks

before overwrite after overwrite

- VVVV -> CVNN
- CVNN -> VVVV

- CVNN -> CVNN
- CVNN -> CVVV

- CVVV -> CVNN
- CVVV -> CVVV

20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.

20191101
- apply fixes from Jaegeuk

20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity

20191216
- apply fixes from Jaegeuk

20200117
- fix to avoid NULL pointer dereference

[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks

Reported-by:
Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2020-01-18 08:48:07 +0800

16 Jan, 2020

3 commits

0e6d01643 f2fs: cleanup duplicate stats for atomic files ... Browse Code »

Remove duplicate sbi->aw_cnt stats counter that tracks
the number of atomic files currently opened (it also shows
incorrect value sometimes). Use more relit lable sbi->atomic_files
to show in the stats.

Signed-off-by: Sahitya Tummala
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Sahitya Tummala
2020-01-16 05:43:48 +0800
d508c94e4 f2fs: Check write pointer consistency of non-open zones ... Browse Code »

To catch f2fs bugs in write pointer handling code for zoned block
devices, check write pointers of non-open zones that current segments do
not point to. Do this check at mount time, after the fsync data recovery
and current segments' write pointer consistency fix. Or when fsync data
recovery is disabled by mount option, do the check when there is no fsync
data.

Check two items comparing write pointers with valid block maps in SIT.
The first item is check for zones with no valid blocks. When there is no
valid blocks in a zone, the write pointer should be at the start of the
zone. If not, next write operation to the zone will cause unaligned write
error. If write pointer is not at the zone start, reset the write pointer
to place at the zone start.

The second item is check between the write pointer position and the last
valid block in the zone. It is unexpected that the last valid block
position is beyond the write pointer. In such a case, report as a bug.
Fix is not required for such zone, because the zone is not selected for
next write operation until the zone get discarded.

Signed-off-by: Shin'ichiro Kawasaki
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Shin'ichiro Kawasaki
2020-01-16 05:43:48 +0800
c426d9912 f2fs: Check write pointer consistency of open zones ... Browse Code »

On sudden f2fs shutdown, write pointers of zoned block devices can go
further but f2fs meta data keeps current segments at positions before the
write operations. After remounting the f2fs, this inconsistency causes
write operations not at write pointers and "Unaligned write command"
error is reported.

To avoid the error, compare current segments with write pointers of open
zones the current segments point to, during mount operation. If the write
pointer position is not aligned with the current segment position, assign
a new zone to the current segment. Also check the newly assigned zone has
write pointer at zone start. If not, reset write pointer of the zone.

Perform the consistency check during fsync recovery. Not to lose the
fsync data, do the check after fsync data gets restored and before
checkpoint commit which flushes data at current segment positions. Not to
cause conflict with kworker's dirfy data/node flush, do the fix within
SBI_POR_DOING protection.

Signed-off-by: Shin'ichiro Kawasaki
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Shin'ichiro Kawasaki
2020-01-16 05:42:14 +0800

01 Dec, 2019

1 commit

8f45533e9 Merge tag 'f2fs-for-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs ... Browse Code »

Pull f2fs updates from Jaegeuk Kim:
"In this round, we've introduced fairly small number of patches as below.

Enhancements:
- improve the in-place-update IO flow
- allocate segment to guarantee no GC for pinned files

Bug fixes:
- fix updatetime in lazytime mode
- potential memory leak in f2fs_listxattr
- record parent inode number in rename2 correctly
- fix deadlock in f2fs_gc along with atomic writes
- avoid needless data migration in GC"

* tag 'f2fs-for-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs:
f2fs: stop GC when the victim becomes fully valid
f2fs: expose main_blkaddr in sysfs
f2fs: choose hardlimit when softlimit is larger than hardlimit in f2fs_statfs_project()
f2fs: Fix deadlock in f2fs_gc() context during atomic files handling
f2fs: show f2fs instance in printk_ratelimited
f2fs: fix potential overflow
f2fs: fix to update dir's i_pino during cross_rename
f2fs: support aligned pinned file
f2fs: avoid kernel panic on corruption test
f2fs: fix wrong description in document
f2fs: cache global IPU bio
f2fs: fix to avoid memory leakage in f2fs_listxattr
f2fs: check total_segments from devices in raw_super
f2fs: update multi-dev metadata in resize_fs
f2fs: mark recovery flag correctly in read_raw_super_block()
f2fs: fix to update time in lazytime mode

Linus Torvalds
2019-12-01 03:02:30 +0800

20 Nov, 2019

2 commits

677017d19 f2fs: Fix deadlock in f2fs_gc() context during atomic files handling ... Browse Code »

The FS got stuck in the below stack when the storage is almost
full/dirty condition (when FG_GC is being done).

schedule_timeout
io_schedule_timeout
congestion_wait
f2fs_drop_inmem_pages_all
f2fs_gc
f2fs_balance_fs
__write_node_page
f2fs_fsync_node_pages
f2fs_do_sync_file
f2fs_ioctl

The root cause for this issue is there is a potential infinite loop
in f2fs_drop_inmem_pages_all() for the case where gc_failure is true
and when there an inode whose i_gc_failures[GC_FAILURE_ATOMIC] is
not set. Fix this by keeping track of the total atomic files
currently opened and using that to exit from this condition.

Fix-suggested-by: Chao Yu
Signed-off-by: Chao Yu
Signed-off-by: Sahitya Tummala
Signed-off-by: Jaegeuk Kim

Sahitya Tummala
2019-11-20 06:41:21 +0800
c45d6002f f2fs: show f2fs instance in printk_ratelimited ... Browse Code »

As Eric mentioned, bare printk{,_ratelimited} won't show which
filesystem instance these message is coming from, this patch tries
to show fs instance with sb->s_id field in all places we missed
before.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2019-11-20 06:41:21 +0800

08 Nov, 2019

1 commit

f5a53edcf f2fs: support aligned pinned file ... Browse Code »

This patch supports 2MB-aligned pinned file, which can guarantee no GC at all
by allocating fully valid 2MB segment.

Check free segments by has_not_enough_free_secs() with large budget.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2019-11-08 02:40:59 +0800

07 Nov, 2019

1 commit

6c1b1da58 block: add zone open, close and finish operations ... Browse Code »

Zoned block devices (ZBC and ZAC devices) allow an explicit control
over the condition (state) of zones. The operations allowed are:
* Open a zone: Transition to open condition to indicate that a zone will
actively be written
* Close a zone: Transition to closed condition to release the drive
resources used for writing to a zone
* Finish a zone: Transition an open or closed zone to the full
condition to prevent write operations

To enable this control for in-kernel zoned block device users, define
the new request operations REQ_OP_ZONE_OPEN, REQ_OP_ZONE_CLOSE
and REQ_OP_ZONE_FINISH as well as the generic function
blkdev_zone_mgmt() for submitting these operations on a range of zones.
This results in blkdev_reset_zones() removal and replacement with this
new zone magement function. Users of blkdev_reset_zones() (f2fs and
dm-zoned) are updated accordingly.

Contains contributions from Matias Bjorling, Hans Holmberg,
Dmitry Fomichev, Keith Busch, Damien Le Moal and Christoph Hellwig.

Reviewed-by: Javier González
Reviewed-by: Christoph Hellwig
Signed-off-by: Ajay Joshi
Signed-off-by: Matias Bjorling
Signed-off-by: Hans Holmberg
Signed-off-by: Dmitry Fomichev
Signed-off-by: Keith Busch
Signed-off-by: Damien Le Moal
Signed-off-by: Jens Axboe

Ajay Joshi
2019-11-07 21:31:48 +0800

26 Oct, 2019

1 commit

0b20fcec8 f2fs: cache global IPU bio ... Browse Code »

In commit 8648de2c581e ("f2fs: add bio cache for IPU"), we added
f2fs_submit_ipu_bio() in __write_data_page() as below:

__write_data_page()

if (!S_ISDIR(inode->i_mode) && !IS_NOQUOTA(inode)) {
f2fs_submit_ipu_bio(sbi, bio, page);
....
}

in order to avoid below deadlock:

Thread A Thread B
- __write_data_page (inode x, page y)
- f2fs_do_write_data_page
- set_page_writeback ---- set writeback flag in page y
- f2fs_inplace_write_data
- f2fs_balance_fs
- lock gc_mutex
- lock gc_mutex
- f2fs_gc
- do_garbage_collect
- gc_data_segment
- move_data_page
- f2fs_wait_on_page_writeback
- wait_on_page_writeback --- wait writeback of page y

However, the bio submission breaks the merge of IPU IOs.

So in this patch let's add a global bio cache for merged IPU pages,
then f2fs_wait_on_page_writeback() is able to submit bio if a
writebacked page is cached in global bio cache.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2019-10-26 00:52:03 +0800

16 Sep, 2019

2 commits

8223ecc45 f2fs: fix to add missing F2FS_IO_ALIGNED() condition ... Browse Code »

In f2fs_allocate_data_block(), we will reset fio.retry for IO
alignment feature instead of IO serialization feature.

In addition, spread F2FS_IO_ALIGNED() to check IO alignment
feature status explicitly.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2019-09-16 23:38:49 +0800
743b620cb f2fs: avoid infinite GC loop due to stale atomic files ... Browse Code »

If committing atomic pages is failed when doing f2fs_do_sync_file(), we can
get commited pages but atomic_file being still set like:

- inmem: 0, atomic IO: 4 (Max. 10), volatile IO: 0 (Max. 0)

If GC selects this block, we can get an infinite loop like this:

f2fs_submit_page_bio: dev = (253,7), ino = 2, page_index = 0x2359a8, oldaddr = 0x2359a8, newaddr = 0x2359a8, rw = READ(), type = COLD_DATA
f2fs_submit_read_bio: dev = (253,7)/(253,7), rw = READ(), DATA, sector = 18533696, size = 4096
f2fs_get_victim: dev = (253,7), type = No TYPE, policy = (Foreground GC, LFS-mode, Greedy), victim = 4355, cost = 1, ofs_unit = 1, pre_victim_secno = 4355, prefree = 0, free = 234
f2fs_iget: dev = (253,7), ino = 6247, pino = 5845, i_mode = 0x81b0, i_size = 319488, i_nlink = 1, i_blocks = 624, i_advise = 0x2c
f2fs_submit_page_bio: dev = (253,7), ino = 2, page_index = 0x2359a8, oldaddr = 0x2359a8, newaddr = 0x2359a8, rw = READ(), type = COLD_DATA
f2fs_submit_read_bio: dev = (253,7)/(253,7), rw = READ(), DATA, sector = 18533696, size = 4096
f2fs_get_victim: dev = (253,7), type = No TYPE, policy = (Foreground GC, LFS-mode, Greedy), victim = 4355, cost = 1, ofs_unit = 1, pre_victim_secno = 4355, prefree = 0, free = 234
f2fs_iget: dev = (253,7), ino = 6247, pino = 5845, i_mode = 0x81b0, i_size = 319488, i_nlink = 1, i_blocks = 624, i_advise = 0x2c

In that moment, we can observe:

[Before]
Try to move 5084219 blocks (BG: 384508)
- data blocks : 4962373 (274483)
- node blocks : 121846 (110025)
Skipped : atomic write 4534686 (10)

[After]
Try to move 5088973 blocks (BG: 384508)
- data blocks : 4967127 (274483)
- node blocks : 121846 (110025)
Skipped : atomic write 4539440 (10)

So, refactor atomic_write flow like this:
1. start_atomic_write
- add inmem_list and set atomic_file

2. write()
- register it in inmem_pages

3. commit_atomic_write
- if no error, f2fs_drop_inmem_pages()
- f2fs_commit_inmme_pages() failed
: __revoked_inmem_pages() was done
- f2fs_do_sync_file failed
: abort_atomic_write later

4. abort_atomic_write
- f2fs_drop_inmem_pages

5. f2fs_drop_inmem_pages
- clear atomic_file
- remove inmem_list

Based on this change, when GC fails to move block in atomic_file,
f2fs_drop_inmem_pages_all() can call f2fs_drop_inmem_pages().

Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2019-09-16 23:38:20 +0800

07 Sep, 2019

1 commit

00e09c0bc f2fs: enhance f2fs_is_checkpoint_ready()'s readability ... Browse Code »

This patch changes sematics of f2fs_is_checkpoint_ready()'s return
value as: return true when checkpoint is ready, other return false,
it can improve readability of below conditions.

f2fs_submit_page_write()
...
if (is_sbi_flag_set(sbi, SBI_IS_SHUTDOWN) ||
!f2fs_is_checkpoint_ready(sbi))
__submit_merged_bio(io);

f2fs_balance_fs()
...
if (!f2fs_is_checkpoint_ready(sbi))
return;

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2019-09-07 07:18:26 +0800

23 Aug, 2019

6 commits

bbf9f7d90 f2fs: Fix indefinite loop in f2fs_gc() ... Browse Code »

Policy - Foreground GC, LFS and greedy GC mode.

Under this policy, f2fs_gc() loops forever to GC as it doesn't have
enough free segements to proceed and thus it keeps calling gc_more
for the same victim segment. This can happen if the selected victim
segment could not be GC'd due to failed blkaddr validity check i.e.
is_alive() returns false for the blocks set in current validity map.

Fix this by keeping track of such invalid segments and skip those
segments for selection in get_victim_by_default() to avoid endless
GC loop under such error scenarios. Currently, add this logic under
CONFIG_F2FS_CHECK_FS to be able to root cause the issue in debug
version.

Signed-off-by: Sahitya Tummala
Reviewed-by: Chao Yu
[Jaegeuk Kim: fix wrong bitmap size]
Signed-off-by: Jaegeuk Kim

Sahitya Tummala
2019-08-23 22:57:15 +0800
2fde3dd14 f2fs: allocate memory in batch in build_sit_info() ... Browse Code »

build_sit_info() allocate all bitmaps for each segment one by one,
it's quite low efficiency, this pach changes to allocate large
continuous memory at a time, and divide it and assign for each bitmaps
of segment. For large size image, it can expect improving its mount
speed.

Signed-off-by: Chen Gong
Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2019-08-23 22:57:15 +0800
899fee36f f2fs: fix to avoid data corruption by forbidding SSR overwrite ... Browse Code »

There is one case can cause data corruption.

- write 4k to fileA
- fsync fileA, 4k data is writebacked to lbaA
- write 4k to fileA
- kworker flushs 4k to lbaB; dnode contain lbaB didn't be persisted yet
- write 4k to fileB
- kworker flush 4k to lbaA due to SSR
- SPOR -> dnode with lbaA will be recovered, however lbaA contains fileB's
data

One solution is tracking all fsynced file's block history, and disallow
SSR overwrite on newly invalidated block on that file.

However, during recovery, no matter the dnode is flushed or fsynced, all
previous dnodes until last fsynced one in node chain can be recovered,
that means we need to record all block change in flushed dnode, which
will cause heavy cost, so let's just use simple fix by forbidding SSR
overwrite directly.

Fixes: 5b6c6be2d878 ("f2fs: use SSR for warm node as well")
Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2019-08-23 22:57:14 +0800
a37d0862d Revert "f2fs: avoid out-of-range memory access" ... Browse Code »

As Pavel Machek reported:

"We normally use -EUCLEAN to signal filesystem corruption. Plus, it is
good idea to report it to the syslog and mark filesystem as "needing
fsck" if filesystem can do that."

Still we need improve the original patch with:
- use unlikely keyword
- add message print
- return EUCLEAN

However, after rethink this patch, I don't think we should add such
condition check here as below reasons:
- We have already checked the field in f2fs_sanity_check_ckpt(),
- If there is fs corrupt or security vulnerability, there is nothing
to guarantee the field is integrated after the check, unless we do
the check before each of its use, however no filesystem does that.
- We only have similar check for bitmap, which was added due to there
is bitmap corruption happened on f2fs' runtime in product.
- There are so many key fields in SB/CP/NAT did have such check
after f2fs_sanity_check_{sb,cp,..}.

So I propose to revert this unneeded check.

This reverts commit 56f3ce675103e3fb9e631cfb4131fc768bc23e9a.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2019-08-23 22:57:14 +0800
290c30d44 f2fs: cleanup the code in build_sit_entries. ... Browse Code »

We do not need to set the SBI_NEED_FSCK flag in the error paths, if we
return error here, we will not update the checkpoint flag, so the code
is useless, just remove it.

Signed-off-by: Lihong Kou
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Lihong Kou
2019-08-23 22:57:14 +0800
04f9287ab f2fs: fix to avoid discard command leak ... Browse Code »

=============================================================================
BUG discard_cmd (Tainted: G B OE ): Objects remaining in discard_cmd on __kmem_cache_shutdown()
-----------------------------------------------------------------------------

INFO: Slab 0xffffe1ac481d22c0 objects=36 used=2 fp=0xffff936b4748bf50 flags=0x2ffff0000000100
Call Trace:
dump_stack+0x63/0x87
slab_err+0xa1/0xb0
__kmem_cache_shutdown+0x183/0x390
shutdown_cache+0x14/0x110
kmem_cache_destroy+0x195/0x1c0
f2fs_destroy_segment_manager_caches+0x21/0x40 [f2fs]
exit_f2fs_fs+0x35/0x641 [f2fs]
SyS_delete_module+0x155/0x230
? vtime_user_exit+0x29/0x70
do_syscall_64+0x6e/0x160
entry_SYSCALL64_slow_path+0x25/0x25

INFO: Object 0xffff936b4748b000 @offset=0
INFO: Object 0xffff936b4748b070 @offset=112
kmem_cache_destroy discard_cmd: Slab cache still has objects
Call Trace:
dump_stack+0x63/0x87
kmem_cache_destroy+0x1b4/0x1c0
f2fs_destroy_segment_manager_caches+0x21/0x40 [f2fs]
exit_f2fs_fs+0x35/0x641 [f2fs]
SyS_delete_module+0x155/0x230
do_syscall_64+0x6e/0x160
entry_SYSCALL64_slow_path+0x25/0x25

Recovery can cache discard commands, so in error path of fill_super(),
we need give a chance to handle them, otherwise it will lead to leak
of discard_cmd slab cache.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2019-08-23 22:57:11 +0800

11 Jul, 2019

2 commits

56f3ce675 f2fs: avoid out-of-range memory access ... Browse Code »

blkoff_off might over 512 due to fs corrupt or security
vulnerability. That should be checked before being using.

Use ENTRIES_IN_SUM to protect invalid value in cur_data_blkoff.

Signed-off-by: Ocean Chen
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Ocean Chen
2019-07-11 09:13:53 +0800
6e0cd4a9d f2fs: fix to avoid long latency during umount ... Browse Code »

In umount, we give an constand time to handle pending discard, previously,
in __issue_discard_cmd() we missed to check timeout condition in loop,
result in delaying long time, fix it.

Signed-off-by: Heng Xiao
[Chao Yu: add commit message]
Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Heng Xiao
2019-07-11 09:13:53 +0800

03 Jul, 2019

4 commits

10f966bbf f2fs: use generic EFSBADCRC/EFSCORRUPTED ... Browse Code »

f2fs uses EFAULT as error number to indicate filesystem is corrupted
all the time, but generic filesystems use EUCLEAN for such condition,
we need to change to follow others.

This patch adds two new macros as below to wrap more generic error
code macros, and spread them in code.

EFSBADCRC EBADMSG /* Bad CRC detected */
EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */

Reported-by: Pavel Machek
Signed-off-by: Chao Yu
Acked-by: Pavel Machek
Signed-off-by: Jaegeuk Kim

Chao Yu
2019-07-03 06:40:41 +0800
2d821c121 f2fs: print kernel message if filesystem is inconsistent ... Browse Code »

As Pavel reported, once we detect filesystem inconsistency in
f2fs_inplace_write_data(), it will be better to print kernel message as
we did in other places.

Reported-by: Pavel Machek
Signed-off-by: Chao Yu
Acked-by: Pavel Machek
Signed-off-by: Jaegeuk Kim

Chao Yu
2019-07-03 06:40:41 +0800
dcbb4c10e f2fs: introduce f2fs_<level> macros to wrap f2fs_printk() ... Browse Code »

- Add and use f2fs_ macros
- Convert f2fs_msg to f2fs_printk
- Remove level from f2fs_printk and embed the level in the format
- Coalesce formats and align multi-line arguments
- Remove unnecessary duplicate extern f2fs_msg f2fs.h

Signed-off-by: Joe Perches
Signed-off-by: Chao Yu
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Joe Perches
2019-07-03 06:40:40 +0800
04f0b2eaa f2fs: ioctl for removing a range from F2FS ... Browse Code »

This ioctl shrinks a given length (aligned to sections) from end of the
main area. Any cursegs and valid blocks will be moved out before
invalidating the range.

This feature can be used for adjusting partition sizes online.

History of the patch:

Sahitya Tummala:
- Add this ioctl for f2fs_compat_ioctl() as well.
- Fix debugfs status to reflect the online resize changes.
- Fix potential race between online resize path and allocate new data
block path or gc path.

Others:
- Rename some identifiers.
- Add some error handling branches.
- Clear sbi->next_victim_seg[BG_GC/FG_GC] in shrinking range.
- Implement this interface as ext4's, and change the parameter from shrunk
bytes to new block count of F2FS.
- During resizing, force to empty sit_journal and forbid adding new
entries to it, in order to avoid invalid segno in journal after resize.
- Reduce sbi->user_block_count before resize starts.
- Commit the updated superblock first, and then update in-memory metadata
only when the former succeeds.
- Target block count must align to sections.
- Write checkpoint before and after committing the new superblock, w/o
CP_FSCK_FLAG respectively, so that the FS can be fixed by fsck even if
resize fails after the new superblock is committed.
- In free_segment_range(), reduce granularity of gc_mutex.
- Add protection on curseg migration.
- Add freeze_bdev() and thaw_bdev() for resize fs.
- Remove CUR_MAIN_SECS and use MAIN_SECS directly for allocation.
- Recover super_block and FS metadata when resize fails.
- No need to clear CP_FSCK_FLAG in update_ckpt_flags().
- Clean up the sb and fs metadata update functions for resize_fs.

Geert Uytterhoeven:
- Use div_u64*() for 64-bit divisions

Arnd Bergmann:
- Not all architectures support get_user() with a 64-bit argument:
ERROR: "__get_user_bad" [fs/f2fs/f2fs.ko] undefined!
Use copy_from_user() here, this will always work.

Signed-off-by: Qiuyang Sun
Signed-off-by: Chao Yu
Signed-off-by: Sahitya Tummala
Signed-off-by: Geert Uytterhoeven
Signed-off-by: Arnd Bergmann
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Qiuyang Sun
2019-07-03 06:39:24 +0800

04 Jun, 2019

2 commits

4d3aed709 f2fs: Add option to limit required GC for checkpoint=disable ... Browse Code »

This extends the checkpoint option to allow checkpoint=disable:%u[%]
This allows you to specify what how much of the disk you are willing
to lose access to while mounting with checkpoint=disable. If the amount
lost would be higher, the mount will return -EAGAIN. This can be given
as a percent of total space, or in blocks.

Currently, we need to run garbage collection until the amount of holes
is smaller than the OVP space. With the new option, f2fs can mark
space as unusable up front instead of requiring garbage collection until
the number of holes is small enough.

Signed-off-by: Daniel Rosenberg
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Daniel Rosenberg
2019-06-04 04:27:48 +0800
ae4ad7ea0 f2fs: Lower threshold for disable_cp_again ... Browse Code »

The existing threshold for allowable holes at checkpoint=disable time is
too high. The OVP space contains reserved segments, which are always in
the form of free segments. These must be subtracted from the OVP value.

The current threshold is meant to be the maximum value of holes of a
single type we can have and still guarantee that we can fill the disk
without failing to find space for a block of a given type.

If the disk is full, ignoring current reserved, which only helps us,
the amount of unused blocks is equal to the OVP area. Of that, there
are reserved segments, which must be free segments, and the rest of the
ovp area, which can come from either free segments or holes. The maximum
possible amount of holes is OVP-reserved.

Now, consider the disk when mounting with checkpoint=disable.
We must be able to fill all available free space with either data or
node blocks. When we start with checkpoint=disable, holes are locked to
their current type. Say we have H of one type of hole, and H+X of the
other. We can fill H of that space with arbitrary typed blocks via SSR.
For the remaining H+X blocks, we may not have any of a given block type
left at all. For instance, if we were to fill the disk entirely with
blocks of the type with fewer holes, the H+X blocks of the opposite type
would not be used. If H+X > OVP-reserved, there would be more holes than
could possibly exist, and we would have failed to find a suitable block
earlier on, leading to a crash in update_sit_entry.

If H+X
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Daniel Rosenberg
2019-06-04 04:27:47 +0800