Eric Lee / smarc-fsl-linux-kernel

22 Dec, 2020

3 commits

f15a200ae f2fs: add compress_mode mount option ... Browse Code »

We will add a new "compress_mode" mount option to control file
compression mode. This supports "fs" and "user". In "fs" mode (default),
f2fs does automatic compression on the compression enabled files.
In "user" mode, f2fs disables the automaic compression and gives the
user discretion of choosing the target file and the timing. It means
the user can do manual compression/decompression on the compression
enabled files using ioctls.

Signed-off-by: Daeho Jeong
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Daeho Jeong
2020-12-22 05:33:16 +0800
d86ac27d3 f2fs: init dirty_secmap incorrectly ... Browse Code »

section is dirty, but dirty_secmap may not set

Reported-by: Jia Yang
Fixes: da52f8ade40b ("f2fs: get the right gc victim section when section has several segments")
Cc:
Signed-off-by: Jack Qiu
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Jack Qiu
2020-12-22 05:33:16 +0800
34315f887 f2fs: fix to avoid REQ_TIME and CP_TIME collision ... Browse Code »

Lei Li reported a issue: if foreground operations are frequent, background
checkpoint may be always skipped due to below check, result in losing more
data after sudden power-cut.

f2fs_balance_fs_bg()
...
if (!is_idle(sbi, REQ_TIME) &&
(!excess_dirty_nats(sbi) && !excess_dirty_nodes(sbi)))
return;

E.g:
cp_interval = 5 second
idle_interval = 2 second
foreground operation interval = 1 second (append 1 byte per second into file)

In such case, no matter when it calls f2fs_balance_fs_bg(), is_idle(, REQ_TIME)
returns false, result in skipping background checkpoint.

This patch changes as below to make trigger condition being more reasonable:
- trigger sync_fs() if dirty_{nats,nodes} and prefree segs exceeds threshold;
- skip triggering sync_fs() if there is any background inflight IO or there is
foreground operation recently and meanwhile cp_rwsem is being held by someone;

Reported-by: Lei Li
Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2020-12-22 05:33:14 +0800

14 Oct, 2020

2 commits

6ed29fe1c f2fs: don't issue flush in f2fs_flush_device_cache() for nobarrier case ... Browse Code »

This patch changes f2fs_flush_device_cache() to skip issuing flush for
nobarrier case.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2020-10-14 14:23:34 +0800
86f33603f f2fs: handle errors of f2fs_get_meta_page_nofail ... Browse Code »

First problem is we hit BUG_ON() in f2fs_get_sum_page given EIO on
f2fs_get_meta_page_nofail().

Quick fix was not to give any error with infinite loop, but syzbot caught
a case where it goes to that loop from fuzzed image. In turned out we abused
f2fs_get_meta_page_nofail() like in the below call stack.

- f2fs_fill_super
- f2fs_build_segment_manager
- build_sit_entries
- get_current_sit_page

INFO: task syz-executor178:6870 can't die for more than 143 seconds.
task:syz-executor178 state:R
stack:26960 pid: 6870 ppid: 6869 flags:0x00004006
Call Trace:

Showing all locks held in the system:
1 lock held by khungtaskd/1179:
#0: ffffffff8a554da0 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x53/0x260 kernel/locking/lockdep.c:6242
1 lock held by systemd-journal/3920:
1 lock held by in:imklog/6769:
#0: ffff88809eebc130 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0xe9/0x100 fs/file.c:930
1 lock held by syz-executor178/6870:
#0: ffff8880925120e0 (&type->s_umount_key#47/1){+.+.}-{3:3}, at: alloc_super+0x201/0xaf0 fs/super.c:229

Actually, we didn't have to use _nofail in this case, since we could return
error to mount(2) already with the error handler.

As a result, this patch tries to 1) remove _nofail callers as much as possible,
2) deal with error case in last remaining caller, f2fs_get_sum_page().

Reported-by: syzbot+ee250ac8137be41d7b13@syzkaller.appspotmail.com
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2020-10-14 14:23:29 +0800

15 Sep, 2020

1 commit

c8eb70248 f2fs: clean up kvfree ... Browse Code »

After commit 0b6d4ca04a86 ("f2fs: don't return vmalloc() memory from
f2fs_kmalloc()"), f2fs_k{m,z}alloc() will not return vmalloc()'ed
memory, so clean up to use kfree() instead of kvfree() to free
vmalloc()'ed memory.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2020-09-15 02:15:37 +0800

12 Sep, 2020

1 commit

093749e29 f2fs: support age threshold based garbage collection ... Browse Code »

There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.

This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:

1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;

2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.

3. merge valid blocks from source victim into target victim with
SSR alloctor.

Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC

Benefit: GC count and block movement count both decrease obviously:

- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)

GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)

IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments

- After:

- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)

GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)

IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments

Signed-off-by: Chao Yu
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim

Chao Yu
2020-09-12 02:11:15 +0800

11 Sep, 2020

6 commits

2e9b2bb25 f2fs: support 64-bits key in f2fs rb-tree node entry ... Browse Code »

then, we can add specified entry into rb-tree with 64-bits segment time
as key.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2020-09-11 05:03:30 +0800
c5d02785c f2fs: inherit mtime of original block during GC ... Browse Code »

Don't let f2fs inner GC ruins original aging degree of segment.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2020-09-11 05:03:30 +0800
6f3a01ae9 f2fs: record average update time of segment ... Browse Code »

Previously, once we update one block in segment, we will update mtime of
segment to last time, making aged segment becoming freshest, result in
that GC with cost benefit algorithm missing such segment, So this patch
changes to record mtime as average block updating time instead of last
updating time.

It's not needed to reset mtime for prefree segment, as se->valid_blocks
is zero, then old se->mtime won't take any weight with below calculation:

se->mtime = div_u64(se->mtime * se->valid_blocks + mtime,
se->valid_blocks + 1);

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2020-09-11 05:03:30 +0800
d0b9e42ab f2fs: introduce inmem curseg ... Browse Code »

Previous implementation of aligned pinfile allocation will:
- allocate new segment on cold data log no matter whether last used
segment is partially used or not, it makes IOs more random;
- force concurrent cold data/GCed IO going into warm data area, it
can make a bad effect on hot/cold data separation;

In this patch, we introduce a new type of log named 'inmem curseg',
the differents from normal curseg is:
- it reuses existed segment type (CURSEG_XXX_NODE/DATA);
- it only exists in memory, its segno, blkofs, summary will not b
persisted into checkpoint area;

With this new feature, we can enhance scalability of log, special
allocators can be created for purposes:
- pure lfs allocator for aligned pinfile allocation or file
defragmentation
- pure ssr allocator for later feature

So that, let's update aligned pinfile allocation to use this new
inmem curseg fwk.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2020-09-11 05:03:30 +0800
e90027d23 f2fs: remove duplicated type casting ... Browse Code »

Since DUMMY_WRITTEN_PAGE and ATOMIC_WRITTEN_PAGE have already been
converted as unsigned long type, we don't need do type casting again.

Signed-off-by: Xiaojun Wang
Reported-by: Jack Qiu
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Xiaojun Wang
2020-09-11 05:03:29 +0800
de881df97 f2fs: support zone capacity less than zone size ... Browse Code »

NVMe Zoned Namespace devices can have zone-capacity less than zone-size.
Zone-capacity indicates the maximum number of sectors that are usable in
a zone beginning from the first sector of the zone. This makes the sectors
sectors after the zone-capacity till zone-size to be unusable.
This patch set tracks zone-size and zone-capacity in zoned devices and
calculate the usable blocks per segment and usable segments per section.

If zone-capacity is less than zone-size mark only those segments which
start before zone-capacity as free segments. All segments at and beyond
zone-capacity are treated as permanently used segments. In cases where
zone-capacity does not align with segment size the last segment will start
before zone-capacity and end beyond the zone-capacity of the zone. For
such spanning segments only sectors within the zone-capacity are used.

During writes and GC manage the usable segments in a section and usable
blocks per segment. Segments which are beyond zone-capacity are never
allocated, and do not need to be garbage collected, only the segments
which are before zone-capacity needs to garbage collected.
For spanning segments based on the number of usable blocks in that
segment, write to blocks only up to zone-capacity.

Zone-capacity is device specific and cannot be configured by the user.
Since NVMe ZNS device zones are sequentially write only, a block device
with conventional zones or any normal block device is needed along with
the ZNS device for the metadata operations of F2fs.

A typical nvme-cli output of a zoned device shows zone start and capacity
and write pointer as below:

SLBA: 0x0 WP: 0x0 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
SLBA: 0x20000 WP: 0x20000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
SLBA: 0x40000 WP: 0x40000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ

Here zone size is 64MB, capacity is 49MB, WP is at zone start as the zones
are in EMPTY state. For each zone, only zone start + 49MB is usable area,
any lba/sector after 49MB cannot be read or written to, the drive will fail
any attempts to read/write. So, the second zone starts at 64MB and is
usable till 113MB (64 + 49) and the range between 113 and 128MB is
again unusable. The next zone starts at 128MB, and so on.

Signed-off-by: Aravind Ramesh
Signed-off-by: Damien Le Moal
Signed-off-by: Niklas Cassel
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Aravind Ramesh
2020-09-11 05:03:29 +0800

09 Sep, 2020

1 commit

123aaf774 f2fs: Fix type of section block count variables ... Browse Code »

Commit da52f8ade40b ("f2fs: get the right gc victim section when section
has several segments") added code to count blocks of each section using
variables with type 'unsigned short', which has 2 bytes size in many
systems. However, the counts can be larger than the 2 bytes range and
type conversion results in wrong values. Especially when the f2fs
sections have blocks as many as USHRT_MAX + 1, the count is handled as 0.
This triggers eternal loop in init_dirty_segmap() at mount system call.
Fix this by changing the type of the variables to block_t.

Fixes: da52f8ade40b ("f2fs: get the right gc victim section when section has several segments")
Signed-off-by: Shin'ichiro Kawasaki
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Shin'ichiro Kawasaki
2020-09-09 11:31:33 +0800

04 Aug, 2020

1 commit

9feffe146 f2fs: update_sit_entry: Make the judgment condition of f2fs_bug_on more intuitive ... Browse Code »

Current judgment condition of f2fs_bug_on in function update_sit_entry():
new_vblocks >> (sizeof(unsigned short) << 3) ||
new_vblocks > sbi->blocks_per_seg

which equivalents to:
new_vblocks < 0 || new_vblocks > sbi->blocks_per_seg

The latter is more intuitive.

Signed-off-by: Zhihao Cheng
Reported-by: Jack Qiu
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Zhihao Cheng
2020-08-04 09:05:13 +0800

08 Jul, 2020

6 commits

0e5e81114 f2fs: add GC_URGENT_LOW mode in gc_urgent ... Browse Code »

Added a new gc_urgent mode, GC_URGENT_LOW, in which mode
F2FS will lower the bar of checking idle in order to
process outstanding discard commands and GC a little bit
aggressively.

Signed-off-by: Daeho Jeong
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Daeho Jeong
2020-07-08 12:51:49 +0800
901d745f8 f2fs: split f2fs_allocate_new_segments() ... Browse Code »

to two independent functions:
- f2fs_allocate_new_segment() for specified type segment allocation
- f2fs_allocate_new_segments() for all data type segments allocation

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2020-07-08 12:51:48 +0800
977675007 f2fs: add f2fs_gc exception handle in f2fs_ioc_gc_range ... Browse Code »

When f2fs_ioc_gc_range performs multiple segments gc ops, the return
value of f2fs_ioc_gc_range is determined by the last segment gc ops.
If its ops failed, the f2fs_ioc_gc_range will be considered to be failed
despite some of previous segments gc ops succeeded. Therefore, so we
fix: Redefine the return value of getting victim ops and add exception
handle for f2fs_gc. In particular, 1).if target has no valid block, it
will go on. 2).if target sectoion has valid block(s), but it is current
section, we will reminder the caller.

Signed-off-by: Qilong Zhang
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Qilong Zhang
2020-07-08 12:51:44 +0800
f608c38c5 f2fs: clean up parameter of f2fs_allocate_data_block() ... Browse Code »

Use validation of @fio to inidcate whether caller want to serialize IOs
in io.io_list or not, then @add_list will be redundant, remove it.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2020-07-08 12:51:44 +0800
79963d967 f2fs: shrink node_write lock coverage ... Browse Code »

- to avoid race between checkpoint and quota file writeback, it
just needs to hold read lock of node_write in writeback path.
- node_write lock has covered all LFS data write paths, it's not
necessary, we only need to hold node_write lock at write path of
quota file.

This refactors commit ca7f76e68074 ("f2fs: fix wrong discard space").

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2020-07-08 12:51:44 +0800
0ef818335 f2fs: add prefix for exported symbols ... Browse Code »

to avoid polluting global symbol namespace.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2020-07-08 12:51:43 +0800

19 Jun, 2020

1 commit

da52f8ade f2fs: get the right gc victim section when section has several segments ... Browse Code »

Assume each section has 4 segment:
.___________________________.
|_Segment0_|_..._|_Segment3_|
. .
. .
.__________.
|_section0_|

Segment 0~2 has 0 valid block, segment 3 has 512 valid blocks.
It will fail if we want to gc section0 in this scenes,
because all 4 segments in section0 is not dirty.
So we should use dirty section bitmap instead of dirty segment bitmap
to get right victim section.

Signed-off-by: Jack Qiu
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Jack Qiu
2020-06-19 03:35:38 +0800

30 May, 2020

1 commit

ca7f76e68 f2fs: fix wrong discard space ... Browse Code »

Under heavy fsstress, we may triggle panic while issuing discard,
because __check_sit_bitmap() detects that discard command may earse
valid data blocks, the root cause is as below race stack described,
since we removed lock when flushing quota data, quota data writeback
may race with write_checkpoint(), so that it causes inconsistency in
between cached discard entry and segment bitmap.

- f2fs_write_checkpoint
- block_operations
- set_sbi_flag(sbi, SBI_QUOTA_SKIP_FLUSH)
- f2fs_flush_sit_entries
- add_discard_addrs
- __set_bit_le(i, (void *)de->discard_map);
- f2fs_write_data_pages
- f2fs_write_single_data_page
: inode is quota one, cp_rwsem won't be locked
- f2fs_do_write_data_page
- f2fs_allocate_data_block
- f2fs_wait_discard_bio
: discard entry has not been added yet.
- update_sit_entry
- f2fs_clear_prefree_segments
- f2fs_issue_discard
: add discard entry

In order to fix this, this patch uses node_write to serialize
f2fs_allocate_data_block and checkpoint.

Fixes: 435cbab95e39 ("f2fs: fix quota_sync failure due to f2fs_lock_op")
Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2020-05-30 23:17:52 +0800

29 May, 2020

1 commit

47d0d7d76 f2fs: remove unneeded return value of __insert_discard_tree() ... Browse Code »

We never use return value of __insert_discard_tree(), so remove it.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2020-05-29 03:00:43 +0800

18 Apr, 2020

3 commits

df4233997 f2fs: Fix the accounting of dcc->undiscard_blks ... Browse Code »

When a discard_cmd needs to be split due to dpolicy->max_requests, then
for the remaining length it will be either merged into another cmd or a
new discard_cmd will be created. In this case, there is double
accounting of dcc->undiscard_blks for the remaining len, due to which
it shows incorrect value in stats.

Signed-off-by: Sahitya Tummala
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Sahitya Tummala
2020-04-18 00:17:00 +0800
3fa6a8c5b f2fs: report the discard cmd errors properly ... Browse Code »

In case a discard_cmd is split into several bios, the dc->error
must not be overwritten once an error is reported by a bio. Also,
move it under dc->lock.

Signed-off-by: Sahitya Tummala
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Sahitya Tummala
2020-04-18 00:17:00 +0800
141af6ba5 f2fs: fix long latency due to discard during umount ... Browse Code »

F2FS already has a default timeout of 5 secs for discards that
can be issued during umount, but it can take more than the 5 sec
timeout if the underlying UFS device queue is already full and there
are no more available free tags to be used. Fix this by submitting a
small batch of discard requests so that it won't cause the device
queue to be full at any time and thus doesn't incur its wait time
in the umount context.

Signed-off-by: Sahitya Tummala
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Sahitya Tummala
2020-04-18 00:17:00 +0800

04 Apr, 2020

1 commit

6ce48b0c6 f2fs: switch discard_policy.timeout to bool type ... Browse Code »

While checking discard timeout, we use specified type
UMOUNT_DISCARD_TIMEOUT, so just replace doplicy.timeout with
it, and switch doplicy.timeout to bool type.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2020-04-04 01:21:31 +0800

31 Mar, 2020

1 commit

7bcd0cfa7 f2fs: don't trigger data flush in foreground operation ... Browse Code »

Data flush can generate heavy IO and cause long latency during
flush, so it's not appropriate to trigger it in foreground
operation.

And also, we may face below potential deadlock during data flush:
- f2fs_write_multi_pages
- f2fs_write_raw_pages
- f2fs_write_single_data_page
- f2fs_balance_fs
- f2fs_balance_fs_bg
- f2fs_sync_dirty_inodes
- filemap_fdatawrite -- stuck on flush same cluster

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2020-03-31 11:46:24 +0800

20 Mar, 2020

4 commits

985100035 f2fs: add prefix for f2fs slab cache name ... Browse Code »

In order to avoid polluting global slab cache namespace.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2020-03-20 02:41:26 +0800
5df7731f6 f2fs: introduce DEFAULT_IO_TIMEOUT ... Browse Code »

As Geert Uytterhoeven reported:

for parameter HZ/50 in congestion_wait(BLK_RW_ASYNC, HZ/50);

On some platforms, HZ can be less than 50, then unexpected 0 timeout
jiffies will be set in congestion_wait().

This patch introduces a macro DEFAULT_IO_TIMEOUT to wrap a determinate
value with msecs_to_jiffies(20) to instead HZ/50 to avoid such issue.

Quoted from Geert Uytterhoeven:

"A timeout of HZ means 1 second.
HZ/50 means 20 ms, but has the risk of being zero, if HZ < 50.

If you want to use a timeout of 20 ms, you best use msecs_to_jiffies(20),
as that takes care of the special cases, and never returns 0."

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2020-03-20 02:41:26 +0800
b0332a0f9 f2fs: clean up lfs/adaptive mount option ... Browse Code »

This patch removes F2FS_MOUNT_ADAPTIVE and F2FS_MOUNT_LFS mount options,
and add F2FS_OPTION.fs_mode with below two status to indicate filesystem
mode.

enum {
FS_MODE_ADAPTIVE, /* use both lfs/ssr allocation */
FS_MODE_LFS, /* use lfs allocation only */
};

It can enhance code readability and fs mode's scalability.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2020-03-20 02:41:25 +0800
a7e679b53 f2fs: show mounted time ... Browse Code »

Let's show mounted time.

Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2020-03-20 02:41:25 +0800

18 Jan, 2020

3 commits

fb24fea75 f2fs: change to use rwsem for gc_mutex ... Browse Code »

Mutex lock won't serialize callers, in order to avoid starving of unlucky
caller, let's use rwsem lock instead.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2020-01-18 08:48:44 +0800
0e7f41974 f2fs: add a way to turn off ipu bio cache ... Browse Code »

Setting 0x40 in /sys/fs/f2fs/dev/ipu_policy gives a way to turn off
bio cache, which is useufl to check whether block layer using hardware
encryption engine merges IOs correctly.

Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2020-01-18 08:48:43 +0800
4c8ff7095 f2fs: support data compression ... Browse Code »

This patch tries to support compression in f2fs.

- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.

- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.

- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.

- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext

Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+

Changelog:

20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().

20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().

20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().

20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.

20190402
- don't preallocate blocks for compressed file.

- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.

20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR

One cluster contain 4 blocks

before overwrite after overwrite

- VVVV -> CVNN
- CVNN -> VVVV

- CVNN -> CVNN
- CVNN -> CVVV

- CVVV -> CVNN
- CVVV -> CVVV

20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.

20191101
- apply fixes from Jaegeuk

20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity

20191216
- apply fixes from Jaegeuk

20200117
- fix to avoid NULL pointer dereference

[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks

Reported-by:
Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2020-01-18 08:48:07 +0800

16 Jan, 2020

3 commits

0e6d01643 f2fs: cleanup duplicate stats for atomic files ... Browse Code »

Remove duplicate sbi->aw_cnt stats counter that tracks
the number of atomic files currently opened (it also shows
incorrect value sometimes). Use more relit lable sbi->atomic_files
to show in the stats.

Signed-off-by: Sahitya Tummala
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Sahitya Tummala
2020-01-16 05:43:48 +0800
d508c94e4 f2fs: Check write pointer consistency of non-open zones ... Browse Code »

To catch f2fs bugs in write pointer handling code for zoned block
devices, check write pointers of non-open zones that current segments do
not point to. Do this check at mount time, after the fsync data recovery
and current segments' write pointer consistency fix. Or when fsync data
recovery is disabled by mount option, do the check when there is no fsync
data.

Check two items comparing write pointers with valid block maps in SIT.
The first item is check for zones with no valid blocks. When there is no
valid blocks in a zone, the write pointer should be at the start of the
zone. If not, next write operation to the zone will cause unaligned write
error. If write pointer is not at the zone start, reset the write pointer
to place at the zone start.

The second item is check between the write pointer position and the last
valid block in the zone. It is unexpected that the last valid block
position is beyond the write pointer. In such a case, report as a bug.
Fix is not required for such zone, because the zone is not selected for
next write operation until the zone get discarded.

Signed-off-by: Shin'ichiro Kawasaki
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Shin'ichiro Kawasaki
2020-01-16 05:43:48 +0800
c426d9912 f2fs: Check write pointer consistency of open zones ... Browse Code »

On sudden f2fs shutdown, write pointers of zoned block devices can go
further but f2fs meta data keeps current segments at positions before the
write operations. After remounting the f2fs, this inconsistency causes
write operations not at write pointers and "Unaligned write command"
error is reported.

To avoid the error, compare current segments with write pointers of open
zones the current segments point to, during mount operation. If the write
pointer position is not aligned with the current segment position, assign
a new zone to the current segment. Also check the newly assigned zone has
write pointer at zone start. If not, reset write pointer of the zone.

Perform the consistency check during fsync recovery. Not to lose the
fsync data, do the check after fsync data gets restored and before
checkpoint commit which flushes data at current segment positions. Not to
cause conflict with kworker's dirfy data/node flush, do the fix within
SBI_POR_DOING protection.

Signed-off-by: Shin'ichiro Kawasaki
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Shin'ichiro Kawasaki
2020-01-16 05:42:14 +0800

01 Dec, 2019

1 commit

8f45533e9 Merge tag 'f2fs-for-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs ... Browse Code »

Pull f2fs updates from Jaegeuk Kim:
"In this round, we've introduced fairly small number of patches as below.

Enhancements:
- improve the in-place-update IO flow
- allocate segment to guarantee no GC for pinned files

Bug fixes:
- fix updatetime in lazytime mode
- potential memory leak in f2fs_listxattr
- record parent inode number in rename2 correctly
- fix deadlock in f2fs_gc along with atomic writes
- avoid needless data migration in GC"

* tag 'f2fs-for-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs:
f2fs: stop GC when the victim becomes fully valid
f2fs: expose main_blkaddr in sysfs
f2fs: choose hardlimit when softlimit is larger than hardlimit in f2fs_statfs_project()
f2fs: Fix deadlock in f2fs_gc() context during atomic files handling
f2fs: show f2fs instance in printk_ratelimited
f2fs: fix potential overflow
f2fs: fix to update dir's i_pino during cross_rename
f2fs: support aligned pinned file
f2fs: avoid kernel panic on corruption test
f2fs: fix wrong description in document
f2fs: cache global IPU bio
f2fs: fix to avoid memory leakage in f2fs_listxattr
f2fs: check total_segments from devices in raw_super
f2fs: update multi-dev metadata in resize_fs
f2fs: mark recovery flag correctly in read_raw_super_block()
f2fs: fix to update time in lazytime mode

Linus Torvalds
2019-12-01 03:02:30 +0800