13 Oct, 2016
1 commit
-
This patch fixes using a wrong pointer for sum_page in f2fs_gc.
Signed-off-by: Jaegeuk Kim
01 Oct, 2016
6 commits
-
Signed-off-by: Sheng Yong
Acked-by: Chao Yu
Signed-off-by: Jaegeuk Kim -
This patch adds to support checkpoint error injection in f2fs for testing
fatal error tolerance, it will be useful that it can simulate abnormal
power off by f2fs itself instead of calling godown ioctl by running apps.Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim -
Previously, we only support global fault injection configuration, so that
when we configure type/rate of fault injection through sysfs, mount
option, it will influence all f2fs partition which is being used.It is not make sence, since it will be not convenient if developer want
to test separated partitions with different fault injection rate/type
simultaneously, also it's not possible to enable fault injection in one
partition and disable fault injection in other one.>From now on, we move global configuration of fault injection in module
into per-superblock, hence injection testing can be more flexible.Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim -
Otherwise, we can hit
f2fs_bug_on(sbi, !PageUptodate(sum_page));Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim -
We should call put_page for preloaded summary pages in do_garbage_collect.
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim -
This patch adds a return value of write_checkpoint for f2fs_gc.
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim
13 Sep, 2016
1 commit
-
Fix wrong condition check for defragmentation of a file.
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim
08 Sep, 2016
1 commit
-
In order to enhance performance, we try to readahead node page during
GC, but before loading node page we should get block address of node page
which is stored in NAT table, so synchronously read of single NAT page
block our readahead flow.f2fs_submit_page_bio: dev = (251,0), ino = 2, page_index = 0xa1e, oldaddr = 0xa1e, newaddr = 0xa1e, rw = READ_SYNC(MP), type = META
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x35e9, oldaddr = 0x72d7a, newaddr = 0x72d7a, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 2, page_index = 0xc1f, oldaddr = 0xc1f, newaddr = 0xc1f, rw = READ_SYNC(MP), type = META
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x389d, oldaddr = 0x72d7d, newaddr = 0x72d7d, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x3a82, oldaddr = 0x72d7f, newaddr = 0x72d7f, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x3bfa, oldaddr = 0x72d86, newaddr = 0x72d86, rw = READAHEAD ^H, type = NODEThis patch adds one phase that do readahead NAT pages in batch before
readahead node page for more effeciently.f2fs_submit_page_bio: dev = (251,0), ino = 2, page_index = 0x1952, oldaddr = 0x1952, newaddr = 0x1952, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc34, oldaddr = 0xc34, newaddr = 0xc34, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xa33, oldaddr = 0xa33, newaddr = 0xa33, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc30, oldaddr = 0xc30, newaddr = 0xc30, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc32, oldaddr = 0xc32, newaddr = 0xc32, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc26, oldaddr = 0xc26, newaddr = 0xc26, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xa2b, oldaddr = 0xa2b, newaddr = 0xa2b, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc23, oldaddr = 0xc23, newaddr = 0xc23, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc24, oldaddr = 0xc24, newaddr = 0xc24, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xa10, oldaddr = 0xa10, newaddr = 0xa10, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc2c, oldaddr = 0xc2c, newaddr = 0xc2c, rw = READ_SYNC(MP), type = META
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5db7, oldaddr = 0x6be00, newaddr = 0x6be00, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5db9, oldaddr = 0x6be17, newaddr = 0x6be17, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5dbc, oldaddr = 0x6be1a, newaddr = 0x6be1a, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5dc3, oldaddr = 0x6be20, newaddr = 0x6be20, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5dc7, oldaddr = 0x6be24, newaddr = 0x6be24, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5dc9, oldaddr = 0x6be25, newaddr = 0x6be25, rw = READAHEAD ^H, type = NODESigned-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim
30 Aug, 2016
1 commit
-
This patch changes to check valid block number of one GCed section
directly instead of checking the number in all segments of section
one by one in order to clean up codes of foreground GC.Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim
28 Jul, 2016
1 commit
-
Pull f2fs updates from Jaegeuk Kim:
"The major change in this version is mitigating cpu overheads on write
paths by replacing redundant inode page updates with mark_inode_dirty
calls. And we tried to reduce lock contentions as well to improve
filesystem scalability. Other feature is setting F2FS automatically
when detecting host-managed SMR.Enhancements:
- ioctl to move a range of data between files
- inject orphan inode errors
- avoid flush commands congestion
- support lazytimeBug fixes:
- return proper results for some dentry operations
- fix deadlock in add_link failure
- disable extent_cache for fcollapse/finsert"* tag 'for-f2fs-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (68 commits)
f2fs: clean up coding style and redundancy
f2fs: get victim segment again after new cp
f2fs: handle error case with f2fs_bug_on
f2fs: avoid data race when deciding checkpoin in f2fs_sync_file
f2fs: support an ioctl to move a range of data blocks
f2fs: fix to report error number of f2fs_find_entry
f2fs: avoid memory allocation failure due to a long length
f2fs: reset default idle interval value
f2fs: use blk_plug in all the possible paths
f2fs: fix to avoid data update racing between GC and DIO
f2fs: add maximum prefree segments
f2fs: disable extent_cache for fcollapse/finsert inodes
f2fs: refactor __exchange_data_block for speed up
f2fs: fix ERR_PTR returned by bio
f2fs: avoid mark_inode_dirty
f2fs: move i_size_write in f2fs_write_end
f2fs: fix to avoid redundant discard during fstrim
f2fs: avoid mismatching block range for discard
f2fs: fix incorrect f_bfree calculation in ->statfs
f2fs: use percpu_rw_semaphore
...
23 Jul, 2016
1 commit
-
Previous selected segment may become free after write_checkpoint,
if we do garbage collect on this segment, and then new_curseg happen
to reuse it, it may cause f2fs_bug_on as below.panic+0x154/0x29c
do_garbage_collect+0x15c/0xaf4
f2fs_gc+0x2dc/0x444
f2fs_balance_fs.part.22+0xcc/0x14c
f2fs_balance_fs+0x28/0x34
f2fs_map_blocks+0x5ec/0x790
f2fs_preallocate_blocks+0xe0/0x100
f2fs_file_write_iter+0x64/0x11c
new_sync_write+0xac/0x11c
vfs_write+0x144/0x1e4
SyS_write+0x60/0xc0Here, maybe we check sit and ssa type during reset_curseg. So, we check
segment is stale or not, and select a new victim to avoid this.Signed-off-by: Yunlei He
Signed-off-by: Jaegeuk Kim
21 Jul, 2016
1 commit
-
These two are confusing leftover of the old world order, combining
values of the REQ_OP_ and REQ_ namespaces. For callers that don't
special case we mostly just replace bi_rw with bio_data_dir or
op_is_write, except for the few cases where a switch over the REQ_OP_
values makes more sense. Any check for READA is replaced with an
explicit check for REQ_RAHEAD. Also remove the READA alias for
REQ_RAHEAD.Signed-off-by: Christoph Hellwig
Reviewed-by: Johannes Thumshirn
Reviewed-by: Mike Christie
Signed-off-by: Jens Axboe
16 Jul, 2016
2 commits
-
This patch reverts 19a5f5e2ef37 (f2fs: drop any block plugging),
and adds blk_plug in write paths additionally.The main reason is that blk_start_plug can be used to wake up from low-power
mode before submitting further bios.Signed-off-by: Jaegeuk Kim
-
Datas in file can be operated by GC and DIO simultaneously, so we will
face race case as below:For write case:
Thread A Thread B
- generic_file_direct_write
- invalidate_inode_pages2_range
- f2fs_direct_IO
- do_blockdev_direct_IO
- do_direct_IO
- get_more_blocks
- f2fs_gc
- do_garbage_collect
- gc_data_segment
- move_data_page
- do_write_data_page
migrate data block to new block address
- dio_bio_submit
update user data to old block addressFor read case:
Thread A Thread B
- generic_file_direct_write
- invalidate_inode_pages2_range
- f2fs_direct_IO
- do_blockdev_direct_IO
- do_direct_IO
- get_more_blocks
- f2fs_balance_fs
- f2fs_gc
- do_garbage_collect
- gc_data_segment
- move_data_page
- do_write_data_page
migrate data block to new block address
- write_checkpoint
- do_checkpoint
- clear_prefree_segments
- f2fs_issue_discard
discard old block adress
- dio_bio_submit
update user buffer from obsolete block addressIn order to fix this, for one file, we should let DIO and GC getting exclusion
against with each other.Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim
09 Jul, 2016
2 commits
-
If we fail to move data page during foreground GC, we should give another
chance to writeback that page which was set dirty previously by writer.Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim -
In procedure of synchonized read, after sending out the read request, reader
will try to lock the page for waiting device to finish the read jobs and
unlock the page, but meanwhile, truncater will race with reader, so after
reader get lock of the page, it should check page's mapping to detect
whether someone has truncated the page in advance, then reader has the
chance to do the retry if truncation was done, otherwise read can be failed
due to previous condition check.Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim
09 Jun, 2016
2 commits
-
If a segment in a section is clean or prefreed, we don't need to get its summary
and do gc.Signed-off-by: Jaegeuk Kim
-
In f2fs, we don't need to keep block plugging for NODE and DATA writes, since
we already merged bios as much as possible.Signed-off-by: Jaegeuk Kim
08 Jun, 2016
1 commit
-
Separate the op from the rq_flag_bits and have f2fs
set/get the bio using bio_set_op_attrs/bio_op.Signed-off-by: Mike Christie
Reviewed-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe
03 Jun, 2016
1 commit
-
This patch refactors to use inode pointer for set_inode_flag and
clear_inode_flag.Signed-off-by: Jaegeuk Kim
08 May, 2016
1 commit
-
This patch adds f2fs_kmalloc.
Signed-off-by: Jaegeuk Kim
28 Apr, 2016
1 commit
-
For foreground GC, we cache node blocks in victim section and set them
dirty, then we call sync_node_pages to flush these node pages, but
meanwhile, those node pages which does not locate in victim section
will be flushed together, so more bandwidth and continuous free space
would be occupied.So for this condition, it's better to leave those unrelated node page
in cache for further write hit, and let CP or VM to flush them afterward.Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim
27 Apr, 2016
1 commit
-
This patch splits the existing sync_node_pages into (f)sync_node_pages.
The fsync_node_pages is used for f2fs_sync_file only.Acked-by: Chao Yu
Signed-off-by: Jaegeuk Kim
27 Feb, 2016
2 commits
-
Add a new help f2fs_update_data_blkaddr to clean up redundant codes.
Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim -
For now, flow of GCing an encrypted data page:
1) try to grab meta page in meta inode's mapping with index of old block
address of that data page
2) load data of ciphertext into meta page
3) allocate new block address
4) write the meta page into new block address
5) update block address pointer in direct node page.Other reader/writer will use f2fs_wait_on_encrypted_page_writeback to
check and wait on GCed encrypted data cached in meta page writebacked
in order to avoid inconsistence among data page cache, meta page cache
and data on-disk when updating.However, we will use new block address updated in step 5) as an index to
lookup meta page in inner bio buffer. That would be wrong, and we will
never find the GCing meta page, since we use the old block address as
index of that page in step 1).This patch fixes the issue by adjust the order of step 1) and step 3),
and in step 1) grab page with index generated in step 3).Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim
23 Feb, 2016
8 commits
-
This patch enables to trace old block address of CoWed page for better
debugging.f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4f0, oldaddr = 0xfe8ab, newaddr = 0xfee90 rw = WRITE_SYNC, type = NODE
f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4f8, oldaddr = 0xfe8b0, newaddr = 0xfee91 rw = WRITE_SYNC, type = NODE
f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4fa, oldaddr = 0xfe8ae, newaddr = 0xfee92 rw = WRITE_SYNC, type = NODEf2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x96, oldaddr = 0xf049b, newaddr = 0x2bbe rw = WRITE, type = DATA
f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x97, oldaddr = 0xf049c, newaddr = 0x2bbf rw = WRITE, type = DATA
f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x98, oldaddr = 0xf049d, newaddr = 0x2bc0 rw = WRITE, type = DATAf2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x47, oldaddr = 0xffffffff, newaddr = 0xf2631 rw = WRITE, type = DATA
f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x48, oldaddr = 0xffffffff, newaddr = 0xf2632 rw = WRITE, type = DATA
f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x49, oldaddr = 0xffffffff, newaddr = 0xf2633 rw = WRITE, type = DATASigned-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim -
With a partition which was formated as multi segments in one section,
we stated incorrectly for count of gc operation.e.g., for a partition with segs_per_sec = 4
cat /sys/kernel/debug/f2fs/status
GC calls: 208 (BG: 7)
- data segments : 104 (52)
- node segments : 104 (24)GC called count should be (104 (data segs) + 104 (node segs)) / 4 = 52,
rather than 208. Fix it.Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim -
This patch avoids to remain inefficient victim segment number selected by
a victim.For example, if all the dirty segments has same valid blocks, we can get
the victim segments descending order due to keeping wrong last segment number.Signed-off-by: Jaegeuk Kim
-
There are redundant pointer conversion in following call stack:
- at position a, inode was been converted to f2fs_file_info.
- at position b, f2fs_file_info was been converted to inode again.- truncate_blocks(inode,..)
- fi = F2FS_I(inode) ---a
- ADDRS_PER_PAGE(node_page, fi)
- addrs_per_inode(fi)
- inode = &fi->vfs_inode ---b
- f2fs_has_inline_xattr(inode)
- fi = F2FS_I(inode)
- is_inode_flag_set(fi,..)In order to avoid unneeded conversion, alter ADDRS_PER_PAGE and
addrs_per_inode to acept parameter with type of inode pointer.Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim -
variable nsearched in get_victim_by_default() indicates the number of
dirty segments we already checked. There are 2 problems about the way
it updates:
1. When p.ofs_unit is greater than 1, the victim we find consists
of multiple segments, possibly more than 1 dirty segment.
But nsearched always increases by 1.
2. If segments have been found but not been chosen, nsearched won't
increase. So even we have checked all dirty segments, nsearched
may still less than p.max_search.
All these problems could cause unnecessary search after all dirty
segments have already been checked.Signed-off-by: Fan li
Signed-off-by: Jaegeuk Kim -
In write_begin, if storage supports stable_page, we don't need to wait for
writeback to update its contents.
This patch introduces to use wait_for_stable_page instead of
wait_on_page_writeback.Signed-off-by: Jaegeuk Kim
-
If we configure section consist of multiple segments, foreground GC will
do the garbage collection with following approach:for each segment in victim section
blk_start_plug
for each valid block in segment
write out by OPU method
submit bio cache
Signed-off-by: Jaegeuk Kim -
The scenario is:
1. create lots of node blocks
2. sync
3. write lots of inline_data
-> got panic due to no free spaceIn that case, we should flush node blocks when writing inline_data in #3,
and trigger gc as well.Signed-off-by: Jaegeuk Kim
12 Jan, 2016
1 commit
-
This patch adds last time that user requested filesystem operations.
This information is used to detect whether system is idle or not later.Signed-off-by: Jaegeuk Kim
31 Dec, 2015
1 commit
-
Sometimes we keep dumb when IO error occur in lower layer device, so user
will not receive any error return value for some operation, but actually,
the operation did not succeed.This sould be avoided, so this patch reports such kind of error to user.
Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim
05 Dec, 2015
1 commit
-
Use sbi->blocks_per_seg directly to avoid unnecessary calculation when using
1 << sbi->log_blocks_per_seg.Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim
14 Oct, 2015
2 commits
-
Once f2fs_gc is done, wait_ms is changed once more.
So, its tracepoint would be located after it.Reported-by: He YunLei
Signed-off-by: Jaegeuk Kim -
different competitors
Since we use different page cache (normally inode's page cache for R/W
and meta inode's page cache for GC) to cache the same physical block
which is belong to an encrypted inode. Writeback of these two page
cache should be exclusive, but now we didn't handle writeback state
well, so there may be potential racing problem:a)
kworker: f2fs_gc:
- f2fs_write_data_pages
- f2fs_write_data_page
- do_write_data_page
- write_data_page
- f2fs_submit_page_mbio
(page#1 in inode's page cache was queued
in f2fs bio cache, and be ready to write
to new blkaddr)
- gc_data_segment
- move_encrypted_block
- pagecache_get_page
(page#2 in meta inode's page cache
was cached with the invalid datas
of physical block located in new
blkaddr)
- f2fs_submit_page_mbio
(page#1 was submitted, later, page#2
with invalid data will be submitted)b)
f2fs_gc:
- gc_data_segment
- move_encrypted_block
- f2fs_submit_page_mbio
(page#1 in meta inode's page cache was
queued in f2fs bio cache, and be ready
to write to new blkaddr)
user thread:
- f2fs_write_begin
- f2fs_submit_page_bio
(we submit the request to block layer
to update page#2 in inode's page cache
with physical block located in new
blkaddr, so here we may read gabbage
data from new blkaddr since GC hasn't
writebacked the page#1 yet)This patch fixes above potential racing problem for encrypted inode.
Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim
13 Oct, 2015
1 commit
-
Now, we use ra_meta_pages to reads continuous physical blocks as much as
possible to improve performance of following reads. However, ra_meta_pages
uses a synchronous readahead approach by submitting bio with READ, as READ
is with high priority, it can not be used in the case of preloading blocks,
and it's not sure when these RAed pages will be used.This patch supports asynchronous readahead in ra_meta_pages by tagging bio
with READA flag in order to allow preloading.Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim