05 Sep, 2013
2 commits
-
This patch improves the gc efficiency by optimizing the victim
selection policy. With this optimization, the random re-write
performance could increase up to 20%.For f2fs, when disk is in shortage of free spaces, gc will selects
dirty segments and moves valid blocks around for making more space
available. The gc cost of a segment is determined by the valid blocks
in the segment. The less the valid blocks, the higher the efficiency.
The ideal victim segment is the one that has the most garbage blocks.Currently, it searches up to 20 dirty segments for a victim segment.
The selected victim is not likely the best victim for gc when there
are much more dirty segments. Why not searching more dirty segments
for a better victim? The cost of searching dirty segments is
negligible in comparison to moving blocks.In this patch, it enlarges the MAX_VICTIM_SEARCH to 4096 to make
the search more aggressively for a possible better victim. Since
it also applies to victim selection for SSR, it will likely improve
the SSR efficiency as well.The test case is simple. It creates as many files until the disk full.
The size for each file is 32KB. Then it writes as many as 100000
records of 4KB size to random offsets of random files in sync mode.
The testing was done on a 2GB partition of a SDHC card. Let's see the
test result of f2fs without and with the patch.---------------------------------------
2GB partition, SDHC
create 52023 files of size 32768 bytes
random re-write 100000 records of 4KB
---------------------------------------
| file creation (s) | rewrite time (s) | gc count | gc garbage blocks |
[no patch] 341 4227 1174 174840
[patched] 324 2958 645 106682It's obvious that, with the patch, f2fs finishes the test in 20+% less
time than without the patch. And internally it does much less gc with
higher efficiency than before.Since the performance improvement is related to gc, it might not be so
obvious for other tests that do not trigger gc as often as this one (
This is because f2fs selects dirty segments for SSR use most of the
time when free space is in shortage). The well-known iozone test tool
was not used for benchmarking the patch becuase it seems do not have
a test case that performs random re-write on a full disk.This patch is the revised version based on the suggestion from
Jaegeuk Kim.Signed-off-by: Jin Xu
[Jaegeuk Kim: suggested simpler solution]
Reviewed-by: Jaegeuk Kim
Signed-off-by: Jaegeuk Kim -
Previously, we experience bio traces as follows when running simple sequential
write test.f2fs_do_submit_bio: type = NODE, io = no sync, sector = 500104928, size = 4K
f2fs_do_submit_bio: type = NODE, io = no sync, sector = 499922208, size = 368K
f2fs_do_submit_bio: type = NODE, io = no sync, sector = 499914752, size = 140K-> total 512K
The first one is to write an indirect node block, and the others are to write
direct node blocks.The reason why there are two separate bios for direct node blocks is:
0. initial state
------------------ ------------------
| | |xxxxxxxx |
------------------ ------------------1. write 368K
------------------ ------------------
| | |xxxxxxxxWWWWWWWW|
------------------ ------------------2. write 140K
------------------ ------------------
|WWWWWWW | |xxxxxxxxWWWWWWWW|
------------------ ------------------This is because f2fs_write_node_pages tries to write just 512K totally, so that
we can lose the chance to merge more bios nicely.After this patch is applied, we can get the following bio traces.
f2fs_do_submit_bio: type = NODE, io = no sync, sector = 500103168, size = 8K
f2fs_do_submit_bio: type = NODE, io = no sync, sector = 500111368, size = 4K
f2fs_do_submit_bio: type = NODE, io = no sync, sector = 500107272, size = 512K
f2fs_do_submit_bio: type = NODE, io = no sync, sector = 500108296, size = 512K
f2fs_do_submit_bio: type = NODE, io = no sync, sector = 500109320, size = 500KAnd finally, we can improve the sequential write performance,
from 458.775 MB/s to 479.945 MB/s on SSD.Signed-off-by: Jaegeuk Kim
03 Sep, 2013
2 commits
-
The current f2fs uses all the block counts with 32 bit numbers, which is able to
cover about 15TB volume.But in calculation of utilization, f2fs multiplies the count by 100 which can
induce overflow.
This patch fixes this.Signed-off-by: Jaegeuk Kim
-
Previously, f2fs conducts SSR when free_sections() < overprovision_sections.
But, even though there are a lot of prefree segments, it can consider SSR only.
So, let's consider the number of prefree segments too for triggering SSR.Signed-off-by: Jaegeuk Kim
27 Aug, 2013
2 commits
-
Signed-off-by: Gu Zheng
Signed-off-by: Jaegeuk Kim -
The f2fs_set_link updates its parent inode number, so we should sync this to
the inode block.
Otherwise, the data can be lost after sudden-power-off.Signed-off-by: Jaegeuk Kim
26 Aug, 2013
6 commits
-
0. modified inode structure
--------------------------------------
metadata (e.g., i_mtime, i_ctime, etc)
--------------------------------------
direct pointers [0 ~ 873]inline xattrs (200 bytes by default)
indirect pointers [0 ~ 4]
--------------------------------------
node footer
--------------------------------------1. setxattr flow
- read_all_xattrs copies all the xattrs from inline and xattr node block.
- handle xattr entries
- write_all_xattrs copies modified xattrs into inline and xattr node block.2. getxattr flow
- read_all_xattrs copies all the xattrs from inline and xattr node block.
- check target entries3. Usage
# mount -t f2fs -o inline_xattr $DEV $MNTOnce mounted with the inline_xattr option, f2fs marks all the newly created
files to reserve an amount of inline xattr space explicitly inside the inode
block. Without the mount option, f2fs will not touch any existing files and
newly created files as well.Signed-off-by: Jaegeuk Kim
-
The truncate_xattr_node function will be used by inline xattr.
Signed-off-by: Jaegeuk Kim
-
The __find_xattr is to search the wanted xattr entry starting from the
base_addr.If not found, the returned entry is the last empty xattr entry that can be
allocated newly.Signed-off-by: Jaegeuk Kim
-
This patch enables the number of direct pointers inside on-disk inode block to
be changed dynamically according to the size of inline xattr space.The number of direct pointers, ADDRS_PER_INODE, can be changed only if the file
has inline xattr flag.The number of direct pointers that will be used by inline xattrs is defined as
F2FS_INLINE_XATTR_ADDRS.
Current patch assigns F2FS_INLINE_XATTR_ADDRS to 0 temporarily.Signed-off-by: Jaegeuk Kim
-
This patch adds basic inode flags for inline xattrs, F2FS_INLINE_XATTR,
and add a mount option, inline_xattr, which is enabled when xattr is set.If the mount option is enabled, all the files are marked with the inline_xattrs
flag.Signed-off-by: Jaegeuk Kim
-
Fix to return -ENOMEM in the kset create and add error handling
case instead of 0, as done elsewhere in this function.Introduced by commit b59d0bae6ca30c496f298881616258f9cde0d9c6.
(f2fs: add sysfs support for controlling the gc_thread)Signed-off-by: Wei Yongjun
Acked-by: Namjae Jeon
[Jaegeuk Kim: merge the patch with previous modification]
Signed-off-by: Jaegeuk Kim
20 Aug, 2013
2 commits
-
This patch removes a false-alaramed BUG_ON.
The previous BUG_ON condition didn't cover the following true scenario.In f2fs_add_link, 1) get_new_data_page gives an uptodate page successfully,
and then, 2) init_inode_metadata returns -ENOSPC.
At this moment, a new clean data page is remained in the page cache, but its
block address still indicates NEW_ADDR.
After then, even if sync is called, this clean data page cannot be written to
the disk due to the clean state.So this means that get_lock_data_page should make a new empty page when its
block address is NEW_ADDR and its page is not uptodated.Signed-off-by: Jaegeuk Kim
-
When any of the caches create fails in init_f2fs_fs(), the other caches which are
create successful should be free.Signed-off-by: Zhao Hongjiang
Signed-off-by: Jaegeuk Kim
19 Aug, 2013
3 commits
-
An error "label at end of compound statement" will occur if CONFIG_F2FS_STAT_FS
disabled.
fs/f2fs/segment.c:556:1: error: label at end of compound statement
So clean up the 'out' label to fix it.Reported-by: Fengguang Wu
Signed-off-by: Gu Zheng
Signed-off-by: Jaegeuk Kim -
In f2fs_write_inode, updating inode after f2fs_balance_fs is not
a optimized way in the case that f2fs_gc is performed ahead. The
inode page will be unnecessarily written out twice, one of which
is in f2fs_gc->...->sync_node_pages and the other is in
update_inode_page.Let's update the inode page in prior to f2fs_balance_fs to avoid
this.To reproduce it,
$ touch file (before this step, should make the device need f2fs_gc)
$ sync (or wait the bdi to write dirty inode)Signed-off-by: Jin Xu
Signed-off-by: Jaegeuk Kim -
alloc_page() returns a NULL on failure, it never returns an ERR_PTR.
Signed-off-by: Dan Carpenter
Signed-off-by: Jaegeuk Kim
12 Aug, 2013
3 commits
-
Previously, f2fs_setxattr assigns i_xattr_nid in the inode page inconsistently.
The scenario is:
= Thread 1 = = Thread 2 = = fi->i_xattr_nid = = on-disk nid =
f2fs_setxattr 0 0
new_node_page X 0
sync_inode_page X X
checkpoint X X -.
grab_cache_page X X |
--> allocate a new xattr node block or -ENOSPC -
Let's check the free space in prior to the main process of allocating a new node
page.Signed-off-by: Jaegeuk Kim
-
Signed-off-by: Gu Zheng
Signed-off-by: Jaegeuk Kim
09 Aug, 2013
3 commits
-
This patch introduces a new inline function, cur_cp_version, to reduce redundant
codes.Signed-off-by: Jaegeuk Kim
-
Previously xattr node blocks are stored to the COLD_NODE log, which means that
our roll-forward mechanism doesn't recover the xattr node blocks at all.
Only the direct node blocks in the WARM_NODE log can be recovered.So, let's resolve the issue simply by conducting checkpoint during fsync when a
file has a modified xattr node block.This approach is able to degrade the performance, but normally the checkpoint
overhead is shown at the initial fsync call after the xattr entry changes.
Once the checkpoint is done, no additional overhead would be occurred.Signed-off-by: Jaegeuk Kim
-
This patch fixes the use of XATTR_NODE_OFFSET.
o The offset should not use several MSB bits which are used by marking node
blocks.o IS_DNODE should handle XATTR_NODE_OFFSET to avoid potential abnormality
during the fsync call.Signed-off-by: Jaegeuk Kim
08 Aug, 2013
1 commit
-
This patch should resolve the following error reported by kbuild test robot.
All error/warnings:
In file included from fs/f2fs/dir.c:13:0:
>> fs/f2fs/f2fs.h:435:17: error: field 's_kobj' has incomplete type
struct kobject s_kobj;The failure was caused by missing the kobject header file in dir.c.
So, this patch move the header file to the right location, f2fs.h.CC: Namjae Jeon
Signed-off-by: Jaegeuk Kim
06 Aug, 2013
4 commits
-
This patch fixes a deadlock bug that occurs quite often when there are
concurrent write and fsync on a same file.Following is the simplified call trace when tasks get hung.
fsync thread:
- f2fs_sync_file
...
- f2fs_write_data_pages
...
- update_extent_cache
...
- update_inode
- wait_on_page_writebackbdi writeback thread
- __writeback_single_inode
- f2fs_write_data_pages
- mutex_lock(sbi->writepages)The deadlock happens when the fsync thread waits on a inode page that has
been added to the f2fs' cached bio sbi->bio[NODE], and unfortunately,
no one else could be able to submit the cached bio to block layer for
writeback. This is because the fsync thread already hold a sbi->fs_lock and
the sbi->writepages lock, causing the bdi thread being blocked when attempt
to write data pages for the same inode. At the same time, f2fs_gc thread
does not notice the situation and could not help. Even the sync syscall
gets blocked.To fix it, we could submit the cached bio first before waiting on a inode page
that is being written back.Signed-off-by: Jin Xu
[Jaegeuk Kim: add more cases to use f2fs_wait_on_page_writeback]
Signed-off-by: Jaegeuk Kim -
This code is being used for nobh_write_end() function.
But since now f2fs_write_end function is added so
there is no need for this code.Signed-off-by: Namjae Jeon
Signed-off-by: Pankaj Kumar
Signed-off-by: Jaegeuk Kim -
Add sysfs entry gc_idle to control the gc policy. Where
gc_idle = 1 corresponds to selecting a cost benefit approach,
while gc_idle = 2 corresponds to selecting a greedy approach
to garbage collection. The selection is mutually exclusive one
approach will work at any point. If gc_idle = 0, then this
option is disabled.Cc: Gu Zheng
Signed-off-by: Namjae Jeon
Signed-off-by: Pankaj Kumar
Reviewed-by: Gu Zheng
[Jaegeuk Kim: change the select_gc_type() flow slightly]
Signed-off-by: Jaegeuk Kim -
Add sysfs entries to control the timing parameters for
f2fs gc thread.Various Sysfs options introduced are:
gc_min_sleep_time: Min Sleep time for GC in ms
gc_max_sleep_time: Max Sleep time for GC in ms
gc_no_gc_sleep_time: Default Sleep time for GC in msCc: Gu Zheng
Signed-off-by: Namjae Jeon
Signed-off-by: Pankaj Kumar
Reviewed-by: Gu Zheng
[Jaegeuk Kim: fix an umount bug and some minor changes]
Signed-off-by: Jaegeuk Kim
31 Jul, 2013
1 commit
-
This kfree() is no longer needed after a79dc083d7 "f2fs: move
bio_private allocation out of f2fs_bio_alloc()". The "bio->bi_private"
is NULL here so it's a no-op.Signed-off-by: Dan Carpenter
Signed-off-by: Jaegeuk Kim
30 Jul, 2013
10 commits
-
This patch fixes mishandling of the sbi->n_orphans variable.
If users request lots of f2fs_unlink(), check_orphan_space() could be contended.
In such the case, sbi->n_orphans can be read incorrectly so that f2fs_unlink()
would fall into the wrong state which results in the failure of
add_orphan_inode().So, let's increment sbi->n_orphans virtually prior to the actual orphan inode
stuffs. After that, let's release sbi->n_orphans by calling release_orphan_inode
or remove_orphan_inode.Signed-off-by: Jaegeuk Kim
-
bio->bi_private is not always needed. As in the reading data path,
end_read_io does not need bio_private for further using, so moving
bio_private allocation out of f2fs_bio_alloc(). Alloc it in the
submit_write_page(), and ignore it in the f2fs_readpage().Signed-off-by: Gu Zheng
Signed-off-by: Jaegeuk Kim -
As we remove the target single node, so list_for_each is enought, in order to
clean up, we use list_for_each_entry instead.Signed-off-by: Gu Zheng
Signed-off-by: Jaegeuk Kim -
For string without format specifiers, using seq_puts()/seq_putc()
instead of seq_printf().Signed-off-by: Gu Zheng
Signed-off-by: Jaegeuk Kim -
As similar as the i_pino fix, i_name also should be fixed when i_nlink is 1.
The errorneous scenario is like this.
1. touch test1
2. link test1 test2
3. unlink test2
4. fsync test1After this, i_name should be test1.
CC: Al Viro
Signed-off-by: Jaegeuk Kim -
The error is reproducible by:
0. mkfs.f2fs /dev/sdb1 & mount
1. touch test1
2. touch test2
3. mv test1 test2
4. umount
5. dumpt.f2fs -i 4 /dev/sdb1After this, when we retrieve the inode->i_name of test2 by dump.f2fs, we get
test1 instead of test2.
This is because f2fs didn't update the file name during the f2fs_rename.So, this patch fixes that.
Signed-off-by: Jaegeuk Kim
-
Introduce help function F2FS_NODE() to simplify the conversion of node_page to
f2fs_node.Signed-off-by: Gu Zheng
Signed-off-by: Jaegeuk Kim -
Add a help func F2FS_STAT() to get the f2fs_stat_info.
Signed-off-by: Gu Zheng
Signed-off-by: Jaegeuk Kim -
You can monitor valid block counts of whole segments in:
/proc/fs/f2fs/sdb1/segment_info.Signed-off-by: Jaegeuk Kim
-
In order to support SQLite that uses fdatasync instead of fsync, we should
guarantee the data requested by fdatasync can be recovered after sudden-power-
off.So, let's remove the fdatasync condition in f2fs_sync_file.
Otherwise, we can restore the data after sudden-power-off due to nonexistence
of any fsync mark'ed node blocks.Signed-off-by: Jaegeuk Kim
08 Jul, 2013
1 commit
-
In the previous Al Viro's readdir patch set, there occurs a bug when
running
xfstest: 006 as follows.[Error output]
alpha size = 4, name length = 6, total files = 4096, nproc=1
1023 files created
rm: cannot remove `/mnt/f2fs/permname.15150/a': Directory not empty[Correct output]
alpha size = 4, name length = 6, total files = 4096, nproc=1
4097 files createdThis bug is due to the misupdate of directory position in ctx.
So, this patch fixes this.[AV: fixed a braino]
CC: Al Viro
Signed-off-by: Jaegeuk Kim
Signed-off-by: Al Viro