Doug / smarc-fsl-linux-kernel | Embedian Git Server

05 Sep, 2013

2 commits

a26b7c8a0 f2fs: optimize gc for better performance ... Browse Code »

This patch improves the gc efficiency by optimizing the victim
selection policy. With this optimization, the random re-write
performance could increase up to 20%.

For f2fs, when disk is in shortage of free spaces, gc will selects
dirty segments and moves valid blocks around for making more space
available. The gc cost of a segment is determined by the valid blocks
in the segment. The less the valid blocks, the higher the efficiency.
The ideal victim segment is the one that has the most garbage blocks.

Currently, it searches up to 20 dirty segments for a victim segment.
The selected victim is not likely the best victim for gc when there
are much more dirty segments. Why not searching more dirty segments
for a better victim? The cost of searching dirty segments is
negligible in comparison to moving blocks.

In this patch, it enlarges the MAX_VICTIM_SEARCH to 4096 to make
the search more aggressively for a possible better victim. Since
it also applies to victim selection for SSR, it will likely improve
the SSR efficiency as well.

The test case is simple. It creates as many files until the disk full.
The size for each file is 32KB. Then it writes as many as 100000
records of 4KB size to random offsets of random files in sync mode.
The testing was done on a 2GB partition of a SDHC card. Let's see the
test result of f2fs without and with the patch.

---------------------------------------
2GB partition, SDHC
create 52023 files of size 32768 bytes
random re-write 100000 records of 4KB
---------------------------------------
| file creation (s) | rewrite time (s) | gc count | gc garbage blocks |
[no patch] 341 4227 1174 174840
[patched] 324 2958 645 106682

It's obvious that, with the patch, f2fs finishes the test in 20+% less
time than without the patch. And internally it does much less gc with
higher efficiency than before.

Since the performance improvement is related to gc, it might not be so
obvious for other tests that do not trigger gc as often as this one (
This is because f2fs selects dirty segments for SSR use most of the
time when free space is in shortage). The well-known iozone test tool
was not used for benchmarking the patch becuase it seems do not have
a test case that performs random re-write on a full disk.

This patch is the revised version based on the suggestion from
Jaegeuk Kim.

Signed-off-by: Jin Xu
[Jaegeuk Kim: suggested simpler solution]
Reviewed-by: Jaegeuk Kim
Signed-off-by: Jaegeuk Kim

Jin Xu
2013-09-05 12:50:32 +0800
423e95ccb f2fs: merge more bios of node block writes ... Browse Code »

Previously, we experience bio traces as follows when running simple sequential
write test.

f2fs_do_submit_bio: type = NODE, io = no sync, sector = 500104928, size = 4K
f2fs_do_submit_bio: type = NODE, io = no sync, sector = 499922208, size = 368K
f2fs_do_submit_bio: type = NODE, io = no sync, sector = 499914752, size = 140K

-> total 512K

The first one is to write an indirect node block, and the others are to write
direct node blocks.

The reason why there are two separate bios for direct node blocks is:
0. initial state
------------------ ------------------
| | |xxxxxxxx |
------------------ ------------------

1. write 368K
------------------ ------------------
| | |xxxxxxxxWWWWWWWW|
------------------ ------------------

2. write 140K
------------------ ------------------
|WWWWWWW | |xxxxxxxxWWWWWWWW|
------------------ ------------------

This is because f2fs_write_node_pages tries to write just 512K totally, so that
we can lose the chance to merge more bios nicely.

After this patch is applied, we can get the following bio traces.

f2fs_do_submit_bio: type = NODE, io = no sync, sector = 500103168, size = 8K
f2fs_do_submit_bio: type = NODE, io = no sync, sector = 500111368, size = 4K
f2fs_do_submit_bio: type = NODE, io = no sync, sector = 500107272, size = 512K
f2fs_do_submit_bio: type = NODE, io = no sync, sector = 500108296, size = 512K
f2fs_do_submit_bio: type = NODE, io = no sync, sector = 500109320, size = 500K

And finally, we can improve the sequential write performance,
from 458.775 MB/s to 479.945 MB/s on SSD.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2013-09-05 09:17:19 +0800

03 Sep, 2013

2 commits

222cbdc48 f2fs: avoid an overflow during utilization calculation ... Browse Code »

The current f2fs uses all the block counts with 32 bit numbers, which is able to
cover about 15TB volume.

But in calculation of utilization, f2fs multiplies the count by 100 which can
induce overflow.
This patch fixes this.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2013-09-03 12:41:37 +0800
c34e333fd f2fs: trigger GC when there are prefree segments ... Browse Code »

Previously, f2fs conducts SSR when free_sections() < overprovision_sections.
But, even though there are a lot of prefree segments, it can consider SSR only.
So, let's consider the number of prefree segments too for triggering SSR.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2013-09-03 09:11:20 +0800

27 Aug, 2013

2 commits

749ebfd17 f2fs: use strncasecmp() simplify the string comparison ... Browse Code »

Signed-off-by: Gu Zheng
Signed-off-by: Jaegeuk Kim

Gu Zheng
2013-08-27 20:50:12 +0800
8cb826880 f2fs: fix omitting to update inode page ... Browse Code »

The f2fs_set_link updates its parent inode number, so we should sync this to
the inode block.
Otherwise, the data can be lost after sudden-power-off.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2013-08-27 20:49:04 +0800

26 Aug, 2013

6 commits

65985d935 f2fs: support the inline xattrs ... Browse Code »

0. modified inode structure
--------------------------------------
metadata (e.g., i_mtime, i_ctime, etc)
--------------------------------------
direct pointers [0 ~ 873]

inline xattrs (200 bytes by default)

indirect pointers [0 ~ 4]
--------------------------------------
node footer
--------------------------------------

1. setxattr flow
- read_all_xattrs copies all the xattrs from inline and xattr node block.
- handle xattr entries
- write_all_xattrs copies modified xattrs into inline and xattr node block.

2. getxattr flow
- read_all_xattrs copies all the xattrs from inline and xattr node block.
- check target entries

3. Usage
# mount -t f2fs -o inline_xattr $DEV $MNT

Once mounted with the inline_xattr option, f2fs marks all the newly created
files to reserve an amount of inline xattr space explicitly inside the inode
block. Without the mount option, f2fs will not touch any existing files and
newly created files as well.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2013-08-26 19:15:23 +0800
4f16fb0f9 f2fs: add the truncate_xattr_node function ... Browse Code »

The truncate_xattr_node function will be used by inline xattr.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2013-08-26 19:15:06 +0800
dd9cfe236 f2fs: introduce __find_xattr for readability ... Browse Code »

The __find_xattr is to search the wanted xattr entry starting from the
base_addr.

If not found, the returned entry is the last empty xattr entry that can be
allocated newly.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2013-08-26 19:15:06 +0800
de93653fe f2fs: reserve the xattr space dynamically ... Browse Code »

This patch enables the number of direct pointers inside on-disk inode block to
be changed dynamically according to the size of inline xattr space.

The number of direct pointers, ADDRS_PER_INODE, can be changed only if the file
has inline xattr flag.

The number of direct pointers that will be used by inline xattrs is defined as
F2FS_INLINE_XATTR_ADDRS.
Current patch assigns F2FS_INLINE_XATTR_ADDRS to 0 temporarily.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2013-08-26 19:15:01 +0800
444c580f7 f2fs: add flags for inline xattrs ... Browse Code »

This patch adds basic inode flags for inline xattrs, F2FS_INLINE_XATTR,
and add a mount option, inline_xattr, which is enabled when xattr is set.

If the mount option is enabled, all the files are marked with the inline_xattrs
flag.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2013-08-26 19:02:12 +0800
6e6b978c3 f2fs: fix error return code in init_f2fs_fs() ... Browse Code »

Fix to return -ENOMEM in the kset create and add error handling
case instead of 0, as done elsewhere in this function.

Introduced by commit b59d0bae6ca30c496f298881616258f9cde0d9c6.
(f2fs: add sysfs support for controlling the gc_thread)

Signed-off-by: Wei Yongjun
Acked-by: Namjae Jeon
[Jaegeuk Kim: merge the patch with previous modification]
Signed-off-by: Jaegeuk Kim

Wei Yongjun
2013-08-26 18:36:46 +0800

20 Aug, 2013

2 commits

d59ff4df7 f2fs: fix wrong BUG_ON condition ... Browse Code »

This patch removes a false-alaramed BUG_ON.
The previous BUG_ON condition didn't cover the following true scenario.

In f2fs_add_link, 1) get_new_data_page gives an uptodate page successfully,
and then, 2) init_inode_metadata returns -ENOSPC.
At this moment, a new clean data page is remained in the page cache, but its
block address still indicates NEW_ADDR.
After then, even if sync is called, this clean data page cannot be written to
the disk due to the clean state.

So this means that get_lock_data_page should make a new empty page when its
block address is NEW_ADDR and its page is not uptodated.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2013-08-20 18:32:48 +0800
9890ff3f2 f2fs: fix memory leak when init f2fs filesystem fail ... Browse Code »

When any of the caches create fails in init_f2fs_fs(), the other caches which are
create successful should be free.

Signed-off-by: Zhao Hongjiang
Signed-off-by: Jaegeuk Kim

Zhao Hongjiang
2013-08-20 17:58:44 +0800

19 Aug, 2013

3 commits

7b4052750 f2fs: fix a compound statement label error ... Browse Code »

An error "label at end of compound statement" will occur if CONFIG_F2FS_STAT_FS
disabled.
fs/f2fs/segment.c:556:1: error: label at end of compound statement
So clean up the 'out' label to fix it.

Reported-by: Fengguang Wu
Signed-off-by: Gu Zheng
Signed-off-by: Jaegeuk Kim

Gu Zheng
2013-08-19 10:51:08 +0800
92c4342fb f2fs: avoid writing inode redundantly when creating a file ... Browse Code »

In f2fs_write_inode, updating inode after f2fs_balance_fs is not
a optimized way in the case that f2fs_gc is performed ahead. The
inode page will be unnecessarily written out twice, one of which
is in f2fs_gc->...->sync_node_pages and the other is in
update_inode_page.

Let's update the inode page in prior to f2fs_balance_fs to avoid
this.

To reproduce it,
$ touch file (before this step, should make the device need f2fs_gc)
$ sync (or wait the bdi to write dirty inode)

Signed-off-by: Jin Xu
Signed-off-by: Jaegeuk Kim

Jin Xu
2013-08-19 08:43:25 +0800
e27dae4d6 f2fs: alloc_page() doesn't return an ERR_PTR ... Browse Code »

alloc_page() returns a NULL on failure, it never returns an ERR_PTR.

Signed-off-by: Dan Carpenter
Signed-off-by: Jaegeuk Kim

Dan Carpenter
2013-08-19 08:42:29 +0800

12 Aug, 2013

3 commits

479bd73ac f2fs: should cover i_xattr_nid with its xattr node page lock ... Browse Code »

Previously, f2fs_setxattr assigns i_xattr_nid in the inode page inconsistently.

The scenario is:

= Thread 1 = = Thread 2 = = fi->i_xattr_nid = = on-disk nid =

f2fs_setxattr 0 0
new_node_page X 0
sync_inode_page X X
checkpoint X X -.
grab_cache_page X X |
--> allocate a new xattr node block or -ENOSPC

Jaegeuk Kim
2013-08-12 15:04:53 +0800
9c02740c0 f2fs: check the free space first in new_node_page ... Browse Code »

Let's check the free space in prior to the main process of allocating a new node
page.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2013-08-12 15:00:46 +0800
41dfde135 f2fs: clean up the needless end 'return' of void function ... Browse Code »

Signed-off-by: Gu Zheng
Signed-off-by: Jaegeuk Kim

Gu Zheng
2013-08-12 10:49:22 +0800

09 Aug, 2013

3 commits

d71b5564c f2fs: introduce cur_cp_version function to reduce code size ... Browse Code »

This patch introduces a new inline function, cur_cp_version, to reduce redundant
codes.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2013-08-09 14:25:37 +0800
e518ff81c f2fs: fix inconsistency between xattr node blocks and its inode ... Browse Code »

Previously xattr node blocks are stored to the COLD_NODE log, which means that
our roll-forward mechanism doesn't recover the xattr node blocks at all.
Only the direct node blocks in the WARM_NODE log can be recovered.

So, let's resolve the issue simply by conducting checkpoint during fsync when a
file has a modified xattr node block.

This approach is able to degrade the performance, but normally the checkpoint
overhead is shown at the initial fsync call after the xattr entry changes.
Once the checkpoint is done, no additional overhead would be occurred.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2013-08-09 14:25:24 +0800
dbe6a5ff4 f2fs: fix the use of XATTR_NODE_OFFSET ... Browse Code »

This patch fixes the use of XATTR_NODE_OFFSET.

o The offset should not use several MSB bits which are used by marking node
blocks.

o IS_DNODE should handle XATTR_NODE_OFFSET to avoid potential abnormality
during the fsync call.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2013-08-09 13:57:56 +0800

08 Aug, 2013

1 commit

c2d715d14 f2fs: fix a build failure due to missing the kobject header ... Browse Code »

This patch should resolve the following error reported by kbuild test robot.

All error/warnings:

In file included from fs/f2fs/dir.c:13:0:
>> fs/f2fs/f2fs.h:435:17: error: field 's_kobj' has incomplete type
struct kobject s_kobj;

The failure was caused by missing the kobject header file in dir.c.
So, this patch move the header file to the right location, f2fs.h.

CC: Namjae Jeon
Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2013-08-08 14:56:49 +0800

06 Aug, 2013

4 commits

a569469e9 f2fs: fix a deadlock in fsync ... Browse Code »

This patch fixes a deadlock bug that occurs quite often when there are
concurrent write and fsync on a same file.

Following is the simplified call trace when tasks get hung.

fsync thread:
- f2fs_sync_file
...
- f2fs_write_data_pages
...
- update_extent_cache
...
- update_inode
- wait_on_page_writeback

bdi writeback thread
- __writeback_single_inode
- f2fs_write_data_pages
- mutex_lock(sbi->writepages)

The deadlock happens when the fsync thread waits on a inode page that has
been added to the f2fs' cached bio sbi->bio[NODE], and unfortunately,
no one else could be able to submit the cached bio to block layer for
writeback. This is because the fsync thread already hold a sbi->fs_lock and
the sbi->writepages lock, causing the bdi thread being blocked when attempt
to write data pages for the same inode. At the same time, f2fs_gc thread
does not notice the situation and could not help. Even the sync syscall
gets blocked.

To fix it, we could submit the cached bio first before waiting on a inode page
that is being written back.

Signed-off-by: Jin Xu
[Jaegeuk Kim: add more cases to use f2fs_wait_on_page_writeback]
Signed-off-by: Jaegeuk Kim

Jin Xu
2013-08-06 21:00:36 +0800
df273efc3 f2fs: remove redundant code from f2fs_write_begin ... Browse Code »

This code is being used for nobh_write_end() function.
But since now f2fs_write_end function is added so
there is no need for this code.

Signed-off-by: Namjae Jeon
Signed-off-by: Pankaj Kumar
Signed-off-by: Jaegeuk Kim

Namjae Jeon
2013-08-06 21:00:35 +0800
d2dc095f4 f2fs: add sysfs entries to select the gc policy ... Browse Code »

Add sysfs entry gc_idle to control the gc policy. Where
gc_idle = 1 corresponds to selecting a cost benefit approach,
while gc_idle = 2 corresponds to selecting a greedy approach
to garbage collection. The selection is mutually exclusive one
approach will work at any point. If gc_idle = 0, then this
option is disabled.

Cc: Gu Zheng
Signed-off-by: Namjae Jeon
Signed-off-by: Pankaj Kumar
Reviewed-by: Gu Zheng
[Jaegeuk Kim: change the select_gc_type() flow slightly]
Signed-off-by: Jaegeuk Kim

Namjae Jeon
2013-08-06 21:00:18 +0800
b59d0bae6 f2fs: add sysfs support for controlling the gc_thread ... Browse Code »

Add sysfs entries to control the timing parameters for
f2fs gc thread.

Various Sysfs options introduced are:
gc_min_sleep_time: Min Sleep time for GC in ms
gc_max_sleep_time: Max Sleep time for GC in ms
gc_no_gc_sleep_time: Default Sleep time for GC in ms

Cc: Gu Zheng
Signed-off-by: Namjae Jeon
Signed-off-by: Pankaj Kumar
Reviewed-by: Gu Zheng
[Jaegeuk Kim: fix an umount bug and some minor changes]
Signed-off-by: Jaegeuk Kim

Namjae Jeon
2013-08-06 20:53:34 +0800

31 Jul, 2013

1 commit

f0c5e565b f2fs: remove an unneeded kfree(NULL) ... Browse Code »

This kfree() is no longer needed after a79dc083d7 "f2fs: move
bio_private allocation out of f2fs_bio_alloc()". The "bio->bi_private"
is NULL here so it's a no-op.

Signed-off-by: Dan Carpenter
Signed-off-by: Jaegeuk Kim

Dan Carpenter
2013-07-31 18:07:01 +0800

30 Jul, 2013

10 commits

cbd56e7d2 f2fs: fix handling orphan inodes ... Browse Code »

This patch fixes mishandling of the sbi->n_orphans variable.

If users request lots of f2fs_unlink(), check_orphan_space() could be contended.
In such the case, sbi->n_orphans can be read incorrectly so that f2fs_unlink()
would fall into the wrong state which results in the failure of
add_orphan_inode().

So, let's increment sbi->n_orphans virtually prior to the actual orphan inode
stuffs. After that, let's release sbi->n_orphans by calling release_orphan_inode
or remove_orphan_inode.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2013-07-30 14:17:03 +0800
d8207f695 f2fs: move bio_private allocation out of f2fs_bio_alloc() ... Browse Code »

bio->bi_private is not always needed. As in the reading data path,
end_read_io does not need bio_private for further using, so moving
bio_private allocation out of f2fs_bio_alloc(). Alloc it in the
submit_write_page(), and ignore it in the f2fs_readpage().

Signed-off-by: Gu Zheng
Signed-off-by: Jaegeuk Kim

Gu Zheng
2013-07-30 14:17:03 +0800
60ed9a0f5 f2fs: use list_for_each rather than list_for_each_safe, in remove_orphan_inode() ... Browse Code »

As we remove the target single node, so list_for_each is enought, in order to
clean up, we use list_for_each_entry instead.

Signed-off-by: Gu Zheng
Signed-off-by: Jaegeuk Kim

Gu Zheng
2013-07-30 14:17:03 +0800
2d219c518 f2fs: use seq_puts()/seq_putc() rather than seq_printf() where possible ... Browse Code »

For string without format specifiers, using seq_puts()/seq_putc()
instead of seq_printf().

Signed-off-by: Gu Zheng
Signed-off-by: Jaegeuk Kim

Gu Zheng
2013-07-30 14:17:03 +0800
f0947e5cc f2fs: fix i_name during f2fs_sync_file ... Browse Code »

As similar as the i_pino fix, i_name also should be fixed when i_nlink is 1.

The errorneous scenario is like this.

1. touch test1
2. link test1 test2
3. unlink test2
4. fsync test1

After this, i_name should be test1.

CC: Al Viro
Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2013-07-30 14:17:03 +0800
1cd14cafc f2fs: update file name in the inode block during f2fs_rename ... Browse Code »

The error is reproducible by:
0. mkfs.f2fs /dev/sdb1 & mount
1. touch test1
2. touch test2
3. mv test1 test2
4. umount
5. dumpt.f2fs -i 4 /dev/sdb1

After this, when we retrieve the inode->i_name of test2 by dump.f2fs, we get
test1 instead of test2.
This is because f2fs didn't update the file name during the f2fs_rename.

So, this patch fixes that.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2013-07-30 14:17:03 +0800
455907106 f2fs: introduce help function F2FS_NODE() ... Browse Code »

Introduce help function F2FS_NODE() to simplify the conversion of node_page to
f2fs_node.

Signed-off-by: Gu Zheng
Signed-off-by: Jaegeuk Kim

Gu Zheng
2013-07-30 14:17:02 +0800
963d4f7d7 f2fs: add a help func F2FS_STAT() to get the f2fs_stat_info ... Browse Code »

Add a help func F2FS_STAT() to get the f2fs_stat_info.

Signed-off-by: Gu Zheng
Signed-off-by: Jaegeuk Kim

Gu Zheng
2013-07-30 14:17:02 +0800
5e176d54a f2fs: add proc entry to monitor current usage of segments ... Browse Code »

You can monitor valid block counts of whole segments in:
/proc/fs/f2fs/sdb1/segment_info.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2013-07-30 14:17:02 +0800
e5d2385ed f2fs: recover date requested by fdatasync ... Browse Code »

In order to support SQLite that uses fdatasync instead of fsync, we should
guarantee the data requested by fdatasync can be recovered after sudden-power-
off.

So, let's remove the fdatasync condition in f2fs_sync_file.
Otherwise, we can restore the data after sudden-power-off due to nonexistence
of any fsync mark'ed node blocks.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2013-07-30 14:17:02 +0800

08 Jul, 2013

1 commit

99b072bb3 f2fs: fix readdir incorrectness ... Browse Code »

In the previous Al Viro's readdir patch set, there occurs a bug when
running
xfstest: 006 as follows.

[Error output]
alpha size = 4, name length = 6, total files = 4096, nproc=1
1023 files created
rm: cannot remove `/mnt/f2fs/permname.15150/a': Directory not empty

[Correct output]
alpha size = 4, name length = 6, total files = 4096, nproc=1
4097 files created

This bug is due to the misupdate of directory position in ctx.
So, this patch fixes this.

[AV: fixed a braino]

CC: Al Viro
Signed-off-by: Jaegeuk Kim
Signed-off-by: Al Viro

Jaegeuk Kim
2013-07-08 17:35:48 +0800