Eric Lee / smarc-ti-linux-kernel | Embedian Git Server

08 Oct, 2014

1 commit

02a1335f2 f2fs: support volatile operations for transient data ... Browse Code »

This patch adds support for volatile writes which keep data pages in memory
until f2fs_evict_inode is called by iput.

For instance, we can use this feature for the sqlite database as follows.
While supporting atomic writes for main database file, we can keep its journal
data temporarily in the page cache by the following sequence.

1. open
-> ioctl(F2FS_IOC_START_VOLATILE_WRITE);
2. writes
: keep all the data in the page cache.
3. flush to the database file with atomic writes
a. ioctl(F2FS_IOC_START_ATOMIC_WRITE);
b. writes
c. ioctl(F2FS_IOC_COMMIT_ATOMIC_WRITE);
4. close
-> drop the cached data

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2014-10-08 02:54:41 +0800

07 Oct, 2014

1 commit

88b88a667 f2fs: support atomic writes ... Browse Code »

This patch introduces a very limited functionality for atomic write support.
In order to support atomic write, this patch adds two ioctls:
o F2FS_IOC_START_ATOMIC_WRITE
o F2FS_IOC_COMMIT_ATOMIC_WRITE

The database engine should be aware of the following sequence.
1. open
-> ioctl(F2FS_IOC_START_ATOMIC_WRITE);
2. writes
: all the written data will be treated as atomic pages.
3. commit
-> ioctl(F2FS_IOC_COMMIT_ATOMIC_WRITE);
: this flushes all the data blocks to the disk, which will be shown all or
nothing by f2fs recovery procedure.
4. repeat to #2.

The IO pattens should be:

,- START_ATOMIC_WRITE ,- COMMIT_ATOMIC_WRITE
CP | D D D D D D | FSYNC | D D D D | FSYNC ...
`- COMMIT_ATOMIC_WRITE

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2014-10-07 08:39:50 +0800

01 Oct, 2014

4 commits

44c161565 f2fs: call f2fs_unlock_op after error was handled ... Browse Code »

This patch relocates f2fs_unlock_op in every directory operations to be called
after any error was processed.
Otherwise, the checkpoint can be entered with valid node ids without its
dentry when -ENOSPC is occurred.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2014-10-01 06:34:55 +0800
309cc2b6e f2fs: refactor flush_nat_entries to remove costly reorganizing ops ... Browse Code »

Previously, f2fs tries to reorganize the dirty nat entries into multiple sets
according to its nid ranges. This can improve the flushing nat pages, however,
if there are a lot of cached nat entries, it becomes a bottleneck.

This patch introduces a new set management flow by removing dirty nat list and
adding a series of set operations when the nat entry becomes dirty.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2014-10-01 06:30:41 +0800
4b2fecc84 f2fs: introduce FITRIM in f2fs_ioctl ... Browse Code »

This patch introduces FITRIM in f2fs_ioctl.
In this case, f2fs will issue small discards and prefree discards as many as
possible for the given area.

Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2014-10-01 06:06:09 +0800
75ab4cb83 f2fs: introduce cp_control structure ... Browse Code »

This patch add a new data structure to control checkpoint parameters.
Currently, it presents the reason of checkpoint such as is_umount and normal
sync.

Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2014-10-01 06:01:28 +0800

24 Sep, 2014

3 commits

c52e1b10b f2fs: remove redundant operation during roll-forward recovery ... Browse Code »

If same data is updated multiple times, we don't need to redo whole the
operations.
Let's just update the lastest one.

Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2014-09-24 02:10:17 +0800
88bd02c94 f2fs: fix conditions to remain recovery information in f2fs_sync_file ... Browse Code »

This patch revisited whole the recovery information during the f2fs_sync_file.

In this patch, there are three information to make a decision.

a) IS_CHECKPOINTED, /* is it checkpointed before? */
b) HAS_FSYNCED_INODE, /* is the inode fsynced before? */
c) HAS_LAST_FSYNC, /* has the latest node fsync mark? */

And, the scenarios for our rule are based on:

[Term] F: fsync_mark, D: dentry_mark

1. inode(x) | CP | inode(x) | dnode(F)
2. inode(x) | CP | inode(F) | dnode(F)
3. inode(x) | CP | dnode(F) | inode(x) | inode(F)
4. inode(x) | CP | dnode(F) | inode(F)
5. CP | inode(x) | dnode(F) | inode(DF)
6. CP | inode(DF) | dnode(F)
7. CP | dnode(F) | inode(DF)
8. CP | dnode(F) | inode(x) | inode(DF)

For example, #3, the three conditions should be changed as follows.

inode(x) | CP | dnode(F) | inode(x) | inode(F)
a) x o o o o
b) x x x x o
c) x o o x o

If f2fs_sync_file stops ------^,
it should write inode(F) --------------^

So, the need_inode_block_update should return true, since
c) get_nat_flag(e, HAS_LAST_FSYNC), is false.

For example, #8,
CP | alloc | dnode(F) | inode(x) | inode(DF)
a) o x x x x
b) x x x o
c) o o x o

If f2fs_sync_file stops -------^,
it should write inode(DF) --------------^

Note that, the roll-forward policy should follow this rule, which means,
if there are any missing blocks, we doesn't need to recover that inode.

Signed-off-by: Huang Ying
Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2014-09-24 02:10:15 +0800
4c521f493 f2fs: use meta_inode cache to improve roll-forward speed ... Browse Code »

Previously, all the dnode pages should be read during the roll-forward recovery.
Even worsely, whole the chain was traversed twice.
This patch removes that redundant and costly read operations by using page cache
of meta_inode and readahead function as well.

Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2014-09-24 02:10:12 +0800

16 Sep, 2014

2 commits

c1ce1b02b f2fs: give an option to enable in-place-updates during fsync to users ... Browse Code »

If user wrote F2FS_IPU_FSYNC:4 in /sys/fs/f2fs/ipu_policy, f2fs_sync_file
only starts to try in-place-updates.
And, if the number of dirty pages is over /sys/fs/f2fs/min_fsync_blocks, it
keeps out-of-order manner. Otherwise, it triggers in-place-updates.

This may be used by storage showing very high random write performance.

For example, it can be used when,

Seq. writes (Data) + wait + Seq. writes (Node)

is pretty much slower than,

Rand. writes (Data)

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2014-09-16 19:10:44 +0800
a7ffdbe22 f2fs: expand counting dirty pages in the inode page cache ... Browse Code »

Previously f2fs only counts dirty dentry pages, but there is no reason not to
expand the scope.

This patch changes the names on the management of dirty pages and to count
dirty pages in each inode info as well.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2014-09-16 19:10:39 +0800

10 Sep, 2014

4 commits

721bd4d5c f2fs: use lock-less list(llist) to simplify the flush cmd management ... Browse Code »

We use flush cmd control to collect many flush cmds, and flush them
together. In this case, we use two list to manage the flush cmds
(collect and dispatch), and one spin lock is used to protect this.
In fact, the lock-less list(llist) is very suitable to this case,
and we use simplify this routine.

-
v2:
-use llist_for_each_entry_safe to fix possible use-after-free issue.
-remove the unused field from struct flush_cmd.
Thanks for Yu's suggestion.
-

Signed-off-by: Gu Zheng
Signed-off-by: Jaegeuk Kim

Gu Zheng
2014-09-10 04:15:06 +0800
184a5cd2c f2fs: refactor flush_sit_entries codes for reducing SIT writes ... Browse Code »

In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:

"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."

Actually, we have the same problem in using SIT journal area.

In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.

In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.

In my testing environment, it shows this patch can help to reduce SIT block
update obviously.

virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070

Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2014-09-10 04:15:05 +0800
9850cf4a8 f2fs: need fsck.f2fs when f2fs_bug_on is triggered ... Browse Code »

If any f2fs_bug_on is triggered, fsck.f2fs is needed.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2014-09-10 04:15:02 +0800
2ae4c673e f2fs: retain inconsistency information to initiate fsck.f2fs ... Browse Code »

This patch adds sbi->need_fsck to conduct fsck.f2fs later.
This flag can only be removed by fsck.f2fs.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2014-09-10 04:14:25 +0800

04 Sep, 2014

1 commit

4081363fb f2fs: introduce F2FS_I_SB, F2FS_M_SB, and F2FS_P_SB ... Browse Code »

This patch adds three inline functions to clean up dirty casting codes.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2014-09-04 08:37:13 +0800

22 Aug, 2014

5 commits

202095a7a f2fs: remove rewrite_node_page ... Browse Code »

I think we need to let the dirty node pages remain in the page cache instead
of rewriting them in their places.
So, after done with successful recovery, write_checkpoint will flush all of them
through the normal write path.
Through this, we can avoid potential error cases in terms of block allocation.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2014-08-22 04:57:02 +0800
764aa3e97 f2fs: avoid double lock in truncate_blocks ... Browse Code »

The init_inode_metadata calls truncate_blocks when error is occurred.
The callers holds f2fs_lock_op, so we should not call it again in
truncate_blocks.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2014-08-22 04:57:01 +0800
b3fe0a0da f2fs: add WARN_ON in f2fs_bug_on ... Browse Code »

This patch adds WARN_ON when f2fs_bug_on is disable to see kernel messages.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2014-08-22 04:56:59 +0800
1e968fdfe f2fs: introduce f2fs_cp_error for readability ... Browse Code »

This patch adds f2fs_cp_error for readability.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2014-08-22 00:21:00 +0800
6f12ac25f f2fs: trigger release_dirty_inode in f2fs_put_super ... Browse Code »

The generic_shutdown_super calls sync_filesystem, evict_inode, and then
f2fs_put_super. In f2fs_evict_inode, we remain some dirty inode information
so we should release them at f2fs_put_super.

Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2014-08-22 00:20:29 +0800

20 Aug, 2014

4 commits

1c35a90e8 f2fs: fix to recover inline_xattr/data and blocks ... Browse Code »

This patch fixes not to skip xattr recovery and inline xattr/data recovery
order.

Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2014-08-20 01:01:34 +0800
0342fd301 f2fs: make clear on test condition and return types ... Browse Code »

This patch adds a parentheses to make clear for condition check.
And also it changes the return type for better meanings.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2014-08-20 01:01:33 +0800
b067ba1f1 f2fs: should convert inline_data during the mkwrite ... Browse Code »

If mkwrite is called to an inode having inline_data, it can overwrite the data
index space as NEW_ADDR. (e.g., the first 4 bytes are coincidently zero)

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2014-08-20 01:01:33 +0800
e1c420452 f2fs: fix typo ... Browse Code »

Fix typo and some grammatical errors.

The words "filesystem" and "readahead" are being used without the space treewide.

Signed-off-by: Park Ju Hyung
Signed-off-by: Jaegeuk Kim

arter97
2014-08-20 01:01:33 +0800

02 Aug, 2014

1 commit

70cfed88e f2fs: avoid skipping recover_inline_xattr after recover_inline_data ... Browse Code »

When we recover data of inode in roll-forward procedure, and the inode has both
inline data and inline xattr. We may skip recovering inline xattr if we recover
inline data form node page first.
This patch will fix the problem that we lost inline xattr data in above
scenario.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2014-08-02 22:43:51 +0800

31 Jul, 2014

5 commits

b3582c689 f2fs: reduce competition among node page writes ... Browse Code »

We do not need to block on ->node_write among different node page writers e.g.
fsync/flush, unless we have a node page writer from write_checkpoint.
So it's better use rw_semaphore instead of mutex type for ->node_write to
promote performance.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2014-07-31 14:28:37 +0800
65b85ccce f2fs: fix coding style ... Browse Code »

This patch fixes wrong coding style.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2014-07-31 08:25:54 +0800
cf2271e78 f2fs: avoid retrying wrong recovery routine when error was occurred ... Browse Code »

This patch eliminates the propagation of recovery errors to the next mount.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2014-07-31 05:13:35 +0800
61e0f2d0a f2fs: test before set/clear bits ... Browse Code »

If the bit is already set, we don't need to reset it, and vice versa.
Because we don't need to make the caches dirty for that.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2014-07-31 05:13:35 +0800
ea1aa12ca f2fs: enable in-place-update for fdatasync ... Browse Code »

This patch enforces in-place-updates only when fdatasync is requested.
If we adopt this in-place-updates for the fdatasync, we can skip to write the
recovery information.

Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2014-07-31 05:13:23 +0800

29 Jul, 2014

4 commits

fff04f90c f2fs: add info of appended or updated data writes ... Browse Code »

This patch introduces a inode number list in which represents inodes having
appended data writes or updated data writes after last checkpoint.
This will be used at fsync to determine whether the recovery information
should be written or not.

Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2014-07-29 22:46:11 +0800
39efac41f f2fs: use radix_tree for ino management ... Browse Code »

For better ino management, this patch replaces the data structure from list
to radix tree.

Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2014-07-29 22:46:11 +0800
6451e041c f2fs: add infra for ino management ... Browse Code »

This patch changes the naming of orphan-related data structures to use as
inode numbers managed globally.
Later, we can use this facility for managing any inode number lists.

Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2014-07-29 22:45:54 +0800
0f7b2abd1 f2fs: add nobarrier mount option ... Browse Code »

This patch adds a mount option, nobarrier, in f2fs.
The assumption in here is that file system keeps the IO ordering, but
doesn't care about cache flushes inside the storages.

Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2014-07-29 20:27:48 +0800

12 Jul, 2014

1 commit

4b2868aa4 f2fs: remove the unused stat_lock ... Browse Code »

Signed-off-by: Gu Zheng
Signed-off-by: Jaegeuk Kim

Gu Zheng
2014-07-12 06:01:48 +0800

10 Jul, 2014

4 commits

eee6160f2 f2fs: arguments cleanup of finding file flow functions ... Browse Code »

Signed-off-by: Gu Zheng
Signed-off-by: Jaegeuk Kim

Gu Zheng
2014-07-10 05:04:26 +0800
aec71382c f2fs: refactor flush_nat_entries codes for reducing NAT writes ... Browse Code »
13

Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time.

In this patch we merge dirty entries located in same NAT block to nat entry set,
and linked all set to list, sorted ascending order by entries' count of set.
Later we flush entries in sparse set into journal as many as we can, and then
flush merged entries to disk. In this way we can not only gain in performance,
but also save lifetime of flash device.

In my testing environment, it shows this patch can help to reduce NAT block
writes obviously. In hard disk test case: cost time of fsstress is stablely
reduced by about 5%.

1. virtual machine + hard disk:
fsstress -p 20 -n 200 -l 5
node num cp count nodes/cp
based 4599.6 1803.0 2.551
patched 2714.6 1829.6 1.483

2. virtual machine + 32g micro SD card:
fsstress -p 20 -n 200 -l 1 -w -f chown=0 -f creat=4 -f dwrite=0
-f fdatasync=4 -f fsync=4 -f link=0 -f mkdir=4 -f mknod=4 -f rename=5
-f rmdir=5 -f symlink=0 -f truncate=4 -f unlink=5 -f write=0 -S

node num cp count nodes/cp
based 84.5 43.7 1.933
patched 49.2 40.0 1.23

Our latency of merging op shows not bad when handling extreme case like:
merging a great number of dirty nats:
latency(ns) dirty nat count
3089219 24922
5129423 27422
4000250 24523

change log from v1:
o fix wrong logic in add_nat_entry when grab a new nat entry set.
o swith to create slab cache in create_node_manager_caches.
o use GFP_ATOMIC instead of GFP_NOFS to avoid potential long latency.

change log from v2:
o make comment position more appropriate suggested by Jaegeuk Kim.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2014-07-10 05:04:25 +0800
a014e037b f2fs: clean up an unused parameter and assignment ... Browse Code »

This patch cleans up simple unnecessary codes.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2014-07-10 05:04:25 +0800
b97a9b5da f2fs: introduce f2fs_do_tmpfile for code consistency ... Browse Code »

This patch adds f2fs_do_tmpfile to eliminate the redundant init_inode_metadata
flow.
Throught this, we can provide the consistent lock usage, e.g., fi->i_sem, and
this will enable better debugging stuffs.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2014-07-10 05:04:24 +0800