Eric Lee / smarc-ti-linux-kernel | Embedian Git Server

08 Oct, 2014

1 commit

02a1335f2 f2fs: support volatile operations for transient data ... Browse Code »

This patch adds support for volatile writes which keep data pages in memory
until f2fs_evict_inode is called by iput.

For instance, we can use this feature for the sqlite database as follows.
While supporting atomic writes for main database file, we can keep its journal
data temporarily in the page cache by the following sequence.

1. open
-> ioctl(F2FS_IOC_START_VOLATILE_WRITE);
2. writes
: keep all the data in the page cache.
3. flush to the database file with atomic writes
a. ioctl(F2FS_IOC_START_ATOMIC_WRITE);
b. writes
c. ioctl(F2FS_IOC_COMMIT_ATOMIC_WRITE);
4. close
-> drop the cached data

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2014-10-08 02:54:41 +0800

07 Oct, 2014

1 commit

88b88a667 f2fs: support atomic writes ... Browse Code »

This patch introduces a very limited functionality for atomic write support.
In order to support atomic write, this patch adds two ioctls:
o F2FS_IOC_START_ATOMIC_WRITE
o F2FS_IOC_COMMIT_ATOMIC_WRITE

The database engine should be aware of the following sequence.
1. open
-> ioctl(F2FS_IOC_START_ATOMIC_WRITE);
2. writes
: all the written data will be treated as atomic pages.
3. commit
-> ioctl(F2FS_IOC_COMMIT_ATOMIC_WRITE);
: this flushes all the data blocks to the disk, which will be shown all or
nothing by f2fs recovery procedure.
4. repeat to #2.

The IO pattens should be:

,- START_ATOMIC_WRITE ,- COMMIT_ATOMIC_WRITE
CP | D D D D D D | FSYNC | D D D D | FSYNC ...
`- COMMIT_ATOMIC_WRITE

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2014-10-07 08:39:50 +0800

06 Oct, 2014

1 commit

120c2cba1 f2fs: remove unused return value ... Browse Code »

Don't return any value without any usage.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2014-10-06 12:05:15 +0800

01 Oct, 2014

7 commits

52656e6cf f2fs: clean up f2fs_ioctl functions ... Browse Code »

This patch cleans up f2fs_ioctl functions for better readability.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2014-10-01 06:34:56 +0800
8a21984d5 f2fs: potential shift wrapping buf in f2fs_trim_fs() ... Browse Code »

My static checker complains that segment is a u64 but only the lower 31
bits can be used before we hit a shift wrapping bug.

Signed-off-by: Dan Carpenter
Signed-off-by: Jaegeuk Kim

Dan Carpenter
2014-10-01 06:34:56 +0800
44c161565 f2fs: call f2fs_unlock_op after error was handled ... Browse Code »

This patch relocates f2fs_unlock_op in every directory operations to be called
after any error was processed.
Otherwise, the checkpoint can be entered with valid node ids without its
dentry when -ENOSPC is occurred.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2014-10-01 06:34:55 +0800
7cd8558ba f2fs: check the use of macros on block counts and addresses ... Browse Code »

This patch cleans up the existing and new macros for readability.

Rule is like this.

,-----------------------------------------> MAX_BLKADDR -,
| ,------------- TOTAL_BLKS ----------------------------,
| | |
| ,- seg0_blkaddr ,----- sit/nat/ssa/main blkaddress |
block | | (SEG0_BLKADDR) | | | | (e.g., MAIN_BLKADDR) |
address 0..x................ a b c d .............................
| |
global seg# 0...................... m .............................
| | |
| `------- MAIN_SEGS -----------'
`-------------- TOTAL_SEGS ---------------------------'
| |
seg# 0..........xx..................

= Note =
o GET_SEGNO_FROM_SEG0 : blk address -> global segno
o GET_SEGNO : blk address -> segno
o START_BLOCK : segno -> starting block address

Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2014-10-01 06:34:47 +0800
309cc2b6e f2fs: refactor flush_nat_entries to remove costly reorganizing ops ... Browse Code »

Previously, f2fs tries to reorganize the dirty nat entries into multiple sets
according to its nid ranges. This can improve the flushing nat pages, however,
if there are a lot of cached nat entries, it becomes a bottleneck.

This patch introduces a new set management flow by removing dirty nat list and
adding a series of set operations when the nat entry becomes dirty.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2014-10-01 06:30:41 +0800
4b2fecc84 f2fs: introduce FITRIM in f2fs_ioctl ... Browse Code »

This patch introduces FITRIM in f2fs_ioctl.
In this case, f2fs will issue small discards and prefree discards as many as
possible for the given area.

Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2014-10-01 06:06:09 +0800
75ab4cb83 f2fs: introduce cp_control structure ... Browse Code »

This patch add a new data structure to control checkpoint parameters.
Currently, it presents the reason of checkpoint such as is_umount and normal
sync.

Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2014-10-01 06:01:28 +0800

24 Sep, 2014

15 commits

95dd89730 f2fs: use more free segments until SSR is activated ... Browse Code »

Previously, f2fs activates SSR if the # of free segments reaches to the # of
overprovisioned segments.
In this case, SSR starts to use dirty segments only, so that the overprovisoned
space cannot be selected for new data.
This means that we have no chance to utilizae the overprovisioned space at all.

This patch fixes that by allowing LFS allocations until the # of free segments
reaches to the last threshold, reserved space.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2014-09-24 02:10:24 +0800
9b5f136fd f2fs: change the ipu_policy option to enable combinations ... Browse Code »

This patch changes the ipu_policy setting to use any combination of orthogonal policies.

Signed-off-by: Changman Lee
Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2014-09-24 02:10:24 +0800
210f41bc0 f2fs: fix to search whole dirty segmap when get_victim ... Browse Code »

In ->get_victim we get max_search value from dirty_i->nr_dirty without
protection of seglist_lock, after that, nr_dirty can be increased/decreased
before we hold seglist_lock lock.
Then in main loop we attempt to traverse all dirty section one time to find
victim section, but it's not accurate to use max_search as the total loop count,
because we might lose checking several sections or check sections redundantly
for the case of nr_dirty are increased or decreased previously.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2014-09-24 02:10:23 +0800
26666c8a4 f2fs: fix to clean previous mount option when remount_fs ... Browse Code »

In manual of mount, we descript remount as below:

"mount -o remount,rw /dev/foo /dir
After this call all old mount options are replaced and arbitrary stuff from
fstab is ignored, except the loop= option which is internally generated and
maintained by the mount command."

Previously f2fs do not clear up old mount options when remount_fs, so we have no
chance of disabling previous option (e.g. flush_merge). Fix it.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2014-09-24 02:10:22 +0800
14cecc5cd f2fs: skip punching hole in special condition ... Browse Code »

Now punching hole in directory is not supported in f2fs, so let's limit file
type in punch_hole().

In addition, in punch_hole if offset is exceed file size, we should skip
punching hole.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2014-09-24 02:10:21 +0800
55cf9cb63 f2fs: support large sector size ... Browse Code »

Block size in f2fs is 4096 bytes, so theoretically, f2fs can support 4096 bytes
sector device at maximum. But now f2fs only support 512 bytes size sector, so
block device such as zRAM which uses page cache as its block storage space will
not be mounted successfully as mismatch between sector size of zRAM and sector
size of f2fs supported.

In this patch we support large sector size in f2fs, so block device with sector
size of 512/1024/2048/4096 bytes can be supported in f2fs.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2014-09-24 02:10:20 +0800
09db6a2ef f2fs: fix to truncate blocks past EOF in ->setattr ... Browse Code »

By using FALLOC_FL_KEEP_SIZE in ->fallocate of f2fs, we can fallocate block past
EOF without changing i_size of inode. These blocks past EOF will not be
truncated in ->setattr as we truncate them only when change the file size.

We should give a chance to truncate blocks out of filesize in setattr().

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2014-09-24 02:10:20 +0800
976e4c50a f2fs: update i_size when __allocate_data_block ... Browse Code »

The f2fs_direct_IO uses __allocate_data_block, but inside the allocation path,
we should update i_size at the changed time to update its inode page.
Otherwise, we can get wrong i_size after roll-forward recovery.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2014-09-24 02:10:19 +0800
90a893c74 f2fs: use MAX_BIO_BLOCKS(sbi) ... Browse Code »

This patch cleans up a simple macro.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2014-09-24 02:10:18 +0800
c52e1b10b f2fs: remove redundant operation during roll-forward recovery ... Browse Code »

If same data is updated multiple times, we don't need to redo whole the
operations.
Let's just update the lastest one.

Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2014-09-24 02:10:17 +0800
19c9c466e f2fs: do not skip latest inode information ... Browse Code »

In f2fs_sync_file, if there is no written appended writes, it skips
to write its node blocks.
But, if there is up-to-date inode page, we should write it to update
its metadata during the roll-forward recovery.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2014-09-24 02:10:16 +0800
441ac5cb3 f2fs: fix roll-forward missing scenarios ... Browse Code »

We can summarize the roll forward recovery scenarios as follows.

[Term] F: fsync_mark, D: dentry_mark

1. inode(x) | CP | inode(x) | dnode(F)
-> Update the latest inode(x).

2. inode(x) | CP | inode(F) | dnode(F)
-> No problem.

3. inode(x) | CP | dnode(F) | inode(x)
-> Recover to the latest dnode(F), and drop the last inode(x)

4. inode(x) | CP | dnode(F) | inode(F)
-> No problem.

5. CP | inode(x) | dnode(F)
-> The inode(DF) was missing. Should drop this dnode(F).

6. CP | inode(DF) | dnode(F)
-> No problem.

7. CP | dnode(F) | inode(DF)
-> If f2fs_iget fails, then goto next to find inode(DF).

8. CP | dnode(F) | inode(x)
-> If f2fs_iget fails, then goto next to find inode(DF).
But it will fail due to no inode(DF).

So, this patch adds some missing points such as #1, #5, #7, and #8.

Signed-off-by: Huang Ying
Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2014-09-24 02:10:16 +0800
88bd02c94 f2fs: fix conditions to remain recovery information in f2fs_sync_file ... Browse Code »

This patch revisited whole the recovery information during the f2fs_sync_file.

In this patch, there are three information to make a decision.

a) IS_CHECKPOINTED, /* is it checkpointed before? */
b) HAS_FSYNCED_INODE, /* is the inode fsynced before? */
c) HAS_LAST_FSYNC, /* has the latest node fsync mark? */

And, the scenarios for our rule are based on:

[Term] F: fsync_mark, D: dentry_mark

1. inode(x) | CP | inode(x) | dnode(F)
2. inode(x) | CP | inode(F) | dnode(F)
3. inode(x) | CP | dnode(F) | inode(x) | inode(F)
4. inode(x) | CP | dnode(F) | inode(F)
5. CP | inode(x) | dnode(F) | inode(DF)
6. CP | inode(DF) | dnode(F)
7. CP | dnode(F) | inode(DF)
8. CP | dnode(F) | inode(x) | inode(DF)

For example, #3, the three conditions should be changed as follows.

inode(x) | CP | dnode(F) | inode(x) | inode(F)
a) x o o o o
b) x x x x o
c) x o o x o

If f2fs_sync_file stops ------^,
it should write inode(F) --------------^

So, the need_inode_block_update should return true, since
c) get_nat_flag(e, HAS_LAST_FSYNC), is false.

For example, #8,
CP | alloc | dnode(F) | inode(x) | inode(DF)
a) o x x x x
b) x x x o
c) o o x o

If f2fs_sync_file stops -------^,
it should write inode(DF) --------------^

Note that, the roll-forward policy should follow this rule, which means,
if there are any missing blocks, we doesn't need to recover that inode.

Signed-off-by: Huang Ying
Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2014-09-24 02:10:15 +0800
7ef35e3b9 f2fs: introduce a flag to represent each nat entry information ... Browse Code »

This patch introduces a flag in the nat entry structure to merge various
information such as checkpointed and fsync_done marks.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2014-09-24 02:10:14 +0800
4c521f493 f2fs: use meta_inode cache to improve roll-forward speed ... Browse Code »

Previously, all the dnode pages should be read during the roll-forward recovery.
Even worsely, whole the chain was traversed twice.
This patch removes that redundant and costly read operations by using page cache
of meta_inode and readahead function as well.

Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2014-09-24 02:10:12 +0800

16 Sep, 2014

5 commits

60979115a f2fs: fix double lock for inode page during roll-foward recovery ... Browse Code »

If the inode is same and its data index are needed to truncate, we can fall into
double lock for its inode page via get_dnode_of_data.

Error case is like this.

1. write data 1, 2, 3, 4, 5 in inode #4.
2. write data 100, 102, 103, 104, 105 in dnode #6 of inode #4.
3. sync
4. update data 100->106 in dnode #6.
5. fsync inode #4.
6. power-cut

-> Then,
1. go back to #3's checkpoint
2. in do_recover_data, get_dnode_of_data() gets inode #4.
3. detect 100->106 in dnode #6.
4. check_index_in_prev_nodes tries to truncate 100 in dnode #6.
5. to trigger truncate_hole, get_dnode_of_data should grab inode #4.
6. detect *kernel hang*

This patch should resolve that bug.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2014-09-16 19:10:47 +0800
c6e489305 f2fs: fix a race condition in next_free_nid ... Browse Code »

The nm_i->fcnt checking is executed before spin_lock, so if another
thread delete the last free_nid from the list, the wrong nid may be
gotten. So fix the race condition by moving the nm_i->fnct checking
into spin_lock.

Signed-off-by: Huang, Ying
Signed-off-by: Jaegeuk Kim

Huang Ying
2014-09-16 19:10:46 +0800
770418238 f2fs: use nm_i->next_scan_nid as default for next_free_nid ... Browse Code »

Now, if there is no free nid in nm_i->free_nid_list, 0 may be saved
into next_free_nid of checkpoint, this may cause useless scanning for
next mount. nm_i->next_scan_nid should be a better default value than
0.

Signed-off-by: Huang, Ying
Signed-off-by: Jaegeuk Kim

Huang Ying
2014-09-16 19:10:45 +0800
c1ce1b02b f2fs: give an option to enable in-place-updates during fsync to users ... Browse Code »

If user wrote F2FS_IPU_FSYNC:4 in /sys/fs/f2fs/ipu_policy, f2fs_sync_file
only starts to try in-place-updates.
And, if the number of dirty pages is over /sys/fs/f2fs/min_fsync_blocks, it
keeps out-of-order manner. Otherwise, it triggers in-place-updates.

This may be used by storage showing very high random write performance.

For example, it can be used when,

Seq. writes (Data) + wait + Seq. writes (Node)

is pretty much slower than,

Rand. writes (Data)

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2014-09-16 19:10:44 +0800
a7ffdbe22 f2fs: expand counting dirty pages in the inode page cache ... Browse Code »

Previously f2fs only counts dirty dentry pages, but there is no reason not to
expand the scope.

This patch changes the names on the management of dirty pages and to count
dirty pages in each inode info as well.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2014-09-16 19:10:39 +0800

11 Sep, 2014

1 commit

2403c155b f2fs: remove lengthy inode->i_ino ... Browse Code »

This patch is to remove lengthy name by adding a new variable.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2014-09-11 08:00:25 +0800

10 Sep, 2014

9 commits

0b4c5afde f2fs: fix negative value for lseek offset ... Browse Code »

If application throws negative value of lseek with SEEK_DATA|SEEK_HOLE,
previous f2fs went into BUG_ON in get_dnode_of_data, which was reported
by Tommi Rantala.

He could make a simple code to detect this having:
lseek(fd, -17595150933902LL, SEEK_DATA);

This patch should resolve that bug.

Reported-by: Tommi Rentala
[Jaegeuk Kim: relocate the condition as suggested by Chao]
Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2014-09-10 05:46:36 +0800
9a01b56b1 f2fs: avoid node page to be written twice in gc_node_segment ... Browse Code »

In gc_node_segment, if node page gc is run concurrently with node page
writeback, and check_valid_map and get_node_page run after page locked
and before cur_valid_map is updated as below, it is possible for the
page to be written twice unnecessarily.

sync_node_pages
try_lock_page
...
check_valid_map f2fs_write_node_page
...
write_node_page
do_write_page
allocate_data_block
...
refresh_sit_entry /* update cur_valid_map */
...
...
unlock_page
get_node_page
...
set_page_dirty
...
f2fs_put_page
unlock_page

This can be solved via calling check_valid_map after get_node_page again.

Signed-off-by: Huang, Ying
Signed-off-by: Jaegeuk Kim

Huang Ying
2014-09-10 04:15:07 +0800
721bd4d5c f2fs: use lock-less list(llist) to simplify the flush cmd management ... Browse Code »

We use flush cmd control to collect many flush cmds, and flush them
together. In this case, we use two list to manage the flush cmds
(collect and dispatch), and one spin lock is used to protect this.
In fact, the lock-less list(llist) is very suitable to this case,
and we use simplify this routine.

-
v2:
-use llist_for_each_entry_safe to fix possible use-after-free issue.
-remove the unused field from struct flush_cmd.
Thanks for Yu's suggestion.
-

Signed-off-by: Gu Zheng
Signed-off-by: Jaegeuk Kim

Gu Zheng
2014-09-10 04:15:06 +0800
184a5cd2c f2fs: refactor flush_sit_entries codes for reducing SIT writes ... Browse Code »

In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:

"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."

Actually, we have the same problem in using SIT journal area.

In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.

In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.

In my testing environment, it shows this patch can help to reduce SIT block
update obviously.

virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070

Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2014-09-10 04:15:05 +0800
d3a14afd5 f2fs: remove unneeded sit_i in macro SIT_BLOCK_OFFSET/START_SEGNO ... Browse Code »

sit_i in macro SIT_BLOCK_OFFSET/START_SEGNO is not used, remove it.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2014-09-10 04:15:05 +0800
b0c44f05a f2fs: need fsck.f2fs if the recovery was failed ... Browse Code »

If the roll-forward recovery was failed, we'd better conduct fsck.f2fs.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2014-09-10 04:15:04 +0800
ec325b527 f2fs: handle bug cases by letting fsck.f2fs initiate ... Browse Code »

This patch adds to handle corner buggy cases for fsck.f2fs.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2014-09-10 04:15:03 +0800
05796763b f2fs: add BUG cases to initiate fsck.f2fs ... Browse Code »

This patch replaces BUG cases with f2fs_bug_on to remain fsck.f2fs information.
And it implements some void functions to initiate fsck.f2fs too.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2014-09-10 04:15:03 +0800
9850cf4a8 f2fs: need fsck.f2fs when f2fs_bug_on is triggered ... Browse Code »

If any f2fs_bug_on is triggered, fsck.f2fs is needed.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2014-09-10 04:15:02 +0800