Eric Lee / smarc-fsl-linux-kernel

22 Aug, 2015

6 commits

029e13cc3 f2fs: adjust showing of extent cache stat ... Browse Code »

This patch alters to replace total hit stat with rbtree hit stat,
and then adjust showing of extent cache stat:

Hit Count:
L1-1: for largest node hit count;
L1-2: for last cached node hit count;
L2: for extent node hit after lookuping in rbtree.

Hit Ratio:
ratio (hit count / total lookup count)

Inner Struct Count:
tree count, node count.

Before:
Extent Hit Ratio: 0 / 2

Extent Tree Count: 3

Extent Node Count: 2

Patched:
Exten Cacache:
- Hit Count: L1-1:4871 L1-2:2074 L2:208
- Hit Ratio: 1% (7153 / 550751)
- Inner Struct Count: tree: 26560, node: 11824

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2015-08-22 13:45:16 +0800
91c481fff f2fs: add largest/cached stat in extent cache ... Browse Code »

This patch adds to stat the hit count of largest/cached node for showing
in debugfs.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2015-08-22 13:45:15 +0800
e2b4e2bc8 f2fs: fix incorrect mapping for bmap ... Browse Code »

The test step is like below:
1. touch file
2. truncate -s $((1024*1024)) file
3. fallocate -o 0 -l $((1024*1024)) file
4. fibmap.f2fs file

Our result of fibmap.f2fs showed below is not correct:

file_pos start_blk end_blk blks
0 -937166132 -937166132 1
4096 -937166132 -937166132 1
8192 -937166132 -937166132 1
12288 -937166132 -937166132 1
16384 -937166132 -937166132 1
20480 -937166132 -937166132 1
...
1040384 -937166132 -937166132 1
1044480 -937166132 -937166132 1

This is because f2fs_map_blocks will return with no error when meeting
a hole or preallocated block, the caller __get_data_block will map the
uninitialized variable value to bh->b_blocknr.

Unfortunately generic_block_bmap will neither check the return value of
get_data() nor check mapping info of buffer_head, result in returning
the random block address.

After fixing the issue, our result shows correctly:

file_pos start_blk end_blk blks
0 0 0 256

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2015-08-22 13:45:14 +0800
c031f6a90 f2fs: add annotation for space utilization of regular/inline dentry ... Browse Code »

Add annotation to let us know more clearly about space utilization
information of regular dentry and inline dentry.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2015-08-22 13:45:13 +0800
f8b703da2 f2fs: fix to update cached_en of extent tree properly ... Browse Code »

In f2fs_lookup_extent_tree, et->cached_en was read and updated with only
read lock held,
it could cause __lookup_extent_tree within return entirely wrong
extent_node, if other
thread update et->cached_en just before __lookup_extent_tree return.

However, there are two things about this patch that need to be noticed:
1. It does no good to arrange the order of concurrent read/write, the result
would still
be random in such case.
2. It's built on this assumption: the mix up of reads and writes on a single
pointer would
not make the pointer partially wrong at any time. Please let me know if I'm
wrong, thx.

Signed-off-by: Fan li
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Fan Li
2015-08-22 13:45:06 +0800
217940d4f f2fs: fix typo ... Browse Code »

Fix typo.

Signed-off-by: Junesung Lee
Signed-off-by: Jaegeuk Kim

Junesung Lee
2015-08-22 13:43:32 +0800

21 Aug, 2015

12 commits

24928634f f2fs: check the node block address of newly allocated nid ... Browse Code »

This patch adds a routine which checks the block address of newly allocated nid.
If an nid has already allocated by other thread due to subtle data races, it
will result in filesystem corruption.
So, it needs to check whether its block address was already allocated or not
in prior to nid allocation as the last chance.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2015-08-21 00:00:14 +0800
a21c20f0c f2fs: go out for insert_inode_locked failure ... Browse Code »

We should not call unlock_new_inode when insert_inode_locked failed.

Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2015-08-21 00:00:13 +0800
5ee5293c3 f2fs: retry gc if one section is not successfully reclaimed ... Browse Code »

If FG_GC failed to reclaim one section, let's retry with another section
from the start, since we can get anoterh good candidate.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2015-08-21 00:00:12 +0800
2286c0205 f2fs: fix to cover lock_op for update_inode_page ... Browse Code »

Previously, update_inode_page is not called under f2fs_lock_op.
Instead we should call with f2fs_write_inode.

Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2015-08-21 00:00:11 +0800
268344664 f2fs: reuse nids more aggressively ... Browse Code »

If we can reuse nids as many as possible, we can mitigate producing obsolete
node pages in the page cache.

Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2015-08-21 00:00:11 +0800
26d585997 f2fs: avoid garbage collecting already moved node blocks ... Browse Code »

If node blocks were already moved, we don't need to move them again.

Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2015-08-21 00:00:10 +0800
740432f83 f2fs: handle failed bio allocation ... Browse Code »

As the below comment of bio_alloc_bioset, f2fs can allocate multiple bios at the
same time. So, we can't guarantee that bio is allocated all the time.

"
* When @bs is not NULL, if %__GFP_WAIT is set then bio_alloc will always be
* able to allocate a bio. This is due to the mempool guarantees. To make this
* work, callers must never allocate more than 1 bio at a time from this pool.
* Callers that need to allocate more than 1 bio must always submit the
* previously allocated bio for IO before attempting to allocate a new one.
* Failure to do so can cause deadlocks under memory pressure.
"

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2015-08-21 00:00:09 +0800
a6db67f06 f2fs: increase the number of max hard links ... Browse Code »

This patch increases the number of maximum hard links for one file.

Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2015-08-21 00:00:08 +0800
798c1b16d f2fs: skip checkpoint if there is no dirty and prefree segments ... Browse Code »

We should avoid needless checkpoints when there is no dirty and prefree segment.

Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2015-08-21 00:00:07 +0800
31696580b f2fs: shrink free_nids entries ... Browse Code »

This patch introduces __count_free_nids/try_to_free_nids and registers
them in slab shrinker for shrinking under memory pressure.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2015-08-21 00:00:06 +0800
206e61be2 f2fs: avoid clear valid page ... Browse Code »

In f2fs_delete_entry, if last dirent is remove from the dentry page,
we will try to punch that page since it has no valid date in it.

But truncate_hole which is used for punching could fail because of
no memory or IO error, if that happened, we'd better skip clearing
this valid dentry page.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2015-08-21 00:00:06 +0800
7b2a246b8 MAINTAINERS: add myself as a dedicated reviewer of f2fs ... Browse Code »

I volunteer to be a dedicated reviewer of f2fs, add my email address in
maintainship entry of f2fs.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2015-08-21 00:00:05 +0800

20 Aug, 2015

1 commit

315df8398 f2fs: do not write any node pages related to orphan inodes ... Browse Code »

We should not write node pages when deleting orphan inodes.
In order to do that, we can eaisly set POR_DOING flag earlier before entering
orphan inode routine.

Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2015-08-20 23:59:42 +0800

15 Aug, 2015

3 commits

4c278394b f2fs: avoid a build warning ... Browse Code »

If F2FS_CHECK_FS is turned off, we can get a build warning for unused variable.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2015-08-15 07:02:15 +0800
8c14bfade f2fs: handle error of f2fs_iget correctly ... Browse Code »

In recover_orphan_inode, whenever f2fs_iget fail, we will make kernel panic,
but it's not reasonable, because f2fs_iget can fail due to a lot of reasons
including out of memory.

So we change error handling method as below:
a) when finding no entry for the orphan inode, bug_on for catching bugs;
b) for other reasons, report it to caller.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2015-08-15 07:02:14 +0800
47e70ca46 f2fs: do not assign a new segment for dio under space shortage ... Browse Code »

If there is not enough free segment, we should not assign a new segment
explicitly. Otherwise, we can run out of free segment.

Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2015-08-15 07:02:13 +0800

12 Aug, 2015

2 commits

decd36b6c f2fs: remove inmem radix tree ... Browse Code »

Previously, we use radix tree to index all registered page entries for
atomic file, but now we only use radix tree to see whether current page
is indexed or not, since the other user of radix tree is gone in commit
042b7816aaeb ("f2fs: remove unnecessary call to invalidate inmemory pages").

So in this patch, we try to use one more efficient way:
Introducing a macro ATOMIC_WRITTEN_PAGE, and setting it as page private
value to indicate page indexing status. By using this way, we can save
memory and lookup time.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2015-08-12 02:31:14 +0800
c15e8599f f2fs: report EINVAL for unalignment direct IO ... Browse Code »

We run ltp testcase with f2fs and obtain a TFAIL in diotest4, the result in
detail is as fallow:

dio04

<<>>
tag=dio04 stime=1432278894
cmdline="diotest4"
contacts=""
analysis=exit
<<>>
diotest4 1 TPASS : Negative Offset
diotest4 2 TPASS : removed
diotest4 3 TFAIL : diotest4.c:129: write allows odd count.returns 1: Success
diotest4 4 TFAIL : diotest4.c:183: Odd count of read and write
diotest4 5 TPASS : Read beyond the file size
......

the result of ext4 with same environment:

dio04

<<>>
tag=dio04 stime=1432259643
cmdline="diotest4"
contacts=""
analysis=exit
<<>>
diotest4 1 TPASS : Negative Offset
diotest4 2 TPASS : removed
diotest4 3 TPASS : Odd count of read and write
diotest4 4 TPASS : Read beyond the file size
......

The reason is that when triggering DIO in f2fs, we will return zero value
in ->direct_IO if writer's buffer offset, file offset and transfer size is
not alignment to block size of filesystem, resulting in falling back into
buffered write instead of returning -EINVAL.

This patch fixes that problem by returning correct error number for above
case, and removing the judgement condition in check_direct_IO to make sure
the verification will be enabled for direct reader too.

Besides, Jaegeuk Kim pointed out that there is expectional cases we should
always make direct-io falling back into buffered write, such as dio in
encrypted file.

Signed-off-by: Yunlei He
[Chao Yu make small change and add detail description in commit message]
Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2015-08-12 02:30:24 +0800

11 Aug, 2015

1 commit

6394328ab f2fs: report error of fill_zero ... Browse Code »

fill_zero can fail due to a lot of reason, but previously we do not handle
its return value, so its callers such as punch_hole/f2fs_zero_range may
report success, but actually can fail because of error occurs inside
fill_zero.

This patch fixes to report correct return value of fill_zero.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2015-08-11 03:26:34 +0800

06 Aug, 2015

2 commits

12a8343e9 f2fs: recover invalid/reserved block address for fsynced file ... Browse Code »

When testing with generic/101 in xfstests, error message outputed as below:

--- tests/generic/101.out
+++ results//generic/101.out.bad
@@ -10,10 +10,14 @@
File foo content after log replay:
0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
*
-0200000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
+0200000 bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb
*
0372000
...
(Run 'diff -u tests/generic/101.out results/generic/101.out.bad' to see the entire diff)

The test flow is like below:
1. pwrite foo -S 0xaa 0 64K
2. pwrite foo -S 0xbb 64K 61K
3. sync
4. truncate foo 64K
5. truncate foo 125K
6. fsync foo
7. flakey drop writes
8. umount

After this test, we expect the data of recovered file will have the first
64k of data filling with value 0xaa and the next 61k of data filling with
value 0x00 because we have fsynced it before dropping writes in dm.

In f2fs, during recovering, we will only recover the valid block address
in direct node page if it is marked as a fsynced dnode, but block address
which means invalid/reserved (with value NULL_ADDR/NEW_ADDR) will not be
recovered. So, the file recovered shows its incorrect data 0xbb in range of
[61k, 125k].

In this patch, we fix to recover invalid/reserved block during recover flow.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2015-08-06 13:16:41 +0800
759af1c9c f2fs: use extent cache to optimize f2fs_reserve_block ... Browse Code »

In some cases, we only need the block address when we call
f2fs_reserve_block,
other fields of struct dnode_of_data aren't necessary.
We can try extent cache first for such cases in order to speed up the
process.

Signed-off-by: Fan li
Signed-off-by: Jaegeuk Kim

Fan Li
2015-08-06 13:15:42 +0800

05 Aug, 2015

13 commits

e90c2d285 f2fs: invalidate temporary meta page ... Browse Code »

To avoid meeting garbage data in next free node block at the end of warm
node chain when doing recovery, we will try to zero out that invalid block.

If the device is not support discard, our way for zeroing out block is:
grabbing a temporary zeroed page in meta inode, then, issue write request
with this page.

But, we forget to release that temporary page, so our memory usage will
increase without gaining any hit ratio benefit, so it's better to free it
for saving memory.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2015-08-05 23:19:21 +0800
470f00e96 f2fs: fix to release inode page correctly ... Browse Code »

In following call path, we will pass a locked and referenced ipage
pointer to get_new_data_page:
- init_inode_metadata
- make_empty_dir
- get_new_data_page

There are two exit paths in get_new_data_page when error occurs:
1) grab_cache_page fails, ipage will not be released;
2) f2fs_reserve_block fails, ipage will be released in callee.

So, it's not consistent for error handling in get_new_data_page.

For f2fs_reserve_block, it's not very easy to change the rule
of error handling, since it's already complicated.

Here we deside to choose an easy way to fix this issue:
If any error occur in get_new_data_page, we will ensure releasing
ipage in this function.

The same issue is in f2fs_convert_inline_dir, fix that too.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2015-08-05 23:08:23 +0800
7a04f64d4 f2fs: unify f2fs_bug_on when check blocks and segment ... Browse Code »

Replace BUG_ON with f2fs_bug_on to deal with
block and segment validity check failed.

Signed-off-by: Xue Liu
Signed-off-by: Jaegeuk Kim

Liu Xue
2015-08-05 23:08:18 +0800
f3f338caa f2fs: freeze filesystem when fail to update meta page due to IO error ... Browse Code »

In get_meta_page, we guarantee no failure for the returned page,
but sometimes, IO error from device will incur returning an
non-updated page.

Then, we still use this page as updated one, exception could happen
when using this kind of page.

So in this condition, we'd better freeze fs by making fs readonly and
and stop doing checkpoint.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2015-08-05 23:08:17 +0800
5768dcdd7 f2fs: change the timing of f2fs_wait_on_page_writeback ... Browse Code »

some backing devices need pages to be stable during writeback. It doesn't
matter if
the page is completely overwritten or already uptodate, it needs to wait
before write.

Signed-off-by: Fan li
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Fan Li
2015-08-05 23:08:16 +0800
edb27deea f2fs: handle error cases in commit_inmem_pages ... Browse Code »

This patch adds to handle error cases in commit_inmem_pages.
If an error occurs, it stops to write the pages and return the error right
away.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2015-08-05 23:08:15 +0800
a6d494b6d f2fs: fix to build free nids from readaheaded nat pages ... Browse Code »

When there is no enough free nids in free nid cache, we will try to
readahead FREE_NID_PAGES:4 nat pages into page cache of meta_inode,
then, reading nat entries in nat page for adding free nids to free nid
cache.

But when traversing all nat pages we readaheaded in a circulation,
our exit condition is not set right, one more nat page will be scanned
without readaheading, resulting worse read performance.

This patch fixes to read the correct number nat pages to avoid bad
performance.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2015-08-05 23:08:14 +0800
e4e762723 f2fs: fix inline data/dentry stat number leak ... Browse Code »

If we clear inline data/dentry flag in handle_failed_inode, we will fail
to decline the stat count of inline data/dentry in f2fs_evict_inode due
to no flag in inode. So remove the wrong clearing.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2015-08-05 23:08:14 +0800
f4c9c743a f2fs: convert inline data before set atomic/volatile flag ... Browse Code »

In f2fs_ioc_start_{atomic,volatile}_write, if we failed in converting
inline data, we will report error to user, but still remain atomic/volatile
flag in inode, it will impact further writes for this file. Fix it.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2015-08-05 23:08:13 +0800
a5f64b6aa f2fs: fix to wait all atomic written pages writeback ... Browse Code »

This patch fixes the incorrect range (0, LONG_MAX) which is used
in ranged fsync. If we use LONG_MAX as the parameter for indicating
the end of file we want to synchronize, in 32-bits architecture
machine, these datas after 4GB offset may not be persisted in
storage after ->fsync returned.

Here, we alter LONG_MAX to LLONG_MAX to fix this issue.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2015-08-05 23:08:12 +0800
6a2905443 f2fs: skip writing in ->writepages when no dirty pages exist ... Browse Code »

When flushing comes from background, if there is no dirty page in the
mapping of inode, we'd better to skip seeking dirty page from mapping
for writebacking.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2015-08-05 23:08:11 +0800
737f18992 f2fs: optimize f2fs_write_cache_pages ... Browse Code »

The if statement "goto continue_unlock" is exactly the same when
each if condition is true that is depended on the value of both
"step" and "is_cold_data(page)" are 0 or 1. That means when the
value of "step" equals to "is_cold_data(page)", the if condition
is true and the if statement "goto continue_unlock" appears only
once, so it can be optimized to reduce the duplicated code.

Signed-off-by: Tiezhu Yang
Signed-off-by: Jaegeuk Kim

Tiezhu Yang
2015-08-05 23:08:10 +0800
55f57d2c4 f2fs: fix double lock in handle_failed_inode ... Browse Code »

In handle_failed_inode, there is a potential deadlock which can happen
in below call path:

- f2fs_create
- f2fs_lock_op down_read(cp_rwsem)
- f2fs_add_link
- __f2fs_add_link
- init_inode_metadata
- f2fs_init_security failed
- truncate_blocks failed
- handle_failed_inode
- f2fs_truncate
- truncate_blocks(..,true)
- write_checkpoint
- block_operations
- f2fs_lock_all down_write(cp_rwsem)
- f2fs_lock_op down_read(cp_rwsem)

So in this path, we pass parameter to f2fs_truncate to make sure
cp_rwsem in truncate_blocks will not be locked again.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2015-08-05 23:08:09 +0800