14 Jan, 2016
1 commit
-
Pull f2fs updates from Jaegeuk Kim:
"This series adds two ioctls to control cached data and fragmented
files. Most of the rest fixes missing error cases and bugs that we
have not covered so far. Summary:Enhancements:
- support an ioctl to execute online file defragmentation
- support an ioctl to flush cached data
- speed up shrinking of extent_cache entries
- handle broken superblock
- refector dirty inode management infra
- revisit f2fs_map_blocks to handle more cases
- reduce global lock coverage
- add detecting user's idle timeMajor bug fixes:
- fix data race condition on cached nat entries
- fix error cases of volatile and atomic writes"* tag 'for-f2fs-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (87 commits)
f2fs: should unset atomic flag after successful commit
f2fs: fix wrong memory condition check
f2fs: monitor the number of background checkpoint
f2fs: detect idle time depending on user behavior
f2fs: introduce time and interval facility
f2fs: skip releasing nodes in chindless extent tree
f2fs: use atomic type for node count in extent tree
f2fs: recognize encrypted data in f2fs_fiemap
f2fs: clean up f2fs_balance_fs
f2fs: remove redundant calls
f2fs: avoid unnecessary f2fs_balance_fs calls
f2fs: check the page status filled from disk
f2fs: introduce __get_node_page to reuse common code
f2fs: check node id earily when readaheading node page
f2fs: read isize while holding i_mutex in fiemap
Revert "f2fs: check the node block address of newly allocated nid"
f2fs: cover more area with nat_tree_lock
f2fs: introduce max_file_blocks in sbi
f2fs crypto: check CONFIG_F2FS_FS_XATTR for encrypted symlink
f2fs: introduce zombie list for fast shrinking extent trees
...
09 Jan, 2016
2 commits
-
This patch adds one parameter to clean up all the callers of f2fs_balance_fs.
Signed-off-by: Jaegeuk Kim
-
Only when node page is newly dirtied, it needs to check whether we need to do
f2fs_gc.Signed-off-by: Jaegeuk Kim
31 Dec, 2015
1 commit
-
Otherwise, we can get mismatched largest extent information.
One example is:
1. mount f2fs w/ extent_cache
2. make a small extent
3. umount
4. mount f2fs w/o extent_cache
5. update the largest extent
6. umount
7. mount f2fs w/ extent_cache
8. get the old extent made by #2Signed-off-by: Jaegeuk Kim
17 Dec, 2015
1 commit
-
Maintain regular/symlink inode which has dirty pages in global dirty list
and record their total dirty pages count like the way of handling directory
inode.Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim
16 Dec, 2015
1 commit
-
remove_dirty_dir_inode will be renamed to remove_dirty_inode as a generic
function in following patch for removing directory/regular/symlink inode
in global dirty list.Here rename ino management related functions for readability, also in
order to avoid name conflict.Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim
09 Dec, 2015
1 commit
-
kmap() in page_follow_link_light() needed to go - allowing to hold
an arbitrary number of kmaps for long is a great way to deadlocking
the system.new helper (inode_nohighmem(inode)) needs to be used for pagecache
symlinks inodes; done for all in-tree cases. page_follow_link_light()
instrumented to yell about anything missed.Signed-off-by: Al Viro
10 Oct, 2015
1 commit
-
As comment says, we don't need to call f2fs_lock_op in write_inode to prevent
from producing dirty node pages all the time.
That happens only when there is not enough free sections and we can avoid that
by calling balance_fs in prior to that.Signed-off-by: Jaegeuk Kim
25 Aug, 2015
2 commits
-
In following call stack, if unfortunately we lose all chances to truncate
inode page in remove_inode_page, eventually we will add the nid allocated
previously into free nid cache, this nid is with NID_NEW status and with
NEW_ADDR in its blkaddr pointer:- f2fs_create
- f2fs_add_link
- __f2fs_add_link
- init_inode_metadata
- new_inode_page
- new_node_page
- set_node_addr(, NEW_ADDR)
- f2fs_init_acl failed
- remove_inode_page failed
- handle_failed_inode
- remove_inode_page failed
- iput
- f2fs_evict_inode
- remove_inode_page failed
- alloc_nid_failed cache a nid with valid blkaddr: NEW_ADDRThis may not only cause resource leak of previous inode, but also may cause
incorrect use of the previous blkaddr which is located in NO.nid node entry
when this nid is reused by others.This patch tries to add this inode to orphan list if we fail to truncate
inode, so that we can obtain a second chance to release it in orphan
recovery flow.Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim -
According to commit 5f16f3225b06 ("ext4: atomically set inode->i_flags in
ext4_set_inode_flags()").Signed-off-by: Zhang Zhen
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim
05 Aug, 2015
5 commits
-
If we clear inline data/dentry flag in handle_failed_inode, we will fail
to decline the stat count of inline data/dentry in f2fs_evict_inode due
to no flag in inode. So remove the wrong clearing.Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim -
In handle_failed_inode, there is a potential deadlock which can happen
in below call path:- f2fs_create
- f2fs_lock_op down_read(cp_rwsem)
- f2fs_add_link
- __f2fs_add_link
- init_inode_metadata
- f2fs_init_security failed
- truncate_blocks failed
- handle_failed_inode
- f2fs_truncate
- truncate_blocks(..,true)
- write_checkpoint
- block_operations
- f2fs_lock_all down_write(cp_rwsem)
- f2fs_lock_op down_read(cp_rwsem)So in this path, we pass parameter to f2fs_truncate to make sure
cp_rwsem in truncate_blocks will not be locked again.Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim -
This patch adds to stat the number of inline xattr inode for
showing in debugfs.Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim -
We don't need to handle the duplicate extent information.
The integrated rule is:
- update on-disk extent with largest one tracked by in-memory extent_cache
- destroy extent_tree for the truncation case
- drop per-inode extent_cache by shrinkerReviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim -
Before iput is called, the inode number used by a bad inode can be reassigned
to other new inode, resulting in any abnormal behaviors on the new inode.
This should not happen for the new inode.Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim
02 Jun, 2015
1 commit
-
This patch applies the following ext4 patch:
ext4 crypto: use per-inode tfm structure
As suggested by Herbert Xu, we shouldn't allocate a new tfm each time
we read or write a page. Instead we can use a single tfm hanging off
the inode's crypt_info structure for all of our encryption needs for
that inode, since the tfm can be used by multiple crypto requests in
parallel.Also use cmpxchg() to avoid races that could result in crypt_info
structure getting doubly allocated or doubly freed.Signed-off-by: Theodore Ts'o
Signed-off-by: Jaegeuk Kim
29 May, 2015
2 commits
-
This patch implements encryption support for symlink.
Signed-off-by: Uday Savagaonkar
Signed-off-by: Theodore Ts'o
Signed-off-by: Jaegeuk Kim -
This patch activates the following APIs for encryption support.
The rules quoted by ext4 are:
- An unencrypted directory may contain encrypted or unencrypted files
or directories.
- All files or directories in a directory must be protected using the
same key as their containing directory.
- Encrypted inode for regular file should not have inline_data.
- Encrypted symlink and directory may have inline_data and inline_dentry.This patch activates the following APIs.
1. f2fs_link : validate context
2. f2fs_lookup : ''
3. f2fs_rename : ''
4. f2fs_create/f2fs_mkdir : inherit its dir's context
5. f2fs_direct_IO : do buffered io for regular files
6. f2fs_open : check encryption info
7. f2fs_file_mmap : ''
8. f2fs_setattr : ''
9. f2fs_file_write_iter : '' (Called by sys_io_submit)
10. f2fs_fallocate : do not support fcollapse
11. f2fs_evict_inode : free_encryption_infoSigned-off-by: Michael Halcrow
Signed-off-by: Theodore Ts'o
Signed-off-by: Jaegeuk Kim
11 Apr, 2015
4 commits
-
This patch fixes the below warning.
sparse warnings: (new ones prefixed by >>)
>> fs/f2fs/inode.c:56:23: sparse: restricted __le32 degrades to integer
>> fs/f2fs/inode.c:56:52: sparse: restricted __le32 degrades to integerReported-by: kbuild test robot
Signed-off-by: Jaegeuk Kim -
This patch tries to preserve last extent info in extent tree cache into on-disk
inode, so this can help us to reuse the last extent info next time for
performance.Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim -
With normal extent info cache, we records largest extent mapping between logical
block and physical block into extent info, and we persist extent info in on-disk
inode.When we enable extent tree cache, if extent info of on-disk inode is exist, and
the extent is not a small fragmented mapping extent. We'd better to load the
extent info into extent tree cache when inode is loaded. By this way we can have
more chance to hit extent tree cache rather than taking more time to read dnode
page for block address.Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim -
This patch is to avoid some punch_hole overhead when releasing volatile data.
If volatile data was not written yet, we just can make the first page as zero.Signed-off-by: Jaegeuk Kim
04 Mar, 2015
2 commits
-
This patch enables rb-tree based extent cache in f2fs.
When we mount with "-o extent_cache", f2fs will try to add recently accessed
page-block mappings into rb-tree based extent cache as much as possible, instead
of original one extent info cache.By this way, f2fs can support more effective cache between dnode page cache and
disk. It will supply high hit ratio in the cache with fewer memory when dnode
page cache are reclaimed in environment of low memory.Storage: Sandisk sd card 64g
1.append write file (offset: 0, size: 128M);
2.override write file (offset: 2M, size: 1M);
3.override write file (offset: 4M, size: 1M);
...
4.override write file (offset: 48M, size: 1M);
...
5.override write file (offset: 112M, size: 1M);
6.sync
7.echo 3 > /proc/sys/vm/drop_caches
8.read file (size:128M, unit: 4k, count: 32768)
(time dd if=/mnt/f2fs/128m bs=4k count=32768)Extent Hit Ratio:
before patched
Hit Ratio 121 / 1071 1071 / 1071Performance:
before patched
real 0m37.051s 0m35.556s
user 0m0.040s 0m0.026s
sys 0m2.990s 0m2.251sMemory Cost:
before patched
Tree Count: 0 1 (size: 24 bytes)
Node Count: 0 45 (size: 1440 bytes)v3:
o retest and given more details of test result.Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim -
Move ext_lock out of struct extent_info, then in the following patches we can
use variables with struct extent_info type as a parameter to pass pure data.Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim
10 Jan, 2015
2 commits
-
We use kzalloc to allocate memory in __recover_inline_status, and use this
all-zero memory to check the inline date content of inode page by comparing
them. This is low effective and not needed, let's check inline date content
directly.Signed-off-by: Chao Yu
[Jaegeuk Kim: make the code more neat]
Signed-off-by: Jaegeuk Kim -
This patch adds two new ioctls to release inmemory pages grabbed by atomic
writes.
o f2fs_ioc_abort_volatile_write
- If transaction was failed, all the grabbed pages and data should be written.
o f2fs_ioc_release_volatile_write
- This is to enhance the performance of PERSIST mode in sqlite.In order to avoid huge memory consumption which causes OOM, this patch changes
volatile writes to use normal dirty pages, instead blocked flushing to the disk
as long as system does not suffer from memory pressure.Signed-off-by: Jaegeuk Kim
09 Dec, 2014
1 commit
-
In do_read_inode, if we failed __recover_inline_status, the inode has inline
flag without increasing its count.
Later, f2fs_evict_inode will decrease the count, which causes -1.Signed-off-by: Jaegeuk Kim
05 Nov, 2014
1 commit
-
This patch simplifies the inline_data usage with the following rule.
1. inline_data is set during the file creation.
2. If new data is requested to be written ranges out of inline_data,
f2fs converts that inode permanently.
3. There is no cases which converts non-inline_data inode to inline_data.
4. The inline_data flag should be changed under inode page lock.Signed-off-by: Jaegeuk Kim
04 Nov, 2014
3 commits
-
This patch fixes wrongly counting inline_data inode numbers.
Signed-off-by: Jaegeuk Kim
-
This patch adds status information for inline_dentry inodes.
Signed-off-by: Jaegeuk Kim
-
This patch fixes to use highmem for directory pages.
Signed-off-by: Jaegeuk Kim
08 Oct, 2014
1 commit
-
This patch adds support for volatile writes which keep data pages in memory
until f2fs_evict_inode is called by iput.For instance, we can use this feature for the sqlite database as follows.
While supporting atomic writes for main database file, we can keep its journal
data temporarily in the page cache by the following sequence.1. open
-> ioctl(F2FS_IOC_START_VOLATILE_WRITE);
2. writes
: keep all the data in the page cache.
3. flush to the database file with atomic writes
a. ioctl(F2FS_IOC_START_ATOMIC_WRITE);
b. writes
c. ioctl(F2FS_IOC_COMMIT_ATOMIC_WRITE);
4. close
-> drop the cached dataSigned-off-by: Jaegeuk Kim
07 Oct, 2014
1 commit
-
This patch introduces a very limited functionality for atomic write support.
In order to support atomic write, this patch adds two ioctls:
o F2FS_IOC_START_ATOMIC_WRITE
o F2FS_IOC_COMMIT_ATOMIC_WRITEThe database engine should be aware of the following sequence.
1. open
-> ioctl(F2FS_IOC_START_ATOMIC_WRITE);
2. writes
: all the written data will be treated as atomic pages.
3. commit
-> ioctl(F2FS_IOC_COMMIT_ATOMIC_WRITE);
: this flushes all the data blocks to the disk, which will be shown all or
nothing by f2fs recovery procedure.
4. repeat to #2.The IO pattens should be:
,- START_ATOMIC_WRITE ,- COMMIT_ATOMIC_WRITE
CP | D D D D D D | FSYNC | D D D D | FSYNC ...
`- COMMIT_ATOMIC_WRITESigned-off-by: Jaegeuk Kim
01 Oct, 2014
1 commit
-
This patch relocates f2fs_unlock_op in every directory operations to be called
after any error was processed.
Otherwise, the checkpoint can be entered with valid node ids without its
dentry when -ENOSPC is occurred.Signed-off-by: Jaegeuk Kim
16 Sep, 2014
1 commit
-
Previously f2fs only counts dirty dentry pages, but there is no reason not to
expand the scope.This patch changes the names on the management of dirty pages and to count
dirty pages in each inode info as well.Signed-off-by: Jaegeuk Kim
10 Sep, 2014
1 commit
-
If any f2fs_bug_on is triggered, fsck.f2fs is needed.
Signed-off-by: Jaegeuk Kim
04 Sep, 2014
1 commit
-
This patch adds three inline functions to clean up dirty casting codes.
Signed-off-by: Jaegeuk Kim
05 Aug, 2014
1 commit
-
When inode is evicted, all the page cache belong to this inode should be
released including the xattr node page. But previously we didn't do this, this
patch fixed this issue.v2:
o reposition invalidate_mapping_pages() to the right place suggested by
Jaegeuk Kim.Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim
29 Jul, 2014
1 commit
-
This patch introduces a inode number list in which represents inodes having
appended data writes or updated data writes after last checkpoint.
This will be used at fsync to determine whether the recovery information
should be written or not.Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim
25 Jul, 2014
1 commit
-
Andrey Tsyvarev reported:
"Using memory error detector reveals the following use-after-free error
in 3.15.0:AddressSanitizer: heap-use-after-free in f2fs_evict_inode
Read of size 8 by thread T22279:
[] f2fs_evict_inode+0x102/0x2e0 [f2fs]
[] evict+0x15f/0x290
[< inlined >] iput+0x196/0x280 iput_final
[] iput+0x196/0x280
[] f2fs_put_super+0xd6/0x170 [f2fs]
[] generic_shutdown_super+0xc5/0x1b0
[] kill_block_super+0x4d/0xb0
[] deactivate_locked_super+0x66/0x80
[] deactivate_super+0x68/0x80
[] mntput_no_expire+0x198/0x250
[< inlined >] SyS_umount+0xe9/0x1a0 SYSC_umount
[] SyS_umount+0xe9/0x1a0
[] system_call_fastpath+0x16/0x1bFreed by thread T3:
[] f2fs_i_callback+0x27/0x30 [f2fs]
[< inlined >] rcu_process_callbacks+0x2d6/0x930 __rcu_reclaim
[< inlined >] rcu_process_callbacks+0x2d6/0x930 rcu_do_batch
[< inlined >] rcu_process_callbacks+0x2d6/0x930 invoke_rcu_callbacks
[< inlined >] rcu_process_callbacks+0x2d6/0x930 __rcu_process_callbacks
[] rcu_process_callbacks+0x2d6/0x930
[] __do_softirq+0x142/0x380
[] run_ksoftirqd+0x30/0x50
[] smpboot_thread_fn+0x197/0x280
[] kthread+0x148/0x160
[] ret_from_fork+0x7c/0xb0Allocated by thread T22276:
[] f2fs_alloc_inode+0x2d/0x170 [f2fs]
[] iget_locked+0x10a/0x230
[] f2fs_iget+0x35/0xa80 [f2fs]
[] f2fs_fill_super+0xb53/0xff0 [f2fs]
[] mount_bdev+0x1de/0x240
[] f2fs_mount+0x10/0x20 [f2fs]
[] mount_fs+0x55/0x220
[] vfs_kern_mount+0x66/0x200
[< inlined >] do_mount+0x2b4/0x1120 do_new_mount
[] do_mount+0x2b4/0x1120
[< inlined >] SyS_mount+0xb2/0x110 SYSC_mount
[] SyS_mount+0xb2/0x110
[] system_call_fastpath+0x16/0x1bThe buggy address ffff8800587866c8 is located 48 bytes inside
of 680-byte region [ffff880058786698, ffff880058786940)Memory state around the buggy address:
ffff880058786100: ffffffff ffffffff ffffffff ffffffff
ffff880058786200: ffffffff ffffffff ffffffrr rrrrrrrr
ffff880058786300: rrrrrrrr rrffffff ffffffff ffffffff
ffff880058786400: ffffffff ffffffff ffffffff ffffffff
ffff880058786500: ffffffff ffffffff ffffffff fffffffr
>ffff880058786600: rrrrrrrr rrrrrrrr rrrfffff ffffffff
^
ffff880058786700: ffffffff ffffffff ffffffff ffffffff
ffff880058786800: ffffffff ffffffff ffffffff ffffffff
ffff880058786900: ffffffff rrrrrrrr rrrrrrrr rrrr....
ffff880058786a00: ........ ........ ........ ........
ffff880058786b00: ........ ........ ........ ........
Legend:
f - 8 freed bytes
r - 8 redzone bytes
. - 8 allocated bytes
x=1..7 - x allocated bytes + (8-x) redzone bytesInvestigation shows, that f2fs_evict_inode, when called for
'meta_inode', uses invalidate_mapping_pages() for 'node_inode'.
But 'node_inode' is deleted before 'meta_inode' in f2fs_put_super via
iput().It seems that in common usage scenario this use-after-free is benign,
because 'node_inode' remains partially valid data even after
kmem_cache_free().
But things may change if, while 'meta_inode' is evicted in one f2fs
filesystem, another (mounted) f2fs filesystem requests inode from cache,
and formely
'node_inode' of the first filesystem is returned."Nids for both meta_inode and node_inode are reservation, so it's not necessary
for us to invalidate pages which will never be allocated.
To fix this issue, let's skipping needlessly invalidating pages for
{meta,node}_inode in f2fs_evict_inode.Reported-by: Andrey Tsyvarev
Tested-by: Andrey Tsyvarev
Signed-off-by: Gu Zheng
Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim