Eric Lee / smarc-fsl-linux-kernel

29 Jul, 2016

1 commit

6784725ab Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs updates from Al Viro:
"Assorted cleanups and fixes.

Probably the most interesting part long-term is ->d_init() - that will
have a bunch of followups in (at least) ceph and lustre, but we'll
need to sort the barrier-related rules before it can get used for
really non-trivial stuff.

Another fun thing is the merge of ->d_iput() callers (dentry_iput()
and dentry_unlink_inode()) and a bunch of ->d_compare() ones (all
except the one in __d_lookup_lru())"

* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (26 commits)
fs/dcache.c: avoid soft-lockup in dput()
vfs: new d_init method
vfs: Update lookup_dcache() comment
bdev: get rid of ->bd_inodes
Remove last traces of ->sync_page
new helper: d_same_name()
dentry_cmp(): use lockless_dereference() instead of smp_read_barrier_depends()
vfs: clean up documentation
vfs: document ->d_real()
vfs: merge .d_select_inode() into .d_real()
unify dentry_iput() and dentry_unlink_inode()
binfmt_misc: ->s_root is not going anywhere
drop redundant ->owner initializations
ufs: get rid of redundant checks
orangefs: constify inode_operations
missed comment updates from ->direct_IO() prototype change
file_inode(f)->i_mapping is f->f_mapping
trim fsnotify hooks a bit
9p: new helper - v9fs_parent_fid()
debugfs: ->d_parent is never NULL or negative
...

Linus Torvalds
2016-07-29 03:59:05 +0800

28 Jul, 2016

1 commit

4fc29c1aa Merge tag 'for-f2fs-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs ... Browse Code »

Pull f2fs updates from Jaegeuk Kim:
"The major change in this version is mitigating cpu overheads on write
paths by replacing redundant inode page updates with mark_inode_dirty
calls. And we tried to reduce lock contentions as well to improve
filesystem scalability. Other feature is setting F2FS automatically
when detecting host-managed SMR.

Enhancements:
- ioctl to move a range of data between files
- inject orphan inode errors
- avoid flush commands congestion
- support lazytime

Bug fixes:
- return proper results for some dentry operations
- fix deadlock in add_link failure
- disable extent_cache for fcollapse/finsert"

* tag 'for-f2fs-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (68 commits)
f2fs: clean up coding style and redundancy
f2fs: get victim segment again after new cp
f2fs: handle error case with f2fs_bug_on
f2fs: avoid data race when deciding checkpoin in f2fs_sync_file
f2fs: support an ioctl to move a range of data blocks
f2fs: fix to report error number of f2fs_find_entry
f2fs: avoid memory allocation failure due to a long length
f2fs: reset default idle interval value
f2fs: use blk_plug in all the possible paths
f2fs: fix to avoid data update racing between GC and DIO
f2fs: add maximum prefree segments
f2fs: disable extent_cache for fcollapse/finsert inodes
f2fs: refactor __exchange_data_block for speed up
f2fs: fix ERR_PTR returned by bio
f2fs: avoid mark_inode_dirty
f2fs: move i_size_write in f2fs_write_end
f2fs: fix to avoid redundant discard during fstrim
f2fs: avoid mismatching block range for discard
f2fs: fix incorrect f_bfree calculation in ->statfs
f2fs: use percpu_rw_semaphore
...

Linus Torvalds
2016-07-28 01:36:31 +0800

27 Jul, 2016

2 commits

0e06f5c0d Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge updates from Andrew Morton:

- a few misc bits

- ocfs2

- most(?) of MM

* emailed patches from Andrew Morton : (125 commits)
thp: fix comments of __pmd_trans_huge_lock()
cgroup: remove unnecessary 0 check from css_from_id()
cgroup: fix idr leak for the first cgroup root
mm: memcontrol: fix documentation for compound parameter
mm: memcontrol: remove BUG_ON in uncharge_list
mm: fix build warnings in
mm, thp: convert from optimistic swapin collapsing to conservative
mm, thp: fix comment inconsistency for swapin readahead functions
thp: update Documentation/{vm/transhuge,filesystems/proc}.txt
shmem: split huge pages beyond i_size under memory pressure
thp: introduce CONFIG_TRANSPARENT_HUGE_PAGECACHE
khugepaged: add support of collapse for tmpfs/shmem pages
shmem: make shmem_inode_info::lock irq-safe
khugepaged: move up_read(mmap_sem) out of khugepaged_alloc_page()
thp: extract khugepaged from mm/huge_memory.c
shmem, thp: respect MADV_{NO,}HUGEPAGE for file mappings
shmem: add huge pages support
shmem: get_unmapped_area align huge page
shmem: prepare huge= mount option and sysfs knob
mm, rmap: account shmem thp pages
...

Linus Torvalds
2016-07-27 10:55:54 +0800
8a5c743e3 mm, memcg: use consistent gfp flags during readahead ... Browse Code »

Vladimir has noticed that we might declare memcg oom even during
readahead because read_pages only uses GFP_KERNEL (with mapping_gfp
restriction) while __do_page_cache_readahead uses
page_cache_alloc_readahead which adds __GFP_NORETRY to prevent from
OOMs. This gfp mask discrepancy is really unfortunate and easily
fixable. Drop page_cache_alloc_readahead() which only has one user and
outsource the gfp_mask logic into readahead_gfp_mask and propagate this
mask from __do_page_cache_readahead down to read_pages.

This alone would have only very limited impact as most filesystems are
implementing ->readpages and the common implementation mpage_readpages
does GFP_KERNEL (with mapping_gfp restriction) again. We can tell it to
use readahead_gfp_mask instead as this function is called only during
readahead as well. The same applies to read_cache_pages.

ext4 has its own ext4_mpage_readpages but the path which has pages !=
NULL can use the same gfp mask. Btrfs, cifs, f2fs and orangefs are
doing a very similar pattern to mpage_readpages so the same can be
applied to them as well.

[akpm@linux-foundation.org: coding-style fixes]
[mhocko@suse.com: restrict gfp mask in mpage_alloc]
Link: http://lkml.kernel.org/r/20160610074223.GC32285@dhcp22.suse.cz
Link: http://lkml.kernel.org/r/1465301556-26431-1-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko
Cc: Vladimir Davydov
Cc: Chris Mason
Cc: Steve French
Cc: Theodore Ts'o
Cc: Jan Kara
Cc: Mike Marshall
Cc: Jaegeuk Kim
Cc: Changman Lee
Cc: Chao Yu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2016-07-27 07:19:19 +0800

26 Jul, 2016

1 commit

5302fb000 f2fs: clean up coding style and redundancy ... Browse Code »

This patch includes minor clean-ups.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2016-07-26 03:58:12 +0800

23 Jul, 2016

1 commit

fe94793e5 f2fs: get victim segment again after new cp ... Browse Code »

Previous selected segment may become free after write_checkpoint,
if we do garbage collect on this segment, and then new_curseg happen
to reuse it, it may cause f2fs_bug_on as below.

panic+0x154/0x29c
do_garbage_collect+0x15c/0xaf4
f2fs_gc+0x2dc/0x444
f2fs_balance_fs.part.22+0xcc/0x14c
f2fs_balance_fs+0x28/0x34
f2fs_map_blocks+0x5ec/0x790
f2fs_preallocate_blocks+0xe0/0x100
f2fs_file_write_iter+0x64/0x11c
new_sync_write+0xac/0x11c
vfs_write+0x144/0x1e4
SyS_write+0x60/0xc0

Here, maybe we check sit and ssa type during reset_curseg. So, we check
segment is stale or not, and select a new victim to avoid this.

Signed-off-by: Yunlei He
Signed-off-by: Jaegeuk Kim

Yunlei He
2016-07-23 02:55:31 +0800

21 Jul, 2016

5 commits

70246286e block: get rid of bio_rw and READA ... Browse Code »

These two are confusing leftover of the old world order, combining
values of the REQ_OP_ and REQ_ namespaces. For callers that don't
special case we mostly just replace bi_rw with bio_data_dir or
op_is_write, except for the few cases where a switch over the REQ_OP_
values makes more sense. Any check for READA is replaced with an
explicit check for REQ_RAHEAD. Also remove the READA alias for
REQ_RAHEAD.

Signed-off-by: Christoph Hellwig
Reviewed-by: Johannes Thumshirn
Reviewed-by: Mike Christie
Signed-off-by: Jens Axboe

Christoph Hellwig
2016-07-21 07:37:01 +0800
6f3ec9952 f2fs: handle error case with f2fs_bug_on ... Browse Code »

It's enough to show BUG or WARN by f2fs_bug_on for error case.
Then, we don't need to remain corrupted filesystem.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2016-07-21 05:53:22 +0800
dd11a5df5 f2fs: avoid data race when deciding checkpoin in f2fs_sync_file ... Browse Code »

When fs utilization is almost full, f2fs_sync_file should do checkpoint if
there is not enough space for roll-forward later. (i.e. space_for_roll_forward)
So, currently we have no lock for sbi->alloc_valid_block_count, resulting in
race condition.

In rare case, we can get -ENOSPC when doing roll-forward which triggers

if (is_valid_blkaddr(sbi, dest, META_POR)) {
if (src == NULL_ADDR) {
err = reserve_new_block(&dn);
f2fs_bug_on(sbi, err);
...
}
...
}
in do_recover_data.

So, this patch avoids that situation in advance.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2016-07-21 05:53:21 +0800
4dd6f977f f2fs: support an ioctl to move a range of data blocks ... Browse Code »

This patch implements moving a range of data blocks from source file to
destination file.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2016-07-21 05:53:20 +0800
91246c21b f2fs: fix to report error number of f2fs_find_entry ... Browse Code »

This patch fixes to report the right error number of f2fs_find_entry to
its caller.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2016-07-21 05:53:19 +0800

19 Jul, 2016

1 commit

363cad7f7 f2fs: avoid memory allocation failure due to a long length ... Browse Code »

We need to avoid ENOMEM due to unexpected long length.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2016-07-19 01:20:44 +0800

16 Jul, 2016

7 commits

dcf25fe8f f2fs: reset default idle interval value ... Browse Code »

The default value of idle interval is 2 mins, but for most time when
screen shutdown, there are still operations during the 2 mins interval,
and gc's sleep time is about 30 secs to 60 secs, so there is almost no
chance for GC thread to do garbage collecting.

Set default value of idle interval value from 2 mins to 5 secs for
fixing.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2016-07-16 06:21:24 +0800
9dfa1baff f2fs: use blk_plug in all the possible paths ... Browse Code »

This patch reverts 19a5f5e2ef37 (f2fs: drop any block plugging),
and adds blk_plug in write paths additionally.

The main reason is that blk_start_plug can be used to wake up from low-power
mode before submitting further bios.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2016-07-16 06:21:23 +0800
82e0a5aa5 f2fs: fix to avoid data update racing between GC and DIO ... Browse Code »

Datas in file can be operated by GC and DIO simultaneously, so we will
face race case as below:

For write case:
Thread A Thread B
- generic_file_direct_write
- invalidate_inode_pages2_range
- f2fs_direct_IO
- do_blockdev_direct_IO
- do_direct_IO
- get_more_blocks
- f2fs_gc
- do_garbage_collect
- gc_data_segment
- move_data_page
- do_write_data_page
migrate data block to new block address
- dio_bio_submit
update user data to old block address

For read case:
Thread A Thread B
- generic_file_direct_write
- invalidate_inode_pages2_range
- f2fs_direct_IO
- do_blockdev_direct_IO
- do_direct_IO
- get_more_blocks
- f2fs_balance_fs
- f2fs_gc
- do_garbage_collect
- gc_data_segment
- move_data_page
- do_write_data_page
migrate data block to new block address
- write_checkpoint
- do_checkpoint
- clear_prefree_segments
- f2fs_issue_discard
discard old block adress
- dio_bio_submit
update user buffer from obsolete block address

In order to fix this, for one file, we should let DIO and GC getting exclusion
against with each other.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2016-07-16 06:21:22 +0800
44a83499d f2fs: add maximum prefree segments ... Browse Code »

In 1TB storage, we need to admit 22841 prefree segments, which can consume
too much segments.
This patch sets 8GB in max. prefree segments in that case.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2016-07-16 06:21:21 +0800
5f281fab9 f2fs: disable extent_cache for fcollapse/finsert inodes ... Browse Code »

This reduces the elapsed time to do xfstests/generic/017.

Before: 458 s
After: 390 s

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2016-07-16 06:21:20 +0800
0a2aa8fbb f2fs: refactor __exchange_data_block for speed up ... Browse Code »

This reduces the elapsed time to do xfstests/generic/017.

Before: 715 s
After: 458 s

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2016-07-16 06:21:19 +0800
1d353eb7e f2fs: fix ERR_PTR returned by bio ... Browse Code »

This is to fix wrong error pointer handling flow reported by Dan.

Reported-by: Dan Carpenter
Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2016-07-16 06:21:19 +0800

09 Jul, 2016

15 commits

b56ab837a f2fs: avoid mark_inode_dirty ... Browse Code »

Let's check inode's dirtiness before calling mark_inode_dirty.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2016-07-09 01:34:09 +0800
a2ee0a300 f2fs: move i_size_write in f2fs_write_end ... Browse Code »

We don't need to do i_size_write under page lock.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2016-07-09 01:33:35 +0800
c24a0fd65 f2fs: fix to avoid redundant discard during fstrim ... Browse Code »

With below test steps, f2fs will issue redundant discard when doing fstrim,
the reason is that we issue discards for both prefree segments and
consecutive freed region user wants to trim, part regions they covered are
overlapped, here, we change to do not to issue any discards for prefree
segments in trimmed range.

1. mount -t f2fs -o discard /dev/zram0 /mnt/f2fs
2. fstrim -o 0 -l 3221225472 -m 2097152 -v /mnt/f2fs/
3. dd if=/dev/zero of=/mnt/f2fs/a bs=2M count=1
4. dd if=/dev/zero of=/mnt/f2fs/b bs=1M count=1
5. sync
6. rm /mnt/f2fs/a /mnt/f2fs/b
7. fstrim -o 0 -l 3221225472 -m 2097152 -v /mnt/f2fs/

Before:
-5428 [001] ...1 9511.052125: f2fs_issue_discard: dev = (251,0), blkstart = 0x2200, blklen = 0x200
-5428 [001] ...1 9511.052787: f2fs_issue_discard: dev = (251,0), blkstart = 0x2200, blklen = 0x300

After:
-6764 [000] ...1 9720.382504: f2fs_issue_discard: dev = (251,0), blkstart = 0x2200, blklen = 0x300

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2016-07-09 01:33:34 +0800
c7b41e161 f2fs: avoid mismatching block range for discard ... Browse Code »

This patch skip discard block range smaller than trim_minlen,
and can not be merged by neighbour

Signed-off-by: Yunlei He
Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Yunlei He
2016-07-09 01:33:33 +0800
3e6d0b4d9 f2fs: fix incorrect f_bfree calculation in ->statfs ... Browse Code »

As manual described, f_bfree indicates total free blocks in fs, in f2fs, it
includes two parts: visible free blocks and over-provision blocks. This
patch corrrects the calculation.

fsblkcnt_t f_bfree; /* free blocks in fs */

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2016-07-09 01:33:32 +0800
ec795418c f2fs: use percpu_rw_semaphore ... Browse Code »

This patch replaces rw_semaphore with percpu_rw_semaphore for:
sbi->cp_rwsem
nm_i->nat_tree_lock

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2016-07-09 01:33:31 +0800
3bdad3c7e f2fs: skip to check the block address of node page ... Browse Code »

If the node page is up-to-date, it should be alive.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2016-07-09 01:33:31 +0800
2555a2d55 f2fs: shrink critical region in spin_lock ... Browse Code »

This patch shrinks the critical region in spin_lock.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2016-07-09 01:33:30 +0800
237c0790e f2fs: call SetPageUptodate if needed ... Browse Code »

SetPageUptodate() issues memory barrier, resulting in performance degrdation.
Let's avoid that.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2016-07-09 01:33:29 +0800
fe76b796f f2fs: introduce f2fs_set_page_dirty_nobuffer ... Browse Code »

This patch adds f2fs_set_page_dirty_nobuffer() copied from __set_page_dirty_buffer.
When appending 4KB blocks in f2fs on pmem with multiple cores, this improves the
overall performance.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2016-07-09 01:33:28 +0800
a0995af69 f2fs: remove unnecessary goto statement ... Browse Code »

When base_addr is NULL, there is no need to call kzfree,
it should return -ENOMEM directly. Additionally, it is
better to initialize variable 'error' with 0.

Signed-off-by: Tiezhu Yang
Signed-off-by: Jaegeuk Kim

Tiezhu Yang
2016-07-09 01:33:27 +0800
64058be9c f2fs: add nodiscard mount option ... Browse Code »

This patch adds 'nodiscard' mount option.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2016-07-09 01:33:26 +0800
72e1c797b f2fs: fix to redirty page if fail to gc data page ... Browse Code »

If we fail to move data page during foreground GC, we should give another
chance to writeback that page which was set dirty previously by writer.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2016-07-09 01:33:26 +0800
1563ac75e f2fs: fix to detect truncation prior rather than EIO during read ... Browse Code »

In procedure of synchonized read, after sending out the read request, reader
will try to lock the page for waiting device to finish the read jobs and
unlock the page, but meanwhile, truncater will race with reader, so after
reader get lock of the page, it should check page's mapping to detect
whether someone has truncated the page in advance, then reader has the
chance to do the retry if truncation was done, otherwise read can be failed
due to previous condition check.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2016-07-09 01:33:25 +0800
78682f794 f2fs: fix to avoid reading out encrypted data in page cache ... Browse Code »

For encrypted inode, if user overwrites data of the inode, f2fs will read
encrypted data into page cache, and then do the decryption.

However reader can race with overwriter, and it will see encrypted data
which has not been decrypted by overwriter yet. Fix it by moving decrypting
work to background and keep page non-uptodated until data is decrypted.

Thread A Thread B
- f2fs_file_write_iter
- __generic_file_write_iter
- generic_perform_write
- f2fs_write_begin
- f2fs_submit_page_bio
- generic_file_read_iter
- do_generic_file_read
- lock_page_killable
- unlock_page
- copy_page_to_iter
hit the encrypted data in updated page
- lock_page
- fscrypt_decrypt_page

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2016-07-09 01:33:24 +0800

07 Jul, 2016

6 commits

ac6f19998 f2fs: avoid latency-critical readahead of node pages ... Browse Code »

The f2fs_map_blocks is very related to the performance, so let's avoid any
latency to read ahead node pages.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2016-07-07 01:44:10 +0800
2c237ebaa f2fs: avoid writing node/metapages during writes ... Browse Code »

Let's keep more node/meta pages in run time.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2016-07-07 01:44:09 +0800
ad4edb831 f2fs: produce more nids and reduce readahead nats ... Browse Code »

The readahead nat pages are more likely to be reclaimed quickly, so it'd better
to gather more free nids in advance.

And, let's keep some free nids as much as possible.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2016-07-07 01:44:08 +0800
52763a4b7 f2fs: detect host-managed SMR by feature flag ... Browse Code »

If mkfs.f2fs gives a feature flag for host-managed SMR, we can set mode=lfs
by default.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2016-07-07 01:44:07 +0800
67c3758d2 f2fs: call update_inode_page for orphan inodes ... Browse Code »

Let's store orphan inode pages right away.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2016-07-07 01:44:07 +0800
3e19886ed f2fs: report error for f2fs_parent_dir ... Browse Code »

If there is no dentry, we can report its error correctly.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2016-07-07 01:44:06 +0800