Eric Lee / smarc-fsl-linux-kernel

16 Jun, 2020

1 commit

3be20b6fc Merge tag 'ext4-for-linus-5.8-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 ... Browse Code »

Pull more ext4 updates from Ted Ts'o:
"This is the second round of ext4 commits for 5.8 merge window [1].

It includes the per-inode DAX support, which was dependant on the DAX
infrastructure which came in via the XFS tree, and a number of
regression and bug fixes; most notably the "BUG: using
smp_processor_id() in preemptible code in ext4_mb_new_blocks" reported
by syzkaller"

[1] The pull request actually came in 15 minutes after I had tagged the
rc1 release. Tssk, tssk, late.. - Linus

* tag 'ext4-for-linus-5.8-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4, jbd2: ensure panic by fix a race between jbd2 abort and ext4 error handlers
ext4: support xattr gnu.* namespace for the Hurd
ext4: mballoc: Use this_cpu_read instead of this_cpu_ptr
ext4: avoid utf8_strncasecmp() with unstable name
ext4: stop overwrite the errcode in ext4_setup_super
ext4: fix partial cluster initialization when splitting extent
ext4: avoid race conditions when remounting with options that change dax
Documentation/dax: Update DAX enablement for ext4
fs/ext4: Introduce DAX inode flag
fs/ext4: Remove jflag variable
fs/ext4: Make DAX mount option a tri-state
fs/ext4: Only change S_DAX on inode load
fs/ext4: Update ext4_should_use_dax()
fs/ext4: Change EXT4_MOUNT_DAX to EXT4_MOUNT_DAX_ALWAYS
fs/ext4: Disallow verity if inode is DAX
fs/ext4: Narrow scope of DAX check in setflags

Linus Torvalds
2020-06-16 00:32:10 +0800

13 Jun, 2020

1 commit

7b97d868b ext4, jbd2: ensure panic by fix a race between jbd2 abort and ext4 error handlers ... Browse Code »

In the ext4 filesystem with errors=panic, if one process is recording
errno in the superblock when invoking jbd2_journal_abort() due to some
error cases, it could be raced by another __ext4_abort() which is
setting the SB_RDONLY flag but missing panic because errno has not been
recorded.

jbd2_journal_commit_transaction()
jbd2_journal_abort()
journal->j_flags |= JBD2_ABORT;
jbd2_journal_update_sb_errno()
| ext4_journal_check_start()
| __ext4_abort()
| sb->s_flags |= SB_RDONLY;
| if (!JBD2_REC_ERR)
| return;
journal->j_flags |= JBD2_REC_ERR;

Finally, it will no longer trigger panic because the filesystem has
already been set read-only. Fix this by introduce j_abort_mutex to make
sure journal abort is completed before panic, and remove JBD2_REC_ERR
flag.

Fixes: 4327ba52afd03 ("ext4, jbd2: ensure entering into panic after recording an error in superblock")
Signed-off-by: zhangyi (F)
Reviewed-by: Jan Kara
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200609073540.3810702-1-yi.zhang@huawei.com
Signed-off-by: Theodore Ts'o

zhangyi (F)
2020-06-13 02:51:41 +0800

06 Jun, 2020

1 commit

0b166a57e Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 ... Browse Code »

Pull ext4 updates from Ted Ts'o:
"A lot of bug fixes and cleanups for ext4, including:

- Fix performance problems found in dioread_nolock now that it is the
default, caused by transaction leaks.

- Clean up fiemap handling in ext4

- Clean up and refactor multiple block allocator (mballoc) code

- Fix a problem with mballoc with a smaller file systems running out
of blocks because they couldn't properly use blocks that had been
reserved by inode preallocation.

- Fixed a race in ext4_sync_parent() versus rename()

- Simplify the error handling in the extent manipulation code

- Make sure all metadata I/O errors are felected to
ext4_ext_dirty()'s and ext4_make_inode_dirty()'s callers.

- Avoid passing an error pointer to brelse in ext4_xattr_set()

- Fix race which could result to freeing an inode on the dirty last
in data=journal mode.

- Fix refcount handling if ext4_iget() fails

- Fix a crash in generic/019 caused by a corrupted extent node"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (58 commits)
ext4: avoid unnecessary transaction starts during writeback
ext4: don't block for O_DIRECT if IOCB_NOWAIT is set
ext4: remove the access_ok() check in ext4_ioctl_get_es_cache
fs: remove the access_ok() check in ioctl_fiemap
fs: handle FIEMAP_FLAG_SYNC in fiemap_prep
fs: move fiemap range validation into the file systems instances
iomap: fix the iomap_fiemap prototype
fs: move the fiemap definitions out of fs.h
fs: mark __generic_block_fiemap static
ext4: remove the call to fiemap_check_flags in ext4_fiemap
ext4: split _ext4_fiemap
ext4: fix fiemap size checks for bitmap files
ext4: fix EXT4_MAX_LOGICAL_BLOCK macro
add comment for ext4_dir_entry_2 file_type member
jbd2: avoid leaking transaction credits when unreserving handle
ext4: drop ext4_journal_free_reserved()
ext4: mballoc: use lock for checking free blocks while retrying
ext4: mballoc: refactor ext4_mb_good_group()
ext4: mballoc: introduce pcpu seqcnt for freeing PA to improve ENOSPC handling
ext4: mballoc: refactor ext4_mb_discard_preallocations()
...

Linus Torvalds
2020-06-06 07:19:28 +0800

04 Jun, 2020

1 commit

14ff62863 jbd2: avoid leaking transaction credits when unreserving handle ... Browse Code »

When reserved transaction handle is unused, we subtract its reserved
credits in __jbd2_journal_unreserve_handle() called from
jbd2_journal_stop(). However this function forgets to remove reserved
credits from transaction->t_outstanding_credits and thus the transaction
space that was reserved remains effectively leaked. The leaked
transaction space can be quite significant in some cases and leads to
unnecessarily small transactions and thus reducing throughput of the
journalling machinery. E.g. fsmark workload creating lots of 4k files
was observed to have about 20% lower throughput due to this when ext4 is
mounted with dioread_nolock mount option.

Subtract reserved credits from t_outstanding_credits as well.

CC: stable@vger.kernel.org
Fixes: 8f7d89f36829 ("jbd2: transaction reservation support")
Reviewed-by: Andreas Dilger
Signed-off-by: Jan Kara
Link: https://lore.kernel.org/r/20200520133119.1383-3-jack@suse.cz
Signed-off-by: Theodore Ts'o

Jan Kara
2020-06-04 11:16:53 +0800

22 May, 2020

1 commit

9398554fb block: remove the error_sector argument to blkdev_issue_flush ... Browse Code »

The argument isn't used by any caller, and drivers don't fill out
bi_sector for flush requests either.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-05-22 22:45:46 +0800

06 Mar, 2020

1 commit

780f66e59 jbd2: improve comments about freeing data buffers whose page mapping is NULL ... Browse Code »

Improve comments in jbd2_journal_commit_transaction() to describe why
we don't need to clear the buffer_mapped bit for freeing file mapping
buffers whose page mapping is NULL.

Link: https://lore.kernel.org/r/20200217112706.20085-1-yi.zhang@huawei.com
Fixes: c96dceeabf76 ("jbd2: do not clear the BH_Mapped flag when forgetting a metadata buffer")
Suggested-by: Jan Kara
Reviewed-by: Jan Kara
Signed-off-by: zhangyi (F)
Signed-off-by: Theodore Ts'o

zhangyi (F)
2020-03-06 09:25:05 +0800

01 Mar, 2020

1 commit

6c5d91124 jbd2: fix data races at struct journal_head ... Browse Code »

journal_head::b_transaction and journal_head::b_next_transaction could
be accessed concurrently as noticed by KCSAN,

LTP: starting fsync04
/dev/zero: Can't open blockdev
EXT4-fs (loop0): mounting ext3 file system using the ext4 subsystem
EXT4-fs (loop0): mounted filesystem with ordered data mode. Opts: (null)
==================================================================
BUG: KCSAN: data-race in __jbd2_journal_refile_buffer [jbd2] / jbd2_write_access_granted [jbd2]

write to 0xffff99f9b1bd0e30 of 8 bytes by task 25721 on cpu 70:
__jbd2_journal_refile_buffer+0xdd/0x210 [jbd2]
__jbd2_journal_refile_buffer at fs/jbd2/transaction.c:2569
jbd2_journal_commit_transaction+0x2d15/0x3f20 [jbd2]
(inlined by) jbd2_journal_commit_transaction at fs/jbd2/commit.c:1034
kjournald2+0x13b/0x450 [jbd2]
kthread+0x1cd/0x1f0
ret_from_fork+0x27/0x50

read to 0xffff99f9b1bd0e30 of 8 bytes by task 25724 on cpu 68:
jbd2_write_access_granted+0x1b2/0x250 [jbd2]
jbd2_write_access_granted at fs/jbd2/transaction.c:1155
jbd2_journal_get_write_access+0x2c/0x60 [jbd2]
__ext4_journal_get_write_access+0x50/0x90 [ext4]
ext4_mb_mark_diskspace_used+0x158/0x620 [ext4]
ext4_mb_new_blocks+0x54f/0xca0 [ext4]
ext4_ind_map_blocks+0xc79/0x1b40 [ext4]
ext4_map_blocks+0x3b4/0x950 [ext4]
_ext4_get_block+0xfc/0x270 [ext4]
ext4_get_block+0x3b/0x50 [ext4]
__block_write_begin_int+0x22e/0xae0
__block_write_begin+0x39/0x50
ext4_write_begin+0x388/0xb50 [ext4]
generic_perform_write+0x15d/0x290
ext4_buffered_write_iter+0x11f/0x210 [ext4]
ext4_file_write_iter+0xce/0x9e0 [ext4]
new_sync_write+0x29c/0x3b0
__vfs_write+0x92/0xa0
vfs_write+0x103/0x260
ksys_write+0x9d/0x130
__x64_sys_write+0x4c/0x60
do_syscall_64+0x91/0xb05
entry_SYSCALL_64_after_hwframe+0x49/0xbe

5 locks held by fsync04/25724:
#0: ffff99f9911093f8 (sb_writers#13){.+.+}, at: vfs_write+0x21c/0x260
#1: ffff99f9db4c0348 (&sb->s_type->i_mutex_key#15){+.+.}, at: ext4_buffered_write_iter+0x65/0x210 [ext4]
#2: ffff99f5e7dfcf58 (jbd2_handle){++++}, at: start_this_handle+0x1c1/0x9d0 [jbd2]
#3: ffff99f9db4c0168 (&ei->i_data_sem){++++}, at: ext4_map_blocks+0x176/0x950 [ext4]
#4: ffffffff99086b40 (rcu_read_lock){....}, at: jbd2_write_access_granted+0x4e/0x250 [jbd2]
irq event stamp: 1407125
hardirqs last enabled at (1407125): [] __find_get_block+0x107/0x790
hardirqs last disabled at (1407124): [] __find_get_block+0x49/0x790
softirqs last enabled at (1405528): [] __do_softirq+0x34c/0x57c
softirqs last disabled at (1405521): [] irq_exit+0xa2/0xc0

Reported by Kernel Concurrency Sanitizer on:
CPU: 68 PID: 25724 Comm: fsync04 Tainted: G L 5.6.0-rc2-next-20200221+ #7
Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019

The plain reads are outside of jh->b_state_lock critical section which result
in data races. Fix them by adding pairs of READ|WRITE_ONCE().

Reviewed-by: Jan Kara
Signed-off-by: Qian Cai
Link: https://lore.kernel.org/r/20200222043111.2227-1-cai@lca.pw
Signed-off-by: Theodore Ts'o

Qian Cai
2020-03-01 02:40:02 +0800

22 Feb, 2020

1 commit

8eedabfd6 jbd2: fix ocfs2 corrupt when clearing block group bits ... Browse Code »

I found a NULL pointer dereference in ocfs2_block_group_clear_bits().
The running environment:
kernel version: 4.19
A cluster with two nodes, 5 luns mounted on two nodes, and do some
file operations like dd/fallocate/truncate/rm on every lun with storage
network disconnection.

The fallocate operation on dm-23-45 caused an null pointer dereference.

The information of NULL pointer dereference as follows:
[577992.878282] JBD2: Error -5 detected when updating journal superblock for dm-23-45.
[577992.878290] Aborting journal on device dm-23-45.
...
[577992.890778] JBD2: Error -5 detected when updating journal superblock for dm-24-46.
[577992.890908] __journal_remove_journal_head: freeing b_committed_data
[577992.890916] (fallocate,88392,52):ocfs2_extend_trans:474 ERROR: status = -30
[577992.890918] __journal_remove_journal_head: freeing b_committed_data
[577992.890920] (fallocate,88392,52):ocfs2_rotate_tree_right:2500 ERROR: status = -30
[577992.890922] __journal_remove_journal_head: freeing b_committed_data
[577992.890924] (fallocate,88392,52):ocfs2_do_insert_extent:4382 ERROR: status = -30
[577992.890928] (fallocate,88392,52):ocfs2_insert_extent:4842 ERROR: status = -30
[577992.890928] __journal_remove_journal_head: freeing b_committed_data
[577992.890930] (fallocate,88392,52):ocfs2_add_clusters_in_btree:4947 ERROR: status = -30
[577992.890933] __journal_remove_journal_head: freeing b_committed_data
[577992.890939] __journal_remove_journal_head: freeing b_committed_data
[577992.890949] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020
[577992.890950] Mem abort info:
[577992.890951] ESR = 0x96000004
[577992.890952] Exception class = DABT (current EL), IL = 32 bits
[577992.890952] SET = 0, FnV = 0
[577992.890953] EA = 0, S1PTW = 0
[577992.890954] Data abort info:
[577992.890955] ISV = 0, ISS = 0x00000004
[577992.890956] CM = 0, WnR = 0
[577992.890958] user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000f8da07a9
[577992.890960] [0000000000000020] pgd=0000000000000000
[577992.890964] Internal error: Oops: 96000004 [#1] SMP
[577992.890965] Process fallocate (pid: 88392, stack limit = 0x00000000013db2fd)
[577992.890968] CPU: 52 PID: 88392 Comm: fallocate Kdump: loaded Tainted: G W OE 4.19.36 #1
[577992.890969] Hardware name: Huawei TaiShan 2280 V2/BC82AMDD, BIOS 0.98 08/25/2019
[577992.890971] pstate: 60400009 (nZCv daif +PAN -UAO)
[577992.891054] pc : _ocfs2_free_suballoc_bits+0x63c/0x968 [ocfs2]
[577992.891082] lr : _ocfs2_free_suballoc_bits+0x618/0x968 [ocfs2]
[577992.891084] sp : ffff0000c8e2b810
[577992.891085] x29: ffff0000c8e2b820 x28: 0000000000000000
[577992.891087] x27: 00000000000006f3 x26: ffffa07957b02e70
[577992.891089] x25: ffff807c59d50000 x24: 00000000000006f2
[577992.891091] x23: 0000000000000001 x22: ffff807bd39abc30
[577992.891093] x21: ffff0000811d9000 x20: ffffa07535d6a000
[577992.891097] x19: ffff000001681638 x18: ffffffffffffffff
[577992.891098] x17: 0000000000000000 x16: ffff000080a03df0
[577992.891100] x15: ffff0000811d9708 x14: 203d207375746174
[577992.891101] x13: 73203a524f525245 x12: 20373439343a6565
[577992.891103] x11: 0000000000000038 x10: 0101010101010101
[577992.891106] x9 : ffffa07c68a85d70 x8 : 7f7f7f7f7f7f7f7f
[577992.891109] x7 : 0000000000000000 x6 : 0000000000000080
[577992.891110] x5 : 0000000000000000 x4 : 0000000000000002
[577992.891112] x3 : ffff000001713390 x2 : 2ff90f88b1c22f00
[577992.891114] x1 : ffff807bd39abc30 x0 : 0000000000000000
[577992.891116] Call trace:
[577992.891139] _ocfs2_free_suballoc_bits+0x63c/0x968 [ocfs2]
[577992.891162] _ocfs2_free_clusters+0x100/0x290 [ocfs2]
[577992.891185] ocfs2_free_clusters+0x50/0x68 [ocfs2]
[577992.891206] ocfs2_add_clusters_in_btree+0x198/0x5e0 [ocfs2]
[577992.891227] ocfs2_add_inode_data+0x94/0xc8 [ocfs2]
[577992.891248] ocfs2_extend_allocation+0x1bc/0x7a8 [ocfs2]
[577992.891269] ocfs2_allocate_extents+0x14c/0x338 [ocfs2]
[577992.891290] __ocfs2_change_file_space+0x3f8/0x610 [ocfs2]
[577992.891309] ocfs2_fallocate+0xe4/0x128 [ocfs2]
[577992.891316] vfs_fallocate+0x11c/0x250
[577992.891317] ksys_fallocate+0x54/0x88
[577992.891319] __arm64_sys_fallocate+0x28/0x38
[577992.891323] el0_svc_common+0x78/0x130
[577992.891325] el0_svc_handler+0x38/0x78
[577992.891327] el0_svc+0x8/0xc

My analysis process as follows:
ocfs2_fallocate
__ocfs2_change_file_space
ocfs2_allocate_extents
ocfs2_extend_allocation
ocfs2_add_inode_data
ocfs2_add_clusters_in_btree
ocfs2_insert_extent
ocfs2_do_insert_extent
ocfs2_rotate_tree_right
ocfs2_extend_rotate_transaction
ocfs2_extend_trans
jbd2_journal_restart
jbd2__journal_restart
/* handle->h_transaction is NULL,
* is_handle_aborted(handle) is true
*/
handle->h_transaction = NULL;
start_this_handle
return -EROFS;
ocfs2_free_clusters
_ocfs2_free_clusters
_ocfs2_free_suballoc_bits
ocfs2_block_group_clear_bits
ocfs2_journal_access_gd
__ocfs2_journal_access
jbd2_journal_get_undo_access
/* I think jbd2_write_access_granted() will
* return true, because do_get_write_access()
* will return -EROFS.
*/
if (jbd2_write_access_granted(...)) return 0;
do_get_write_access
/* handle->h_transaction is NULL, it will
* return -EROFS here, so do_get_write_access()
* was not called.
*/
if (is_handle_aborted(handle)) return -EROFS;
/* bh2jh(group_bh) is NULL, caused NULL
pointer dereference */
undo_bg = (struct ocfs2_group_desc *)
bh2jh(group_bh)->b_committed_data;

If handle->h_transaction == NULL, then jbd2_write_access_granted()
does not really guarantee that journal_head will stay around,
not even speaking of its b_committed_data. The bh2jh(group_bh)
can be removed after ocfs2_journal_access_gd() and before call
"bh2jh(group_bh)->b_committed_data". So, we should move
is_handle_aborted() check from do_get_write_access() into
jbd2_journal_get_undo_access() and jbd2_journal_get_write_access()
before the call to jbd2_write_access_granted().

Link: https://lore.kernel.org/r/f72a623f-b3f1-381a-d91d-d22a1c83a336@huawei.com
Signed-off-by: Yan Wang
Signed-off-by: Theodore Ts'o
Reviewed-by: Jun Piao
Reviewed-by: Jan Kara
Cc: stable@kernel.org

wangyan
2020-02-22 08:32:07 +0800

17 Feb, 2020

1 commit

8a8b80967 Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 ... Browse Code »

Pull ext4 fixes from Ted Ts'o:
"Miscellaneous ext4 bug fixes (all stable fodder)"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: improve explanation of a mount failure caused by a misconfigured kernel
jbd2: do not clear the BH_Mapped flag when forgetting a metadata buffer
jbd2: move the clearing of b_modified flag to the journal_unmap_buffer()
ext4: add cond_resched() to ext4_protect_reserved_inode
ext4: fix checksum errors with indexed dirs
ext4: fix support for inode sizes > 1024 bytes
ext4: simplify checking quota limits in ext4_statfs()
ext4: don't assume that mmp_nodename/bdevname have NUL

Linus Torvalds
2020-02-17 03:12:06 +0800

14 Feb, 2020

2 commits

c96dceeab jbd2: do not clear the BH_Mapped flag when forgetting a metadata buffer ... Browse Code »

Commit 904cdbd41d74 ("jbd2: clear dirty flag when revoking a buffer from
an older transaction") set the BH_Freed flag when forgetting a metadata
buffer which belongs to the committing transaction, it indicate the
committing process clear dirty bits when it is done with the buffer. But
it also clear the BH_Mapped flag at the same time, which may trigger
below NULL pointer oops when block_size < PAGE_SIZE.

rmdir 1 kjournald2 mkdir 2
jbd2_journal_commit_transaction
commit transaction N
jbd2_journal_forget
set_buffer_freed(bh1)
jbd2_journal_commit_transaction
commit transaction N+1
...
clear_buffer_mapped(bh1)
ext4_getblk(bh2 ummapped)
...
grow_dev_page
init_page_buffers
bh1->b_private=NULL
bh2->b_private=NULL
jbd2_journal_put_journal_head(jh1)
__journal_remove_journal_head(hb1)
jh1 is NULL and trigger oops

*) Dir entry block bh1 and bh2 belongs to one page, and the bh2 has
already been unmapped.

For the metadata buffer we forgetting, we should always keep the mapped
flag and clear the dirty flags is enough, so this patch pick out the
these buffers and keep their BH_Mapped flag.

Link: https://lore.kernel.org/r/20200213063821.30455-3-yi.zhang@huawei.com
Fixes: 904cdbd41d74 ("jbd2: clear dirty flag when revoking a buffer from an older transaction")
Reviewed-by: Jan Kara
Signed-off-by: zhangyi (F)
Signed-off-by: Theodore Ts'o
Cc: stable@kernel.org

zhangyi (F)
2020-02-14 01:06:28 +0800
6a66a7ded jbd2: move the clearing of b_modified flag to the journal_unmap_buffer() ... Browse Code »

There is no need to delay the clearing of b_modified flag to the
transaction committing time when unmapping the journalled buffer, so
just move it to the journal_unmap_buffer().

Link: https://lore.kernel.org/r/20200213063821.30455-2-yi.zhang@huawei.com
Reviewed-by: Jan Kara
Signed-off-by: zhangyi (F)
Signed-off-by: Theodore Ts'o
Cc: stable@kernel.org

zhangyi (F)
2020-02-14 01:06:17 +0800

09 Feb, 2020

1 commit

236f45329 Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull misc vfs updates from Al Viro:

- bmap series from cmaiolino

- getting rid of convolutions in copy_mount_options() (use a couple of
copy_from_user() instead of the __get_user() crap)

* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
saner copy_mount_options()
fibmap: Reject negative block numbers
fibmap: Use bmap instead of ->bmap method in ioctl_fibmap
ecryptfs: drop direct calls to ->bmap
cachefiles: drop direct usage of ->bmap method.
fs: Enable bmap() function to properly return errors

Linus Torvalds
2020-02-09 05:04:49 +0800

04 Feb, 2020

1 commit

97a32539b proc: convert everything to "struct proc_ops" ... Browse Code »

The most notable change is DEFINE_SHOW_ATTRIBUTE macro split in
seq_file.h.

Conversion rule is:

llseek => proc_lseek
unlocked_ioctl => proc_ioctl

xxx => proc_xxx

delete ".owner = THIS_MODULE" line

[akpm@linux-foundation.org: fix drivers/isdn/capi/kcapi_proc.c]
[sfr@canb.auug.org.au: fix kernel/sched/psi.c]
Link: http://lkml.kernel.org/r/20200122180545.36222f50@canb.auug.org.au
Link: http://lkml.kernel.org/r/20191225172546.GB13378@avx2
Signed-off-by: Alexey Dobriyan
Signed-off-by: Stephen Rothwell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2020-02-04 11:05:26 +0800

03 Feb, 2020

1 commit

30460e1ea fs: Enable bmap() function to properly return errors ... Browse Code »

By now, bmap() will either return the physical block number related to
the requested file offset or 0 in case of error or the requested offset
maps into a hole.
This patch makes the needed changes to enable bmap() to proper return
errors, using the return value as an error return, and now, a pointer
must be passed to bmap() to be filled with the mapped physical block.

It will change the behavior of bmap() on return:

- negative value in case of error
- zero on success or map fell into a hole

In case of a hole, the *block will be zero too

Since this is a prep patch, by now, the only error return is -EINVAL if
->bmap doesn't exist.

Reviewed-by: Christoph Hellwig
Signed-off-by: Carlos Maiolino
Signed-off-by: Al Viro

Carlos Maiolino
2020-02-03 21:05:37 +0800

25 Jan, 2020

8 commits

7f6225e44 jbd2: clean __jbd2_journal_abort_hard() and __journal_abort_soft() ... Browse Code »

__jbd2_journal_abort_hard() is no longer used, so now we can merge
__jbd2_journal_abort_hard() and __journal_abort_soft() these two
functions into jbd2_journal_abort() and remove them.

Signed-off-by: zhangyi (F)
Reviewed-by: Jan Kara
Link: https://lore.kernel.org/r/20191204124614.45424-5-yi.zhang@huawei.com
Signed-off-by: Theodore Ts'o

zhangyi (F)
2020-01-25 16:01:56 +0800
0e98c084a jbd2: make sure ESHUTDOWN to be recorded in the journal superblock ... Browse Code »

Commit fb7c02445c49 ("ext4: pass -ESHUTDOWN code to jbd2 layer") want
to allow jbd2 layer to distinguish shutdown journal abort from other
error cases. So the ESHUTDOWN should be taken precedence over any other
errno which has already been recoded after EXT4_FLAGS_SHUTDOWN is set,
but it only update errno in the journal suoerblock now if the old errno
is 0.

Fixes: fb7c02445c49 ("ext4: pass -ESHUTDOWN code to jbd2 layer")
Signed-off-by: zhangyi (F)
Reviewed-by: Jan Kara
Link: https://lore.kernel.org/r/20191204124614.45424-4-yi.zhang@huawei.com
Signed-off-by: Theodore Ts'o

zhangyi (F)
2020-01-25 16:00:20 +0800
51f57b01e ext4, jbd2: ensure panic when aborting with zero errno ... Browse Code »

JBD2_REC_ERR flag used to indicate the errno has been updated when jbd2
aborted, and then __ext4_abort() and ext4_handle_error() can invoke
panic if ERRORS_PANIC is specified. But if the journal has been aborted
with zero errno, jbd2_journal_abort() didn't set this flag so we can
no longer panic. Fix this by always record the proper errno in the
journal superblock.

Fixes: 4327ba52afd03 ("ext4, jbd2: ensure entering into panic after recording an error in superblock")
Signed-off-by: zhangyi (F)
Reviewed-by: Jan Kara
Link: https://lore.kernel.org/r/20191204124614.45424-3-yi.zhang@huawei.com
Signed-off-by: Theodore Ts'o

zhangyi (F)
2020-01-25 15:59:25 +0800
d0a186e0d jbd2: switch to use jbd2_journal_abort() when failed to submit the commit record ... Browse Code »

We invoke jbd2_journal_abort() to abort the journal and record errno
in the jbd2 superblock when committing journal transaction besides the
failure on submitting the commit record. But there is no need for the
case and we can also invoke jbd2_journal_abort() instead of
__jbd2_journal_abort_hard().

Fixes: 818d276ceb83a ("ext4: Add the journal checksum feature")
Signed-off-by: zhangyi (F)
Reviewed-by: Jan Kara
Link: https://lore.kernel.org/r/20191204124614.45424-2-yi.zhang@huawei.com
Signed-off-by: Theodore Ts'o

zhangyi (F)
2020-01-25 15:58:46 +0800
1a8e9cf40 jbd2_seq_info_next should increase position index ... Browse Code »

if seq_file .next fuction does not change position index,
read after some lseek can generate unexpected output.

Script below generates endless output
$ q=;while read -r r;do echo "$((++q)) $r";done
Reviewed-by: Jan Kara
Link: https://lore.kernel.org/r/d13805e5-695e-8ac3-b678-26ca2313629f@virtuozzo.com
Signed-off-by: Theodore Ts'o

Vasily Averin
2020-01-25 15:30:46 +0800
17c51d836 jbd2: remove pointless assertion in __journal_remove_journal_head ... Browse Code »

Only when jh->b_jcount = 0 in jbd2_journal_put_journal_head, we are allowed
to call __journal_remove_journal_head. This assertion is meaningless,
just remove it.

Signed-off-by: Shijie Luo
Reviewed-by: Jan Kara
Link: https://lore.kernel.org/r/20200123070054.50585-1-luoshijie1@huawei.com
Signed-off-by: Theodore Ts'o

Shijie Luo
2020-01-25 15:25:56 +0800
8d6ce1367 ext4,jbd2: fix comment and code style ... Browse Code »

Fix comment and remove unneccessary blank.

Signed-off-by: Shijie Luo
Reviewed-by: Jan Kara
Link: https://lore.kernel.org/r/20200123064325.36358-1-luoshijie1@huawei.com
Signed-off-by: Theodore Ts'o

Shijie Luo
2020-01-25 15:24:53 +0800
0c1cba6cc jbd2: delete the duplicated words in the comments ... Browse Code »

Delete the duplicated words "is" in the comments

Signed-off-by: Yan Wang
Reviewed-by: Jan Kara
Link: https://lore.kernel.org/r/12087f77-ab4d-c7ba-53b4-893dbf0026f0@huawei.com
Signed-off-by: Theodore Ts'o

wangyan
2020-01-25 15:23:29 +0800

18 Jan, 2020

1 commit

a09decff5 jbd2: clear JBD2_ABORT flag before journal_reset to update log tail info when load journal ... Browse Code »

If the journal is dirty when the filesystem is mounted, jbd2 will replay
the journal but the journal superblock will not be updated by
journal_reset() because JBD2_ABORT flag is still set (it was set in
journal_init_common()). This is problematic because when a new transaction
is then committed, it will be recorded in block 1 (journal->j_tail was set
to 1 in journal_reset()). If unclean shutdown happens again before the
journal superblock is updated, the new recorded transaction will not be
replayed during the next mount (because of stale sb->s_start and
sb->s_sequence values) which can lead to filesystem corruption.

Fixes: 85e0c4e89c1b ("jbd2: if the journal is aborted then don't allow update of the log tail")
Signed-off-by: Kai Li
Link: https://lore.kernel.org/r/20200111022542.5008-1-li.kai4@h3c.com
Signed-off-by: Theodore Ts'o

Kai Li
2020-01-18 05:25:47 +0800

01 Dec, 2019

1 commit

50b8b3f85 Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 ... Browse Code »

Pull ext4 updates from Ted Ts'o:
"This merge window saw the the following new featuers added to ext4:

- Direct I/O via iomap (required the iomap-for-next branch from
Darrick as a prereq).

- Support for using dioread-nolock where the block size < page size.

- Support for encryption for file systems where the block size < page
size.

- Rework of journal credits handling so a revoke-heavy workload will
not cause the journal to run out of space.

- Replace bit-spinlocks with spinlocks in jbd2

Also included were some bug fixes and cleanups, mostly to clean up
corner cases from fuzzed file systems and error path handling"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (59 commits)
ext4: work around deleting a file with i_nlink == 0 safely
ext4: add more paranoia checking in ext4_expand_extra_isize handling
jbd2: make jbd2_handle_buffer_credits() handle reserved handles
ext4: fix a bug in ext4_wait_for_tail_page_commit
ext4: bio_alloc with __GFP_DIRECT_RECLAIM never fails
ext4: code cleanup for get_next_id
ext4: fix leak of quota reservations
ext4: remove unused variable warning in parse_options()
ext4: Enable encryption for subpage-sized blocks
fs/buffer.c: support fscrypt in block_read_full_page()
ext4: Add error handling for io_end_vec struct allocation
jbd2: Fine tune estimate of necessary descriptor blocks
jbd2: Provide trace event for handle restarts
ext4: Reserve revoke credits for freed blocks
jbd2: Make credit checking more strict
jbd2: Rename h_buffer_credits to h_total_credits
jbd2: Reserve space for revoke descriptor blocks
jbd2: Drop jbd2_space_needed()
jbd2: Account descriptor blocks into t_outstanding_credits
jbd2: Factor out common parts of stopping and restarting a handle
...

Linus Torvalds
2019-12-01 02:53:02 +0800

06 Nov, 2019

15 commits

a6d404084 Merge branch 'jk/jbd2-revoke-overflow' Browse Code »

Theodore Ts'o
2019-11-06 05:02:20 +0800
19014d697 jbd2: Fine tune estimate of necessary descriptor blocks ... Browse Code »

Currently we reserve j_max_transaction_buffers / 32 for transaction
descriptor blocks. Now that revoke descriptors are accounted for
separately this estimate is unnecessarily high and we can actually
compute much tighter estimate. In the common case of 32k journal blocks
and 4k blocksize this actually reduces the amount of reserved descriptor
blocks from 256 to ~25 which allows us to fit more real data into a
transaction.

Signed-off-by: Jan Kara
Link: https://lore.kernel.org/r/20191105164437.32602-25-jack@suse.cz
Signed-off-by: Theodore Ts'o

Jan Kara
2019-11-06 05:00:49 +0800
0094f981b jbd2: Provide trace event for handle restarts ... Browse Code »

Provide trace event for handle restarts to ease debugging.

Reviewed-by: Theodore Ts'o
Signed-off-by: Jan Kara
Link: https://lore.kernel.org/r/20191105164437.32602-24-jack@suse.cz
Signed-off-by: Theodore Ts'o

Jan Kara
2019-11-06 05:00:49 +0800
d090707ed jbd2: Make credit checking more strict ... Browse Code »

Make checking of available credits in jbd2_journal_dirty_metadata() more
strict. There should be always enough credits in the handle to write all
potential revoke descriptors. Also we warn in case there are not enough
credits since this is a bug in the filesystem.

Reviewed-by: Theodore Ts'o
Signed-off-by: Jan Kara
Link: https://lore.kernel.org/r/20191105164437.32602-22-jack@suse.cz
Signed-off-by: Theodore Ts'o

Jan Kara
2019-11-06 05:00:49 +0800
933f1c1e0 jbd2: Rename h_buffer_credits to h_total_credits ... Browse Code »

The credit counter now contains both buffer and revoke descriptor block
credits. Rename to counter to h_total_credits to reflect that. No
functional change.

Reviewed-by: Theodore Ts'o
Signed-off-by: Jan Kara
Link: https://lore.kernel.org/r/20191105164437.32602-21-jack@suse.cz
Signed-off-by: Theodore Ts'o

Jan Kara
2019-11-06 05:00:48 +0800
fdc3ef882 jbd2: Reserve space for revoke descriptor blocks ... Browse Code »

Extend functions for starting, extending, and restarting transaction
handles to take number of revoke records handle must be able to
accommodate. These functions then make sure transaction has enough
credits to be able to store resulting revoke descriptor blocks. Also
revoke code tracks number of revoke records created by a handle to catch
situation where some place didn't reserve enough space for revoke
records. Similarly to standard transaction credits, space for unused
reserved revoke records is released when the handle is stopped.

On the ext4 side we currently take a simplistic approach of reserving
space for 1024 revoke records for any transaction. This grows amount of
credits reserved for each handle only by a few and is enough for any
normal workload so that we don't hit warnings in jbd2. We will refine
the logic in following commits.

Signed-off-by: Jan Kara
Link: https://lore.kernel.org/r/20191105164437.32602-20-jack@suse.cz
Signed-off-by: Theodore Ts'o

Jan Kara
2019-11-06 05:00:48 +0800
77444ac4f jbd2: Drop jbd2_space_needed() ... Browse Code »

The function is now just a trivial wrapper returning
journal->j_max_transaction_buffers. Drop it.

Reviewed-by: Theodore Ts'o
Signed-off-by: Jan Kara
Link: https://lore.kernel.org/r/20191105164437.32602-19-jack@suse.cz
Signed-off-by: Theodore Ts'o

Jan Kara
2019-11-06 05:00:48 +0800
9f356e5a4 jbd2: Account descriptor blocks into t_outstanding_credits ... Browse Code »

Currently, journal descriptor blocks were not accounted in
transaction->t_outstanding_credits and we were just leaving some slack
space in the journal for them (in jbd2_log_space_left() and
jbd2_space_needed()). This is making proper accounting (and reservation
we want to add) of descriptor blocks difficult so switch to accounting
descriptor blocks in transaction->t_outstanding_credits and just reserve
the same amount of credits in t_outstanding credits for journal
descriptor blocks when creating transaction.

Signed-off-by: Jan Kara
Link: https://lore.kernel.org/r/20191105164437.32602-18-jack@suse.cz
Signed-off-by: Theodore Ts'o

Jan Kara
2019-11-06 05:00:48 +0800
ec8b6f600 jbd2: Factor out common parts of stopping and restarting a handle ... Browse Code »

jbd2__journal_restart() has quite some code that is common with
jbd2_journal_stop(). Factor this functionality into stop_this_handle()
helper and use it from both functions. Note that this also drops
t_handle_lock protection from jbd2__journal_restart() as
jbd2_journal_stop() does the same thing without it.

Signed-off-by: Jan Kara
Link: https://lore.kernel.org/r/20191105164437.32602-17-jack@suse.cz
Signed-off-by: Theodore Ts'o

Jan Kara
2019-11-06 05:00:48 +0800
5559b2d81 jbd2: Drop pointless wakeup from jbd2_journal_stop() ... Browse Code »

When we drop last handle from a transaction and journal->j_barrier_count
> 0, jbd2_journal_stop() wakes up journal->j_wait_transaction_locked
wait queue. This looks pointless - wait for outstanding handles always
happens on journal->j_wait_updates waitqueue.
journal->j_wait_transaction_locked is used to wait for transaction state
changes and by start_this_handle() for waiting until
journal->j_barrier_count drops to 0. The first case is clearly
irrelevant here since only jbd2 thread changes transaction state. The
second case looks related but jbd2_journal_unlock_updates() is
responsible for the wakeup in this case. So just drop the wakeup.

Reviewed-by: Theodore Ts'o
Signed-off-by: Jan Kara
Link: https://lore.kernel.org/r/20191105164437.32602-16-jack@suse.cz
Signed-off-by: Theodore Ts'o

Jan Kara
2019-11-06 05:00:48 +0800
150549ed2 jbd2: Drop pointless check from jbd2_journal_stop() ... Browse Code »

If a transaction is larger than journal->j_max_transaction_buffers, that
is a bug and not a trigger for transaction commit. Also the very next
attempt to start new handle will start transaction commit anyway. So
just remove the pointless check. Arguably, we could start transaction
commit whenever the transaction size is *close* to
journal->j_max_transaction_buffers. This has a potential to reduce
latency of the next jbd2_journal_start() at the cost of somewhat smaller
transactions. However for this to have any effect, it would mean that
there isn't someone already waiting in jbd2_journal_start() which means
metadata load for the fs is pretty light anyway so probably this
optimization is not worth it.

Reviewed-by: Theodore Ts'o
Signed-off-by: Jan Kara
Link: https://lore.kernel.org/r/20191105164437.32602-15-jack@suse.cz
Signed-off-by: Theodore Ts'o

Jan Kara
2019-11-06 05:00:48 +0800
dfaf5ffda jbd2: Reorganize jbd2_journal_stop() ... Browse Code »

Move code in jbd2_journal_stop() around a bit. It removes some
unnecessary code duplication and will make factoring out parts common
with jbd2__journal_restart() easier.

Reviewed-by: Theodore Ts'o
Signed-off-by: Jan Kara
Link: https://lore.kernel.org/r/20191105164437.32602-14-jack@suse.cz
Signed-off-by: Theodore Ts'o

Jan Kara
2019-11-06 05:00:48 +0800
015c60330 jbd2: Fix statistics for the number of logged blocks ... Browse Code »

jbd2 statistics counting number of blocks logged in a transaction was
wrong. It didn't count the commit block and more importantly it didn't
count revoke descriptor blocks. Make sure these get properly counted.

Reviewed-by: Theodore Ts'o
Signed-off-by: Jan Kara
Link: https://lore.kernel.org/r/20191105164437.32602-13-jack@suse.cz
Signed-off-by: Theodore Ts'o

Jan Kara
2019-11-06 05:00:48 +0800
b90bfdf58 jbd2: Completely fill journal descriptor blocks ... Browse Code »

With 32-bit block numbers, we don't allocate the array for journal
buffer heads large enough for corresponding descriptor tags to fill the
descriptor block. Thus we end up writing out half-full descriptor blocks
to the journal unnecessarily growing the transaction. Fix the logic to
allocate the array large enough.

Signed-off-by: Jan Kara
Link: https://lore.kernel.org/r/20191105164437.32602-3-jack@suse.cz
Signed-off-by: Theodore Ts'o

Jan Kara
2019-11-06 01:13:25 +0800
0db458894 jbd2: Fixup stale comment in commit code ... Browse Code »

jbd2_journal_next_log_block() does not look at
transaction->t_outstanding_credits. Remove the misleading comment.

Reviewed-by: Theodore Ts'o
Signed-off-by: Jan Kara
Link: https://lore.kernel.org/r/20191105164437.32602-2-jack@suse.cz
Signed-off-by: Theodore Ts'o

Jan Kara
2019-11-06 01:13:25 +0800

21 Oct, 2019

1 commit

7855a57d0 jbd2: Free journal head outside of locked region ... Browse Code »

On PREEMPT_RT bit-spinlocks have the same semantics as on PREEMPT_RT=n,
i.e. they disable preemption. That means functions which are not safe to be
called in preempt disabled context on RT trigger a might_sleep() assert.

The journal head bit spinlock is mostly held for short code sequences with
trivial RT safe functionality, except for one place:

jbd2_journal_put_journal_head() invokes __journal_remove_journal_head()
with the journal head bit spinlock held. __journal_remove_journal_head()
invokes kmem_cache_free() which must not be called with preemption disabled
on RT.

Jan suggested to rework the removal function so the actual free happens
outside the bit-spinlocked region.

Split it into two parts:

- Do the sanity checks and the buffer head detach under the lock

- Do the actual free after dropping the lock

There is error case handling in the free part which needs to dereference
the b_size field of the now detached buffer head. Due to paranoia (caused
by ignorance) the size is retrieved in the detach function and handed into
the free function. Might be over-engineered, but better safe than sorry.

This makes the journal head bit-spinlock usage RT compliant and also avoids
nested locking which is not covered by lockdep.

Suggested-by: Jan Kara
Signed-off-by: Thomas Gleixner
Cc: linux-ext4@vger.kernel.org
Cc: "Theodore Ts'o"
Cc: Jan Kara
Signed-off-by: Jan Kara
Link: https://lore.kernel.org/r/20190809124233.13277-8-jack@suse.cz
Signed-off-by: Theodore Ts'o

Thomas Gleixner
2019-10-21 21:16:46 +0800