Eric Lee / smarc-fsl-linux-kernel

03 Sep, 2020

2 commits

3a53d012b jbd2: abort journal if free a async write error metadata buffer ... Browse Code »

[ Upstream commit c044f3d8360d2ecf831ba2cc9f08cf9fb2c699fb ]

If we free a metadata buffer which has been failed to async write out
in the background, the jbd2 checkpoint procedure will not detect this
failure in jbd2_log_do_checkpoint(), so it may lead to filesystem
inconsistency after cleanup journal tail. This patch abort the journal
if free a buffer has write_io_error flag to prevent potential further
inconsistency.

Signed-off-by: zhangyi (F)
Link: https://lore.kernel.org/r/20200620025427.1756360-5-yi.zhang@huawei.com
Signed-off-by: Theodore Ts'o
Signed-off-by: Sasha Levin

zhangyi (F)
2020-09-03 17:26:56 +0800
40827caf9 jbd2: make sure jh have b_transaction set in refile/unfile_buffer ... Browse Code »

[ Upstream commit 24dc9864914eb5813173cfa53313fcd02e4aea7d ]

Callers of __jbd2_journal_unfile_buffer() and
__jbd2_journal_refile_buffer() assume that the b_transaction is set. In
fact if it's not, we can end up with journal_head refcounting errors
leading to crash much later that might be very hard to track down. Add
asserts to make sure that is the case.

We also make sure that b_next_transaction is NULL in
__jbd2_journal_unfile_buffer() since the callers expect that as well and
we should not get into that stage in this state anyway, leading to
problems later on if we do.

Tested with fstests.

Signed-off-by: Lukas Czerner
Reviewed-by: Jan Kara
Link: https://lore.kernel.org/r/20200617092549.6712-1-lczerner@redhat.com
Signed-off-by: Theodore Ts'o
Signed-off-by: Sasha Levin

Lukas Czerner
2020-09-03 17:26:56 +0800

26 Aug, 2020

1 commit

b4062a49a jbd2: add the missing unlock_buffer() in the error path of jbd2_write_superblock() ... Browse Code »

commit ef3f5830b859604eda8723c26d90ab23edc027a4 upstream.

jbd2_write_superblock() is under the buffer lock of journal superblock
before ending that superblock write, so add a missing unlock_buffer() in
in the error path before submitting buffer.

Fixes: 742b06b5628f ("jbd2: check superblock mapped prior to committing")
Signed-off-by: zhangyi (F)
Reviewed-by: Ritesh Harjani
Cc: stable@kernel.org
Link: https://lore.kernel.org/r/20200620061948.2049579-1-yi.zhang@huawei.com
Signed-off-by: Theodore Ts'o
Signed-off-by: Greg Kroah-Hartman

zhangyi (F)
2020-08-26 16:40:52 +0800

24 Jun, 2020

1 commit

0804b23d2 jbd2: clean __jbd2_journal_abort_hard() and __journal_abort_soft() ... Browse Code »

[ Upstream commit 7f6225e446cc8dfa4c3c7959a4de3dd03ec277bf ]

__jbd2_journal_abort_hard() is no longer used, so now we can merge
__jbd2_journal_abort_hard() and __journal_abort_soft() these two
functions into jbd2_journal_abort() and remove them.

Signed-off-by: zhangyi (F)
Reviewed-by: Jan Kara
Link: https://lore.kernel.org/r/20191204124614.45424-5-yi.zhang@huawei.com
Signed-off-by: Theodore Ts'o
Signed-off-by: Sasha Levin

zhangyi (F)
2020-06-24 23:50:48 +0800

21 Apr, 2020

1 commit

5815a5d70 jbd2: improve comments about freeing data buffers whose page mapping is NULL ... Browse Code »

commit 780f66e59231fcf882f36c63f287252ee47cc75a upstream.

Improve comments in jbd2_journal_commit_transaction() to describe why
we don't need to clear the buffer_mapped bit for freeing file mapping
buffers whose page mapping is NULL.

Link: https://lore.kernel.org/r/20200217112706.20085-1-yi.zhang@huawei.com
Fixes: c96dceeabf76 ("jbd2: do not clear the BH_Mapped flag when forgetting a metadata buffer")
Suggested-by: Jan Kara
Reviewed-by: Jan Kara
Signed-off-by: zhangyi (F)
Signed-off-by: Theodore Ts'o
Signed-off-by: Greg Kroah-Hartman

zhangyi (F)
2020-04-21 15:04:52 +0800

21 Mar, 2020

1 commit

dbce82922 jbd2: fix data races at struct journal_head ... Browse Code »

[ Upstream commit 6c5d911249290f41f7b50b43344a7520605b1acb ]

journal_head::b_transaction and journal_head::b_next_transaction could
be accessed concurrently as noticed by KCSAN,

LTP: starting fsync04
/dev/zero: Can't open blockdev
EXT4-fs (loop0): mounting ext3 file system using the ext4 subsystem
EXT4-fs (loop0): mounted filesystem with ordered data mode. Opts: (null)
==================================================================
BUG: KCSAN: data-race in __jbd2_journal_refile_buffer [jbd2] / jbd2_write_access_granted [jbd2]

write to 0xffff99f9b1bd0e30 of 8 bytes by task 25721 on cpu 70:
__jbd2_journal_refile_buffer+0xdd/0x210 [jbd2]
__jbd2_journal_refile_buffer at fs/jbd2/transaction.c:2569
jbd2_journal_commit_transaction+0x2d15/0x3f20 [jbd2]
(inlined by) jbd2_journal_commit_transaction at fs/jbd2/commit.c:1034
kjournald2+0x13b/0x450 [jbd2]
kthread+0x1cd/0x1f0
ret_from_fork+0x27/0x50

read to 0xffff99f9b1bd0e30 of 8 bytes by task 25724 on cpu 68:
jbd2_write_access_granted+0x1b2/0x250 [jbd2]
jbd2_write_access_granted at fs/jbd2/transaction.c:1155
jbd2_journal_get_write_access+0x2c/0x60 [jbd2]
__ext4_journal_get_write_access+0x50/0x90 [ext4]
ext4_mb_mark_diskspace_used+0x158/0x620 [ext4]
ext4_mb_new_blocks+0x54f/0xca0 [ext4]
ext4_ind_map_blocks+0xc79/0x1b40 [ext4]
ext4_map_blocks+0x3b4/0x950 [ext4]
_ext4_get_block+0xfc/0x270 [ext4]
ext4_get_block+0x3b/0x50 [ext4]
__block_write_begin_int+0x22e/0xae0
__block_write_begin+0x39/0x50
ext4_write_begin+0x388/0xb50 [ext4]
generic_perform_write+0x15d/0x290
ext4_buffered_write_iter+0x11f/0x210 [ext4]
ext4_file_write_iter+0xce/0x9e0 [ext4]
new_sync_write+0x29c/0x3b0
__vfs_write+0x92/0xa0
vfs_write+0x103/0x260
ksys_write+0x9d/0x130
__x64_sys_write+0x4c/0x60
do_syscall_64+0x91/0xb05
entry_SYSCALL_64_after_hwframe+0x49/0xbe

5 locks held by fsync04/25724:
#0: ffff99f9911093f8 (sb_writers#13){.+.+}, at: vfs_write+0x21c/0x260
#1: ffff99f9db4c0348 (&sb->s_type->i_mutex_key#15){+.+.}, at: ext4_buffered_write_iter+0x65/0x210 [ext4]
#2: ffff99f5e7dfcf58 (jbd2_handle){++++}, at: start_this_handle+0x1c1/0x9d0 [jbd2]
#3: ffff99f9db4c0168 (&ei->i_data_sem){++++}, at: ext4_map_blocks+0x176/0x950 [ext4]
#4: ffffffff99086b40 (rcu_read_lock){....}, at: jbd2_write_access_granted+0x4e/0x250 [jbd2]
irq event stamp: 1407125
hardirqs last enabled at (1407125): [] __find_get_block+0x107/0x790
hardirqs last disabled at (1407124): [] __find_get_block+0x49/0x790
softirqs last enabled at (1405528): [] __do_softirq+0x34c/0x57c
softirqs last disabled at (1405521): [] irq_exit+0xa2/0xc0

Reported by Kernel Concurrency Sanitizer on:
CPU: 68 PID: 25724 Comm: fsync04 Tainted: G L 5.6.0-rc2-next-20200221+ #7
Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019

The plain reads are outside of jh->b_state_lock critical section which result
in data races. Fix them by adding pairs of READ|WRITE_ONCE().

Reviewed-by: Jan Kara
Signed-off-by: Qian Cai
Link: https://lore.kernel.org/r/20200222043111.2227-1-cai@lca.pw
Signed-off-by: Theodore Ts'o
Signed-off-by: Sasha Levin

Qian Cai
2020-03-21 15:11:57 +0800

29 Feb, 2020

1 commit

453692eb5 jbd2: fix ocfs2 corrupt when clearing block group bits ... Browse Code »

commit 8eedabfd66b68a4623beec0789eac54b8c9d0fb6 upstream.

I found a NULL pointer dereference in ocfs2_block_group_clear_bits().
The running environment:
kernel version: 4.19
A cluster with two nodes, 5 luns mounted on two nodes, and do some
file operations like dd/fallocate/truncate/rm on every lun with storage
network disconnection.

The fallocate operation on dm-23-45 caused an null pointer dereference.

The information of NULL pointer dereference as follows:
[577992.878282] JBD2: Error -5 detected when updating journal superblock for dm-23-45.
[577992.878290] Aborting journal on device dm-23-45.
...
[577992.890778] JBD2: Error -5 detected when updating journal superblock for dm-24-46.
[577992.890908] __journal_remove_journal_head: freeing b_committed_data
[577992.890916] (fallocate,88392,52):ocfs2_extend_trans:474 ERROR: status = -30
[577992.890918] __journal_remove_journal_head: freeing b_committed_data
[577992.890920] (fallocate,88392,52):ocfs2_rotate_tree_right:2500 ERROR: status = -30
[577992.890922] __journal_remove_journal_head: freeing b_committed_data
[577992.890924] (fallocate,88392,52):ocfs2_do_insert_extent:4382 ERROR: status = -30
[577992.890928] (fallocate,88392,52):ocfs2_insert_extent:4842 ERROR: status = -30
[577992.890928] __journal_remove_journal_head: freeing b_committed_data
[577992.890930] (fallocate,88392,52):ocfs2_add_clusters_in_btree:4947 ERROR: status = -30
[577992.890933] __journal_remove_journal_head: freeing b_committed_data
[577992.890939] __journal_remove_journal_head: freeing b_committed_data
[577992.890949] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020
[577992.890950] Mem abort info:
[577992.890951] ESR = 0x96000004
[577992.890952] Exception class = DABT (current EL), IL = 32 bits
[577992.890952] SET = 0, FnV = 0
[577992.890953] EA = 0, S1PTW = 0
[577992.890954] Data abort info:
[577992.890955] ISV = 0, ISS = 0x00000004
[577992.890956] CM = 0, WnR = 0
[577992.890958] user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000f8da07a9
[577992.890960] [0000000000000020] pgd=0000000000000000
[577992.890964] Internal error: Oops: 96000004 [#1] SMP
[577992.890965] Process fallocate (pid: 88392, stack limit = 0x00000000013db2fd)
[577992.890968] CPU: 52 PID: 88392 Comm: fallocate Kdump: loaded Tainted: G W OE 4.19.36 #1
[577992.890969] Hardware name: Huawei TaiShan 2280 V2/BC82AMDD, BIOS 0.98 08/25/2019
[577992.890971] pstate: 60400009 (nZCv daif +PAN -UAO)
[577992.891054] pc : _ocfs2_free_suballoc_bits+0x63c/0x968 [ocfs2]
[577992.891082] lr : _ocfs2_free_suballoc_bits+0x618/0x968 [ocfs2]
[577992.891084] sp : ffff0000c8e2b810
[577992.891085] x29: ffff0000c8e2b820 x28: 0000000000000000
[577992.891087] x27: 00000000000006f3 x26: ffffa07957b02e70
[577992.891089] x25: ffff807c59d50000 x24: 00000000000006f2
[577992.891091] x23: 0000000000000001 x22: ffff807bd39abc30
[577992.891093] x21: ffff0000811d9000 x20: ffffa07535d6a000
[577992.891097] x19: ffff000001681638 x18: ffffffffffffffff
[577992.891098] x17: 0000000000000000 x16: ffff000080a03df0
[577992.891100] x15: ffff0000811d9708 x14: 203d207375746174
[577992.891101] x13: 73203a524f525245 x12: 20373439343a6565
[577992.891103] x11: 0000000000000038 x10: 0101010101010101
[577992.891106] x9 : ffffa07c68a85d70 x8 : 7f7f7f7f7f7f7f7f
[577992.891109] x7 : 0000000000000000 x6 : 0000000000000080
[577992.891110] x5 : 0000000000000000 x4 : 0000000000000002
[577992.891112] x3 : ffff000001713390 x2 : 2ff90f88b1c22f00
[577992.891114] x1 : ffff807bd39abc30 x0 : 0000000000000000
[577992.891116] Call trace:
[577992.891139] _ocfs2_free_suballoc_bits+0x63c/0x968 [ocfs2]
[577992.891162] _ocfs2_free_clusters+0x100/0x290 [ocfs2]
[577992.891185] ocfs2_free_clusters+0x50/0x68 [ocfs2]
[577992.891206] ocfs2_add_clusters_in_btree+0x198/0x5e0 [ocfs2]
[577992.891227] ocfs2_add_inode_data+0x94/0xc8 [ocfs2]
[577992.891248] ocfs2_extend_allocation+0x1bc/0x7a8 [ocfs2]
[577992.891269] ocfs2_allocate_extents+0x14c/0x338 [ocfs2]
[577992.891290] __ocfs2_change_file_space+0x3f8/0x610 [ocfs2]
[577992.891309] ocfs2_fallocate+0xe4/0x128 [ocfs2]
[577992.891316] vfs_fallocate+0x11c/0x250
[577992.891317] ksys_fallocate+0x54/0x88
[577992.891319] __arm64_sys_fallocate+0x28/0x38
[577992.891323] el0_svc_common+0x78/0x130
[577992.891325] el0_svc_handler+0x38/0x78
[577992.891327] el0_svc+0x8/0xc

My analysis process as follows:
ocfs2_fallocate
__ocfs2_change_file_space
ocfs2_allocate_extents
ocfs2_extend_allocation
ocfs2_add_inode_data
ocfs2_add_clusters_in_btree
ocfs2_insert_extent
ocfs2_do_insert_extent
ocfs2_rotate_tree_right
ocfs2_extend_rotate_transaction
ocfs2_extend_trans
jbd2_journal_restart
jbd2__journal_restart
/* handle->h_transaction is NULL,
* is_handle_aborted(handle) is true
*/
handle->h_transaction = NULL;
start_this_handle
return -EROFS;
ocfs2_free_clusters
_ocfs2_free_clusters
_ocfs2_free_suballoc_bits
ocfs2_block_group_clear_bits
ocfs2_journal_access_gd
__ocfs2_journal_access
jbd2_journal_get_undo_access
/* I think jbd2_write_access_granted() will
* return true, because do_get_write_access()
* will return -EROFS.
*/
if (jbd2_write_access_granted(...)) return 0;
do_get_write_access
/* handle->h_transaction is NULL, it will
* return -EROFS here, so do_get_write_access()
* was not called.
*/
if (is_handle_aborted(handle)) return -EROFS;
/* bh2jh(group_bh) is NULL, caused NULL
pointer dereference */
undo_bg = (struct ocfs2_group_desc *)
bh2jh(group_bh)->b_committed_data;

If handle->h_transaction == NULL, then jbd2_write_access_granted()
does not really guarantee that journal_head will stay around,
not even speaking of its b_committed_data. The bh2jh(group_bh)
can be removed after ocfs2_journal_access_gd() and before call
"bh2jh(group_bh)->b_committed_data". So, we should move
is_handle_aborted() check from do_get_write_access() into
jbd2_journal_get_undo_access() and jbd2_journal_get_write_access()
before the call to jbd2_write_access_granted().

Link: https://lore.kernel.org/r/f72a623f-b3f1-381a-d91d-d22a1c83a336@huawei.com
Signed-off-by: Yan Wang
Signed-off-by: Theodore Ts'o
Reviewed-by: Jun Piao
Reviewed-by: Jan Kara
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman

wangyan
2020-02-29 00:22:18 +0800

24 Feb, 2020

4 commits

411327180 jbd2: make sure ESHUTDOWN to be recorded in the journal superblock ... Browse Code »

[ Upstream commit 0e98c084a21177ef136149c6a293b3d1eb33ff92 ]

Commit fb7c02445c49 ("ext4: pass -ESHUTDOWN code to jbd2 layer") want
to allow jbd2 layer to distinguish shutdown journal abort from other
error cases. So the ESHUTDOWN should be taken precedence over any other
errno which has already been recoded after EXT4_FLAGS_SHUTDOWN is set,
but it only update errno in the journal suoerblock now if the old errno
is 0.

Fixes: fb7c02445c49 ("ext4: pass -ESHUTDOWN code to jbd2 layer")
Signed-off-by: zhangyi (F)
Reviewed-by: Jan Kara
Link: https://lore.kernel.org/r/20191204124614.45424-4-yi.zhang@huawei.com
Signed-off-by: Theodore Ts'o
Signed-off-by: Sasha Levin

zhangyi (F)
2020-02-24 15:36:57 +0800
314e25f4b jbd2: switch to use jbd2_journal_abort() when failed to submit the commit record ... Browse Code »

[ Upstream commit d0a186e0d3e7ac05cc77da7c157dae5aa59f95d9 ]

We invoke jbd2_journal_abort() to abort the journal and record errno
in the jbd2 superblock when committing journal transaction besides the
failure on submitting the commit record. But there is no need for the
case and we can also invoke jbd2_journal_abort() instead of
__jbd2_journal_abort_hard().

Fixes: 818d276ceb83a ("ext4: Add the journal checksum feature")
Signed-off-by: zhangyi (F)
Reviewed-by: Jan Kara
Link: https://lore.kernel.org/r/20191204124614.45424-2-yi.zhang@huawei.com
Signed-off-by: Theodore Ts'o
Signed-off-by: Sasha Levin

zhangyi (F)
2020-02-24 15:36:57 +0800
494c30b80 ext4, jbd2: ensure panic when aborting with zero errno ... Browse Code »

[ Upstream commit 51f57b01e4a3c7d7bdceffd84de35144e8c538e7 ]

JBD2_REC_ERR flag used to indicate the errno has been updated when jbd2
aborted, and then __ext4_abort() and ext4_handle_error() can invoke
panic if ERRORS_PANIC is specified. But if the journal has been aborted
with zero errno, jbd2_journal_abort() didn't set this flag so we can
no longer panic. Fix this by always record the proper errno in the
journal superblock.

Fixes: 4327ba52afd03 ("ext4, jbd2: ensure entering into panic after recording an error in superblock")
Signed-off-by: zhangyi (F)
Reviewed-by: Jan Kara
Link: https://lore.kernel.org/r/20191204124614.45424-3-yi.zhang@huawei.com
Signed-off-by: Theodore Ts'o
Signed-off-by: Sasha Levin

zhangyi (F)
2020-02-24 15:36:30 +0800
c98232007 jbd2: clear JBD2_ABORT flag before journal_reset to update log tail info when load journal ... Browse Code »

[ Upstream commit a09decff5c32060639a685581c380f51b14e1fc2 ]

If the journal is dirty when the filesystem is mounted, jbd2 will replay
the journal but the journal superblock will not be updated by
journal_reset() because JBD2_ABORT flag is still set (it was set in
journal_init_common()). This is problematic because when a new transaction
is then committed, it will be recorded in block 1 (journal->j_tail was set
to 1 in journal_reset()). If unclean shutdown happens again before the
journal superblock is updated, the new recorded transaction will not be
replayed during the next mount (because of stale sb->s_start and
sb->s_sequence values) which can lead to filesystem corruption.

Fixes: 85e0c4e89c1b ("jbd2: if the journal is aborted then don't allow update of the log tail")
Signed-off-by: Kai Li
Link: https://lore.kernel.org/r/20200111022542.5008-1-li.kai4@h3c.com
Signed-off-by: Theodore Ts'o
Signed-off-by: Sasha Levin

Kai Li
2020-02-24 15:36:28 +0800

20 Feb, 2020

2 commits

9275ae515 jbd2: do not clear the BH_Mapped flag when forgetting a metadata buffer ... Browse Code »

[ Upstream commit c96dceeabf765d0b1b1f29c3bf50a5c01315b820 ]

Commit 904cdbd41d74 ("jbd2: clear dirty flag when revoking a buffer from
an older transaction") set the BH_Freed flag when forgetting a metadata
buffer which belongs to the committing transaction, it indicate the
committing process clear dirty bits when it is done with the buffer. But
it also clear the BH_Mapped flag at the same time, which may trigger
below NULL pointer oops when block_size < PAGE_SIZE.

rmdir 1 kjournald2 mkdir 2
jbd2_journal_commit_transaction
commit transaction N
jbd2_journal_forget
set_buffer_freed(bh1)
jbd2_journal_commit_transaction
commit transaction N+1
...
clear_buffer_mapped(bh1)
ext4_getblk(bh2 ummapped)
...
grow_dev_page
init_page_buffers
bh1->b_private=NULL
bh2->b_private=NULL
jbd2_journal_put_journal_head(jh1)
__journal_remove_journal_head(hb1)
jh1 is NULL and trigger oops

*) Dir entry block bh1 and bh2 belongs to one page, and the bh2 has
already been unmapped.

For the metadata buffer we forgetting, we should always keep the mapped
flag and clear the dirty flags is enough, so this patch pick out the
these buffers and keep their BH_Mapped flag.

Link: https://lore.kernel.org/r/20200213063821.30455-3-yi.zhang@huawei.com
Fixes: 904cdbd41d74 ("jbd2: clear dirty flag when revoking a buffer from an older transaction")
Reviewed-by: Jan Kara
Signed-off-by: zhangyi (F)
Signed-off-by: Theodore Ts'o
Cc: stable@kernel.org
Signed-off-by: Sasha Levin

zhangyi (F)
2020-02-20 02:53:09 +0800
f09998f7a jbd2: move the clearing of b_modified flag to the journal_unmap_buffer() ... Browse Code »

[ Upstream commit 6a66a7ded12baa6ebbb2e3e82f8cb91382814839 ]

There is no need to delay the clearing of b_modified flag to the
transaction committing time when unmapping the journalled buffer, so
just move it to the journal_unmap_buffer().

Link: https://lore.kernel.org/r/20200213063821.30455-2-yi.zhang@huawei.com
Reviewed-by: Jan Kara
Signed-off-by: zhangyi (F)
Signed-off-by: Theodore Ts'o
Cc: stable@kernel.org
Signed-off-by: Sasha Levin

zhangyi (F)
2020-02-20 02:53:09 +0800

11 Feb, 2020

1 commit

b19f13026 jbd2_seq_info_next should increase position index ... Browse Code »

commit 1a8e9cf40c9a6a2e40b1e924b13ed303aeea4418 upstream.

if seq_file .next fuction does not change position index,
read after some lseek can generate unexpected output.

Script below generates endless output
$ q=;while read -r r;do echo "$((++q)) $r";done
Reviewed-by: Jan Kara
Link: https://lore.kernel.org/r/d13805e5-695e-8ac3-b678-26ca2313629f@virtuozzo.com
Signed-off-by: Theodore Ts'o
Signed-off-by: Greg Kroah-Hartman

Vasily Averin
2020-02-11 20:35:32 +0800

05 Jan, 2020

1 commit

9e2946bdd jbd2: Fix statistics for the number of logged blocks ... Browse Code »

[ Upstream commit 015c6033068208d6227612c878877919f3fcf6b6 ]

jbd2 statistics counting number of blocks logged in a transaction was
wrong. It didn't count the commit block and more importantly it didn't
count revoke descriptor blocks. Make sure these get properly counted.

Reviewed-by: Theodore Ts'o
Signed-off-by: Jan Kara
Link: https://lore.kernel.org/r/20191105164437.32602-13-jack@suse.cz
Signed-off-by: Theodore Ts'o
Signed-off-by: Sasha Levin

Jan Kara
2020-01-05 02:17:14 +0800

25 Sep, 2019

1 commit

963abb9ae jbd2: remove jbd2_journal_inode_add_[write|wait] ... Browse Code »

Since ext4/ocfs2 are using jbd2_inode dirty range scoping APIs now,
jbd2_journal_inode_add_[write|wait] are not used any more, remove them.

Link: http://lkml.kernel.org/r/1562977611-8412-2-git-send-email-joseph.qi@linux.alibaba.com
Signed-off-by: Joseph Qi
Reviewed-by: Ross Zwisler
Acked-by: Changwei Ge
Cc: Gang He
Cc: Joel Becker
Cc: Joseph Qi
Cc: Jun Piao
Cc: Junxiao Bi
Cc: Mark Fasheh
Cc: "Theodore Ts'o"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joseph Qi
2019-09-25 06:54:07 +0800

25 Aug, 2019

1 commit

4c273352b jbd2: add missing tracepoint for reserved handle ... Browse Code »

This issue was found when I use ebpf to trace every jbd2
handle's running info in dioread_nolock case.

Signed-off-by: Xiaoguang Wang
Signed-off-by: Theodore Ts'o

Xiaoguang Wang
2019-08-25 11:10:17 +0800

12 Aug, 2019

1 commit

547b9ad69 jbd2: flush_descriptor(): Do not decrease buffer head's ref count ... Browse Code »

When executing generic/388 on a ppc64le machine, we notice the following
call trace,

VFS: brelse: Trying to free free buffer
WARNING: CPU: 0 PID: 6637 at /root/repos/linux/fs/buffer.c:1195 __brelse+0x84/0xc0

Call Trace:
__brelse+0x80/0xc0 (unreliable)
invalidate_bh_lru+0x78/0xc0
on_each_cpu_mask+0xa8/0x130
on_each_cpu_cond_mask+0x130/0x170
invalidate_bh_lrus+0x44/0x60
invalidate_bdev+0x38/0x70
ext4_put_super+0x294/0x560
generic_shutdown_super+0xb0/0x170
kill_block_super+0x38/0xb0
deactivate_locked_super+0xa4/0xf0
cleanup_mnt+0x164/0x1d0
task_work_run+0x110/0x160
do_notify_resume+0x414/0x460
ret_from_except_lite+0x70/0x74

The warning happens because flush_descriptor() drops bh reference it
does not own. The bh reference acquired by
jbd2_journal_get_descriptor_buffer() is owned by the log_bufs list and
gets released when this list is processed. The reference for doing IO is
only acquired in write_dirty_buffer() later in flush_descriptor().

Reported-by: Harish Sriram
Reviewed-by: Jan Kara
Signed-off-by: Chandan Rajendra
Signed-off-by: Theodore Ts'o

Chandan Rajendra
2019-08-12 04:29:41 +0800

21 Jun, 2019

2 commits

9382cde8c jbd2: drop declaration of journal_sync_buffer() ... Browse Code »

The journal_sync_buffer() function was never carried over from jbd to
jbd2. So get rid of the vestigal declaration of this (non-existent)
function.

Signed-off-by: Theodore Ts'o
Reviewed-by: Darrick J. Wong

Theodore Ts'o
2019-06-21 05:32:21 +0800
6ba0e7dc6 jbd2: introduce jbd2_inode dirty range scoping ... Browse Code »

Currently both journal_submit_inode_data_buffers() and
journal_finish_inode_data_buffers() operate on the entire address space
of each of the inodes associated with a given journal entry. The
consequence of this is that if we have an inode where we are constantly
appending dirty pages we can end up waiting for an indefinite amount of
time in journal_finish_inode_data_buffers() while we wait for all the
pages under writeback to be written out.

The easiest way to cause this type of workload is do just dd from
/dev/zero to a file until it fills the entire filesystem. This can
cause journal_finish_inode_data_buffers() to wait for the duration of
the entire dd operation.

We can improve this situation by scoping each of the inode dirty ranges
associated with a given transaction. We do this via the jbd2_inode
structure so that the scoping is contained within jbd2 and so that it
follows the lifetime and locking rules for that structure.

This allows us to limit the writeback & wait in
journal_submit_inode_data_buffers() and
journal_finish_inode_data_buffers() respectively to the dirty range for
a given struct jdb2_inode, keeping us from waiting forever if the inode
in question is still being appended to.

Signed-off-by: Ross Zwisler
Signed-off-by: Theodore Ts'o
Reviewed-by: Jan Kara
Cc: stable@vger.kernel.org

Ross Zwisler
2019-06-21 05:24:56 +0800

31 May, 2019

2 commits

a49773064 jbd2: fix typo in comment of journal_submit_inode_data_buffers ... Browse Code »

delayed/dealyed

Signed-off-by: Liu Song
Signed-off-by: Theodore Ts'o
Reviewed-by: Jan Kara

Liu Song
2019-05-31 03:15:57 +0800
7821ce417 jbd2: fix some print format mistakes ... Browse Code »

There are some print format mistakes in debug messages. Fix them.

Signed-off-by: Gaowei Pu
Signed-off-by: Theodore Ts'o
Reviewed-by: Jan Kara

Gaowei Pu
2019-05-31 03:08:34 +0800

21 May, 2019

1 commit

ec8f24b7f treewide: Add SPDX license identifier - Makefile/Kconfig ... Browse Code »

Add SPDX license identifiers to all Make/Kconfig files which:

- Have no license information of any form

These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:

GPL-2.0-only

Signed-off-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2019-05-21 16:50:46 +0800

11 May, 2019

1 commit

0d52154bb jbd2: fix potential double free ... Browse Code »

When failing from creating cache jbd2_inode_cache, we will destroy the
previously created cache jbd2_handle_cache twice. This patch fixes
this by moving each cache initialization/destruction to its own
separate, individual function.

Signed-off-by: Chengguang Xu
Signed-off-by: Theodore Ts'o
Cc: stable@kernel.org

Chengguang Xu
2019-05-11 09:15:47 +0800

07 Apr, 2019

2 commits

742b06b56 jbd2: check superblock mapped prior to committing ... Browse Code »

We hit a BUG at fs/buffer.c:3057 if we detached the nbd device
before unmounting ext4 filesystem.

The typical chain of events leading to the BUG:
jbd2_write_superblock
submit_bh
submit_bh_wbc
BUG_ON(!buffer_mapped(bh));

The block device is removed and all the pages are invalidated. JBD2
was trying to write journal superblock to the block device which is
no longer present.

Fix this by checking the journal superblock's buffer head prior to
submitting.

Reported-by: Eric Ren
Signed-off-by: Jiufei Xue
Signed-off-by: Theodore Ts'o
Reviewed-by: Jan Kara
Cc: stable@kernel.org

Jiufei Xue
2019-04-07 06:57:40 +0800
fb2037510 jbd2: remove repeated assignments in __jbd2_log_wait_for_space() ... Browse Code »

At the beginning, nblocks has been assigned. There is no need
to repeat the assignment in the while loop, and remove it.

Signed-off-by: Liu Song
Signed-off-by: Theodore Ts'o
Reviewed-by: Jan Kara

Liu Song
2019-04-07 06:14:17 +0800

01 Mar, 2019

2 commits

0df6f4699 jbd2: jbd2_get_transaction does not need to return a value ... Browse Code »

In jbd2_get_transaction, a new transaction is initialized,
and set to the j_running_transaction. No need for a return
value, so remove it.

Also, adjust some comments to match the actual operation
of this function.

Signed-off-by: Liu Song
Signed-off-by: Theodore Ts'o
Reviewed-by: Jan Kara

Liu Song
2019-03-01 13:36:57 +0800
6e876c3dd jbd2: fix invalid descriptor block checksum ... Browse Code »

In jbd2_journal_commit_transaction(), if we are in abort mode,
we may flush the buffer without setting descriptor block checksum
by goto start_journal_io. Then fs is mounted,
jbd2_descriptor_block_csum_verify() failed.

[ 271.379811] EXT4-fs (vdd): shut down requested (2)
[ 271.381827] Aborting journal on device vdd-8.
[ 271.597136] JBD2: Invalid checksum recovering block 22199 in log
[ 271.598023] JBD2: recovery failed
[ 271.598484] EXT4-fs (vdd): error loading journal

Fix this problem by keep setting descriptor block checksum if the
descriptor buffer is not NULL.

This checksum problem can be reproduced by xfstests generic/388.

Signed-off-by: luojiajun
Signed-off-by: Theodore Ts'o
Reviewed-by: Jan Kara

luojiajun
2019-03-01 13:30:00 +0800

22 Feb, 2019

1 commit

01215d3ed jbd2: fix compile warning when using JBUFFER_TRACE ... Browse Code »

The jh pointer may be used uninitialized in the two cases below and the
compiler complain about it when enabling JBUFFER_TRACE macro, fix them.

In file included from fs/jbd2/transaction.c:19:0:
fs/jbd2/transaction.c: In function ‘jbd2_journal_get_undo_access’:
./include/linux/jbd2.h:1637:38: warning: ‘jh’ is used uninitialized in this function [-Wuninitialized]
#define JBUFFER_TRACE(jh, info) do { printk("%s: %d\n", __func__, jh->b_jcount);} while (0)
^
fs/jbd2/transaction.c:1219:23: note: ‘jh’ was declared here
struct journal_head *jh;
^
In file included from fs/jbd2/transaction.c:19:0:
fs/jbd2/transaction.c: In function ‘jbd2_journal_dirty_metadata’:
./include/linux/jbd2.h:1637:38: warning: ‘jh’ may be used uninitialized in this function [-Wmaybe-uninitialized]
#define JBUFFER_TRACE(jh, info) do { printk("%s: %d\n", __func__, jh->b_jcount);} while (0)
^
fs/jbd2/transaction.c:1332:23: note: ‘jh’ was declared here
struct journal_head *jh;
^

Signed-off-by: zhangyi (F)
Signed-off-by: Theodore Ts'o
Cc: stable@vger.kernel.org
Reviewed-by: Jan Kara

zhangyi (F)
2019-02-22 00:24:09 +0800

15 Feb, 2019

2 commits

a58ca9926 jbd2: fold jbd2_superblock_csum_{verify,set} into their callers ... Browse Code »

The functions jbd2_superblock_csum_verify() and
jbd2_superblock_csum_set() only get called from one location, so to
simplify things, fold them into their callers.

Signed-off-by: Theodore Ts'o

Theodore Ts'o
2019-02-15 05:28:14 +0800
538bcaa62 jbd2: fix race when writing superblock ... Browse Code »

The jbd2 superblock is lockless now, so there is probably a race
condition between writing it so disk and modifing contents of it, which
may lead to checksum error. The following race is the one case that we
have captured.

jbd2 fsstress
jbd2_journal_commit_transaction
jbd2_journal_update_sb_log_tail
jbd2_write_superblock
jbd2_superblock_csum_set jbd2_journal_revoke
jbd2_journal_set_features(revork)
modify superblock
submit_bh(checksum incorrect)

Fix this by locking the buffer head before modifing it. We always
write the jbd2 superblock after we modify it, so this just means
calling the lock_buffer() a little earlier.

This checksum corruption problem can be reproduced by xfstests
generic/475.

Reported-by: zhangyi (F)
Suggested-by: Jan Kara
Signed-off-by: Theodore Ts'o

Theodore Ts'o
2019-02-15 05:27:14 +0800

11 Feb, 2019

2 commits

597599268 jbd2: discard dirty data when forgetting an un-journalled buffer ... Browse Code »

We do not unmap and clear dirty flag when forgetting a buffer without
journal or does not belongs to any transaction, so the invalid dirty
data may still be written to the disk later. It's fine if the
corresponding block is never used before the next mount, and it's also
fine that we invoke clean_bdev_aliases() related functions to unmap
the block device mapping when re-allocating such freed block as data
block. But this logic is somewhat fragile and risky that may lead to
data corruption if we forget to clean bdev aliases. So, It's better to
discard dirty data during forget time.

We have been already handled all the cases of forgetting journalled
buffer, this patch deal with the remaining two cases.

- buffer is not journalled yet,
- buffer is journalled but doesn't belongs to any transaction.

We invoke __bforget() instead of __brelese() when forgetting an
un-journalled buffer in jbd2_journal_forget(). After this patch we can
remove all clean_bdev_aliases() related calls in ext4.

Suggested-by: Jan Kara
Signed-off-by: zhangyi (F)
Signed-off-by: Theodore Ts'o
Reviewed-by: Jan Kara

zhangyi (F)
2019-02-11 12:26:06 +0800
904cdbd41 jbd2: clear dirty flag when revoking a buffer from an older transaction ... Browse Code »

Now, we capture a data corruption problem on ext4 while we're truncating
an extent index block. Imaging that if we are revoking a buffer which
has been journaled by the committing transaction, the buffer's jbddirty
flag will not be cleared in jbd2_journal_forget(), so the commit code
will set the buffer dirty flag again after refile the buffer.

fsx kjournald2
jbd2_journal_commit_transaction
jbd2_journal_revoke commit phase 1~5...
jbd2_journal_forget
belongs to older transaction commit phase 6
jbddirty not clear __jbd2_journal_refile_buffer
__jbd2_journal_unfile_buffer
test_clear_buffer_jbddirty
mark_buffer_dirty

Finally, if the freed extent index block was allocated again as data
block by some other files, it may corrupt the file data after writing
cached pages later, such as during unmount time. (In general,
clean_bdev_aliases() related helpers should be invoked after
re-allocation to prevent the above corruption, but unfortunately we
missed it when zeroout the head of extra extent blocks in
ext4_ext_handle_unwritten_extents()).

This patch mark buffer as freed and set j_next_transaction to the new
transaction when it already belongs to the committing transaction in
jbd2_journal_forget(), so that commit code knows it should clear dirty
bits when it is done with the buffer.

This problem can be reproduced by xfstests generic/455 easily with
seeds (3246 3247 3248 3249).

Signed-off-by: zhangyi (F)
Signed-off-by: Theodore Ts'o
Reviewed-by: Jan Kara
Cc: stable@vger.kernel.org

zhangyi (F)
2019-02-11 12:23:04 +0800

01 Feb, 2019

1 commit

53cf97845 jbd2: fix deadlock while checkpoint thread waits commit thread to finish ... Browse Code »

This issue was found when I tried to put checkpoint work in a separate thread,
the deadlock below happened:
Thread1 | Thread2
__jbd2_log_wait_for_space |
jbd2_log_do_checkpoint (hold j_checkpoint_mutex)|
if (jh->b_transaction != NULL) |
... |
jbd2_log_start_commit(journal, tid); |jbd2_update_log_tail
| will lock j_checkpoint_mutex,
| but will be blocked here.
|
jbd2_log_wait_commit(journal, tid); |
wait_event(journal->j_wait_done_commit, |
!tid_gt(tid, journal->j_commit_sequence)); |
... |wake_up(j_wait_done_commit)
} |

then deadlock occurs, Thread1 will never be waken up.

To fix this issue, drop j_checkpoint_mutex in jbd2_log_do_checkpoint()
when we are going to wait for transaction commit.

Reviewed-by: Jan Kara
Signed-off-by: Xiaoguang Wang
Signed-off-by: Theodore Ts'o

Xiaoguang Wang
2019-02-01 12:42:11 +0800

04 Dec, 2018

2 commits

561405f03 jbd2: clean up indentation issue, replace spaces with tab ... Browse Code »

There is a statement that is indented with spaces, replace it with
a tab.

Reviewed-by: Jan Kara
Signed-off-by: Colin Ian King
Signed-off-by: Theodore Ts'o

Colin Ian King
2018-12-04 13:20:10 +0800
96f1e0974 jbd2: avoid long hold times of j_state_lock while committing a transaction ... Browse Code »

We can hold j_state_lock for writing at the beginning of
jbd2_journal_commit_transaction() for a rather long time (reportedly for
30 ms) due cleaning revoke bits of all revoked buffers under it. The
handling of revoke tables as well as cleaning of t_reserved_list, and
checkpoint lists does not need j_state_lock for anything. It is only
needed to prevent new handles from joining the transaction. Generally
T_LOCKED transaction state prevents new handles from joining the
transaction - except for reserved handles which have to allowed to join
while we wait for other handles to complete.

To prevent reserved handles from joining the transaction while cleaning
up lists, add new transaction state T_SWITCH and watch for it when
starting reserved handles. With this we can just drop the lock for
operations that don't need it.

Reported-and-tested-by: Adrian Hunter
Suggested-by: "Theodore Y. Ts'o"
Signed-off-by: Jan Kara
Signed-off-by: Theodore Ts'o

Jan Kara
2018-12-04 12:16:07 +0800

06 Oct, 2018

1 commit

ccd3c4373 jbd2: fix use after free in jbd2_log_do_checkpoint() ... Browse Code »

The code cleaning transaction's lists of checkpoint buffers has a bug
where it increases bh refcount only after releasing
journal->j_list_lock. Thus the following race is possible:

CPU0 CPU1
jbd2_log_do_checkpoint()
jbd2_journal_try_to_free_buffers()
__journal_try_to_free_buffer(bh)
...
while (transaction->t_checkpoint_io_list)
...
if (buffer_locked(bh)) {

spin_unlock(&journal->j_list_lock);
spin_lock(&journal->j_list_lock);
__jbd2_journal_remove_checkpoint(jh);
spin_unlock(&journal->j_list_lock);
try_to_free_buffers(page);
get_bh(bh) j_list_lock.

Fixes: dc6e8d669cf5 ("jbd2: don't call get_bh() before calling __jbd2_journal_remove_checkpoint()")
Fixes: be1158cc615f ("jbd2: fold __process_buffer() into jbd2_log_do_checkpoint()")
Reported-by: syzbot+7f4a27091759e2fe7453@syzkaller.appspotmail.com
CC: stable@vger.kernel.org
Reviewed-by: Lukas Czerner
Signed-off-by: Jan Kara
Signed-off-by: Theodore Ts'o

Jan Kara
2018-10-06 06:44:40 +0800

30 Jul, 2018

1 commit

b42d1d6b5 jbd2: replace current_kernel_time64 with ktime equivalent ... Browse Code »

jbd2 is one of the few callers of current_kernel_time64(), which
is a wrapper around ktime_get_coarse_real_ts64(). This calls the
latter directly for consistency with the rest of the kernel that
is moving to the ktime_get_ family of time accessors.

Reviewed-by: Andreas Dilger
Reviewed-by: Jan Kara
Signed-off-by: Arnd Bergmann
Signed-off-by: Theodore Ts'o

Arnd Bergmann
2018-07-30 03:51:47 +0800

09 Jul, 2018

1 commit

70a2dc6ab Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 ... Browse Code »

Pull ext4 bugfixes from Ted Ts'o:
"Bug fixes for ext4; most of which relate to vulnerabilities where a
maliciously crafted file system image can result in a kernel OOPS or
hang.

At least one fix addresses an inline data bug could be triggered by
userspace without the need of a crafted file system (although it does
require that the inline data feature be enabled)"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: check superblock mapped prior to committing
ext4: add more mount time checks of the superblock
ext4: add more inode number paranoia checks
ext4: avoid running out of journal credits when appending to an inline file
jbd2: don't mark block as modified if the handle is out of credits
ext4: never move the system.data xattr out of the inode body
ext4: clear i_data in ext4_inode_info when removing inline data
ext4: include the illegal physical block in the bad map ext4_error msg
ext4: verify the depth of extent tree in ext4_find_extent()
ext4: only look at the bg_flags field if it is valid
ext4: make sure bitmaps and the inode table don't overlap with bg descriptors
ext4: always check block group bounds in ext4_init_block_bitmap()
ext4: always verify the magic number in xattr blocks
ext4: add corruption check in ext4_xattr_set_entry()
ext4: add warn_on_error mount option

Linus Torvalds
2018-07-09 02:10:30 +0800

17 Jun, 2018

1 commit

e09463f22 jbd2: don't mark block as modified if the handle is out of credits ... Browse Code »

Do not set the b_modified flag in block's journal head should not
until after we're sure that jbd2_journal_dirty_metadat() will not
abort with an error due to there not being enough space reserved in
the jbd2 handle.

Otherwise, future attempts to modify the buffer may lead a large
number of spurious errors and warnings.

This addresses CVE-2018-10883.

https://bugzilla.kernel.org/show_bug.cgi?id=200071

Signed-off-by: Theodore Ts'o
Cc: stable@kernel.org

Theodore Ts'o
2018-06-17 08:21:45 +0800