03 Sep, 2020

2 commits

  • [ Upstream commit c044f3d8360d2ecf831ba2cc9f08cf9fb2c699fb ]

    If we free a metadata buffer which has been failed to async write out
    in the background, the jbd2 checkpoint procedure will not detect this
    failure in jbd2_log_do_checkpoint(), so it may lead to filesystem
    inconsistency after cleanup journal tail. This patch abort the journal
    if free a buffer has write_io_error flag to prevent potential further
    inconsistency.

    Signed-off-by: zhangyi (F)
    Link: https://lore.kernel.org/r/20200620025427.1756360-5-yi.zhang@huawei.com
    Signed-off-by: Theodore Ts'o
    Signed-off-by: Sasha Levin

    zhangyi (F)
     
  • [ Upstream commit 24dc9864914eb5813173cfa53313fcd02e4aea7d ]

    Callers of __jbd2_journal_unfile_buffer() and
    __jbd2_journal_refile_buffer() assume that the b_transaction is set. In
    fact if it's not, we can end up with journal_head refcounting errors
    leading to crash much later that might be very hard to track down. Add
    asserts to make sure that is the case.

    We also make sure that b_next_transaction is NULL in
    __jbd2_journal_unfile_buffer() since the callers expect that as well and
    we should not get into that stage in this state anyway, leading to
    problems later on if we do.

    Tested with fstests.

    Signed-off-by: Lukas Czerner
    Reviewed-by: Jan Kara
    Link: https://lore.kernel.org/r/20200617092549.6712-1-lczerner@redhat.com
    Signed-off-by: Theodore Ts'o
    Signed-off-by: Sasha Levin

    Lukas Czerner
     

26 Aug, 2020

1 commit

  • commit ef3f5830b859604eda8723c26d90ab23edc027a4 upstream.

    jbd2_write_superblock() is under the buffer lock of journal superblock
    before ending that superblock write, so add a missing unlock_buffer() in
    in the error path before submitting buffer.

    Fixes: 742b06b5628f ("jbd2: check superblock mapped prior to committing")
    Signed-off-by: zhangyi (F)
    Reviewed-by: Ritesh Harjani
    Cc: stable@kernel.org
    Link: https://lore.kernel.org/r/20200620061948.2049579-1-yi.zhang@huawei.com
    Signed-off-by: Theodore Ts'o
    Signed-off-by: Greg Kroah-Hartman

    zhangyi (F)
     

24 Jun, 2020

1 commit

  • [ Upstream commit 7f6225e446cc8dfa4c3c7959a4de3dd03ec277bf ]

    __jbd2_journal_abort_hard() is no longer used, so now we can merge
    __jbd2_journal_abort_hard() and __journal_abort_soft() these two
    functions into jbd2_journal_abort() and remove them.

    Signed-off-by: zhangyi (F)
    Reviewed-by: Jan Kara
    Link: https://lore.kernel.org/r/20191204124614.45424-5-yi.zhang@huawei.com
    Signed-off-by: Theodore Ts'o
    Signed-off-by: Sasha Levin

    zhangyi (F)
     

21 Apr, 2020

1 commit

  • commit 780f66e59231fcf882f36c63f287252ee47cc75a upstream.

    Improve comments in jbd2_journal_commit_transaction() to describe why
    we don't need to clear the buffer_mapped bit for freeing file mapping
    buffers whose page mapping is NULL.

    Link: https://lore.kernel.org/r/20200217112706.20085-1-yi.zhang@huawei.com
    Fixes: c96dceeabf76 ("jbd2: do not clear the BH_Mapped flag when forgetting a metadata buffer")
    Suggested-by: Jan Kara
    Reviewed-by: Jan Kara
    Signed-off-by: zhangyi (F)
    Signed-off-by: Theodore Ts'o
    Signed-off-by: Greg Kroah-Hartman

    zhangyi (F)
     

21 Mar, 2020

1 commit

  • [ Upstream commit 6c5d911249290f41f7b50b43344a7520605b1acb ]

    journal_head::b_transaction and journal_head::b_next_transaction could
    be accessed concurrently as noticed by KCSAN,

    LTP: starting fsync04
    /dev/zero: Can't open blockdev
    EXT4-fs (loop0): mounting ext3 file system using the ext4 subsystem
    EXT4-fs (loop0): mounted filesystem with ordered data mode. Opts: (null)
    ==================================================================
    BUG: KCSAN: data-race in __jbd2_journal_refile_buffer [jbd2] / jbd2_write_access_granted [jbd2]

    write to 0xffff99f9b1bd0e30 of 8 bytes by task 25721 on cpu 70:
    __jbd2_journal_refile_buffer+0xdd/0x210 [jbd2]
    __jbd2_journal_refile_buffer at fs/jbd2/transaction.c:2569
    jbd2_journal_commit_transaction+0x2d15/0x3f20 [jbd2]
    (inlined by) jbd2_journal_commit_transaction at fs/jbd2/commit.c:1034
    kjournald2+0x13b/0x450 [jbd2]
    kthread+0x1cd/0x1f0
    ret_from_fork+0x27/0x50

    read to 0xffff99f9b1bd0e30 of 8 bytes by task 25724 on cpu 68:
    jbd2_write_access_granted+0x1b2/0x250 [jbd2]
    jbd2_write_access_granted at fs/jbd2/transaction.c:1155
    jbd2_journal_get_write_access+0x2c/0x60 [jbd2]
    __ext4_journal_get_write_access+0x50/0x90 [ext4]
    ext4_mb_mark_diskspace_used+0x158/0x620 [ext4]
    ext4_mb_new_blocks+0x54f/0xca0 [ext4]
    ext4_ind_map_blocks+0xc79/0x1b40 [ext4]
    ext4_map_blocks+0x3b4/0x950 [ext4]
    _ext4_get_block+0xfc/0x270 [ext4]
    ext4_get_block+0x3b/0x50 [ext4]
    __block_write_begin_int+0x22e/0xae0
    __block_write_begin+0x39/0x50
    ext4_write_begin+0x388/0xb50 [ext4]
    generic_perform_write+0x15d/0x290
    ext4_buffered_write_iter+0x11f/0x210 [ext4]
    ext4_file_write_iter+0xce/0x9e0 [ext4]
    new_sync_write+0x29c/0x3b0
    __vfs_write+0x92/0xa0
    vfs_write+0x103/0x260
    ksys_write+0x9d/0x130
    __x64_sys_write+0x4c/0x60
    do_syscall_64+0x91/0xb05
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    5 locks held by fsync04/25724:
    #0: ffff99f9911093f8 (sb_writers#13){.+.+}, at: vfs_write+0x21c/0x260
    #1: ffff99f9db4c0348 (&sb->s_type->i_mutex_key#15){+.+.}, at: ext4_buffered_write_iter+0x65/0x210 [ext4]
    #2: ffff99f5e7dfcf58 (jbd2_handle){++++}, at: start_this_handle+0x1c1/0x9d0 [jbd2]
    #3: ffff99f9db4c0168 (&ei->i_data_sem){++++}, at: ext4_map_blocks+0x176/0x950 [ext4]
    #4: ffffffff99086b40 (rcu_read_lock){....}, at: jbd2_write_access_granted+0x4e/0x250 [jbd2]
    irq event stamp: 1407125
    hardirqs last enabled at (1407125): [] __find_get_block+0x107/0x790
    hardirqs last disabled at (1407124): [] __find_get_block+0x49/0x790
    softirqs last enabled at (1405528): [] __do_softirq+0x34c/0x57c
    softirqs last disabled at (1405521): [] irq_exit+0xa2/0xc0

    Reported by Kernel Concurrency Sanitizer on:
    CPU: 68 PID: 25724 Comm: fsync04 Tainted: G L 5.6.0-rc2-next-20200221+ #7
    Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019

    The plain reads are outside of jh->b_state_lock critical section which result
    in data races. Fix them by adding pairs of READ|WRITE_ONCE().

    Reviewed-by: Jan Kara
    Signed-off-by: Qian Cai
    Link: https://lore.kernel.org/r/20200222043111.2227-1-cai@lca.pw
    Signed-off-by: Theodore Ts'o
    Signed-off-by: Sasha Levin

    Qian Cai
     

29 Feb, 2020

1 commit

  • commit 8eedabfd66b68a4623beec0789eac54b8c9d0fb6 upstream.

    I found a NULL pointer dereference in ocfs2_block_group_clear_bits().
    The running environment:
    kernel version: 4.19
    A cluster with two nodes, 5 luns mounted on two nodes, and do some
    file operations like dd/fallocate/truncate/rm on every lun with storage
    network disconnection.

    The fallocate operation on dm-23-45 caused an null pointer dereference.

    The information of NULL pointer dereference as follows:
    [577992.878282] JBD2: Error -5 detected when updating journal superblock for dm-23-45.
    [577992.878290] Aborting journal on device dm-23-45.
    ...
    [577992.890778] JBD2: Error -5 detected when updating journal superblock for dm-24-46.
    [577992.890908] __journal_remove_journal_head: freeing b_committed_data
    [577992.890916] (fallocate,88392,52):ocfs2_extend_trans:474 ERROR: status = -30
    [577992.890918] __journal_remove_journal_head: freeing b_committed_data
    [577992.890920] (fallocate,88392,52):ocfs2_rotate_tree_right:2500 ERROR: status = -30
    [577992.890922] __journal_remove_journal_head: freeing b_committed_data
    [577992.890924] (fallocate,88392,52):ocfs2_do_insert_extent:4382 ERROR: status = -30
    [577992.890928] (fallocate,88392,52):ocfs2_insert_extent:4842 ERROR: status = -30
    [577992.890928] __journal_remove_journal_head: freeing b_committed_data
    [577992.890930] (fallocate,88392,52):ocfs2_add_clusters_in_btree:4947 ERROR: status = -30
    [577992.890933] __journal_remove_journal_head: freeing b_committed_data
    [577992.890939] __journal_remove_journal_head: freeing b_committed_data
    [577992.890949] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020
    [577992.890950] Mem abort info:
    [577992.890951] ESR = 0x96000004
    [577992.890952] Exception class = DABT (current EL), IL = 32 bits
    [577992.890952] SET = 0, FnV = 0
    [577992.890953] EA = 0, S1PTW = 0
    [577992.890954] Data abort info:
    [577992.890955] ISV = 0, ISS = 0x00000004
    [577992.890956] CM = 0, WnR = 0
    [577992.890958] user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000f8da07a9
    [577992.890960] [0000000000000020] pgd=0000000000000000
    [577992.890964] Internal error: Oops: 96000004 [#1] SMP
    [577992.890965] Process fallocate (pid: 88392, stack limit = 0x00000000013db2fd)
    [577992.890968] CPU: 52 PID: 88392 Comm: fallocate Kdump: loaded Tainted: G W OE 4.19.36 #1
    [577992.890969] Hardware name: Huawei TaiShan 2280 V2/BC82AMDD, BIOS 0.98 08/25/2019
    [577992.890971] pstate: 60400009 (nZCv daif +PAN -UAO)
    [577992.891054] pc : _ocfs2_free_suballoc_bits+0x63c/0x968 [ocfs2]
    [577992.891082] lr : _ocfs2_free_suballoc_bits+0x618/0x968 [ocfs2]
    [577992.891084] sp : ffff0000c8e2b810
    [577992.891085] x29: ffff0000c8e2b820 x28: 0000000000000000
    [577992.891087] x27: 00000000000006f3 x26: ffffa07957b02e70
    [577992.891089] x25: ffff807c59d50000 x24: 00000000000006f2
    [577992.891091] x23: 0000000000000001 x22: ffff807bd39abc30
    [577992.891093] x21: ffff0000811d9000 x20: ffffa07535d6a000
    [577992.891097] x19: ffff000001681638 x18: ffffffffffffffff
    [577992.891098] x17: 0000000000000000 x16: ffff000080a03df0
    [577992.891100] x15: ffff0000811d9708 x14: 203d207375746174
    [577992.891101] x13: 73203a524f525245 x12: 20373439343a6565
    [577992.891103] x11: 0000000000000038 x10: 0101010101010101
    [577992.891106] x9 : ffffa07c68a85d70 x8 : 7f7f7f7f7f7f7f7f
    [577992.891109] x7 : 0000000000000000 x6 : 0000000000000080
    [577992.891110] x5 : 0000000000000000 x4 : 0000000000000002
    [577992.891112] x3 : ffff000001713390 x2 : 2ff90f88b1c22f00
    [577992.891114] x1 : ffff807bd39abc30 x0 : 0000000000000000
    [577992.891116] Call trace:
    [577992.891139] _ocfs2_free_suballoc_bits+0x63c/0x968 [ocfs2]
    [577992.891162] _ocfs2_free_clusters+0x100/0x290 [ocfs2]
    [577992.891185] ocfs2_free_clusters+0x50/0x68 [ocfs2]
    [577992.891206] ocfs2_add_clusters_in_btree+0x198/0x5e0 [ocfs2]
    [577992.891227] ocfs2_add_inode_data+0x94/0xc8 [ocfs2]
    [577992.891248] ocfs2_extend_allocation+0x1bc/0x7a8 [ocfs2]
    [577992.891269] ocfs2_allocate_extents+0x14c/0x338 [ocfs2]
    [577992.891290] __ocfs2_change_file_space+0x3f8/0x610 [ocfs2]
    [577992.891309] ocfs2_fallocate+0xe4/0x128 [ocfs2]
    [577992.891316] vfs_fallocate+0x11c/0x250
    [577992.891317] ksys_fallocate+0x54/0x88
    [577992.891319] __arm64_sys_fallocate+0x28/0x38
    [577992.891323] el0_svc_common+0x78/0x130
    [577992.891325] el0_svc_handler+0x38/0x78
    [577992.891327] el0_svc+0x8/0xc

    My analysis process as follows:
    ocfs2_fallocate
    __ocfs2_change_file_space
    ocfs2_allocate_extents
    ocfs2_extend_allocation
    ocfs2_add_inode_data
    ocfs2_add_clusters_in_btree
    ocfs2_insert_extent
    ocfs2_do_insert_extent
    ocfs2_rotate_tree_right
    ocfs2_extend_rotate_transaction
    ocfs2_extend_trans
    jbd2_journal_restart
    jbd2__journal_restart
    /* handle->h_transaction is NULL,
    * is_handle_aborted(handle) is true
    */
    handle->h_transaction = NULL;
    start_this_handle
    return -EROFS;
    ocfs2_free_clusters
    _ocfs2_free_clusters
    _ocfs2_free_suballoc_bits
    ocfs2_block_group_clear_bits
    ocfs2_journal_access_gd
    __ocfs2_journal_access
    jbd2_journal_get_undo_access
    /* I think jbd2_write_access_granted() will
    * return true, because do_get_write_access()
    * will return -EROFS.
    */
    if (jbd2_write_access_granted(...)) return 0;
    do_get_write_access
    /* handle->h_transaction is NULL, it will
    * return -EROFS here, so do_get_write_access()
    * was not called.
    */
    if (is_handle_aborted(handle)) return -EROFS;
    /* bh2jh(group_bh) is NULL, caused NULL
    pointer dereference */
    undo_bg = (struct ocfs2_group_desc *)
    bh2jh(group_bh)->b_committed_data;

    If handle->h_transaction == NULL, then jbd2_write_access_granted()
    does not really guarantee that journal_head will stay around,
    not even speaking of its b_committed_data. The bh2jh(group_bh)
    can be removed after ocfs2_journal_access_gd() and before call
    "bh2jh(group_bh)->b_committed_data". So, we should move
    is_handle_aborted() check from do_get_write_access() into
    jbd2_journal_get_undo_access() and jbd2_journal_get_write_access()
    before the call to jbd2_write_access_granted().

    Link: https://lore.kernel.org/r/f72a623f-b3f1-381a-d91d-d22a1c83a336@huawei.com
    Signed-off-by: Yan Wang
    Signed-off-by: Theodore Ts'o
    Reviewed-by: Jun Piao
    Reviewed-by: Jan Kara
    Cc: stable@kernel.org
    Signed-off-by: Greg Kroah-Hartman

    wangyan
     

24 Feb, 2020

4 commits

  • [ Upstream commit 0e98c084a21177ef136149c6a293b3d1eb33ff92 ]

    Commit fb7c02445c49 ("ext4: pass -ESHUTDOWN code to jbd2 layer") want
    to allow jbd2 layer to distinguish shutdown journal abort from other
    error cases. So the ESHUTDOWN should be taken precedence over any other
    errno which has already been recoded after EXT4_FLAGS_SHUTDOWN is set,
    but it only update errno in the journal suoerblock now if the old errno
    is 0.

    Fixes: fb7c02445c49 ("ext4: pass -ESHUTDOWN code to jbd2 layer")
    Signed-off-by: zhangyi (F)
    Reviewed-by: Jan Kara
    Link: https://lore.kernel.org/r/20191204124614.45424-4-yi.zhang@huawei.com
    Signed-off-by: Theodore Ts'o
    Signed-off-by: Sasha Levin

    zhangyi (F)
     
  • [ Upstream commit d0a186e0d3e7ac05cc77da7c157dae5aa59f95d9 ]

    We invoke jbd2_journal_abort() to abort the journal and record errno
    in the jbd2 superblock when committing journal transaction besides the
    failure on submitting the commit record. But there is no need for the
    case and we can also invoke jbd2_journal_abort() instead of
    __jbd2_journal_abort_hard().

    Fixes: 818d276ceb83a ("ext4: Add the journal checksum feature")
    Signed-off-by: zhangyi (F)
    Reviewed-by: Jan Kara
    Link: https://lore.kernel.org/r/20191204124614.45424-2-yi.zhang@huawei.com
    Signed-off-by: Theodore Ts'o
    Signed-off-by: Sasha Levin

    zhangyi (F)
     
  • [ Upstream commit 51f57b01e4a3c7d7bdceffd84de35144e8c538e7 ]

    JBD2_REC_ERR flag used to indicate the errno has been updated when jbd2
    aborted, and then __ext4_abort() and ext4_handle_error() can invoke
    panic if ERRORS_PANIC is specified. But if the journal has been aborted
    with zero errno, jbd2_journal_abort() didn't set this flag so we can
    no longer panic. Fix this by always record the proper errno in the
    journal superblock.

    Fixes: 4327ba52afd03 ("ext4, jbd2: ensure entering into panic after recording an error in superblock")
    Signed-off-by: zhangyi (F)
    Reviewed-by: Jan Kara
    Link: https://lore.kernel.org/r/20191204124614.45424-3-yi.zhang@huawei.com
    Signed-off-by: Theodore Ts'o
    Signed-off-by: Sasha Levin

    zhangyi (F)
     
  • [ Upstream commit a09decff5c32060639a685581c380f51b14e1fc2 ]

    If the journal is dirty when the filesystem is mounted, jbd2 will replay
    the journal but the journal superblock will not be updated by
    journal_reset() because JBD2_ABORT flag is still set (it was set in
    journal_init_common()). This is problematic because when a new transaction
    is then committed, it will be recorded in block 1 (journal->j_tail was set
    to 1 in journal_reset()). If unclean shutdown happens again before the
    journal superblock is updated, the new recorded transaction will not be
    replayed during the next mount (because of stale sb->s_start and
    sb->s_sequence values) which can lead to filesystem corruption.

    Fixes: 85e0c4e89c1b ("jbd2: if the journal is aborted then don't allow update of the log tail")
    Signed-off-by: Kai Li
    Link: https://lore.kernel.org/r/20200111022542.5008-1-li.kai4@h3c.com
    Signed-off-by: Theodore Ts'o
    Signed-off-by: Sasha Levin

    Kai Li
     

20 Feb, 2020

2 commits

  • [ Upstream commit c96dceeabf765d0b1b1f29c3bf50a5c01315b820 ]

    Commit 904cdbd41d74 ("jbd2: clear dirty flag when revoking a buffer from
    an older transaction") set the BH_Freed flag when forgetting a metadata
    buffer which belongs to the committing transaction, it indicate the
    committing process clear dirty bits when it is done with the buffer. But
    it also clear the BH_Mapped flag at the same time, which may trigger
    below NULL pointer oops when block_size < PAGE_SIZE.

    rmdir 1 kjournald2 mkdir 2
    jbd2_journal_commit_transaction
    commit transaction N
    jbd2_journal_forget
    set_buffer_freed(bh1)
    jbd2_journal_commit_transaction
    commit transaction N+1
    ...
    clear_buffer_mapped(bh1)
    ext4_getblk(bh2 ummapped)
    ...
    grow_dev_page
    init_page_buffers
    bh1->b_private=NULL
    bh2->b_private=NULL
    jbd2_journal_put_journal_head(jh1)
    __journal_remove_journal_head(hb1)
    jh1 is NULL and trigger oops

    *) Dir entry block bh1 and bh2 belongs to one page, and the bh2 has
    already been unmapped.

    For the metadata buffer we forgetting, we should always keep the mapped
    flag and clear the dirty flags is enough, so this patch pick out the
    these buffers and keep their BH_Mapped flag.

    Link: https://lore.kernel.org/r/20200213063821.30455-3-yi.zhang@huawei.com
    Fixes: 904cdbd41d74 ("jbd2: clear dirty flag when revoking a buffer from an older transaction")
    Reviewed-by: Jan Kara
    Signed-off-by: zhangyi (F)
    Signed-off-by: Theodore Ts'o
    Cc: stable@kernel.org
    Signed-off-by: Sasha Levin

    zhangyi (F)
     
  • [ Upstream commit 6a66a7ded12baa6ebbb2e3e82f8cb91382814839 ]

    There is no need to delay the clearing of b_modified flag to the
    transaction committing time when unmapping the journalled buffer, so
    just move it to the journal_unmap_buffer().

    Link: https://lore.kernel.org/r/20200213063821.30455-2-yi.zhang@huawei.com
    Reviewed-by: Jan Kara
    Signed-off-by: zhangyi (F)
    Signed-off-by: Theodore Ts'o
    Cc: stable@kernel.org
    Signed-off-by: Sasha Levin

    zhangyi (F)
     

11 Feb, 2020

1 commit

  • commit 1a8e9cf40c9a6a2e40b1e924b13ed303aeea4418 upstream.

    if seq_file .next fuction does not change position index,
    read after some lseek can generate unexpected output.

    Script below generates endless output
    $ q=;while read -r r;do echo "$((++q)) $r";done
    Reviewed-by: Jan Kara
    Link: https://lore.kernel.org/r/d13805e5-695e-8ac3-b678-26ca2313629f@virtuozzo.com
    Signed-off-by: Theodore Ts'o
    Signed-off-by: Greg Kroah-Hartman

    Vasily Averin
     

05 Jan, 2020

1 commit

  • [ Upstream commit 015c6033068208d6227612c878877919f3fcf6b6 ]

    jbd2 statistics counting number of blocks logged in a transaction was
    wrong. It didn't count the commit block and more importantly it didn't
    count revoke descriptor blocks. Make sure these get properly counted.

    Reviewed-by: Theodore Ts'o
    Signed-off-by: Jan Kara
    Link: https://lore.kernel.org/r/20191105164437.32602-13-jack@suse.cz
    Signed-off-by: Theodore Ts'o
    Signed-off-by: Sasha Levin

    Jan Kara
     

25 Sep, 2019

1 commit

  • Since ext4/ocfs2 are using jbd2_inode dirty range scoping APIs now,
    jbd2_journal_inode_add_[write|wait] are not used any more, remove them.

    Link: http://lkml.kernel.org/r/1562977611-8412-2-git-send-email-joseph.qi@linux.alibaba.com
    Signed-off-by: Joseph Qi
    Reviewed-by: Ross Zwisler
    Acked-by: Changwei Ge
    Cc: Gang He
    Cc: Joel Becker
    Cc: Joseph Qi
    Cc: Jun Piao
    Cc: Junxiao Bi
    Cc: Mark Fasheh
    Cc: "Theodore Ts'o"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joseph Qi
     

25 Aug, 2019

1 commit


12 Aug, 2019

1 commit

  • When executing generic/388 on a ppc64le machine, we notice the following
    call trace,

    VFS: brelse: Trying to free free buffer
    WARNING: CPU: 0 PID: 6637 at /root/repos/linux/fs/buffer.c:1195 __brelse+0x84/0xc0

    Call Trace:
    __brelse+0x80/0xc0 (unreliable)
    invalidate_bh_lru+0x78/0xc0
    on_each_cpu_mask+0xa8/0x130
    on_each_cpu_cond_mask+0x130/0x170
    invalidate_bh_lrus+0x44/0x60
    invalidate_bdev+0x38/0x70
    ext4_put_super+0x294/0x560
    generic_shutdown_super+0xb0/0x170
    kill_block_super+0x38/0xb0
    deactivate_locked_super+0xa4/0xf0
    cleanup_mnt+0x164/0x1d0
    task_work_run+0x110/0x160
    do_notify_resume+0x414/0x460
    ret_from_except_lite+0x70/0x74

    The warning happens because flush_descriptor() drops bh reference it
    does not own. The bh reference acquired by
    jbd2_journal_get_descriptor_buffer() is owned by the log_bufs list and
    gets released when this list is processed. The reference for doing IO is
    only acquired in write_dirty_buffer() later in flush_descriptor().

    Reported-by: Harish Sriram
    Reviewed-by: Jan Kara
    Signed-off-by: Chandan Rajendra
    Signed-off-by: Theodore Ts'o

    Chandan Rajendra
     

21 Jun, 2019

2 commits

  • The journal_sync_buffer() function was never carried over from jbd to
    jbd2. So get rid of the vestigal declaration of this (non-existent)
    function.

    Signed-off-by: Theodore Ts'o
    Reviewed-by: Darrick J. Wong

    Theodore Ts'o
     
  • Currently both journal_submit_inode_data_buffers() and
    journal_finish_inode_data_buffers() operate on the entire address space
    of each of the inodes associated with a given journal entry. The
    consequence of this is that if we have an inode where we are constantly
    appending dirty pages we can end up waiting for an indefinite amount of
    time in journal_finish_inode_data_buffers() while we wait for all the
    pages under writeback to be written out.

    The easiest way to cause this type of workload is do just dd from
    /dev/zero to a file until it fills the entire filesystem. This can
    cause journal_finish_inode_data_buffers() to wait for the duration of
    the entire dd operation.

    We can improve this situation by scoping each of the inode dirty ranges
    associated with a given transaction. We do this via the jbd2_inode
    structure so that the scoping is contained within jbd2 and so that it
    follows the lifetime and locking rules for that structure.

    This allows us to limit the writeback & wait in
    journal_submit_inode_data_buffers() and
    journal_finish_inode_data_buffers() respectively to the dirty range for
    a given struct jdb2_inode, keeping us from waiting forever if the inode
    in question is still being appended to.

    Signed-off-by: Ross Zwisler
    Signed-off-by: Theodore Ts'o
    Reviewed-by: Jan Kara
    Cc: stable@vger.kernel.org

    Ross Zwisler
     

31 May, 2019

2 commits


21 May, 2019

1 commit


11 May, 2019

1 commit

  • When failing from creating cache jbd2_inode_cache, we will destroy the
    previously created cache jbd2_handle_cache twice. This patch fixes
    this by moving each cache initialization/destruction to its own
    separate, individual function.

    Signed-off-by: Chengguang Xu
    Signed-off-by: Theodore Ts'o
    Cc: stable@kernel.org

    Chengguang Xu
     

07 Apr, 2019

2 commits

  • We hit a BUG at fs/buffer.c:3057 if we detached the nbd device
    before unmounting ext4 filesystem.

    The typical chain of events leading to the BUG:
    jbd2_write_superblock
    submit_bh
    submit_bh_wbc
    BUG_ON(!buffer_mapped(bh));

    The block device is removed and all the pages are invalidated. JBD2
    was trying to write journal superblock to the block device which is
    no longer present.

    Fix this by checking the journal superblock's buffer head prior to
    submitting.

    Reported-by: Eric Ren
    Signed-off-by: Jiufei Xue
    Signed-off-by: Theodore Ts'o
    Reviewed-by: Jan Kara
    Cc: stable@kernel.org

    Jiufei Xue
     
  • At the beginning, nblocks has been assigned. There is no need
    to repeat the assignment in the while loop, and remove it.

    Signed-off-by: Liu Song
    Signed-off-by: Theodore Ts'o
    Reviewed-by: Jan Kara

    Liu Song
     

01 Mar, 2019

2 commits

  • In jbd2_get_transaction, a new transaction is initialized,
    and set to the j_running_transaction. No need for a return
    value, so remove it.

    Also, adjust some comments to match the actual operation
    of this function.

    Signed-off-by: Liu Song
    Signed-off-by: Theodore Ts'o
    Reviewed-by: Jan Kara

    Liu Song
     
  • In jbd2_journal_commit_transaction(), if we are in abort mode,
    we may flush the buffer without setting descriptor block checksum
    by goto start_journal_io. Then fs is mounted,
    jbd2_descriptor_block_csum_verify() failed.

    [ 271.379811] EXT4-fs (vdd): shut down requested (2)
    [ 271.381827] Aborting journal on device vdd-8.
    [ 271.597136] JBD2: Invalid checksum recovering block 22199 in log
    [ 271.598023] JBD2: recovery failed
    [ 271.598484] EXT4-fs (vdd): error loading journal

    Fix this problem by keep setting descriptor block checksum if the
    descriptor buffer is not NULL.

    This checksum problem can be reproduced by xfstests generic/388.

    Signed-off-by: luojiajun
    Signed-off-by: Theodore Ts'o
    Reviewed-by: Jan Kara

    luojiajun
     

22 Feb, 2019

1 commit

  • The jh pointer may be used uninitialized in the two cases below and the
    compiler complain about it when enabling JBUFFER_TRACE macro, fix them.

    In file included from fs/jbd2/transaction.c:19:0:
    fs/jbd2/transaction.c: In function ‘jbd2_journal_get_undo_access’:
    ./include/linux/jbd2.h:1637:38: warning: ‘jh’ is used uninitialized in this function [-Wuninitialized]
    #define JBUFFER_TRACE(jh, info) do { printk("%s: %d\n", __func__, jh->b_jcount);} while (0)
    ^
    fs/jbd2/transaction.c:1219:23: note: ‘jh’ was declared here
    struct journal_head *jh;
    ^
    In file included from fs/jbd2/transaction.c:19:0:
    fs/jbd2/transaction.c: In function ‘jbd2_journal_dirty_metadata’:
    ./include/linux/jbd2.h:1637:38: warning: ‘jh’ may be used uninitialized in this function [-Wmaybe-uninitialized]
    #define JBUFFER_TRACE(jh, info) do { printk("%s: %d\n", __func__, jh->b_jcount);} while (0)
    ^
    fs/jbd2/transaction.c:1332:23: note: ‘jh’ was declared here
    struct journal_head *jh;
    ^

    Signed-off-by: zhangyi (F)
    Signed-off-by: Theodore Ts'o
    Cc: stable@vger.kernel.org
    Reviewed-by: Jan Kara

    zhangyi (F)
     

15 Feb, 2019

2 commits

  • The functions jbd2_superblock_csum_verify() and
    jbd2_superblock_csum_set() only get called from one location, so to
    simplify things, fold them into their callers.

    Signed-off-by: Theodore Ts'o

    Theodore Ts'o
     
  • The jbd2 superblock is lockless now, so there is probably a race
    condition between writing it so disk and modifing contents of it, which
    may lead to checksum error. The following race is the one case that we
    have captured.

    jbd2 fsstress
    jbd2_journal_commit_transaction
    jbd2_journal_update_sb_log_tail
    jbd2_write_superblock
    jbd2_superblock_csum_set jbd2_journal_revoke
    jbd2_journal_set_features(revork)
    modify superblock
    submit_bh(checksum incorrect)

    Fix this by locking the buffer head before modifing it. We always
    write the jbd2 superblock after we modify it, so this just means
    calling the lock_buffer() a little earlier.

    This checksum corruption problem can be reproduced by xfstests
    generic/475.

    Reported-by: zhangyi (F)
    Suggested-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Theodore Ts'o
     

11 Feb, 2019

2 commits

  • We do not unmap and clear dirty flag when forgetting a buffer without
    journal or does not belongs to any transaction, so the invalid dirty
    data may still be written to the disk later. It's fine if the
    corresponding block is never used before the next mount, and it's also
    fine that we invoke clean_bdev_aliases() related functions to unmap
    the block device mapping when re-allocating such freed block as data
    block. But this logic is somewhat fragile and risky that may lead to
    data corruption if we forget to clean bdev aliases. So, It's better to
    discard dirty data during forget time.

    We have been already handled all the cases of forgetting journalled
    buffer, this patch deal with the remaining two cases.

    - buffer is not journalled yet,
    - buffer is journalled but doesn't belongs to any transaction.

    We invoke __bforget() instead of __brelese() when forgetting an
    un-journalled buffer in jbd2_journal_forget(). After this patch we can
    remove all clean_bdev_aliases() related calls in ext4.

    Suggested-by: Jan Kara
    Signed-off-by: zhangyi (F)
    Signed-off-by: Theodore Ts'o
    Reviewed-by: Jan Kara

    zhangyi (F)
     
  • Now, we capture a data corruption problem on ext4 while we're truncating
    an extent index block. Imaging that if we are revoking a buffer which
    has been journaled by the committing transaction, the buffer's jbddirty
    flag will not be cleared in jbd2_journal_forget(), so the commit code
    will set the buffer dirty flag again after refile the buffer.

    fsx kjournald2
    jbd2_journal_commit_transaction
    jbd2_journal_revoke commit phase 1~5...
    jbd2_journal_forget
    belongs to older transaction commit phase 6
    jbddirty not clear __jbd2_journal_refile_buffer
    __jbd2_journal_unfile_buffer
    test_clear_buffer_jbddirty
    mark_buffer_dirty

    Finally, if the freed extent index block was allocated again as data
    block by some other files, it may corrupt the file data after writing
    cached pages later, such as during unmount time. (In general,
    clean_bdev_aliases() related helpers should be invoked after
    re-allocation to prevent the above corruption, but unfortunately we
    missed it when zeroout the head of extra extent blocks in
    ext4_ext_handle_unwritten_extents()).

    This patch mark buffer as freed and set j_next_transaction to the new
    transaction when it already belongs to the committing transaction in
    jbd2_journal_forget(), so that commit code knows it should clear dirty
    bits when it is done with the buffer.

    This problem can be reproduced by xfstests generic/455 easily with
    seeds (3246 3247 3248 3249).

    Signed-off-by: zhangyi (F)
    Signed-off-by: Theodore Ts'o
    Reviewed-by: Jan Kara
    Cc: stable@vger.kernel.org

    zhangyi (F)
     

01 Feb, 2019

1 commit

  • This issue was found when I tried to put checkpoint work in a separate thread,
    the deadlock below happened:
    Thread1 | Thread2
    __jbd2_log_wait_for_space |
    jbd2_log_do_checkpoint (hold j_checkpoint_mutex)|
    if (jh->b_transaction != NULL) |
    ... |
    jbd2_log_start_commit(journal, tid); |jbd2_update_log_tail
    | will lock j_checkpoint_mutex,
    | but will be blocked here.
    |
    jbd2_log_wait_commit(journal, tid); |
    wait_event(journal->j_wait_done_commit, |
    !tid_gt(tid, journal->j_commit_sequence)); |
    ... |wake_up(j_wait_done_commit)
    } |

    then deadlock occurs, Thread1 will never be waken up.

    To fix this issue, drop j_checkpoint_mutex in jbd2_log_do_checkpoint()
    when we are going to wait for transaction commit.

    Reviewed-by: Jan Kara
    Signed-off-by: Xiaoguang Wang
    Signed-off-by: Theodore Ts'o

    Xiaoguang Wang
     

04 Dec, 2018

2 commits

  • There is a statement that is indented with spaces, replace it with
    a tab.

    Reviewed-by: Jan Kara
    Signed-off-by: Colin Ian King
    Signed-off-by: Theodore Ts'o

    Colin Ian King
     
  • We can hold j_state_lock for writing at the beginning of
    jbd2_journal_commit_transaction() for a rather long time (reportedly for
    30 ms) due cleaning revoke bits of all revoked buffers under it. The
    handling of revoke tables as well as cleaning of t_reserved_list, and
    checkpoint lists does not need j_state_lock for anything. It is only
    needed to prevent new handles from joining the transaction. Generally
    T_LOCKED transaction state prevents new handles from joining the
    transaction - except for reserved handles which have to allowed to join
    while we wait for other handles to complete.

    To prevent reserved handles from joining the transaction while cleaning
    up lists, add new transaction state T_SWITCH and watch for it when
    starting reserved handles. With this we can just drop the lock for
    operations that don't need it.

    Reported-and-tested-by: Adrian Hunter
    Suggested-by: "Theodore Y. Ts'o"
    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara
     

06 Oct, 2018

1 commit

  • The code cleaning transaction's lists of checkpoint buffers has a bug
    where it increases bh refcount only after releasing
    journal->j_list_lock. Thus the following race is possible:

    CPU0 CPU1
    jbd2_log_do_checkpoint()
    jbd2_journal_try_to_free_buffers()
    __journal_try_to_free_buffer(bh)
    ...
    while (transaction->t_checkpoint_io_list)
    ...
    if (buffer_locked(bh)) {

    spin_unlock(&journal->j_list_lock);
    spin_lock(&journal->j_list_lock);
    __jbd2_journal_remove_checkpoint(jh);
    spin_unlock(&journal->j_list_lock);
    try_to_free_buffers(page);
    get_bh(bh) j_list_lock.

    Fixes: dc6e8d669cf5 ("jbd2: don't call get_bh() before calling __jbd2_journal_remove_checkpoint()")
    Fixes: be1158cc615f ("jbd2: fold __process_buffer() into jbd2_log_do_checkpoint()")
    Reported-by: syzbot+7f4a27091759e2fe7453@syzkaller.appspotmail.com
    CC: stable@vger.kernel.org
    Reviewed-by: Lukas Czerner
    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara
     

30 Jul, 2018

1 commit

  • jbd2 is one of the few callers of current_kernel_time64(), which
    is a wrapper around ktime_get_coarse_real_ts64(). This calls the
    latter directly for consistency with the rest of the kernel that
    is moving to the ktime_get_ family of time accessors.

    Reviewed-by: Andreas Dilger
    Reviewed-by: Jan Kara
    Signed-off-by: Arnd Bergmann
    Signed-off-by: Theodore Ts'o

    Arnd Bergmann
     

09 Jul, 2018

1 commit

  • Pull ext4 bugfixes from Ted Ts'o:
    "Bug fixes for ext4; most of which relate to vulnerabilities where a
    maliciously crafted file system image can result in a kernel OOPS or
    hang.

    At least one fix addresses an inline data bug could be triggered by
    userspace without the need of a crafted file system (although it does
    require that the inline data feature be enabled)"

    * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
    ext4: check superblock mapped prior to committing
    ext4: add more mount time checks of the superblock
    ext4: add more inode number paranoia checks
    ext4: avoid running out of journal credits when appending to an inline file
    jbd2: don't mark block as modified if the handle is out of credits
    ext4: never move the system.data xattr out of the inode body
    ext4: clear i_data in ext4_inode_info when removing inline data
    ext4: include the illegal physical block in the bad map ext4_error msg
    ext4: verify the depth of extent tree in ext4_find_extent()
    ext4: only look at the bg_flags field if it is valid
    ext4: make sure bitmaps and the inode table don't overlap with bg descriptors
    ext4: always check block group bounds in ext4_init_block_bitmap()
    ext4: always verify the magic number in xattr blocks
    ext4: add corruption check in ext4_xattr_set_entry()
    ext4: add warn_on_error mount option

    Linus Torvalds
     

17 Jun, 2018

1 commit

  • Do not set the b_modified flag in block's journal head should not
    until after we're sure that jbd2_journal_dirty_metadat() will not
    abort with an error due to there not being enough space reserved in
    the jbd2 handle.

    Otherwise, future attempts to modify the buffer may lead a large
    number of spurious errors and warnings.

    This addresses CVE-2018-10883.

    https://bugzilla.kernel.org/show_bug.cgi?id=200071

    Signed-off-by: Theodore Ts'o
    Cc: stable@kernel.org

    Theodore Ts'o