20 Nov, 2020

1 commit

  • Kernel-doc markup should use this format:
    identifier - description

    They should not have any type before that, as otherwise
    the parser won't do the right thing.

    Also, some identifiers have different names between their
    prototypes and the kernel-doc markup.

    Reviewed-by: Jan Kara
    Signed-off-by: Mauro Carvalho Chehab
    Link: https://lore.kernel.org/r/72f5c6628f5f278d67625f60893ffbc2ca28d46e.1605521731.git.mchehab+huawei@kernel.org
    Signed-off-by: Theodore Ts'o

    Mauro Carvalho Chehab
     

07 Nov, 2020

9 commits

  • Add missing __acquires() and __releases() annotations. Also, in an
    "this should never happen" WARN_ON check, if it *does* actually
    happen, we need to release j_state_lock since this function is always
    supposed to release that lock. Otherwise, things will quickly grind
    to a halt after the WARN_ON trips.

    Fixes: 96f1e0974575 ("jbd2: avoid long hold times of j_state_lock...")
    Cc: stable@kernel.org
    Signed-off-by: Theodore Ts'o

    Theodore Ts'o
     
  • Fast commit should not be started if the journal is aborted.

    Signed-off-by: Harshad Shirwadkar
    Link: https://lore.kernel.org/r/20201106035911.1942128-22-harshadshirwadkar@gmail.com
    Signed-off-by: Theodore Ts'o

    Harshad Shirwadkar
     
  • Take journal state lock before reading journal->j_commit_sequence.

    Signed-off-by: Harshad Shirwadkar
    Link: https://lore.kernel.org/r/20201106035911.1942128-13-harshadshirwadkar@gmail.com
    Signed-off-by: Theodore Ts'o

    Harshad Shirwadkar
     
  • Fast commit buffers should be filled in before toucing their
    state. Remove code that sets buffer state as dirty before the buffer
    is passed to the file system.

    Suggested-by: Jan Kara
    Signed-off-by: Harshad Shirwadkar
    Link: https://lore.kernel.org/r/20201106035911.1942128-12-harshadshirwadkar@gmail.com
    Signed-off-by: Theodore Ts'o

    Harshad Shirwadkar
     
  • Fast commit performance can be optimized if commit thread doesn't wait
    for ongoing fast commits to complete until the transaction enters
    T_FLUSH state. Document this optimization.

    Suggested-by: Jan Kara
    Signed-off-by: Harshad Shirwadkar
    Link: https://lore.kernel.org/r/20201106035911.1942128-11-harshadshirwadkar@gmail.com
    Signed-off-by: Theodore Ts'o

    Harshad Shirwadkar
     
  • In jbd2_fc_end_commit_fallback(), we know which tid to commit. There's
    no need for caller to pass it.

    Suggested-by: Jan Kara
    Signed-off-by: Harshad Shirwadkar
    Link: https://lore.kernel.org/r/20201106035911.1942128-10-harshadshirwadkar@gmail.com
    Signed-off-by: Theodore Ts'o

    Harshad Shirwadkar
     
  • Variables journal->j_fc_off, journal->j_fc_wbuf are accessed during
    commit path. Since today we allow only one process to perform a fast
    commit, there is no need take state lock before accessing these
    variables. This patch removes these locks and adds comments to
    describe this.

    Suggested-by: Jan Kara
    Signed-off-by: Harshad Shirwadkar
    Link: https://lore.kernel.org/r/20201106035911.1942128-9-harshadshirwadkar@gmail.com
    Signed-off-by: Theodore Ts'o

    Harshad Shirwadkar
     
  • This patch removes jbd2_fc_init() API and its related functions to
    simplify enabling fast commits. With this change, the number of fast
    commit blocks to use is solely determined by the JBD2 layer. So, we
    move the default value for minimum number of fast commit blocks from
    ext4/fast_commit.h to include/linux/jbd2.h. However, whether or not to
    use fast commits is determined by the file system. The file system
    just sets the fast commit feature using
    jbd2_journal_set_features(). JBD2 layer then determines how many
    blocks to use for fast commits (based on the value found in the JBD2
    superblock).

    Note that the JBD2 feature flag of fast commits is just an indication
    that there are fast commit blocks present on disk. It doesn't tell
    JBD2 layer about the intent of the file system of whether to it wants
    to use fast commit or not. That's why, we blindly clear the fast
    commit flag in journal_reset() after the recovery is done.

    Suggested-by: Jan Kara
    Signed-off-by: Harshad Shirwadkar
    Link: https://lore.kernel.org/r/20201106035911.1942128-7-harshadshirwadkar@gmail.com
    Signed-off-by: Theodore Ts'o

    Harshad Shirwadkar
     
  • The on-disk superblock field sb->s_maxlen represents the total size of
    the journal including the fast commit area and is no more the max
    number of blocks available for a transaction. The maximum number of
    blocks available to a transaction is reduced by the number of fast
    commit blocks. So, this patch renames j_maxlen to j_total_len to
    better represent its intent. Also, it adds a function to calculate max
    number of bufs available for a transaction.

    Suggested-by: Jan Kara
    Signed-off-by: Harshad Shirwadkar
    Link: https://lore.kernel.org/r/20201106035911.1942128-6-harshadshirwadkar@gmail.com
    Signed-off-by: Theodore Ts'o

    Harshad Shirwadkar
     

22 Oct, 2020

3 commits

  • This patch adds fast commit recovery support in JBD2.

    Signed-off-by: Harshad Shirwadkar
    Link: https://lore.kernel.org/r/20201015203802.3597742-7-harshadshirwadkar@gmail.com
    Signed-off-by: Theodore Ts'o

    Harshad Shirwadkar
     
  • This functions adds necessary APIs needed in JBD2 layer for fast
    commits.

    Signed-off-by: Harshad Shirwadkar
    Link: https://lore.kernel.org/r/20201015203802.3597742-5-harshadshirwadkar@gmail.com
    Signed-off-by: Theodore Ts'o

    Harshad Shirwadkar
     
  • This patch adds fast commit area trackers in the journal_t
    structure. These are initialized via the jbd2_fc_init() routine that
    this patch adds. This patch also adds ext4/fast_commit.c and
    ext4/fast_commit.h files for fast commit code that will be added in
    subsequent patches in this series.

    Reported-by: kernel test robot
    Signed-off-by: Harshad Shirwadkar
    Link: https://lore.kernel.org/r/20201015203802.3597742-4-harshadshirwadkar@gmail.com
    Signed-off-by: Theodore Ts'o

    Harshad Shirwadkar
     

18 Oct, 2020

3 commits

  • When ext4 is formatted with lazy_journal_init=1 and transactions from
    the previous filesystem are still on disk, it is possible that they are
    considered during a recovery after a crash. Because the checksum seed
    has changed, the CRC check will fail, and the journal recovery fails
    with checksum error although the journal is otherwise perfectly valid.
    Fix the problem by checking commit block time stamps to determine
    whether the data in the journal block is just stale or whether it is
    indeed corrupt.

    Reported-by: kernel test robot
    Reviewed-by: Andreas Dilger
    Signed-off-by: Fengnan Chang
    Signed-off-by: Jan Kara
    Link: https://lore.kernel.org/r/20201012164900.20197-1-jack@suse.cz
    Signed-off-by: Theodore Ts'o

    changfengnan
     
  • Introduce journal callbacks to allow different behaviors
    for an inode in journal_submit|finish_inode_data_buffers().

    The existing users of the current behavior (ext4, ocfs2)
    are adapted to use the previously exported functions
    that implement the current behavior.

    Users are callers of jbd2_journal_inode_ranged_write|wait(),
    which adds the inode to the transaction's inode list with
    the JI_WRITE|WAIT_DATA flags. Only ext4 and ocfs2 in-tree.

    Both CONFIG_EXT4_FS and CONFIG_OCSFS2_FS select CONFIG_JBD2,
    which builds fs/jbd2/commit.c and journal.c that define and
    export the functions, so we can call directly in ext4/ocfs2.

    Signed-off-by: Mauricio Faria de Oliveira
    Suggested-by: Jan Kara
    Reviewed-by: Jan Kara
    Reviewed-by: Andreas Dilger
    Link: https://lore.kernel.org/r/20201006004841.600488-3-mfo@canonical.com
    Signed-off-by: Theodore Ts'o

    Mauricio Faria de Oliveira
     
  • Export functions that implement the current behavior done
    for an inode in journal_submit|finish_inode_data_buffers().

    No functional change.

    Signed-off-by: Mauricio Faria de Oliveira
    Suggested-by: Jan Kara
    Reviewed-by: Jan Kara
    Reviewed-by: Andreas Dilger
    Link: https://lore.kernel.org/r/20201006004841.600488-2-mfo@canonical.com
    Signed-off-by: Theodore Ts'o

    Mauricio Faria de Oliveira
     

20 Aug, 2020

1 commit

  • Remove the unnecessary chksum_err and checksum_seen variables as well as
    some redundant code to make the function easier to understand.

    [ With changes suggested by jack@ and tytso@ ]

    Signed-off-by: Shijie Luo
    Signed-off-by: Theodore Ts'o
    Reviewed-by: Jan Kara
    Link: https://lore.kernel.org/r/20200819122955.33526-1-luoshijie1@huawei.com
    Signed-off-by: Theodore Ts'o

    Shijie Luo
     

08 Aug, 2020

3 commits

  • Remove unnecessary blank.

    Signed-off-by: Xianting Tian
    Reviewed-by: Jan Kara
    Link: https://lore.kernel.org/r/1595077057-8048-1-git-send-email-xianting_tian@126.com
    Signed-off-by: Theodore Ts'o

    Xianting Tian
     
  • Parameter gfp_mask in jbd2_journal_try_to_free_buffers() is no longer
    used after commit ("jbd2: clean up
    jbd2_journal_try_to_free_buffers()"), so just remove it.

    Signed-off-by: zhangyi (F)
    Link: https://lore.kernel.org/r/20200620025427.1756360-6-yi.zhang@huawei.com
    Signed-off-by: Theodore Ts'o

    zhangyi (F)
     
  • If we free a metadata buffer which has been failed to async write out
    in the background, the jbd2 checkpoint procedure will not detect this
    failure in jbd2_log_do_checkpoint(), so it may lead to filesystem
    inconsistency after cleanup journal tail. This patch abort the journal
    if free a buffer has write_io_error flag to prevent potential further
    inconsistency.

    Signed-off-by: zhangyi (F)
    Link: https://lore.kernel.org/r/20200620025427.1756360-5-yi.zhang@huawei.com
    Signed-off-by: Theodore Ts'o

    zhangyi (F)
     

06 Aug, 2020

2 commits

  • jbd2_write_superblock() is under the buffer lock of journal superblock
    before ending that superblock write, so add a missing unlock_buffer() in
    in the error path before submitting buffer.

    Fixes: 742b06b5628f ("jbd2: check superblock mapped prior to committing")
    Signed-off-by: zhangyi (F)
    Reviewed-by: Ritesh Harjani
    Cc: stable@kernel.org
    Link: https://lore.kernel.org/r/20200620061948.2049579-1-yi.zhang@huawei.com
    Signed-off-by: Theodore Ts'o

    zhangyi (F)
     
  • Callers of __jbd2_journal_unfile_buffer() and
    __jbd2_journal_refile_buffer() assume that the b_transaction is set. In
    fact if it's not, we can end up with journal_head refcounting errors
    leading to crash much later that might be very hard to track down. Add
    asserts to make sure that is the case.

    We also make sure that b_next_transaction is NULL in
    __jbd2_journal_unfile_buffer() since the callers expect that as well and
    we should not get into that stage in this state anyway, leading to
    problems later on if we do.

    Tested with fstests.

    Signed-off-by: Lukas Czerner
    Reviewed-by: Jan Kara
    Link: https://lore.kernel.org/r/20200617092549.6712-1-lczerner@redhat.com
    Signed-off-by: Theodore Ts'o

    Lukas Czerner
     

16 Jun, 2020

1 commit

  • Pull more ext4 updates from Ted Ts'o:
    "This is the second round of ext4 commits for 5.8 merge window [1].

    It includes the per-inode DAX support, which was dependant on the DAX
    infrastructure which came in via the XFS tree, and a number of
    regression and bug fixes; most notably the "BUG: using
    smp_processor_id() in preemptible code in ext4_mb_new_blocks" reported
    by syzkaller"

    [1] The pull request actually came in 15 minutes after I had tagged the
    rc1 release. Tssk, tssk, late.. - Linus

    * tag 'ext4-for-linus-5.8-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
    ext4, jbd2: ensure panic by fix a race between jbd2 abort and ext4 error handlers
    ext4: support xattr gnu.* namespace for the Hurd
    ext4: mballoc: Use this_cpu_read instead of this_cpu_ptr
    ext4: avoid utf8_strncasecmp() with unstable name
    ext4: stop overwrite the errcode in ext4_setup_super
    ext4: fix partial cluster initialization when splitting extent
    ext4: avoid race conditions when remounting with options that change dax
    Documentation/dax: Update DAX enablement for ext4
    fs/ext4: Introduce DAX inode flag
    fs/ext4: Remove jflag variable
    fs/ext4: Make DAX mount option a tri-state
    fs/ext4: Only change S_DAX on inode load
    fs/ext4: Update ext4_should_use_dax()
    fs/ext4: Change EXT4_MOUNT_DAX to EXT4_MOUNT_DAX_ALWAYS
    fs/ext4: Disallow verity if inode is DAX
    fs/ext4: Narrow scope of DAX check in setflags

    Linus Torvalds
     

13 Jun, 2020

1 commit

  • In the ext4 filesystem with errors=panic, if one process is recording
    errno in the superblock when invoking jbd2_journal_abort() due to some
    error cases, it could be raced by another __ext4_abort() which is
    setting the SB_RDONLY flag but missing panic because errno has not been
    recorded.

    jbd2_journal_commit_transaction()
    jbd2_journal_abort()
    journal->j_flags |= JBD2_ABORT;
    jbd2_journal_update_sb_errno()
    | ext4_journal_check_start()
    | __ext4_abort()
    | sb->s_flags |= SB_RDONLY;
    | if (!JBD2_REC_ERR)
    | return;
    journal->j_flags |= JBD2_REC_ERR;

    Finally, it will no longer trigger panic because the filesystem has
    already been set read-only. Fix this by introduce j_abort_mutex to make
    sure journal abort is completed before panic, and remove JBD2_REC_ERR
    flag.

    Fixes: 4327ba52afd03 ("ext4, jbd2: ensure entering into panic after recording an error in superblock")
    Signed-off-by: zhangyi (F)
    Reviewed-by: Jan Kara
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20200609073540.3810702-1-yi.zhang@huawei.com
    Signed-off-by: Theodore Ts'o

    zhangyi (F)
     

06 Jun, 2020

1 commit

  • Pull ext4 updates from Ted Ts'o:
    "A lot of bug fixes and cleanups for ext4, including:

    - Fix performance problems found in dioread_nolock now that it is the
    default, caused by transaction leaks.

    - Clean up fiemap handling in ext4

    - Clean up and refactor multiple block allocator (mballoc) code

    - Fix a problem with mballoc with a smaller file systems running out
    of blocks because they couldn't properly use blocks that had been
    reserved by inode preallocation.

    - Fixed a race in ext4_sync_parent() versus rename()

    - Simplify the error handling in the extent manipulation code

    - Make sure all metadata I/O errors are felected to
    ext4_ext_dirty()'s and ext4_make_inode_dirty()'s callers.

    - Avoid passing an error pointer to brelse in ext4_xattr_set()

    - Fix race which could result to freeing an inode on the dirty last
    in data=journal mode.

    - Fix refcount handling if ext4_iget() fails

    - Fix a crash in generic/019 caused by a corrupted extent node"

    * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (58 commits)
    ext4: avoid unnecessary transaction starts during writeback
    ext4: don't block for O_DIRECT if IOCB_NOWAIT is set
    ext4: remove the access_ok() check in ext4_ioctl_get_es_cache
    fs: remove the access_ok() check in ioctl_fiemap
    fs: handle FIEMAP_FLAG_SYNC in fiemap_prep
    fs: move fiemap range validation into the file systems instances
    iomap: fix the iomap_fiemap prototype
    fs: move the fiemap definitions out of fs.h
    fs: mark __generic_block_fiemap static
    ext4: remove the call to fiemap_check_flags in ext4_fiemap
    ext4: split _ext4_fiemap
    ext4: fix fiemap size checks for bitmap files
    ext4: fix EXT4_MAX_LOGICAL_BLOCK macro
    add comment for ext4_dir_entry_2 file_type member
    jbd2: avoid leaking transaction credits when unreserving handle
    ext4: drop ext4_journal_free_reserved()
    ext4: mballoc: use lock for checking free blocks while retrying
    ext4: mballoc: refactor ext4_mb_good_group()
    ext4: mballoc: introduce pcpu seqcnt for freeing PA to improve ENOSPC handling
    ext4: mballoc: refactor ext4_mb_discard_preallocations()
    ...

    Linus Torvalds
     

04 Jun, 2020

1 commit

  • When reserved transaction handle is unused, we subtract its reserved
    credits in __jbd2_journal_unreserve_handle() called from
    jbd2_journal_stop(). However this function forgets to remove reserved
    credits from transaction->t_outstanding_credits and thus the transaction
    space that was reserved remains effectively leaked. The leaked
    transaction space can be quite significant in some cases and leads to
    unnecessarily small transactions and thus reducing throughput of the
    journalling machinery. E.g. fsmark workload creating lots of 4k files
    was observed to have about 20% lower throughput due to this when ext4 is
    mounted with dioread_nolock mount option.

    Subtract reserved credits from t_outstanding_credits as well.

    CC: stable@vger.kernel.org
    Fixes: 8f7d89f36829 ("jbd2: transaction reservation support")
    Reviewed-by: Andreas Dilger
    Signed-off-by: Jan Kara
    Link: https://lore.kernel.org/r/20200520133119.1383-3-jack@suse.cz
    Signed-off-by: Theodore Ts'o

    Jan Kara
     

22 May, 2020

1 commit


06 Mar, 2020

1 commit

  • Improve comments in jbd2_journal_commit_transaction() to describe why
    we don't need to clear the buffer_mapped bit for freeing file mapping
    buffers whose page mapping is NULL.

    Link: https://lore.kernel.org/r/20200217112706.20085-1-yi.zhang@huawei.com
    Fixes: c96dceeabf76 ("jbd2: do not clear the BH_Mapped flag when forgetting a metadata buffer")
    Suggested-by: Jan Kara
    Reviewed-by: Jan Kara
    Signed-off-by: zhangyi (F)
    Signed-off-by: Theodore Ts'o

    zhangyi (F)
     

01 Mar, 2020

1 commit

  • journal_head::b_transaction and journal_head::b_next_transaction could
    be accessed concurrently as noticed by KCSAN,

    LTP: starting fsync04
    /dev/zero: Can't open blockdev
    EXT4-fs (loop0): mounting ext3 file system using the ext4 subsystem
    EXT4-fs (loop0): mounted filesystem with ordered data mode. Opts: (null)
    ==================================================================
    BUG: KCSAN: data-race in __jbd2_journal_refile_buffer [jbd2] / jbd2_write_access_granted [jbd2]

    write to 0xffff99f9b1bd0e30 of 8 bytes by task 25721 on cpu 70:
    __jbd2_journal_refile_buffer+0xdd/0x210 [jbd2]
    __jbd2_journal_refile_buffer at fs/jbd2/transaction.c:2569
    jbd2_journal_commit_transaction+0x2d15/0x3f20 [jbd2]
    (inlined by) jbd2_journal_commit_transaction at fs/jbd2/commit.c:1034
    kjournald2+0x13b/0x450 [jbd2]
    kthread+0x1cd/0x1f0
    ret_from_fork+0x27/0x50

    read to 0xffff99f9b1bd0e30 of 8 bytes by task 25724 on cpu 68:
    jbd2_write_access_granted+0x1b2/0x250 [jbd2]
    jbd2_write_access_granted at fs/jbd2/transaction.c:1155
    jbd2_journal_get_write_access+0x2c/0x60 [jbd2]
    __ext4_journal_get_write_access+0x50/0x90 [ext4]
    ext4_mb_mark_diskspace_used+0x158/0x620 [ext4]
    ext4_mb_new_blocks+0x54f/0xca0 [ext4]
    ext4_ind_map_blocks+0xc79/0x1b40 [ext4]
    ext4_map_blocks+0x3b4/0x950 [ext4]
    _ext4_get_block+0xfc/0x270 [ext4]
    ext4_get_block+0x3b/0x50 [ext4]
    __block_write_begin_int+0x22e/0xae0
    __block_write_begin+0x39/0x50
    ext4_write_begin+0x388/0xb50 [ext4]
    generic_perform_write+0x15d/0x290
    ext4_buffered_write_iter+0x11f/0x210 [ext4]
    ext4_file_write_iter+0xce/0x9e0 [ext4]
    new_sync_write+0x29c/0x3b0
    __vfs_write+0x92/0xa0
    vfs_write+0x103/0x260
    ksys_write+0x9d/0x130
    __x64_sys_write+0x4c/0x60
    do_syscall_64+0x91/0xb05
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    5 locks held by fsync04/25724:
    #0: ffff99f9911093f8 (sb_writers#13){.+.+}, at: vfs_write+0x21c/0x260
    #1: ffff99f9db4c0348 (&sb->s_type->i_mutex_key#15){+.+.}, at: ext4_buffered_write_iter+0x65/0x210 [ext4]
    #2: ffff99f5e7dfcf58 (jbd2_handle){++++}, at: start_this_handle+0x1c1/0x9d0 [jbd2]
    #3: ffff99f9db4c0168 (&ei->i_data_sem){++++}, at: ext4_map_blocks+0x176/0x950 [ext4]
    #4: ffffffff99086b40 (rcu_read_lock){....}, at: jbd2_write_access_granted+0x4e/0x250 [jbd2]
    irq event stamp: 1407125
    hardirqs last enabled at (1407125): [] __find_get_block+0x107/0x790
    hardirqs last disabled at (1407124): [] __find_get_block+0x49/0x790
    softirqs last enabled at (1405528): [] __do_softirq+0x34c/0x57c
    softirqs last disabled at (1405521): [] irq_exit+0xa2/0xc0

    Reported by Kernel Concurrency Sanitizer on:
    CPU: 68 PID: 25724 Comm: fsync04 Tainted: G L 5.6.0-rc2-next-20200221+ #7
    Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019

    The plain reads are outside of jh->b_state_lock critical section which result
    in data races. Fix them by adding pairs of READ|WRITE_ONCE().

    Reviewed-by: Jan Kara
    Signed-off-by: Qian Cai
    Link: https://lore.kernel.org/r/20200222043111.2227-1-cai@lca.pw
    Signed-off-by: Theodore Ts'o

    Qian Cai
     

22 Feb, 2020

1 commit

  • I found a NULL pointer dereference in ocfs2_block_group_clear_bits().
    The running environment:
    kernel version: 4.19
    A cluster with two nodes, 5 luns mounted on two nodes, and do some
    file operations like dd/fallocate/truncate/rm on every lun with storage
    network disconnection.

    The fallocate operation on dm-23-45 caused an null pointer dereference.

    The information of NULL pointer dereference as follows:
    [577992.878282] JBD2: Error -5 detected when updating journal superblock for dm-23-45.
    [577992.878290] Aborting journal on device dm-23-45.
    ...
    [577992.890778] JBD2: Error -5 detected when updating journal superblock for dm-24-46.
    [577992.890908] __journal_remove_journal_head: freeing b_committed_data
    [577992.890916] (fallocate,88392,52):ocfs2_extend_trans:474 ERROR: status = -30
    [577992.890918] __journal_remove_journal_head: freeing b_committed_data
    [577992.890920] (fallocate,88392,52):ocfs2_rotate_tree_right:2500 ERROR: status = -30
    [577992.890922] __journal_remove_journal_head: freeing b_committed_data
    [577992.890924] (fallocate,88392,52):ocfs2_do_insert_extent:4382 ERROR: status = -30
    [577992.890928] (fallocate,88392,52):ocfs2_insert_extent:4842 ERROR: status = -30
    [577992.890928] __journal_remove_journal_head: freeing b_committed_data
    [577992.890930] (fallocate,88392,52):ocfs2_add_clusters_in_btree:4947 ERROR: status = -30
    [577992.890933] __journal_remove_journal_head: freeing b_committed_data
    [577992.890939] __journal_remove_journal_head: freeing b_committed_data
    [577992.890949] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020
    [577992.890950] Mem abort info:
    [577992.890951] ESR = 0x96000004
    [577992.890952] Exception class = DABT (current EL), IL = 32 bits
    [577992.890952] SET = 0, FnV = 0
    [577992.890953] EA = 0, S1PTW = 0
    [577992.890954] Data abort info:
    [577992.890955] ISV = 0, ISS = 0x00000004
    [577992.890956] CM = 0, WnR = 0
    [577992.890958] user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000f8da07a9
    [577992.890960] [0000000000000020] pgd=0000000000000000
    [577992.890964] Internal error: Oops: 96000004 [#1] SMP
    [577992.890965] Process fallocate (pid: 88392, stack limit = 0x00000000013db2fd)
    [577992.890968] CPU: 52 PID: 88392 Comm: fallocate Kdump: loaded Tainted: G W OE 4.19.36 #1
    [577992.890969] Hardware name: Huawei TaiShan 2280 V2/BC82AMDD, BIOS 0.98 08/25/2019
    [577992.890971] pstate: 60400009 (nZCv daif +PAN -UAO)
    [577992.891054] pc : _ocfs2_free_suballoc_bits+0x63c/0x968 [ocfs2]
    [577992.891082] lr : _ocfs2_free_suballoc_bits+0x618/0x968 [ocfs2]
    [577992.891084] sp : ffff0000c8e2b810
    [577992.891085] x29: ffff0000c8e2b820 x28: 0000000000000000
    [577992.891087] x27: 00000000000006f3 x26: ffffa07957b02e70
    [577992.891089] x25: ffff807c59d50000 x24: 00000000000006f2
    [577992.891091] x23: 0000000000000001 x22: ffff807bd39abc30
    [577992.891093] x21: ffff0000811d9000 x20: ffffa07535d6a000
    [577992.891097] x19: ffff000001681638 x18: ffffffffffffffff
    [577992.891098] x17: 0000000000000000 x16: ffff000080a03df0
    [577992.891100] x15: ffff0000811d9708 x14: 203d207375746174
    [577992.891101] x13: 73203a524f525245 x12: 20373439343a6565
    [577992.891103] x11: 0000000000000038 x10: 0101010101010101
    [577992.891106] x9 : ffffa07c68a85d70 x8 : 7f7f7f7f7f7f7f7f
    [577992.891109] x7 : 0000000000000000 x6 : 0000000000000080
    [577992.891110] x5 : 0000000000000000 x4 : 0000000000000002
    [577992.891112] x3 : ffff000001713390 x2 : 2ff90f88b1c22f00
    [577992.891114] x1 : ffff807bd39abc30 x0 : 0000000000000000
    [577992.891116] Call trace:
    [577992.891139] _ocfs2_free_suballoc_bits+0x63c/0x968 [ocfs2]
    [577992.891162] _ocfs2_free_clusters+0x100/0x290 [ocfs2]
    [577992.891185] ocfs2_free_clusters+0x50/0x68 [ocfs2]
    [577992.891206] ocfs2_add_clusters_in_btree+0x198/0x5e0 [ocfs2]
    [577992.891227] ocfs2_add_inode_data+0x94/0xc8 [ocfs2]
    [577992.891248] ocfs2_extend_allocation+0x1bc/0x7a8 [ocfs2]
    [577992.891269] ocfs2_allocate_extents+0x14c/0x338 [ocfs2]
    [577992.891290] __ocfs2_change_file_space+0x3f8/0x610 [ocfs2]
    [577992.891309] ocfs2_fallocate+0xe4/0x128 [ocfs2]
    [577992.891316] vfs_fallocate+0x11c/0x250
    [577992.891317] ksys_fallocate+0x54/0x88
    [577992.891319] __arm64_sys_fallocate+0x28/0x38
    [577992.891323] el0_svc_common+0x78/0x130
    [577992.891325] el0_svc_handler+0x38/0x78
    [577992.891327] el0_svc+0x8/0xc

    My analysis process as follows:
    ocfs2_fallocate
    __ocfs2_change_file_space
    ocfs2_allocate_extents
    ocfs2_extend_allocation
    ocfs2_add_inode_data
    ocfs2_add_clusters_in_btree
    ocfs2_insert_extent
    ocfs2_do_insert_extent
    ocfs2_rotate_tree_right
    ocfs2_extend_rotate_transaction
    ocfs2_extend_trans
    jbd2_journal_restart
    jbd2__journal_restart
    /* handle->h_transaction is NULL,
    * is_handle_aborted(handle) is true
    */
    handle->h_transaction = NULL;
    start_this_handle
    return -EROFS;
    ocfs2_free_clusters
    _ocfs2_free_clusters
    _ocfs2_free_suballoc_bits
    ocfs2_block_group_clear_bits
    ocfs2_journal_access_gd
    __ocfs2_journal_access
    jbd2_journal_get_undo_access
    /* I think jbd2_write_access_granted() will
    * return true, because do_get_write_access()
    * will return -EROFS.
    */
    if (jbd2_write_access_granted(...)) return 0;
    do_get_write_access
    /* handle->h_transaction is NULL, it will
    * return -EROFS here, so do_get_write_access()
    * was not called.
    */
    if (is_handle_aborted(handle)) return -EROFS;
    /* bh2jh(group_bh) is NULL, caused NULL
    pointer dereference */
    undo_bg = (struct ocfs2_group_desc *)
    bh2jh(group_bh)->b_committed_data;

    If handle->h_transaction == NULL, then jbd2_write_access_granted()
    does not really guarantee that journal_head will stay around,
    not even speaking of its b_committed_data. The bh2jh(group_bh)
    can be removed after ocfs2_journal_access_gd() and before call
    "bh2jh(group_bh)->b_committed_data". So, we should move
    is_handle_aborted() check from do_get_write_access() into
    jbd2_journal_get_undo_access() and jbd2_journal_get_write_access()
    before the call to jbd2_write_access_granted().

    Link: https://lore.kernel.org/r/f72a623f-b3f1-381a-d91d-d22a1c83a336@huawei.com
    Signed-off-by: Yan Wang
    Signed-off-by: Theodore Ts'o
    Reviewed-by: Jun Piao
    Reviewed-by: Jan Kara
    Cc: stable@kernel.org

    wangyan
     

17 Feb, 2020

1 commit

  • Pull ext4 fixes from Ted Ts'o:
    "Miscellaneous ext4 bug fixes (all stable fodder)"

    * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
    ext4: improve explanation of a mount failure caused by a misconfigured kernel
    jbd2: do not clear the BH_Mapped flag when forgetting a metadata buffer
    jbd2: move the clearing of b_modified flag to the journal_unmap_buffer()
    ext4: add cond_resched() to ext4_protect_reserved_inode
    ext4: fix checksum errors with indexed dirs
    ext4: fix support for inode sizes > 1024 bytes
    ext4: simplify checking quota limits in ext4_statfs()
    ext4: don't assume that mmp_nodename/bdevname have NUL

    Linus Torvalds
     

14 Feb, 2020

2 commits

  • Commit 904cdbd41d74 ("jbd2: clear dirty flag when revoking a buffer from
    an older transaction") set the BH_Freed flag when forgetting a metadata
    buffer which belongs to the committing transaction, it indicate the
    committing process clear dirty bits when it is done with the buffer. But
    it also clear the BH_Mapped flag at the same time, which may trigger
    below NULL pointer oops when block_size < PAGE_SIZE.

    rmdir 1 kjournald2 mkdir 2
    jbd2_journal_commit_transaction
    commit transaction N
    jbd2_journal_forget
    set_buffer_freed(bh1)
    jbd2_journal_commit_transaction
    commit transaction N+1
    ...
    clear_buffer_mapped(bh1)
    ext4_getblk(bh2 ummapped)
    ...
    grow_dev_page
    init_page_buffers
    bh1->b_private=NULL
    bh2->b_private=NULL
    jbd2_journal_put_journal_head(jh1)
    __journal_remove_journal_head(hb1)
    jh1 is NULL and trigger oops

    *) Dir entry block bh1 and bh2 belongs to one page, and the bh2 has
    already been unmapped.

    For the metadata buffer we forgetting, we should always keep the mapped
    flag and clear the dirty flags is enough, so this patch pick out the
    these buffers and keep their BH_Mapped flag.

    Link: https://lore.kernel.org/r/20200213063821.30455-3-yi.zhang@huawei.com
    Fixes: 904cdbd41d74 ("jbd2: clear dirty flag when revoking a buffer from an older transaction")
    Reviewed-by: Jan Kara
    Signed-off-by: zhangyi (F)
    Signed-off-by: Theodore Ts'o
    Cc: stable@kernel.org

    zhangyi (F)
     
  • There is no need to delay the clearing of b_modified flag to the
    transaction committing time when unmapping the journalled buffer, so
    just move it to the journal_unmap_buffer().

    Link: https://lore.kernel.org/r/20200213063821.30455-2-yi.zhang@huawei.com
    Reviewed-by: Jan Kara
    Signed-off-by: zhangyi (F)
    Signed-off-by: Theodore Ts'o
    Cc: stable@kernel.org

    zhangyi (F)
     

09 Feb, 2020

1 commit

  • Pull misc vfs updates from Al Viro:

    - bmap series from cmaiolino

    - getting rid of convolutions in copy_mount_options() (use a couple of
    copy_from_user() instead of the __get_user() crap)

    * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    saner copy_mount_options()
    fibmap: Reject negative block numbers
    fibmap: Use bmap instead of ->bmap method in ioctl_fibmap
    ecryptfs: drop direct calls to ->bmap
    cachefiles: drop direct usage of ->bmap method.
    fs: Enable bmap() function to properly return errors

    Linus Torvalds
     

04 Feb, 2020

1 commit

  • The most notable change is DEFINE_SHOW_ATTRIBUTE macro split in
    seq_file.h.

    Conversion rule is:

    llseek => proc_lseek
    unlocked_ioctl => proc_ioctl

    xxx => proc_xxx

    delete ".owner = THIS_MODULE" line

    [akpm@linux-foundation.org: fix drivers/isdn/capi/kcapi_proc.c]
    [sfr@canb.auug.org.au: fix kernel/sched/psi.c]
    Link: http://lkml.kernel.org/r/20200122180545.36222f50@canb.auug.org.au
    Link: http://lkml.kernel.org/r/20191225172546.GB13378@avx2
    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

03 Feb, 2020

1 commit

  • By now, bmap() will either return the physical block number related to
    the requested file offset or 0 in case of error or the requested offset
    maps into a hole.
    This patch makes the needed changes to enable bmap() to proper return
    errors, using the return value as an error return, and now, a pointer
    must be passed to bmap() to be filled with the mapped physical block.

    It will change the behavior of bmap() on return:

    - negative value in case of error
    - zero on success or map fell into a hole

    In case of a hole, the *block will be zero too

    Since this is a prep patch, by now, the only error return is -EINVAL if
    ->bmap doesn't exist.

    Reviewed-by: Christoph Hellwig
    Signed-off-by: Carlos Maiolino
    Signed-off-by: Al Viro

    Carlos Maiolino
     

25 Jan, 2020

4 commits

  • __jbd2_journal_abort_hard() is no longer used, so now we can merge
    __jbd2_journal_abort_hard() and __journal_abort_soft() these two
    functions into jbd2_journal_abort() and remove them.

    Signed-off-by: zhangyi (F)
    Reviewed-by: Jan Kara
    Link: https://lore.kernel.org/r/20191204124614.45424-5-yi.zhang@huawei.com
    Signed-off-by: Theodore Ts'o

    zhangyi (F)
     
  • Commit fb7c02445c49 ("ext4: pass -ESHUTDOWN code to jbd2 layer") want
    to allow jbd2 layer to distinguish shutdown journal abort from other
    error cases. So the ESHUTDOWN should be taken precedence over any other
    errno which has already been recoded after EXT4_FLAGS_SHUTDOWN is set,
    but it only update errno in the journal suoerblock now if the old errno
    is 0.

    Fixes: fb7c02445c49 ("ext4: pass -ESHUTDOWN code to jbd2 layer")
    Signed-off-by: zhangyi (F)
    Reviewed-by: Jan Kara
    Link: https://lore.kernel.org/r/20191204124614.45424-4-yi.zhang@huawei.com
    Signed-off-by: Theodore Ts'o

    zhangyi (F)
     
  • JBD2_REC_ERR flag used to indicate the errno has been updated when jbd2
    aborted, and then __ext4_abort() and ext4_handle_error() can invoke
    panic if ERRORS_PANIC is specified. But if the journal has been aborted
    with zero errno, jbd2_journal_abort() didn't set this flag so we can
    no longer panic. Fix this by always record the proper errno in the
    journal superblock.

    Fixes: 4327ba52afd03 ("ext4, jbd2: ensure entering into panic after recording an error in superblock")
    Signed-off-by: zhangyi (F)
    Reviewed-by: Jan Kara
    Link: https://lore.kernel.org/r/20191204124614.45424-3-yi.zhang@huawei.com
    Signed-off-by: Theodore Ts'o

    zhangyi (F)
     
  • We invoke jbd2_journal_abort() to abort the journal and record errno
    in the jbd2 superblock when committing journal transaction besides the
    failure on submitting the commit record. But there is no need for the
    case and we can also invoke jbd2_journal_abort() instead of
    __jbd2_journal_abort_hard().

    Fixes: 818d276ceb83a ("ext4: Add the journal checksum feature")
    Signed-off-by: zhangyi (F)
    Reviewed-by: Jan Kara
    Link: https://lore.kernel.org/r/20191204124614.45424-2-yi.zhang@huawei.com
    Signed-off-by: Theodore Ts'o

    zhangyi (F)