06 Apr, 2019

1 commit

  • commit 5e86bdda41534e17621d5a071b294943cae4376e upstream.

    Currently, we are releasing the indirect buffer where we are done with
    it in ext4_ind_remove_space(), so we can see the brelse() and
    BUFFER_TRACE() everywhere. It seems fragile and hard to read, and we
    may probably forget to release the buffer some day. This patch cleans
    up the code by putting of the code which releases the buffers to the
    end of the function.

    Signed-off-by: zhangyi (F)
    Signed-off-by: Theodore Ts'o
    Reviewed-by: Jan Kara
    Cc: Jari Ruusu
    Signed-off-by: Greg Kroah-Hartman

    zhangyi (F)
     

27 Mar, 2019

3 commits

  • commit 674a2b27234d1b7afcb0a9162e81b2e53aeef217 upstream.

    All indirect buffers get by ext4_find_shared() should be released no
    mater the branch should be freed or not. But now, we forget to release
    the lower depth indirect buffers when removing space from the same
    higher depth indirect block. It will lead to buffer leak and futher
    more, it may lead to quota information corruption when using old quota,
    consider the following case.

    - Create and mount an empty ext4 filesystem without extent and quota
    features,
    - quotacheck and enable the user & group quota,
    - Create some files and write some data to them, and then punch hole
    to some files of them, it may trigger the buffer leak problem
    mentioned above.
    - Disable quota and run quotacheck again, it will create two new
    aquota files and write the checked quota information to them, which
    probably may reuse the freed indirect block(the buffer and page
    cache was not freed) as data block.
    - Enable quota again, it will invoke
    vfs_load_quota_inode()->invalidate_bdev() to try to clean unused
    buffers and pagecache. Unfortunately, because of the buffer of quota
    data block is still referenced, quota code cannot read the up to date
    quota info from the device and lead to quota information corruption.

    This problem can be reproduced by xfstests generic/231 on ext3 file
    system or ext4 file system without extent and quota features.

    This patch fix this problem by releasing the missing indirect buffers,
    in ext4_ind_remove_space().

    Reported-by: Hulk Robot
    Signed-off-by: zhangyi (F)
    Signed-off-by: Theodore Ts'o
    Reviewed-by: Jan Kara
    Cc: stable@kernel.org
    Signed-off-by: Greg Kroah-Hartman

    zhangyi (F)
     
  • commit 372a03e01853f860560eade508794dd274e9b390 upstream.

    Ext4 needs to serialize unaligned direct AIO because the zeroing of
    partial blocks of two competing unaligned AIOs can result in data
    corruption.

    However it decides not to serialize if the potentially unaligned aio is
    past i_size with the rationale that no pending writes are possible past
    i_size. Unfortunately if the i_size is not block aligned and the second
    unaligned write lands past i_size, but still into the same block, it has
    the potential of corrupting the previous unaligned write to the same
    block.

    This is (very simplified) reproducer from Frank

    // 41472 = (10 * 4096) + 512
    // 37376 = 41472 - 4096

    ftruncate(fd, 41472);
    io_prep_pwrite(iocbs[0], fd, buf[0], 4096, 37376);
    io_prep_pwrite(iocbs[1], fd, buf[1], 4096, 41472);

    io_submit(io_ctx, 1, &iocbs[1]);
    io_submit(io_ctx, 1, &iocbs[2]);

    io_getevents(io_ctx, 2, 2, events, NULL);

    Without this patch the 512B range from 40960 up to the start of the
    second unaligned write (41472) is going to be zeroed overwriting the data
    written by the first write. This is a data corruption.

    00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    *
    00009200 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30
    *
    0000a000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    *
    0000a200 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31

    With this patch the data corruption is avoided because we will recognize
    the unaligned_aio and wait for the unwritten extent conversion.

    00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    *
    00009200 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30
    *
    0000a200 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31
    *
    0000b200

    Reported-by: Frank Sorenson
    Signed-off-by: Lukas Czerner
    Signed-off-by: Theodore Ts'o
    Fixes: e9e3bcecf44c ("ext4: serialize unaligned asynchronous DIO")
    Cc: stable@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman

    Lukas Czerner
     
  • commit fa30dde38aa8628c73a6dded7cb0bba38c27b576 upstream.

    We see the following NULL pointer dereference while running xfstests
    generic/475:
    BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
    PGD 8000000c84bad067 P4D 8000000c84bad067 PUD c84e62067 PMD 0
    Oops: 0000 [#1] SMP PTI
    CPU: 7 PID: 9886 Comm: fsstress Kdump: loaded Not tainted 5.0.0-rc8 #10
    RIP: 0010:ext4_do_update_inode+0x4ec/0x760
    ...
    Call Trace:
    ? jbd2_journal_get_write_access+0x42/0x50
    ? __ext4_journal_get_write_access+0x2c/0x70
    ? ext4_truncate+0x186/0x3f0
    ext4_mark_iloc_dirty+0x61/0x80
    ext4_mark_inode_dirty+0x62/0x1b0
    ext4_truncate+0x186/0x3f0
    ? unmap_mapping_pages+0x56/0x100
    ext4_setattr+0x817/0x8b0
    notify_change+0x1df/0x430
    do_truncate+0x5e/0x90
    ? generic_permission+0x12b/0x1a0

    This is triggered because the NULL pointer handle->h_transaction was
    dereferenced in function ext4_update_inode_fsync_trans().
    I found that the h_transaction was set to NULL in jbd2__journal_restart
    but failed to attached to a new transaction while the journal is aborted.

    Fix this by checking the handle before updating the inode.

    Fixes: b436b9bef84d ("ext4: Wait for proper transaction commit on fsync")
    Signed-off-by: Jiufei Xue
    Signed-off-by: Theodore Ts'o
    Reviewed-by: Joseph Qi
    Cc: stable@kernel.org
    Signed-off-by: Greg Kroah-Hartman

    Jiufei Xue
     

24 Mar, 2019

5 commits

  • commit f96c3ac8dfc24b4e38fc4c2eba5fea2107b929d1 upstream.

    When computing maximum size of filesystem possible with given number of
    group descriptor blocks, we forget to include s_first_data_block into
    the number of blocks. Thus for filesystems with non-zero
    s_first_data_block it can happen that computed maximum filesystem size
    is actually lower than current filesystem size which confuses the code
    and eventually leads to a BUG_ON in ext4_alloc_group_tables() hitting on
    flex_gd->count == 0. The problem can be reproduced like:

    truncate -s 100g /tmp/image
    mkfs.ext4 -b 1024 -E resize=262144 /tmp/image 32768
    mount -t ext4 -o loop /tmp/image /mnt
    resize2fs /dev/loop0 262145
    resize2fs /dev/loop0 300000

    Fix the problem by properly including s_first_data_block into the
    computed number of filesystem blocks.

    Fixes: 1c6bd7173d66 "ext4: convert file system to meta_bg if needed..."
    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o
    Cc: stable@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman

    Jan Kara
     
  • commit abdc644e8cbac2e9b19763680e5a7cf9bab2bee7 upstream.

    The reason is that while swapping two inode, we swap the flags too.
    Some flags such as EXT4_JOURNAL_DATA_FL can really confuse the things
    since we're not resetting the address operations structure. The
    simplest way to keep things sane is to restrict the flags that can be
    swapped.

    Signed-off-by: yangerkun
    Signed-off-by: Theodore Ts'o
    Cc: stable@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman

    yangerkun
     
  • commit aa507b5faf38784defe49f5e64605ac3c4425e26 upstream.

    While do swap between two inode, they swap i_data without update
    quota information. Also, swap_inode_boot_loader can do "revert"
    somtimes, so update the quota while all operations has been finished.

    Signed-off-by: yangerkun
    Signed-off-by: Theodore Ts'o
    Cc: stable@kernel.org
    Signed-off-by: Greg Kroah-Hartman

    yangerkun
     
  • commit a46c68a318b08f819047843abf349aeee5d10ac2 upstream.

    While do swap, we should make sure there has no new dirty page since we
    should swap i_data between two inode:
    1.We should lock i_mmap_sem with write to avoid new pagecache from mmap
    read/write;
    2.Change filemap_flush to filemap_write_and_wait and move them to the
    space protected by inode lock to avoid new pagecache from buffer read/write.

    Signed-off-by: yangerkun
    Signed-off-by: Theodore Ts'o
    Cc: stable@kernel.org
    Signed-off-by: Greg Kroah-Hartman

    yangerkun
     
  • commit 67a11611e1a5211f6569044fbf8150875764d1d0 upstream.

    Before really do swap between inode and boot inode, something need to
    check to avoid invalid or not permitted operation, like does this inode
    has inline data. But the condition check should be protected by inode
    lock to avoid change while swapping. Also some other condition will not
    change between swapping, but there has no problem to do this under inode
    lock.

    Signed-off-by: yangerkun
    Signed-off-by: Theodore Ts'o
    Cc: stable@kernel.org
    Signed-off-by: Greg Kroah-Hartman

    yangerkun
     

15 Feb, 2019

1 commit

  • commit 8fdd60f2ae3682caf2a7258626abc21eb4711892 upstream.

    This reverts commit ad211f3e94b314a910d4af03178a0b52a7d1ee0a.

    As Jan Kara pointed out, this change was unsafe since it means we lose
    the call to sync_mapping_buffers() in the nojournal case. The
    original point of the commit was avoid taking the inode mutex (since
    it causes a lockdep warning in generic/113); but we need the mutex in
    order to call sync_mapping_buffers().

    The real fix to this problem was discussed here:

    https://lore.kernel.org/lkml/20181025150540.259281-4-bvanassche@acm.org

    The proposed patch was to fix a syzbot complaint, but the problem can
    also demonstrated via "kvm-xfstests -c nojournal generic/113".
    Multiple solutions were discused in the e-mail thread, but none have
    landed in the kernel as of this writing. Anyway, commit
    ad211f3e94b314 is absolutely the wrong way to suppress the lockdep, so
    revert it.

    Fixes: ad211f3e94b314a910d4af03178a0b52a7d1ee0a ("ext4: use ext4_write_inode() when fsyncing w/o a journal")
    Signed-off-by: Theodore Ts'o
    Reported: Jan Kara
    Signed-off-by: Greg Kroah-Hartman

    Theodore Ts'o
     

17 Jan, 2019

6 commits

  • commit 191ce17876c9367819c4b0a25b503c0f6d9054d8 upstream.

    The check for special (reserved) inode number checks in __ext4_iget()
    was broken by commit 8a363970d1dc: ("ext4: avoid declaring fs
    inconsistent due to invalid file handles"). This was caused by a
    botched reversal of the sense of the flag now known as
    EXT4_IGET_SPECIAL (when it was previously named EXT4_IGET_NORMAL).
    Fix the logic appropriately.

    Fixes: 8a363970d1dc ("ext4: avoid declaring fs inconsistent...")
    Signed-off-by: Theodore Ts'o
    Reported-by: Dan Carpenter
    Cc: stable@kernel.org
    Signed-off-by: Greg Kroah-Hartman

    Theodore Ts'o
     
  • commit 95cb67138746451cc84cf8e516e14989746e93b0 upstream.

    We already using mapping_set_error() in fs/ext4/page_io.c, so all we
    need to do is to use file_check_and_advance_wb_err() when handling
    fsync() requests in ext4_sync_file().

    Signed-off-by: Theodore Ts'o
    Cc: stable@kernel.org
    Signed-off-by: Greg Kroah-Hartman

    Theodore Ts'o
     
  • commit ad211f3e94b314a910d4af03178a0b52a7d1ee0a upstream.

    In no-journal mode, we previously used __generic_file_fsync() in
    no-journal mode. This triggers a lockdep warning, and in addition,
    it's not safe to depend on the inode writeback mechanism in the case
    ext4. We can solve both problems by calling ext4_write_inode()
    directly.

    Signed-off-by: Theodore Ts'o
    Cc: stable@kernel.org
    Signed-off-by: Greg Kroah-Hartman

    Theodore Ts'o
     
  • commit e86807862e6880809f191c4cea7f88a489f0ed34 upstream.

    The xfstests generic/475 test switches the underlying device with
    dm-error while running a stress test. This results in a large number
    of file system errors, and since we can't lock the buffer head when
    marking the superblock dirty in the ext4_grp_locked_error() case, it's
    possible the superblock to be !buffer_uptodate() without
    buffer_write_io_error() being true.

    We need to set buffer_uptodate() before we call mark_buffer_dirty() or
    this will trigger a WARN_ON. It's safe to do this since the
    superblock must have been properly read into memory or the mount would
    have been successful. So if buffer_uptodate() is not set, we can
    safely assume that this happened due to a failed attempt to write the
    superblock.

    Signed-off-by: Theodore Ts'o
    Cc: stable@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman

    Theodore Ts'o
     
  • commit 2b08b1f12cd664dc7d5c84ead9ff25ae97ad5491 upstream.

    The ext4_inline_data_fiemap() function calls fiemap_fill_next_extent()
    while still holding the xattr semaphore. This is not necessary and it
    triggers a circular lockdep warning. This is because
    fiemap_fill_next_extent() could trigger a page fault when it writes
    into page which triggers a page fault. If that page is mmaped from
    the inline file in question, this could very well result in a
    deadlock.

    This problem can be reproduced using generic/519 with a file system
    configuration which has the inline_data feature enabled.

    Signed-off-by: Theodore Ts'o
    Cc: stable@kernel.org
    Signed-off-by: Greg Kroah-Hartman

    Theodore Ts'o
     
  • commit 812c0cab2c0dfad977605dbadf9148490ca5d93f upstream.

    There are enough credits reserved for most dioread_nolock writes;
    however, if the extent tree is sufficiently deep, and/or quota is
    enabled, the code was not allowing for all eventualities when
    reserving journal credits for the unwritten extent conversion.

    This problem can be seen using xfstests ext4/034:

    WARNING: CPU: 1 PID: 257 at fs/ext4/ext4_jbd2.c:271 __ext4_handle_dirty_metadata+0x10c/0x180
    Workqueue: ext4-rsv-conversion ext4_end_io_rsv_work
    RIP: 0010:__ext4_handle_dirty_metadata+0x10c/0x180
    ...
    EXT4-fs: ext4_free_blocks:4938: aborting transaction: error 28 in __ext4_handle_dirty_metadata
    EXT4: jbd2_journal_dirty_metadata failed: handle type 11 started at line 4921, credits 4/0, errcode -28
    EXT4-fs error (device dm-1) in ext4_free_blocks:4950: error 28

    Signed-off-by: Theodore Ts'o
    Cc: stable@kernel.org
    Signed-off-by: Greg Kroah-Hartman

    Theodore Ts'o
     

10 Jan, 2019

8 commits

  • commit 18f2c4fcebf2582f96cbd5f2238f4f354a0e4847 upstream.

    If the file system has been shut down or is read-only, then
    ext4_write_inode() needs to bail out early.

    Also use jbd2_complete_transaction() instead of ext4_force_commit() so
    we only force a commit if it is needed.

    Signed-off-by: Theodore Ts'o
    Cc: stable@kernel.org
    Signed-off-by: Greg Kroah-Hartman

    Theodore Ts'o
     
  • commit fde872682e175743e0c3ef939c89e3c6008a1529 upstream.

    Some time back, nfsd switched from calling vfs_fsync() to using a new
    commit_metadata() hook in export_operations(). If the file system did
    not provide a commit_metadata() hook, it fell back to using
    sync_inode_metadata(). Unfortunately doesn't work on all file
    systems. In particular, it doesn't work on ext4 due to how the inode
    gets journalled --- the VFS writeback code will not always call
    ext4_write_inode().

    So we need to provide our own ext4_nfs_commit_metdata() method which
    calls ext4_write_inode() directly.

    Google-Bug-Id: 121195940
    Signed-off-by: Theodore Ts'o
    Cc: stable@kernel.org
    Signed-off-by: Greg Kroah-Hartman

    Theodore Ts'o
     
  • commit 8a363970d1dc38c4ec4ad575c862f776f468d057 upstream.

    If we receive a file handle, either from NFS or open_by_handle_at(2),
    and it points at an inode which has not been initialized, and the file
    system has metadata checksums enabled, we shouldn't try to get the
    inode, discover the checksum is invalid, and then declare the file
    system as being inconsistent.

    This can be reproduced by creating a test file system via "mke2fs -t
    ext4 -O metadata_csum /tmp/foo.img 8M", mounting it, cd'ing into that
    directory, and then running the following program.

    #define _GNU_SOURCE
    #include

    struct handle {
    struct file_handle fh;
    unsigned char fid[MAX_HANDLE_SZ];
    };

    int main(int argc, char **argv)
    {
    struct handle h = {{8, 1 }, { 12, }};

    open_by_handle_at(AT_FDCWD, &h.fh, O_RDONLY);
    return 0;
    }

    Google-Bug-Id: 120690101
    Signed-off-by: Theodore Ts'o
    Cc: stable@kernel.org
    Signed-off-by: Greg Kroah-Hartman

    Theodore Ts'o
     
  • commit a805622a757b6d7f65def4141d29317d8e37b8a1 upstream.

    In ext4_expand_extra_isize_ea(), we calculate the total size of the
    xattr header, plus the xattr entries so we know how much of the
    beginning part of the xattrs to move when expanding the inode extra
    size. We need to include the terminating u32 at the end of the xattr
    entries, or else if there is uninitialized, non-zero bytes after the
    xattr entries and before the xattr values, the list of xattr entries
    won't be properly terminated.

    Reported-by: Steve Graham
    Signed-off-by: Theodore Ts'o
    Cc: stable@kernel.org
    Signed-off-by: Greg Kroah-Hartman

    Theodore Ts'o
     
  • commit e647e29196b7f802f8242c39ecb7cc937f5ef217 upstream.

    Commit e2b911c53584 ("ext4: clean up feature test macros with
    predicate functions") broke the EXT4_IOC_GROUP_ADD ioctl. This was
    not noticed since only very old versions of resize2fs (before
    e2fsprogs 1.42) use this ioctl. However, using a new kernel with an
    enterprise Linux userspace will cause attempts to use online resize to
    fail with "No reserved GDT blocks".

    Fixes: e2b911c53584 ("ext4: clean up feature test macros with predicate...")
    Cc: stable@kernel.org # v4.4
    Signed-off-by: Theodore Ts'o
    Signed-off-by: ruippan (潘睿)
    Signed-off-by: Greg Kroah-Hartman

    ruippan (潘睿)
     
  • commit 132d00becb31e88469334e1e62751c81345280e0 upstream.

    In case of error, ext4_try_to_write_inline_data() should unlock
    and release the page it holds.

    Fixes: f19d5870cbf7 ("ext4: add normal write support for inline data")
    Cc: stable@kernel.org # 3.8
    Signed-off-by: Maurizio Lombardi
    Signed-off-by: Theodore Ts'o
    Signed-off-by: Greg Kroah-Hartman

    Maurizio Lombardi
     
  • commit 61157b24e60fb3cd1f85f2c76a7b1d628f970144 upstream.

    The function frees qf_inode via iput but then pass qf_inode to
    lockdep_set_quota_inode on the failure path. This may result in a
    use-after-free bug. The patch frees df_inode only when it is never used.

    Fixes: daf647d2dd5 ("ext4: add lockdep annotations for i_data_sem")
    Cc: stable@kernel.org # 4.6
    Reviewed-by: Jan Kara
    Signed-off-by: Pan Bian
    Signed-off-by: Theodore Ts'o
    Signed-off-by: Greg Kroah-Hartman

    Pan Bian
     
  • commit fb265c9cb49e2074ddcdd4de99728aefdd3b3592 upstream.

    Today, when sb_bread() returns NULL, this can either be because of an
    I/O error or because the system failed to allocate the buffer. Since
    it's an old interface, changing would require changing many call
    sites.

    So instead we create our own ext4_sb_bread(), which also allows us to
    set the REQ_META flag.

    Also fixed a problem in the xattr code where a NULL return in a
    function could also mean that the xattr was not found, which could
    lead to the wrong error getting returned to userspace.

    Fixes: ac27a0ec112a ("ext4: initial copy of files from ext3")
    Cc: stable@kernel.org # 2.6.19
    Signed-off-by: Theodore Ts'o
    Signed-off-by: Greg Kroah-Hartman

    Theodore Ts'o
     

21 Nov, 2018

16 commits