22 Oct, 2020

1 commit

  • This patch adds fast commit recovery path support for Ext4 file
    system. We add several helper functions that are similar in spirit to
    e2fsprogs journal recovery path handlers. Example of such functions
    include - a simple block allocator, idempotent block bitmap update
    function etc. Using these routines and the fast commit log in the fast
    commit area, the recovery path (ext4_fc_replay()) performs fast commit
    log recovery.

    Reported-by: kernel test robot
    Signed-off-by: Harshad Shirwadkar
    Link: https://lore.kernel.org/r/20201015203802.3597742-8-harshadshirwadkar@gmail.com
    Signed-off-by: Theodore Ts'o

    Harshad Shirwadkar
     

08 Aug, 2020

1 commit

  • There is a risk of filesystem inconsistency if we failed to async write
    back metadata buffer in the background. Because of current buffer's end
    io procedure is handled by end_buffer_async_write() in the block layer,
    and it only clear the buffer's uptodate flag and mark the write_io_error
    flag, so ext4 cannot detect such failure immediately. In most cases of
    getting metadata buffer (e.g. ext4_read_inode_bitmap()), although the
    buffer's data is actually uptodate, it may still read data from disk
    because the buffer's uptodate flag has been cleared. Finally, it may
    lead to on-disk filesystem inconsistency if reading old data from the
    disk successfully and write them out again.

    This patch detect bdev mapping->wb_err when getting journal's write
    access and mark the filesystem error if bdev's mapping->wb_err was
    increased, this could prevent further writing and potential
    inconsistency.

    Signed-off-by: zhangyi (F)
    Suggested-by: Jan Kara
    Link: https://lore.kernel.org/r/20200620025427.1756360-2-yi.zhang@huawei.com
    Signed-off-by: Theodore Ts'o

    zhangyi (F)
     

16 Apr, 2020

1 commit

  • Fix the following gcc warning:

    fs/ext4/ext4_jbd2.c:341:30: warning: variable 'es' set but not used [-Wunused-but-set-variable]
    struct ext4_super_block *es;
    ^~

    Fixes: 2ea2fc775321 ("ext4: save all error info in save_error_info() and drop ext4_set_errno()")
    Reported-by: Hulk Robot
    Signed-off-by: Jason Yan
    Link: https://lore.kernel.org/r/20200402034759.29957-1-yanaijie@huawei.com
    Signed-off-by: Theodore Ts'o

    Jason Yan
     

02 Apr, 2020

1 commit

  • Using a separate function, ext4_set_errno() to set the errno is
    problematic because it doesn't do the right thing once
    s_last_error_errorcode is non-zero. It's also less racy to set all of
    the error information all at once. (Also, as a bonus, it shrinks code
    size slightly.)

    Link: https://lore.kernel.org/r/20200329020404.686965-1-tytso@mit.edu
    Fixes: 878520ac45f9 ("ext4: save the error code which triggered...")
    Signed-off-by: Theodore Ts'o

    Theodore Ts'o
     

06 Mar, 2020

1 commit


18 Jan, 2020

1 commit

  • Determining an inode's journaling mode has gotten more complicated over
    time. Move ext4_inode_journal_mode() from an inline function into
    ext4_jbd2.c to reduce the compiled code size.

    Signed-off-by: Eric Biggers
    Link: https://lore.kernel.org/r/20191209233602.117778-1-ebiggers@kernel.org
    Signed-off-by: Theodore Ts'o
    Reviewed-by: Jan Kara

    Eric Biggers
     

27 Dec, 2019

1 commit

  • This allows the cause of an ext4_error() report to be categorized
    based on whether it was triggered due to an I/O error, or an memory
    allocation error, or other possible causes. Most errors are caused by
    a detected file system inconsistency, so the default code stored in
    the superblock will be EXT4_ERR_EFSCORRUPTED.

    Link: https://lore.kernel.org/r/20191204032335.7683-1-tytso@mit.edu
    Signed-off-by: Theodore Ts'o

    Theodore Ts'o
     

06 Nov, 2019

4 commits

  • So far we have reserved only relatively high fixed amount of revoke
    credits for each transaction. We over-reserved by large amount for most
    cases but when freeing large directories or files with data journalling,
    the fixed amount is not enough. In fact the worst case estimate is
    inconveniently large (maximum extent size) for freeing of one extent.

    We fix this by doing proper estimate of the amount of blocks that need
    to be revoked when removing blocks from the inode due to truncate or
    hole punching and otherwise reserve just a small amount of revoke
    credits for each transaction to accommodate freeing of xattrs block or
    so.

    Signed-off-by: Jan Kara
    Link: https://lore.kernel.org/r/20191105164437.32602-23-jack@suse.cz
    Signed-off-by: Theodore Ts'o

    Jan Kara
     
  • Extend functions for starting, extending, and restarting transaction
    handles to take number of revoke records handle must be able to
    accommodate. These functions then make sure transaction has enough
    credits to be able to store resulting revoke descriptor blocks. Also
    revoke code tracks number of revoke records created by a handle to catch
    situation where some place didn't reserve enough space for revoke
    records. Similarly to standard transaction credits, space for unused
    reserved revoke records is released when the handle is stopped.

    On the ext4 side we currently take a simplistic approach of reserving
    space for 1024 revoke records for any transaction. This grows amount of
    credits reserved for each handle only by a few and is enough for any
    normal workload so that we don't hit warnings in jbd2. We will refine
    the logic in following commits.

    Signed-off-by: Jan Kara
    Link: https://lore.kernel.org/r/20191105164437.32602-20-jack@suse.cz
    Signed-off-by: Theodore Ts'o

    Jan Kara
     
  • Provide accessor function to get number of credits available in a handle
    and use it from ext4. Later, computation of available credits won't be
    so straightforward.

    Reviewed-by: Theodore Ts'o
    Signed-off-by: Jan Kara
    Link: https://lore.kernel.org/r/20191105164437.32602-11-jack@suse.cz
    Signed-off-by: Theodore Ts'o

    Jan Kara
     
  • Provide ext4_journal_ensure_credits_fn() function to ensure transaction
    has given amount of credits and call helper function to prepare for
    restarting a transaction. This allows to remove some boilerplate code
    from various places, add proper error handling for the case where
    transaction extension or restart fails, and reduces following changes
    needed for proper revoke record reservation tracking.

    Signed-off-by: Jan Kara
    Link: https://lore.kernel.org/r/20191105164437.32602-10-jack@suse.cz
    Signed-off-by: Theodore Ts'o

    Jan Kara
     

19 Feb, 2018

1 commit


02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

17 Jul, 2017

1 commit

  • Firstly by applying the following with coccinelle's spatch:

    @@ expression SB; @@
    -SB->s_flags & MS_RDONLY
    +sb_rdonly(SB)

    to effect the conversion to sb_rdonly(sb), then by applying:

    @@ expression A, SB; @@
    (
    -(!sb_rdonly(SB)) && A
    +!sb_rdonly(SB) && A
    |
    -A != (sb_rdonly(SB))
    +A != sb_rdonly(SB)
    |
    -A == (sb_rdonly(SB))
    +A == sb_rdonly(SB)
    |
    -!(sb_rdonly(SB))
    +!sb_rdonly(SB)
    |
    -A && (sb_rdonly(SB))
    +A && sb_rdonly(SB)
    |
    -A || (sb_rdonly(SB))
    +A || sb_rdonly(SB)
    |
    -(sb_rdonly(SB)) != A
    +sb_rdonly(SB) != A
    |
    -(sb_rdonly(SB)) == A
    +sb_rdonly(SB) == A
    |
    -(sb_rdonly(SB)) && A
    +sb_rdonly(SB) && A
    |
    -(sb_rdonly(SB)) || A
    +sb_rdonly(SB) || A
    )

    @@ expression A, B, SB; @@
    (
    -(sb_rdonly(SB)) ? 1 : 0
    +sb_rdonly(SB)
    |
    -(sb_rdonly(SB)) ? A : B
    +sb_rdonly(SB) ? A : B
    )

    to remove left over excess bracketage and finally by applying:

    @@ expression A, SB; @@
    (
    -(A & MS_RDONLY) != sb_rdonly(SB)
    +(bool)(A & MS_RDONLY) != sb_rdonly(SB)
    |
    -(A & MS_RDONLY) == sb_rdonly(SB)
    +(bool)(A & MS_RDONLY) == sb_rdonly(SB)
    )

    to make comparisons against the result of sb_rdonly() (which is a bool)
    work correctly.

    Signed-off-by: David Howells

    David Howells
     

05 Feb, 2017

1 commit


18 Oct, 2015

1 commit

  • There is a use-after-free possibility in __ext4_journal_stop() in the
    case that we free the handle in the first jbd2_journal_stop() because
    we're referencing handle->h_err afterwards. This was introduced in
    9705acd63b125dee8b15c705216d7186daea4625 and it is wrong. Fix it by
    storing the handle->h_err value beforehand and avoid referencing
    potentially freed handle.

    Fixes: 9705acd63b125dee8b15c705216d7186daea4625
    Signed-off-by: Lukas Czerner
    Reviewed-by: Andreas Dilger
    Cc: stable@vger.kernel.org

    Lukas Czerner
     

15 May, 2015

1 commit

  • Currently when journal restart fails, we'll have the h_transaction of
    the handle set to NULL to indicate that the handle has been effectively
    aborted. We handle this situation quietly in the jbd2_journal_stop() and just
    free the handle and exit because everything else has been done before we
    attempted (and failed) to restart the journal.

    Unfortunately there are a number of problems with that approach
    introduced with commit

    41a5b913197c "jbd2: invalidate handle if jbd2_journal_restart()
    fails"

    First of all in ext4 jbd2_journal_stop() will be called through
    __ext4_journal_stop() where we would try to get a hold of the superblock
    by dereferencing h_transaction which in this case would lead to NULL
    pointer dereference and crash.

    In addition we're going to free the handle regardless of the refcount
    which is bad as well, because others up the call chain will still
    reference the handle so we might potentially reference already freed
    memory.

    Moreover it's expected that we'll get aborted handle as well as detached
    handle in some of the journalling function as the error propagates up
    the stack, so it's unnecessary to call WARN_ON every time we get
    detached handle.

    And finally we might leak some memory by forgetting to free reserved
    handle in jbd2_journal_stop() in the case where handle was detached from
    the transaction (h_transaction is NULL).

    Fix the NULL pointer dereference in __ext4_journal_stop() by just
    calling jbd2_journal_stop() quietly as suggested by Jan Kara. Also fix
    the potential memory leak in jbd2_journal_stop() and use proper
    handle refcounting before we attempt to free it to avoid use-after-free
    issues.

    And finally remove all WARN_ON(!transaction) from the code so that we do
    not get random traces when something goes wrong because when journal
    restart fails we will get to some of those functions.

    Cc: stable@vger.kernel.org
    Signed-off-by: Lukas Czerner
    Signed-off-by: Theodore Ts'o
    Reviewed-by: Jan Kara

    Lukas Czerner
     

02 Oct, 2014

1 commit


12 May, 2014

1 commit


13 Mar, 2014

1 commit


02 Dec, 2013

1 commit

  • While it's true that errors can only happen if there is a bug in
    jbd2_journal_dirty_metadata(), if a bug does happen, we need to halt
    the kernel or remount the file system read-only in order to avoid
    further data loss. The ext4_journal_abort_handle() function doesn't
    do any of this, and while it's likely that this call (since it doesn't
    adjust refcounts) will likely result in the file system eventually
    deadlocking since the current transaction will never be able to close,
    it's much cleaner to call let ext4's error handling system deal with
    this situation.

    There's a separate bug here which is that if certain jbd2 errors
    errors occur and file system is mounted errors=continue, the file
    system will probably eventually end grind to a halt as described
    above. But things have been this way in a long time, and usually when
    we have these sorts of errors it's pretty much a disaster --- and
    that's why the jbd2 layer aggressively retries memory allocations,
    which is the most likely cause of these jbd2 errors.

    Signed-off-by: "Theodore Ts'o"
    Reviewed-by: Jan Kara
    Cc: stable@vger.kernel.org

    Theodore Ts'o
     

12 Aug, 2013

1 commit

  • When jbd2_journal_dirty_metadata() returns error,
    __ext4_handle_dirty_metadata() stops the handle. However callers of this
    function do not count with that fact and still happily used now freed
    handle. This use after free can result in various issues but very likely
    we oops soon.

    The motivation of adding __ext4_journal_stop() into
    __ext4_handle_dirty_metadata() in commit 9ea7a0df seems to be only to
    improve error reporting. So replace __ext4_journal_stop() with
    ext4_journal_abort_handle() which was there before that commit and add
    WARN_ON_ONCE() to dump stack to provide useful information.

    Reported-by: Sage Weil
    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"
    Cc: stable@vger.kernel.org # 3.2+

    Jan Kara
     

05 Jun, 2013

2 commits

  • Reviewed-by: Zheng Liu
    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     
  • In some cases we cannot start a transaction because of locking
    constraints and passing started transaction into those places is not
    handy either because we could block transaction commit for too long.
    Transaction reservation is designed to solve these issues. It
    reserves a handle with given number of credits in the journal and the
    handle can be later attached to the running transaction without
    blocking on commit or checkpointing. Reserved handles do not block
    transaction commit in any way, they only reduce maximum size of the
    running transaction (because we have to always be prepared to
    accomodate request for attaching reserved handle).

    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     

22 Apr, 2013

1 commit


04 Apr, 2013

1 commit


09 Feb, 2013

2 commits

  • So we can better understand what bits of ext4 are responsible for
    long-running jbd2 handles, use jbd2__journal_start() so we can pass
    context information for logging purposes.

    The recommended way for finding the longer-running handles is:

    T=/sys/kernel/debug/tracing
    EVENT=$T/events/jbd2/jbd2_handle_stats
    echo "interval > 5" > $EVENT/filter
    echo 1 > $EVENT/enable

    ./run-my-fs-benchmark

    cat $T/trace > /tmp/problem-handles

    This will list handles that were active for longer than 20ms. Having
    longer-running handles is bad, because a commit started at the wrong
    time could stall for those 20+ milliseconds, which could delay an
    fsync() or an O_SYNC operation. Here is an example line from the
    trace file describing a handle which lived on for 311 jiffies, or over
    1.2 seconds:

    postmark-2917 [000] .... 196.435786: jbd2_handle_stats: dev 254,32
    tid 570 type 2 line_no 2541 interval 311 sync 0 requested_blocks 1
    dirtied_blocks 0

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     
  • Move the jbd2 wrapper functions which start and stop handles out of
    super.c, where they don't really logically belong, and into
    ext4_jbd2.c.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

10 Oct, 2012

1 commit

  • The function ext4_handle_dirty_super() was calculating the superblock
    on the wrong block data. As a result, when the superblock is modified
    while it is mounted (most commonly, when inodes are added or removed
    from the orphan list), the superblock checksum would be wrong. We
    didn't notice because the superblock *was* being correctly calculated
    in ext4_commit_super(), and this would get called when the file system
    was unmounted. So the problem only became obvious if the system
    crashed while the file system was mounted.

    Fix this by removing the poorly designed function signature for
    ext4_superblock_csum_set(); if it only took a single argument, the
    pointer to a struct superblock, the ambiguity which caused this
    mistake would have been impossible.

    Reported-by: George Spelvin
    Signed-off-by: "Theodore Ts'o"
    Cc: stable@vger.kernel.org

    Theodore Ts'o
     

23 Jul, 2012

2 commits

  • The '__ext4_handle_dirty_metadata()' does not need the 'now' argument
    anymore and we can kill it.

    Signed-off-by: Artem Bityutskiy
    Signed-off-by: "Theodore Ts'o"
    Reviewed-by: Jan Kara

    Artem Bityutskiy
     
  • This patch changes the 'ext4_handle_dirty_super()' function which
    submits the superblock for I/O in the following cases:

    1. When creating the first large file on a file system without
    EXT4_FEATURE_RO_COMPAT_LARGE_FILE feature.
    2. When re-sizing the file-system.
    3. When creating an xattr on a file-system without the
    EXT4_FEATURE_COMPAT_EXT_ATTR feature.

    If the file-system has journal enabled, the superblock is written via
    the journal. We do not modify this path.

    If the file-system has no journal, this function, falls back to just
    marking the superblock as dirty using the 's_dirt' superblock
    flag. This means that it delays the actual superblock I/O submission
    by 5 seconds (default setting). Namely, the 'sync_supers()' kernel
    thread will call 'ext4_write_super()' later and will actually submit
    the superblock for I/O.

    And this is the behavior this patch modifies: we stop using 's_dirt'
    and just mark the superblock buffer as dirty right away. Indeed, all 3
    cases above are extremely rare and it does not add any value to delay
    the I/O submission for them.

    Note: 'ext4_handle_dirty_super()' executes
    '__ext4_handle_dirty_super()' with 'now = 0'. This patch basically
    makes the 'now' argument unneeded and it will be deleted in one of the
    next patches.

    This patch also removes 's_dirt' condition on the unmount path because
    we never set it anymore, so we should not test it.

    Tested using xfstests for both journalled and non-journalled ext4.

    Signed-off-by: Artem Bityutskiy
    Signed-off-by: "Theodore Ts'o"
    Reviewed-by: Jan Kara

    Artem Bityutskiy
     

30 Apr, 2012

1 commit

  • Calculate and verify the superblock checksum. Since the UUID and
    block group number are embedded in each copy of the superblock, we
    need only checksum the entire block. Refactor some of the code to
    eliminate open-coding of the checksum update call.

    Signed-off-by: Darrick J. Wong
    Signed-off-by: "Theodore Ts'o"

    Darrick J. Wong
     

04 Sep, 2011

1 commit

  • Add debugging information in case jbd2_journal_dirty_metadata() is
    called with a buffer_head which didn't have
    jbd2_journal_get_write_access() called on it, or if the journal_head
    has the wrong transaction in it. In addition, return an error code.
    This won't change anything for ocfs2, which will BUG_ON() the non-zero
    exit code.

    For ext4, the caller of this function is ext4_handle_dirty_metadata(),
    and on seeing a non-zero return code, will call __ext4_journal_stop(),
    which will print the function and line number of the (buggy) calling
    function and abort the journal. This will allow us to recover instead
    of bug halting, which is better from a robustness and reliability
    point of view.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

09 May, 2011

1 commit

  • The block allocation code used to use jbd2_journal_get_undo_access as
    a way to make changes that wouldn't show up until the commit took
    place. The new multi-block allocation code has a its own way of
    preventing newly freed blocks from getting reused until the commit
    takes place (it avoids updating the buddy bitmaps until the commit is
    done), so we don't need to use jbd2_journal_get_undo_access(), which
    has extra overhead compared to jbd2_journal_get_write_access().

    There was one last vestigal use of ext4_journal_get_undo_access() in
    ext4_add_groupblocks(); change it to use ext4_journal_get_write_access()
    and then remove the ext4_journal_get_undo_access() support.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

27 Jul, 2010

2 commits


30 Jun, 2010

1 commit


29 Jun, 2010

1 commit


12 Jun, 2010

1 commit

  • We don't need to set s_dirt in most of the ext4 code when journaling
    is enabled. In ext3/4 some of the summary statistics for # of free
    inodes, blocks, and directories are calculated from the per-block
    group statistics when the file system is mounted or unmounted. As a
    result the superblock doesn't have to be updated, either via the
    journal or by setting s_dirt. There are a few exceptions, most
    notably when resizing the file system, where the superblock needs to
    be modified --- and in that case it should be done as a journalled
    operation if possible, and s_dirt set only in no-journal mode.

    This patch will optimize out some unneeded disk writes when using ext4
    with a journal.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

17 Feb, 2010

1 commit

  • Calls to ext4_handle_dirty_metadata should only pass in an inode
    pointer for inode-specific metadata, and not for shared metadata
    blocks such as inode table blocks, block group descriptors, the
    superblock, etc.

    The BUG_ON can get tripped when updating a special device (such as a
    block device) that is opened (so that i_mapping is set in
    fs/block_dev.c) and the file system is mounted in no journal mode.

    Addresses-Google-Bug: #2404870

    Signed-off-by: Curt Wohlgemuth
    Signed-off-by: "Theodore Ts'o"

    Curt Wohlgemuth