13 Dec, 2019

1 commit

  • commit add3efdd78b8a0478ce423bb9d4df6bd95e8b335 upstream.

    When number of free space in the journal is very low, the arithmetic in
    jbd2_log_space_left() could underflow resulting in very high number of
    free blocks and thus triggering assertion failure in transaction commit
    code complaining there's not enough space in the journal:

    J_ASSERT(journal->j_free > 1);

    Properly check for the low number of free blocks.

    CC: stable@vger.kernel.org
    Reviewed-by: Theodore Ts'o
    Signed-off-by: Jan Kara
    Link: https://lore.kernel.org/r/20191105164437.32602-1-jack@suse.cz
    Signed-off-by: Theodore Ts'o
    Signed-off-by: Greg Kroah-Hartman

    Jan Kara
     

25 Sep, 2019

1 commit

  • Since ext4/ocfs2 are using jbd2_inode dirty range scoping APIs now,
    jbd2_journal_inode_add_[write|wait] are not used any more, remove them.

    Link: http://lkml.kernel.org/r/1562977611-8412-2-git-send-email-joseph.qi@linux.alibaba.com
    Signed-off-by: Joseph Qi
    Reviewed-by: Ross Zwisler
    Acked-by: Changwei Ge
    Cc: Gang He
    Cc: Joel Becker
    Cc: Joseph Qi
    Cc: Jun Piao
    Cc: Junxiao Bi
    Cc: Mark Fasheh
    Cc: "Theodore Ts'o"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joseph Qi
     

21 Jun, 2019

2 commits

  • The journal_sync_buffer() function was never carried over from jbd to
    jbd2. So get rid of the vestigal declaration of this (non-existent)
    function.

    Signed-off-by: Theodore Ts'o
    Reviewed-by: Darrick J. Wong

    Theodore Ts'o
     
  • Currently both journal_submit_inode_data_buffers() and
    journal_finish_inode_data_buffers() operate on the entire address space
    of each of the inodes associated with a given journal entry. The
    consequence of this is that if we have an inode where we are constantly
    appending dirty pages we can end up waiting for an indefinite amount of
    time in journal_finish_inode_data_buffers() while we wait for all the
    pages under writeback to be written out.

    The easiest way to cause this type of workload is do just dd from
    /dev/zero to a file until it fills the entire filesystem. This can
    cause journal_finish_inode_data_buffers() to wait for the duration of
    the entire dd operation.

    We can improve this situation by scoping each of the inode dirty ranges
    associated with a given transaction. We do this via the jbd2_inode
    structure so that the scoping is contained within jbd2 and so that it
    follows the lifetime and locking rules for that structure.

    This allows us to limit the writeback & wait in
    journal_submit_inode_data_buffers() and
    journal_finish_inode_data_buffers() respectively to the dirty range for
    a given struct jdb2_inode, keeping us from waiting forever if the inode
    in question is still being appended to.

    Signed-off-by: Ross Zwisler
    Signed-off-by: Theodore Ts'o
    Reviewed-by: Jan Kara
    Cc: stable@vger.kernel.org

    Ross Zwisler
     

24 May, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this file is part of the linux kernel and is made available under
    the terms of the gnu general public license version 2 or at your
    option any later version incorporated herein by reference

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-or-later

    has been chosen to replace the boilerplate/reference in 18 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Richard Fontana
    Reviewed-by: Allison Randal
    Reviewed-by: Armijn Hemel
    Reviewed-by: Kate Stewart
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190520075211.321157221@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

20 May, 2019

1 commit

  • Pull ext4 fixes from Ted Ts'o:
    "Some bug fixes, and an update to the URL's for the final version of
    Unicode 12.1.0"

    * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
    ext4: avoid panic during forced reboot due to aborted journal
    ext4: fix block validity checks for journal inodes using indirect blocks
    unicode: update to Unicode 12.1.0 final
    unicode: add missing check for an error return from utf8lookup()
    ext4: fix miscellaneous sparse warnings
    ext4: unsigned int compared against zero
    ext4: fix use-after-free in dx_release()
    ext4: fix data corruption caused by overlapping unaligned and aligned IO
    jbd2: fix potential double free
    ext4: zero out the unused memory region in the extent tree block

    Linus Torvalds
     

11 May, 2019

1 commit

  • When failing from creating cache jbd2_inode_cache, we will destroy the
    previously created cache jbd2_handle_cache twice. This patch fixes
    this by moving each cache initialization/destruction to its own
    separate, individual function.

    Signed-off-by: Chengguang Xu
    Signed-off-by: Theodore Ts'o
    Cc: stable@kernel.org

    Chengguang Xu
     

25 Apr, 2019

1 commit

  • The flags field in 'struct shash_desc' never actually does anything.
    The only ostensibly supported flag is CRYPTO_TFM_REQ_MAY_SLEEP.
    However, no shash algorithm ever sleeps, making this flag a no-op.

    With this being the case, inevitably some users who can't sleep wrongly
    pass MAY_SLEEP. These would all need to be fixed if any shash algorithm
    actually started sleeping. For example, the shash_ahash_*() functions,
    which wrap a shash algorithm with the ahash API, pass through MAY_SLEEP
    from the ahash API to the shash API. However, the shash functions are
    called under kmap_atomic(), so actually they're assumed to never sleep.

    Even if it turns out that some users do need preemption points while
    hashing large buffers, we could easily provide a helper function
    crypto_shash_update_large() which divides the data into smaller chunks
    and calls crypto_shash_update() and cond_resched() for each chunk. It's
    not necessary to have a flag in 'struct shash_desc', nor is it necessary
    to make individual shash algorithms aware of this at all.

    Therefore, remove shash_desc::flags, and document that the
    crypto_shash_*() functions can be called from any context.

    Signed-off-by: Eric Biggers
    Signed-off-by: Herbert Xu

    Eric Biggers
     

04 Dec, 2018

2 commits

  • The following members of struct transaction_s aka transaction_t
    were turned into lock-free variables in the past:
    - t_updates
    - t_outstanding_credits
    - t_handle_count
    However, the documentation has not been updated yet.
    This commit replaced the annotated lock by [none].

    Found by LockDoc (Alexander Lochmann, Horst Schirmeier and Olaf Spinczyk)

    Signed-off-by: Alexander Lochmann
    Signed-off-by: Horst Schirmeier
    Signed-off-by: Theodore Ts'o

    Alexander Lochmann
     
  • We can hold j_state_lock for writing at the beginning of
    jbd2_journal_commit_transaction() for a rather long time (reportedly for
    30 ms) due cleaning revoke bits of all revoked buffers under it. The
    handling of revoke tables as well as cleaning of t_reserved_list, and
    checkpoint lists does not need j_state_lock for anything. It is only
    needed to prevent new handles from joining the transaction. Generally
    T_LOCKED transaction state prevents new handles from joining the
    transaction - except for reserved handles which have to allowed to join
    while we wait for other handles to complete.

    To prevent reserved handles from joining the transaction while cleaning
    up lists, add new transaction state T_SWITCH and watch for it when
    starting reserved handles. With this we can just drop the lock for
    operations that don't need it.

    Reported-and-tested-by: Adrian Hunter
    Suggested-by: "Theodore Y. Ts'o"
    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara
     

10 Jan, 2018

1 commit

  • Sphinx emits various (26) warnings when building make target 'htmldocs'.
    Currently struct definitions contain duplicate documentation, some as
    kernel-docs and some as standard c89 comments. We can reduce
    duplication while cleaning up the kernel docs.

    Move all kernel-docs to right above each struct member. Use the set of
    all existing comments (kernel-doc and c89). Add documentation for
    missing struct members and function arguments.

    Signed-off-by: Tobin C. Harding
    Signed-off-by: Theodore Ts'o
    Cc: stable@vger.kernel.org

    Tobin C. Harding
     

03 Nov, 2017

1 commit

  • We return IOMAP_F_DIRTY flag from ext4_iomap_begin() when asked to
    prepare blocks for writing and the inode has some uncommitted metadata
    changes. In the fault handler ext4_dax_fault() we then detect this case
    (through VM_FAULT_NEEDDSYNC return value) and call helper
    dax_finish_sync_fault() to flush metadata changes and insert page table
    entry. Note that this will also dirty corresponding radix tree entry
    which is what we want - fsync(2) will still provide data integrity
    guarantees for applications not using userspace flushing. And
    applications using userspace flushing can avoid calling fsync(2) and
    thus avoid the performance overhead.

    Reviewed-by: Ross Zwisler
    Signed-off-by: Jan Kara
    Signed-off-by: Dan Williams

    Jan Kara
     

04 May, 2017

1 commit

  • now that we have memalloc_nofs_{save,restore} api we can mark the whole
    transaction context as implicitly GFP_NOFS. All allocations will
    automatically inherit GFP_NOFS this way. This means that we do not have
    to mark any of those requests with GFP_NOFS and moreover all the
    ext4_kv[mz]alloc(GFP_NOFS) are also safe now because even the hardcoded
    GFP_KERNEL allocations deep inside the vmalloc will be NOFS now.

    [akpm@linux-foundation.org: tweak comments]
    Link: http://lkml.kernel.org/r/20170306131408.9828-7-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Reviewed-by: Jan Kara
    Cc: Dave Chinner
    Cc: Theodore Ts'o
    Cc: Chris Mason
    Cc: David Sterba
    Cc: Brian Foster
    Cc: Darrick J. Wong
    Cc: Nikolay Borisov
    Cc: Peter Zijlstra
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

30 Jun, 2016

2 commits

  • So far we were tracking only dependency on transaction commit due to
    starting a new handle (which may require commit to start a new
    transaction). Now add tracking also for other cases where we wait for
    transaction commit. This way lockdep can catch deadlocks e. g. because we
    call jbd2_journal_stop() for a synchronous handle with some locks held
    which rank below transaction start.

    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara
     
  • Currently lockdep map is tracked in each journal handle. To be able to
    expand lockdep support to cover also other cases where we depend on
    transaction commit and where handle is not available, move lockdep map
    into struct journal_s. Since this makes the lockdep map shared for all
    handles, we have to use rwsem_acquire_read() for acquisitions now.

    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara
     

06 May, 2016

1 commit


24 Apr, 2016

1 commit

  • Currently when filesystem needs to make sure data is on permanent
    storage before committing a transaction it adds inode to transaction's
    inode list. During transaction commit, jbd2 writes back all dirty
    buffers that have allocated underlying blocks and waits for the IO to
    finish. However when doing writeback for delayed allocated data, we
    allocate blocks and immediately submit the data. Thus asking jbd2 to
    write dirty pages just unnecessarily adds more work to jbd2 possibly
    writing back other redirtied blocks.

    Add support to jbd2 to allow filesystem to ask jbd2 to only wait for
    outstanding data writes before committing a transaction and thus avoid
    unnecessary writes.

    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara
     

23 Feb, 2016

3 commits


19 Oct, 2015

1 commit

  • If a EXT4 filesystem utilizes JBD2 journaling and an error occurs, the
    journaling will be aborted first and the error number will be recorded
    into JBD2 superblock and, finally, the system will enter into the
    panic state in "errors=panic" option. But, in the rare case, this
    sequence is little twisted like the below figure and it will happen
    that the system enters into panic state, which means the system reset
    in mobile environment, before completion of recording an error in the
    journal superblock. In this case, e2fsck cannot recognize that the
    filesystem failure occurred in the previous run and the corruption
    wouldn't be fixed.

    Task A Task B
    ext4_handle_error()
    -> jbd2_journal_abort()
    -> __journal_abort_soft()
    -> __jbd2_journal_abort_hard()
    | -> journal->j_flags |= JBD2_ABORT;
    |
    | __ext4_abort()
    | -> jbd2_journal_abort()
    | | -> __journal_abort_soft()
    | | -> if (journal->j_flags & JBD2_ABORT)
    | | return;
    | -> panic()
    |
    -> jbd2_journal_update_sb_errno()

    Tested-by: Hobin Woo
    Signed-off-by: Daeho Jeong
    Signed-off-by: Theodore Ts'o
    Cc: stable@vger.kernel.org

    Daeho Jeong
     

18 Oct, 2015

2 commits


15 Oct, 2015

1 commit

  • Change the journal's checksum functions to gate on whether or not the
    crc32c driver is loaded, and gate the loading on the superblock bits.
    This prevents a journal crash if someone loads a journal in no-csum
    mode and then randomizes the superblock, thus flipping on the feature
    bits.

    Tested-By: Nikolay Borisov
    Reported-by: Nikolay Borisov
    Signed-off-by: Darrick J. Wong
    Signed-off-by: Theodore Ts'o

    Darrick J. Wong
     

04 Sep, 2015

1 commit

  • Pull ext4 updates from Ted Ts'o:
    "Pretty much all bug fixes and clean ups for 4.3, after a lot of
    features and other churn going into 4.2"

    * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
    Revert "ext4: remove block_device_ejected"
    ext4: ratelimit the file system mounted message
    ext4: silence a format string false positive
    ext4: simplify some code in read_mmp_block()
    ext4: don't manipulate recovery flag when freezing no-journal fs
    jbd2: limit number of reserved credits
    ext4 crypto: remove duplicate header file
    ext4: update c/mtime on truncate up
    jbd2: avoid infinite loop when destroying aborted journal
    ext4, jbd2: add REQ_FUA flag when recording an error in the superblock
    ext4 crypto: fix spelling typo in comment
    ext4 crypto: exit cleanly if ext4_derive_key_aes() fails
    ext4: reject journal options for ext2 mounts
    ext4: implement cgroup writeback support
    ext4: replace ext4_io_submit->io_op with ->io_wbc
    ext4 crypto: check for too-short encrypted file names
    ext4 crypto: use a jbd2 transaction when adding a crypto policy
    jbd2: speedup jbd2_journal_dirty_metadata()

    Linus Torvalds
     

29 Jul, 2015

1 commit

  • Commit 6f6a6fda2945 "jbd2: fix ocfs2 corrupt when updating journal
    superblock fails" changed jbd2_cleanup_journal_tail() to return EIO
    when the journal is aborted. That makes logic in
    jbd2_log_do_checkpoint() bail out which is fine, except that
    jbd2_journal_destroy() expects jbd2_log_do_checkpoint() to always make
    a progress in cleaning the journal. Without it jbd2_journal_destroy()
    just loops in an infinite loop.

    Fix jbd2_journal_destroy() to cleanup journal checkpoint lists of
    jbd2_log_do_checkpoint() fails with error.

    Reported-by: Eryu Guan
    Tested-by: Eryu Guan
    Fixes: 6f6a6fda294506dfe0e3e0a253bb2d2923f28f0a
    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara
     

24 Jul, 2015

1 commit

  • The functionality of ext3 is fully supported by ext4 driver. Major
    distributions (SUSE, RedHat) already use ext4 driver to handle ext3
    filesystems for quite some time. There is some ugliness in mm resulting
    from jbd cleaning buffers in a dirty page without cleaning page dirty
    bit and also support for buffer bouncing in the block layer when stable
    pages are required is there only because of jbd. So let's remove the
    ext3 driver. This saves us some 28k lines of duplicated code.

    Acked-by: Theodore Ts'o
    Signed-off-by: Jan Kara

    Jan Kara
     

16 Jun, 2015

1 commit

  • If updating journal superblock fails after journal data has been
    flushed, the error is omitted and this will mislead the caller as a
    normal case. In ocfs2, the checkpoint will be treated successfully
    and the other node can get the lock to update. Since the sb_start is
    still pointing to the old log block, it will rewrite the journal data
    during journal recovery by the other node. Thus the new updates will
    be overwritten and ocfs2 corrupts. So in above case we have to return
    the error, and ocfs2_commit_cache will take care of the error and
    prevent the other node to do update first. And only after recovering
    journal it can do the new updates.

    The issue discussion mail can be found at:
    https://oss.oracle.com/pipermail/ocfs2-devel/2015-June/010856.html
    http://comments.gmane.org/gmane.comp.file-systems.ext4/48841

    [ Fixed bug in patch which allowed a non-negative error return from
    jbd2_cleanup_journal_tail() to leak out of jbd2_fjournal_flush(); this
    was causing xfstests ext4/306 to fail. -- Ted ]

    Reported-by: Yiwen Jiang
    Signed-off-by: Joseph Qi
    Signed-off-by: Theodore Ts'o
    Tested-by: Yiwen Jiang
    Cc: Junxiao Bi
    Cc: stable@vger.kernel.org

    Joseph Qi
     

15 Jan, 2015

1 commit


18 Sep, 2014

1 commit


29 Aug, 2014

1 commit

  • It turns out that there are some serious problems with the on-disk
    format of journal checksum v2. The foremost is that the function to
    calculate descriptor tag size returns sizes that are too big. This
    causes alignment issues on some architectures and is compounded by the
    fact that some parts of jbd2 use the structure size (incorrectly) to
    determine the presence of a 64bit journal instead of checking the
    feature flags.

    Therefore, introduce journal checksum v3, which enlarges the
    descriptor block tag format to allow for full 32-bit checksums of
    journal blocks, fix the journal tag function to return the correct
    sizes, and fix the jbd2 recovery code to use feature flags to
    determine 64bitness.

    Add a few function helpers so we don't have to open-code quite so
    many pieces.

    Switching to a 16-byte block size was found to increase journal size
    overhead by a maximum of 0.1%, to convert a 32-bit journal with no
    checksumming to a 32-bit journal with checksum v3 enabled.

    Signed-off-by: Darrick J. Wong
    Reported-by: TR Reardon
    Signed-off-by: Theodore Ts'o
    Cc: stable@vger.kernel.org

    Darrick J. Wong
     

01 Jul, 2013

1 commit

  • If jbd2_journal_restart() fails the handle will have been disconnected
    from the current transaction. In this situation, the handle must not
    be used for for any jbd2 function other than jbd2_journal_stop().
    Enforce this with by treating a handle which has a NULL transaction
    pointer as an aborted handle, and issue a kernel warning if
    jbd2_journal_extent(), jbd2_journal_get_write_access(),
    jbd2_journal_dirty_metadata(), etc. is called with an invalid handle.

    This commit also fixes a bug where jbd2_journal_stop() would trip over
    a kernel jbd2 assertion check when trying to free an invalid handle.

    Also move the responsibility of setting current->journal_info to
    start_this_handle(), simplifying the three users of this function.

    Signed-off-by: "Theodore Ts'o"
    Reported-by: Younger Liu
    Cc: Jan Kara

    Theodore Ts'o
     

13 Jun, 2013

4 commits

  • Since the jbd_debug() is implemented with two separate printk()
    calls, it can lead to corrupted and misleading debug output like
    the following (see lines marked with "*"):

    [ 290.339362] (fs/jbd2/journal.c, 203): kjournald2: kjournald2 wakes
    [ 290.339365] (fs/jbd2/journal.c, 155): kjournald2: commit_sequence=42103, commit_request=42104
    [ 290.339369] (fs/jbd2/journal.c, 158): kjournald2: OK, requests differ
    [* 290.339376] (fs/jbd2/journal.c, 648): jbd2_log_wait_commit:
    [* 290.339379] (fs/jbd2/commit.c, 370): jbd2_journal_commit_transaction: JBD2: want 42104, j_commit_sequence=42103
    [* 290.339382] JBD2: starting commit of transaction 42104
    [ 290.339410] (fs/jbd2/revoke.c, 566): jbd2_journal_write_revoke_records: Wrote 0 revoke records
    [ 290.376555] (fs/jbd2/commit.c, 1088): jbd2_journal_commit_transaction: JBD2: commit 42104 complete, head 42079

    i.e. the debug output from log_wait_commit and journal_commit_transaction
    have become interleaved. The output should have been:

    (fs/jbd2/journal.c, 648): jbd2_log_wait_commit: JBD2: want 42104, j_commit_sequence=42103
    (fs/jbd2/commit.c, 370): jbd2_journal_commit_transaction: JBD2: starting commit of transaction 42104

    It is expected that this is not easy to replicate -- I was only able
    to cause it on preempt-rt kernels, and even then only under heavy
    I/O load.

    Reported-by: Paul Gortmaker
    Suggested-by: "Theodore Ts'o"
    Signed-off-by: Paul Gortmaker
    Signed-off-by: "Theodore Ts'o"

    Paul Gortmaker
     
  • The bit_spinlock functions are only used for the jbd_lock_bh_state
    functions (and friends) in jbd_common.h and are not directly used
    by either of jbd.h or jbd2.h content.

    The jbd_common file is new as of commit 446066724c36 ("jdb/jbd2: factor
    out common functions from the jbd[2] header files") but common
    (and isolated) headers were not considered for factoring at that time.

    Signed-off-by: Paul Gortmaker
    Signed-off-by: "Theodore Ts'o"

    Paul Gortmaker
     
  • Inode's data or non journaled quota may be written w/o jounral so we
    _must_ send a barrier at the end of ext4_sync_fs. But it can be
    skipped if journal commit will do it for us.

    Also fix data integrity for nojournal mode.

    Signed-off-by: Dmitry Monakhov
    Signed-off-by: "Theodore Ts'o"

    Dmitry Monakhov
     
  • Current implementation of jbd2_journal_force_commit() is suboptimal because
    result in empty and useless commits. But callers just want to force and wait
    any unfinished commits. We already have jbd2_journal_force_commit_nested()
    which does exactly what we want, except we are guaranteed that we do not hold
    journal transaction open.

    Signed-off-by: Dmitry Monakhov
    Signed-off-by: "Theodore Ts'o"

    Dmitry Monakhov
     

05 Jun, 2013

4 commits

  • In some cases we cannot start a transaction because of locking
    constraints and passing started transaction into those places is not
    handy either because we could block transaction commit for too long.
    Transaction reservation is designed to solve these issues. It
    reserves a handle with given number of credits in the journal and the
    handle can be later attached to the running transaction without
    blocking on commit or checkpointing. Reserved handles do not block
    transaction commit in any way, they only reduce maximum size of the
    running transaction (because we have to always be prepared to
    accomodate request for attaching reserved handle).

    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     
  • j_wait_logspace and j_wait_checkpoint are unused. Remove them.

    Reviewed-by: Zheng Liu
    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     
  • __jbd2_log_space_left() and jbd_space_needed() were kind of odd.
    jbd_space_needed() accounted also credits needed for currently
    committing transaction while it didn't account for credits needed for
    control blocks. __jbd2_log_space_left() then accounted for control
    blocks as a fraction of free space. Since results of these two
    functions are always only compared against each other, this works
    correct but is somewhat strange. Move the estimates so that
    jbd_space_needed() returns number of blocks needed for a transaction
    including control blocks and __jbd2_log_space_left() returns free
    space in the journal (with the committing transaction already
    subtracted). Rename functions to jbd2_log_space_left() and
    jbd2_space_needed() while we are changing them.

    Reviewed-by: Zheng Liu
    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     
  • Currently when we add a buffer to a transaction, we wait until the
    buffer is removed from BJ_Shadow list (so that we prevent any changes
    to the buffer that is just written to the journal). This can take
    unnecessarily long as a lot happens between the time the buffer is
    submitted to the journal and the time when we remove the buffer from
    BJ_Shadow list. (e.g. We wait for all data buffers in the
    transaction, we issue a cache flush, etc.) Also this creates a
    dependency of do_get_write_access() on transaction commit (namely
    waiting for data IO to complete) which we want to avoid when
    implementing transaction reservation.

    So we modify commit code to set new BH_Shadow flag when temporary
    shadowing buffer is created and we clear that flag once IO on that
    buffer is complete. This allows do_get_write_access() to wait only
    for BH_Shadow bit and thus removes the dependency on data IO
    completion.

    Reviewed-by: Zheng Liu
    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Jan Kara