30 Jun, 2016
2 commits
-
So far we were tracking only dependency on transaction commit due to
starting a new handle (which may require commit to start a new
transaction). Now add tracking also for other cases where we wait for
transaction commit. This way lockdep can catch deadlocks e. g. because we
call jbd2_journal_stop() for a synchronous handle with some locks held
which rank below transaction start.Signed-off-by: Jan Kara
Signed-off-by: Theodore Ts'o -
Currently lockdep map is tracked in each journal handle. To be able to
expand lockdep support to cover also other cases where we depend on
transaction commit and where handle is not available, move lockdep map
into struct journal_s. Since this makes the lockdep map shared for all
handles, we have to use rwsem_acquire_read() for acquisitions now.Signed-off-by: Jan Kara
Signed-off-by: Theodore Ts'o
06 May, 2016
1 commit
-
Commit bf6993276f74 ("jbd2: Use tracepoints for history file")
removed the members j_history, j_history_max and j_history_cur from struct
handle_s but the descriptions stayed lingering. Removing them.Signed-off-by: Luis de Bethencourt
Signed-off-by: Theodore Ts'o
Reviewed-by: Jan Kara
24 Apr, 2016
1 commit
-
Currently when filesystem needs to make sure data is on permanent
storage before committing a transaction it adds inode to transaction's
inode list. During transaction commit, jbd2 writes back all dirty
buffers that have allocated underlying blocks and waits for the IO to
finish. However when doing writeback for delayed allocated data, we
allocate blocks and immediately submit the data. Thus asking jbd2 to
write dirty pages just unnecessarily adds more work to jbd2 possibly
writing back other redirtied blocks.Add support to jbd2 to allow filesystem to ask jbd2 to only wait for
outstanding data writes before committing a transaction and thus avoid
unnecessary writes.Signed-off-by: Jan Kara
Signed-off-by: Theodore Ts'o
23 Feb, 2016
3 commits
-
Revoke and tag descriptor blocks are just different kinds of descriptor
blocks and thus have checksum in the same place. Unify computation and
checking of checksums for these.Reviewed-by: Darrick J. Wong
Signed-off-by: Jan Kara
Signed-off-by: Theodore Ts'o -
Descriptor block header is initialized in several places. Factor out the
common code into jbd2_journal_get_descriptor_buffer().Signed-off-by: Jan Kara
Signed-off-by: Theodore Ts'o -
jbd2_journal_write_revoke_records() takes journal pointer and write_op,
although journal can be obtained from the passed transaction and
write_op is always WRITE_SYNC. Remove these superfluous arguments.Signed-off-by: Jan Kara
Signed-off-by: Theodore Ts'o
19 Oct, 2015
1 commit
-
If a EXT4 filesystem utilizes JBD2 journaling and an error occurs, the
journaling will be aborted first and the error number will be recorded
into JBD2 superblock and, finally, the system will enter into the
panic state in "errors=panic" option. But, in the rare case, this
sequence is little twisted like the below figure and it will happen
that the system enters into panic state, which means the system reset
in mobile environment, before completion of recording an error in the
journal superblock. In this case, e2fsck cannot recognize that the
filesystem failure occurred in the previous run and the corruption
wouldn't be fixed.Task A Task B
ext4_handle_error()
-> jbd2_journal_abort()
-> __journal_abort_soft()
-> __jbd2_journal_abort_hard()
| -> journal->j_flags |= JBD2_ABORT;
|
| __ext4_abort()
| -> jbd2_journal_abort()
| | -> __journal_abort_soft()
| | -> if (journal->j_flags & JBD2_ABORT)
| | return;
| -> panic()
|
-> jbd2_journal_update_sb_errno()Tested-by: Hobin Woo
Signed-off-by: Daeho Jeong
Signed-off-by: Theodore Ts'o
Cc: stable@vger.kernel.org
18 Oct, 2015
2 commits
-
Create separate predicate functions to test/set/clear feature flags,
thereby replacing the wordy old macros. Furthermore, clean out the
places where we open-coded feature tests.Signed-off-by: Darrick J. Wong
Signed-off-by: Theodore Ts'o -
Instead of overloading EIO for CRC errors and corrupt structures,
return the same error codes that XFS returns for the same issues.Signed-off-by: Darrick J. Wong
Signed-off-by: Theodore Ts'o
15 Oct, 2015
1 commit
-
Change the journal's checksum functions to gate on whether or not the
crc32c driver is loaded, and gate the loading on the superblock bits.
This prevents a journal crash if someone loads a journal in no-csum
mode and then randomizes the superblock, thus flipping on the feature
bits.Tested-By: Nikolay Borisov
Reported-by: Nikolay Borisov
Signed-off-by: Darrick J. Wong
Signed-off-by: Theodore Ts'o
04 Sep, 2015
1 commit
-
Pull ext4 updates from Ted Ts'o:
"Pretty much all bug fixes and clean ups for 4.3, after a lot of
features and other churn going into 4.2"* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
Revert "ext4: remove block_device_ejected"
ext4: ratelimit the file system mounted message
ext4: silence a format string false positive
ext4: simplify some code in read_mmp_block()
ext4: don't manipulate recovery flag when freezing no-journal fs
jbd2: limit number of reserved credits
ext4 crypto: remove duplicate header file
ext4: update c/mtime on truncate up
jbd2: avoid infinite loop when destroying aborted journal
ext4, jbd2: add REQ_FUA flag when recording an error in the superblock
ext4 crypto: fix spelling typo in comment
ext4 crypto: exit cleanly if ext4_derive_key_aes() fails
ext4: reject journal options for ext2 mounts
ext4: implement cgroup writeback support
ext4: replace ext4_io_submit->io_op with ->io_wbc
ext4 crypto: check for too-short encrypted file names
ext4 crypto: use a jbd2 transaction when adding a crypto policy
jbd2: speedup jbd2_journal_dirty_metadata()
29 Jul, 2015
1 commit
-
Commit 6f6a6fda2945 "jbd2: fix ocfs2 corrupt when updating journal
superblock fails" changed jbd2_cleanup_journal_tail() to return EIO
when the journal is aborted. That makes logic in
jbd2_log_do_checkpoint() bail out which is fine, except that
jbd2_journal_destroy() expects jbd2_log_do_checkpoint() to always make
a progress in cleaning the journal. Without it jbd2_journal_destroy()
just loops in an infinite loop.Fix jbd2_journal_destroy() to cleanup journal checkpoint lists of
jbd2_log_do_checkpoint() fails with error.Reported-by: Eryu Guan
Tested-by: Eryu Guan
Fixes: 6f6a6fda294506dfe0e3e0a253bb2d2923f28f0a
Signed-off-by: Jan Kara
Signed-off-by: Theodore Ts'o
24 Jul, 2015
1 commit
-
The functionality of ext3 is fully supported by ext4 driver. Major
distributions (SUSE, RedHat) already use ext4 driver to handle ext3
filesystems for quite some time. There is some ugliness in mm resulting
from jbd cleaning buffers in a dirty page without cleaning page dirty
bit and also support for buffer bouncing in the block layer when stable
pages are required is there only because of jbd. So let's remove the
ext3 driver. This saves us some 28k lines of duplicated code.Acked-by: Theodore Ts'o
Signed-off-by: Jan Kara
16 Jun, 2015
1 commit
-
If updating journal superblock fails after journal data has been
flushed, the error is omitted and this will mislead the caller as a
normal case. In ocfs2, the checkpoint will be treated successfully
and the other node can get the lock to update. Since the sb_start is
still pointing to the old log block, it will rewrite the journal data
during journal recovery by the other node. Thus the new updates will
be overwritten and ocfs2 corrupts. So in above case we have to return
the error, and ocfs2_commit_cache will take care of the error and
prevent the other node to do update first. And only after recovering
journal it can do the new updates.The issue discussion mail can be found at:
https://oss.oracle.com/pipermail/ocfs2-devel/2015-June/010856.html
http://comments.gmane.org/gmane.comp.file-systems.ext4/48841[ Fixed bug in patch which allowed a non-negative error return from
jbd2_cleanup_journal_tail() to leak out of jbd2_fjournal_flush(); this
was causing xfstests ext4/306 to fail. -- Ted ]Reported-by: Yiwen Jiang
Signed-off-by: Joseph Qi
Signed-off-by: Theodore Ts'o
Tested-by: Yiwen Jiang
Cc: Junxiao Bi
Cc: stable@vger.kernel.org
15 Jan, 2015
1 commit
-
A quick search shows that there are no users, drop the
macro for both jbd and jbd2.Signed-off-by: Davidlohr Bueso
Cc: Jan Kara
Signed-off-by: Jan Kara
18 Sep, 2014
1 commit
-
__jbd2_journal_clean_checkpoint_list() returns number of buffers it
freed but noone was using the value so just stop doing that. This
also allows for simplifying the calling convention for
journal_clean_once_cp_list().Signed-off-by: Jan Kara
Signed-off-by: Theodore Ts'o
29 Aug, 2014
1 commit
-
It turns out that there are some serious problems with the on-disk
format of journal checksum v2. The foremost is that the function to
calculate descriptor tag size returns sizes that are too big. This
causes alignment issues on some architectures and is compounded by the
fact that some parts of jbd2 use the structure size (incorrectly) to
determine the presence of a 64bit journal instead of checking the
feature flags.Therefore, introduce journal checksum v3, which enlarges the
descriptor block tag format to allow for full 32-bit checksums of
journal blocks, fix the journal tag function to return the correct
sizes, and fix the jbd2 recovery code to use feature flags to
determine 64bitness.Add a few function helpers so we don't have to open-code quite so
many pieces.Switching to a 16-byte block size was found to increase journal size
overhead by a maximum of 0.1%, to convert a 32-bit journal with no
checksumming to a 32-bit journal with checksum v3 enabled.Signed-off-by: Darrick J. Wong
Reported-by: TR Reardon
Signed-off-by: Theodore Ts'o
Cc: stable@vger.kernel.org
01 Jul, 2013
1 commit
-
If jbd2_journal_restart() fails the handle will have been disconnected
from the current transaction. In this situation, the handle must not
be used for for any jbd2 function other than jbd2_journal_stop().
Enforce this with by treating a handle which has a NULL transaction
pointer as an aborted handle, and issue a kernel warning if
jbd2_journal_extent(), jbd2_journal_get_write_access(),
jbd2_journal_dirty_metadata(), etc. is called with an invalid handle.This commit also fixes a bug where jbd2_journal_stop() would trip over
a kernel jbd2 assertion check when trying to free an invalid handle.Also move the responsibility of setting current->journal_info to
start_this_handle(), simplifying the three users of this function.Signed-off-by: "Theodore Ts'o"
Reported-by: Younger Liu
Cc: Jan Kara
13 Jun, 2013
4 commits
-
Since the jbd_debug() is implemented with two separate printk()
calls, it can lead to corrupted and misleading debug output like
the following (see lines marked with "*"):[ 290.339362] (fs/jbd2/journal.c, 203): kjournald2: kjournald2 wakes
[ 290.339365] (fs/jbd2/journal.c, 155): kjournald2: commit_sequence=42103, commit_request=42104
[ 290.339369] (fs/jbd2/journal.c, 158): kjournald2: OK, requests differ
[* 290.339376] (fs/jbd2/journal.c, 648): jbd2_log_wait_commit:
[* 290.339379] (fs/jbd2/commit.c, 370): jbd2_journal_commit_transaction: JBD2: want 42104, j_commit_sequence=42103
[* 290.339382] JBD2: starting commit of transaction 42104
[ 290.339410] (fs/jbd2/revoke.c, 566): jbd2_journal_write_revoke_records: Wrote 0 revoke records
[ 290.376555] (fs/jbd2/commit.c, 1088): jbd2_journal_commit_transaction: JBD2: commit 42104 complete, head 42079i.e. the debug output from log_wait_commit and journal_commit_transaction
have become interleaved. The output should have been:(fs/jbd2/journal.c, 648): jbd2_log_wait_commit: JBD2: want 42104, j_commit_sequence=42103
(fs/jbd2/commit.c, 370): jbd2_journal_commit_transaction: JBD2: starting commit of transaction 42104It is expected that this is not easy to replicate -- I was only able
to cause it on preempt-rt kernels, and even then only under heavy
I/O load.Reported-by: Paul Gortmaker
Suggested-by: "Theodore Ts'o"
Signed-off-by: Paul Gortmaker
Signed-off-by: "Theodore Ts'o" -
The bit_spinlock functions are only used for the jbd_lock_bh_state
functions (and friends) in jbd_common.h and are not directly used
by either of jbd.h or jbd2.h content.The jbd_common file is new as of commit 446066724c36 ("jdb/jbd2: factor
out common functions from the jbd[2] header files") but common
(and isolated) headers were not considered for factoring at that time.Signed-off-by: Paul Gortmaker
Signed-off-by: "Theodore Ts'o" -
Inode's data or non journaled quota may be written w/o jounral so we
_must_ send a barrier at the end of ext4_sync_fs. But it can be
skipped if journal commit will do it for us.Also fix data integrity for nojournal mode.
Signed-off-by: Dmitry Monakhov
Signed-off-by: "Theodore Ts'o" -
Current implementation of jbd2_journal_force_commit() is suboptimal because
result in empty and useless commits. But callers just want to force and wait
any unfinished commits. We already have jbd2_journal_force_commit_nested()
which does exactly what we want, except we are guaranteed that we do not hold
journal transaction open.Signed-off-by: Dmitry Monakhov
Signed-off-by: "Theodore Ts'o"
05 Jun, 2013
6 commits
-
In some cases we cannot start a transaction because of locking
constraints and passing started transaction into those places is not
handy either because we could block transaction commit for too long.
Transaction reservation is designed to solve these issues. It
reserves a handle with given number of credits in the journal and the
handle can be later attached to the running transaction without
blocking on commit or checkpointing. Reserved handles do not block
transaction commit in any way, they only reduce maximum size of the
running transaction (because we have to always be prepared to
accomodate request for attaching reserved handle).Signed-off-by: Jan Kara
Signed-off-by: "Theodore Ts'o" -
j_wait_logspace and j_wait_checkpoint are unused. Remove them.
Reviewed-by: Zheng Liu
Signed-off-by: Jan Kara
Signed-off-by: "Theodore Ts'o" -
__jbd2_log_space_left() and jbd_space_needed() were kind of odd.
jbd_space_needed() accounted also credits needed for currently
committing transaction while it didn't account for credits needed for
control blocks. __jbd2_log_space_left() then accounted for control
blocks as a fraction of free space. Since results of these two
functions are always only compared against each other, this works
correct but is somewhat strange. Move the estimates so that
jbd_space_needed() returns number of blocks needed for a transaction
including control blocks and __jbd2_log_space_left() returns free
space in the journal (with the committing transaction already
subtracted). Rename functions to jbd2_log_space_left() and
jbd2_space_needed() while we are changing them.Reviewed-by: Zheng Liu
Signed-off-by: Jan Kara
Signed-off-by: "Theodore Ts'o" -
Currently when we add a buffer to a transaction, we wait until the
buffer is removed from BJ_Shadow list (so that we prevent any changes
to the buffer that is just written to the journal). This can take
unnecessarily long as a lot happens between the time the buffer is
submitted to the journal and the time when we remove the buffer from
BJ_Shadow list. (e.g. We wait for all data buffers in the
transaction, we issue a cache flush, etc.) Also this creates a
dependency of do_get_write_access() on transaction commit (namely
waiting for data IO to complete) which we want to avoid when
implementing transaction reservation.So we modify commit code to set new BH_Shadow flag when temporary
shadowing buffer is created and we clear that flag once IO on that
buffer is complete. This allows do_get_write_access() to wait only
for BH_Shadow bit and thus removes the dependency on data IO
completion.Reviewed-by: Zheng Liu
Signed-off-by: Jan Kara
Signed-off-by: "Theodore Ts'o" -
Similarly as for metadata buffers, also log descriptor buffers don't
really need the journal head. So strip it and remove BJ_LogCtl list.Reviewed-by: Zheng Liu
Signed-off-by: Jan Kara
Signed-off-by: "Theodore Ts'o" -
When writing metadata to the journal, we create temporary buffer heads
for that task. We also attach journal heads to these buffer heads but
the only purpose of the journal heads is to keep buffers linked in
transaction's BJ_IO list. We remove the need for journal heads by
reusing buffer_head's b_assoc_buffers list for that purpose. Also
since BJ_IO list is just a temporary list for transaction commit, we
use a private list in jbd2_journal_commit_transaction() for that thus
removing BJ_IO list from transaction completely.Reviewed-by: Zheng Liu
Signed-off-by: Jan Kara
Signed-off-by: "Theodore Ts'o"
22 May, 2013
1 commit
-
invalidatepage now accepts range to invalidate and there are two file
system using jbd2 also implementing punch hole feature which can benefit
from this. We need to implement the same thing for jbd2 layer in order to
allow those file system take benefit of this functionality.This commit adds length argument to the jbd2_journal_invalidatepage()
and updates all instances in ext4 and ocfs2.Signed-off-by: Lukas Czerner
Reviewed-by: Jan Kara
20 Apr, 2013
1 commit
-
The jbd2_alloc_handle() function is only called by new_handle(). So
this commit uses kmem_cache_zalloc() instead of
kmem_cache_alloc()/memset().Signed-off-by: Zheng Liu
Signed-off-by: "Theodore Ts'o"
04 Apr, 2013
2 commits
-
The following race is possible:
[kjournald2] other_task
jbd2_journal_commit_transaction()
j_state = T_FINISHED;
spin_unlock(&journal->j_list_lock);
->jbd2_journal_remove_checkpoint()
->jbd2_journal_free_transaction();
->kmem_cache_free(transaction)
->j_commit_callback(journal, transaction);
-> USE_AFTER_FREEWARNING: at lib/list_debug.c:62 __list_del_entry+0x1c0/0x250()
Hardware name:
list_del corruption. prev->next should be ffff88019a4ec198, but was 6b6b6b6b6b6b6b6b
Modules linked in: cpufreq_ondemand acpi_cpufreq freq_table mperf coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode sg xhci_hcd button sd_mod crc_t10dif aesni_intel ablk_helper cryptd lrw aes_x86_64 xts gf128mul ahci libahci pata_acpi ata_generic dm_mirror dm_region_hash dm_log dm_mod
Pid: 16400, comm: jbd2/dm-1-8 Tainted: G W 3.8.0-rc3+ #107
Call Trace:
[] warn_slowpath_common+0xad/0xf0
[] warn_slowpath_fmt+0x46/0x50
[] ? ext4_journal_commit_callback+0x99/0xc0
[] __list_del_entry+0x1c0/0x250
[] ext4_journal_commit_callback+0x6f/0xc0
[] jbd2_journal_commit_transaction+0x23a6/0x2570
[] ? try_to_del_timer_sync+0x82/0xa0
[] ? del_timer_sync+0x91/0x1e0
[] kjournald2+0x19f/0x6a0
[] ? wake_up_bit+0x40/0x40
[] ? bit_spin_lock+0x80/0x80
[] kthread+0x10e/0x120
[] ? __init_kthread_worker+0x70/0x70
[] ret_from_fork+0x7c/0xb0
[] ? __init_kthread_worker+0x70/0x70In order to demonstrace this issue one should mount ext4 with mount -o
discard option on SSD disk. This makes callback longer and race
window becomes wider.In order to fix this we should mark transaction as finished only after
callbacks have completedSigned-off-by: Dmitry Monakhov
Signed-off-by: "Theodore Ts'o"
Cc: stable@vger.kernel.org -
In the case where an inode has a very stale transaction id (tid) in
i_datasync_tid or i_sync_tid, it's possible that after a very large
(2**31) number of transactions, that the tid number space might wrap,
causing tid_geq()'s calculations to fail.Commit deeeaf13 "jbd2: fix fsync() tid wraparound bug", later modified
by commit e7b04ac0 "jbd2: don't wake kjournald unnecessarily",
attempted to fix this problem, but it only avoided kjournald spinning
forever by fixing the logic in jbd2_log_start_commit().Unfortunately, in the codepaths in fs/ext4/fsync.c and fs/ext4/inode.c
that might call jbd2_log_start_commit() with a stale tid, those
functions will subsequently call jbd2_log_wait_commit() with the same
stale tid, and then wait for a very long time. To fix this, we
replace the calls to jbd2_log_start_commit() and
jbd2_log_wait_commit() with a call to a new function,
jbd2_complete_transaction(), which will correctly handle stale tid's.As a bonus, jbd2_complete_transaction() will avoid locking
j_state_lock for writing unless a commit needs to be started. This
should have a small (but probably not measurable) improvement for
ext4's scalability.Signed-off-by: "Theodore Ts'o"
Reported-by: Ben Hutchings
Reported-by: George Barnett
Cc: stable@vger.kernel.org
10 Feb, 2013
1 commit
-
There are multiple reasons to move away from debugfs. First of all,
we are only using it for a single parameter, and it is much more
complicated to set up (some 30 lines of code compared to 3), and one
more thing that might fail while loading the jbd2 module.Secondly, as a module paramter it can be specified as a boot option if
jbd2 is built into the kernel, or as a parameter when the module is
loaded, and it can also be manipulated dynamically under
/sys/module/jbd2/parameters/jbd2_debug. So it is more flexible.Ultimately we want to move away from using jbd_debug() towards
tracepoints, but for now this is still a useful simplification of the
code base.Signed-off-by: "Theodore Ts'o"
09 Feb, 2013
1 commit
-
Handles which stay open a long time are problematic when it comes time
to close down a transaction so it can be committed. These tracepoints
will help us determine which ones are the problematic ones, and to
validate whether changes makes things better or worse.Signed-off-by: "Theodore Ts'o"
07 Feb, 2013
2 commits
-
This reverts commit 93737456d68ddcb86232f669b83da673dd12e351.
The cow-snapshots effort is no longer active, so remove these extra
fields to shrink down the handle structure.Signed-off-by: "Theodore Ts'o"
Reviewed-by: Jan Kara -
Track the delay between when we first request that the commit begin
and when it actually begins, so we can see how much of a gap exists.
In theory, this should just be the remaining scheduling quantuum of
the thread which requested the commit (assuming it was not a
synchronous operation which triggered the commit request) plus
scheduling overhead; however, it's possible that real time processes
might get in the way of letting the kjournald thread from executing.Signed-off-by: "Theodore Ts'o"
26 Dec, 2012
1 commit
-
We cannot wait for transaction commit in journal_unmap_buffer()
because we hold page lock which ranks below transaction start. We
solve the issue by bailing out of journal_unmap_buffer() and
jbd2_journal_invalidatepage() with -EBUSY. Caller is then responsible
for waiting for transaction commit to finish and try invalidation
again. Since the issue can happen only for page stradding i_size, it
is simple enough to manually call jbd2_journal_invalidatepage() for
such page from ext4_setattr(), check the return value and wait if
necessary.Signed-off-by: Jan Kara
Signed-off-by: "Theodore Ts'o"
09 Nov, 2012
2 commits
-
The use of variable length arrays in structs (VLAIS) in the Linux Kernel code
precludes the use of compilers which don't implement VLAIS (for instance the
Clang compiler). Since ctx is always a 32-bit CRC, hard coding a size of 4
bytes accomplishes the same thing without the use of VLAIS. This is the same
technique already employed in fs/ext4/ext4.hSigned-off-by: Mark Charlebois
Signed-off-by: Behan Webster
Signed-off-by: "Theodore Ts'o" -
ext4_handle_release_buffer() was intended to remove journal
write access from a buffer, but it doesn't actually do anything
at all other than add a BUFFER_TRACE point, but it's not reliably
used for that either. Remove all the associated dead code.Signed-off-by: Eric Sandeen
Signed-off-by: "Theodore Ts'o"
Reviewed-by: Carlos Maiolino