Eric Lee / smarc-ti-linux-kernel | Embedian Git Server

13 Dec, 2014

1 commit

9bfccec24 Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 ... Browse Code »

Pull ext4 updates from Ted Ts'o:
"Lots of bugs fixes, including Zheng and Jan's extent status shrinker
fixes, which should improve CPU utilization and potential soft lockups
under heavy memory pressure, and Eric Whitney's bigalloc fixes"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (26 commits)
ext4: ext4_da_convert_inline_data_to_extent drop locked page after error
ext4: fix suboptimal seek_{data,hole} extents traversial
ext4: ext4_inline_data_fiemap should respect callers argument
ext4: prevent fsreentrance deadlock for inline_data
ext4: forbid journal_async_commit in data=ordered mode
jbd2: remove unnecessary NULL check before iput()
ext4: Remove an unnecessary check for NULL before iput()
ext4: remove unneeded code in ext4_unlink
ext4: don't count external journal blocks as overhead
ext4: remove never taken branch from ext4_ext_shift_path_extents()
ext4: create nojournal_checksum mount option
ext4: update comments regarding ext4_delete_inode()
ext4: cleanup GFP flags inside resize path
ext4: introduce aging to extent status tree
ext4: cleanup flag definitions for extent status tree
ext4: limit number of scanned extents in status tree shrinker
ext4: move handling of list of shrinkable inodes into extent status code
ext4: change LRU to round-robin in extent status tree shrinker
ext4: cache extent hole in extent status tree for ext4_da_map_blocks()
ext4: fix block reservation for bigalloc filesystems
...

Linus Torvalds
2014-12-13 01:28:03 +0800

02 Dec, 2014

1 commit

32f386918 jbd2: fix regression where we fail to initialize checksum seed when loading ... Browse Code »

When we're enabling journal features, we cannot use the predicate
jbd2_journal_has_csum_v2or3() because we haven't yet set the sb
feature flag fields! Moreover, we just finished loading the shash
driver, so the test is unnecessary; calculate the seed always.

Without this patch, we fail to initialize the checksum seed the first
time we turn on journal_checksum, which means that all journal blocks
written during that first mount are corrupt. Transactions written
after the second mount will be fine, since the feature flag will be
set in the journal superblock. xfstests generic/{034,321,322} are the
regression tests.

(This is important for 3.18.)

Signed-off-by: Darrick J. Wong
Reported-by: Eric Whitney
Signed-off-by: Theodore Ts'o

Darrick J. Wong
2014-12-02 10:57:06 +0800

26 Nov, 2014

1 commit

d9f39d1e4 jbd2: remove unnecessary NULL check before iput() ... Browse Code »

Signed-off-by: Theodore Ts'o

Theodore Ts'o
2014-11-26 09:02:37 +0800

30 Oct, 2014

1 commit

d48458d4a jbd2: use a better hash function for the revoke table ... Browse Code »
2

The old hash function didn't work well for 64-bit block numbers, and
used undefined (negative) shift right behavior. Use the generic
64-bit hash function instead.

Signed-off-by: Theodore Ts'o
Reported-by: Andrey Ryabinin

Theodore Ts'o
2014-10-30 22:53:17 +0800

18 Sep, 2014

2 commits

50849db32 jbd2: simplify calling convention around __jbd2_journal_clean_checkpoint_list ... Browse Code »
5

__jbd2_journal_clean_checkpoint_list() returns number of buffers it
freed but noone was using the value so just stop doing that. This
also allows for simplifying the calling convention for
journal_clean_once_cp_list().

Signed-off-by: Jan Kara
Signed-off-by: Theodore Ts'o

Jan Kara
2014-09-18 12:58:12 +0800
cc97f1a7c jbd2: avoid pointless scanning of checkpoint lists ... Browse Code »

Yuanhan has reported that when he is running fsync(2) heavy workload
creating new files over ramdisk, significant amount of time is spent in
__jbd2_journal_clean_checkpoint_list() trying to clean old transactions
(but they cannot be cleaned up because flusher hasn't yet checkpointed
those buffers). The workload can be generated by:
fs_mark -d /fs/ram0/1 -D 2 -N 2560 -n 1000000 -L 1 -S 1 -s 4096

Reduce the amount of scanning by stopping to scan the transaction list
once we find a transaction that cannot be checkpointed. Note that this
way of cleaning is still enough to keep freeing space in the journal
after fully checkpointed transactions.

Reported-and-tested-by: Yuanhan Liu
Signed-off-by: Jan Kara
Signed-off-by: Theodore Ts'o

Jan Kara
2014-09-18 12:42:16 +0800

17 Sep, 2014

2 commits

1245799f7 jbd2: jbd2_log_wait_for_space improve error detetcion ... Browse Code »

If EIO happens after we have dropped j_state_lock, we won't notice
that the journal has been aborted. So it is reasonable to move this
check after we have grabbed the j_checkpoint_mutex and re-grabbed the
j_state_lock. This patch helps to prevent false positive complain
after EIO.

#DMESG:
__jbd2_log_wait_for_space: needed 8448 blocks and only had 8386 space available
__jbd2_log_wait_for_space: no way to get more journal space in ram1-8
------------[ cut here ]------------
WARNING: CPU: 15 PID: 6739 at fs/jbd2/checkpoint.c:168 __jbd2_log_wait_for_space+0x188/0x200()
Modules linked in: brd iTCO_wdt lpc_ich mfd_core igb ptp dm_mirror dm_region_hash dm_log dm_mod
CPU: 15 PID: 6739 Comm: fsstress Tainted: G W 3.17.0-rc2-00429-g684de57 #139
Hardware name: Intel Corporation W2600CR/W2600CR, BIOS SE5C600.86B.99.99.x028.061320111235 06/13/2011
00000000000000a8 ffff88077aaab878 ffffffff815c1a8c 00000000000000a8
0000000000000000 ffff88077aaab8b8 ffffffff8106ce8c ffff88077aaab898
ffff8807c57e6000 ffff8807c57e6028 0000000000002100 ffff8807c57e62f0
Call Trace:
[] dump_stack+0x51/0x6d
[] warn_slowpath_common+0x8c/0xc0
[] warn_slowpath_null+0x1a/0x20
[] __jbd2_log_wait_for_space+0x188/0x200
[] start_this_handle+0x4da/0x7b0
[] ? local_clock+0x25/0x30
[] ? lockdep_init_map+0xe7/0x180
[] jbd2__journal_start+0xdc/0x1d0
[] ? __ext4_new_inode+0x7f4/0x1330
[] __ext4_journal_start_sb+0xf8/0x110
[] __ext4_new_inode+0x7f4/0x1330
[] ? lock_release_holdtime+0x29/0x190
[] ext4_create+0x8b/0x150
[] vfs_create+0x7b/0xb0
[] do_last+0x7db/0xcf0
[] ? inode_permission+0x4d/0x50
[] path_openat+0x242/0x590
[] ? __alloc_fd+0x36/0x140
[] do_filp_open+0x4a/0xb0
[] ? __alloc_fd+0x121/0x140
[] do_sys_open+0x170/0x220
[] SyS_open+0x1e/0x20
[] SyS_creat+0x16/0x20
[] system_call_fastpath+0x16/0x1b
---[ end trace cd71c831f82059db ]---

Signed-off-by: Dmitry Monakhov
Signed-off-by: Theodore Ts'o

Dmitry Monakhov
2014-09-17 02:50:50 +0800
064d83892 jbd2: free bh when descriptor block checksum fails ... Browse Code »
5

Free the buffer head if the journal descriptor block fails checksum
verification.

This is the jbd2 port of the e2fsprogs patch "e2fsck: free bh on csum
verify error in do_one_pass".

Signed-off-by: Darrick J. Wong
Signed-off-by: Theodore Ts'o
Reviewed-by: Eric Sandeen
Cc: stable@vger.kernel.org

Darrick J. Wong
2014-09-17 02:43:09 +0800

11 Sep, 2014

1 commit

feb8c6d3d jbd2: fix journal checksum feature flag handling ... Browse Code »

Clear all three journal checksum feature flags before turning on
whichever journal checksum options we want. Rearrange the error
checking so that newer flags get complained about first.

Reported-by: TR Reardon
Signed-off-by: Darrick J. Wong
Signed-off-by: Theodore Ts'o

Darrick J. Wong
2014-09-11 23:38:21 +0800

05 Sep, 2014

3 commits

a49058fab jbd/jbd2: use non-movable memory for the jbd superblock ... Browse Code »

Sicne the jbd/jbd2 superblock is not released until the file system is
unmounted, allocate the buffer cache from the non-moveable area to
allow page migration and CMA allocations to more easily succeed.

Signed-off-by: Gioh Kim
Signed-off-by: Theodore Ts'o
Reviewed-by: Jan Kara

Gioh Kim
2014-09-05 10:36:35 +0800
0e5ecf0a7 jbd2: optimize jbd2_log_do_checkpoint() a bit ... Browse Code »

When we discover written out buffer in transaction checkpoint list we
don't have to recheck validity of a transaction. Either this is the
last buffer in a transaction - and then we are done - or this isn't
and then we can just take another buffer from the checkpoint list
without dropping j_list_lock.

Signed-off-by: Jan Kara
Signed-off-by: Theodore Ts'o

Jan Kara
2014-09-05 06:09:29 +0800
dc6e8d669 jbd2: don't call get_bh() before calling __jbd2_journal_remove_checkpoint() ... Browse Code »

The __jbd2_journal_remove_checkpoint() doesn't require an elevated
b_count; indeed, until the jh structure gets released by the call to
jbd2_journal_put_journal_head(), the bh's b_count is elevated by
virtue of the existence of the jh structure.

Suggested-by: Jan Kara
Reviewed-by: Jan Kara
Signed-off-by: Theodore Ts'o

Theodore Ts'o
2014-09-05 06:09:22 +0800

02 Sep, 2014

2 commits

88fe1acb5 jbd2: fold __wait_cp_io into jbd2_log_do_checkpoint() ... Browse Code »

__wait_cp_io() is only called by jbd2_log_do_checkpoint(). Fold it in
to make it a bit easier to understand.

Signed-off-by: Theodore Ts'o

Theodore Ts'o
2014-09-02 09:26:09 +0800
be1158cc6 jbd2: fold __process_buffer() into jbd2_log_do_checkpoint() ... Browse Code »

__process_buffer() is only called by jbd2_log_do_checkpoint(), and it
had a very complex locking protocol where it would be called with the
j_list_lock, and sometimes exit with the lock held (if the return code
was 0), or release the lock.

This was confusing both to humans and to smatch (which erronously
complained that the lock was taken twice).

Folding __process_buffer() to the caller allows us to simplify the
control flow, making the resulting function easier to read and reason
about, and dropping the compiled size of fs/jbd2/checkpoint.c by 150
bytes (over 4% of the text size).

Signed-off-by: Theodore Ts'o
Reviewed-by: Jan Kara

Theodore Ts'o
2014-09-02 09:19:01 +0800

29 Aug, 2014

2 commits

db9ee2203 jbd2: fix descriptor block size handling errors with journal_csum ... Browse Code »
6

It turns out that there are some serious problems with the on-disk
format of journal checksum v2. The foremost is that the function to
calculate descriptor tag size returns sizes that are too big. This
causes alignment issues on some architectures and is compounded by the
fact that some parts of jbd2 use the structure size (incorrectly) to
determine the presence of a 64bit journal instead of checking the
feature flags.

Therefore, introduce journal checksum v3, which enlarges the
descriptor block tag format to allow for full 32-bit checksums of
journal blocks, fix the journal tag function to return the correct
sizes, and fix the jbd2 recovery code to use feature flags to
determine 64bitness.

Add a few function helpers so we don't have to open-code quite so
many pieces.

Switching to a 16-byte block size was found to increase journal size
overhead by a maximum of 0.1%, to convert a 32-bit journal with no
checksumming to a 32-bit journal with checksum v3 enabled.

Signed-off-by: Darrick J. Wong
Reported-by: TR Reardon
Signed-off-by: Theodore Ts'o
Cc: stable@vger.kernel.org

Darrick J. Wong
2014-08-29 10:22:29 +0800
022eaa751 jbd2: fix infinite loop when recovering corrupt journal blocks ... Browse Code »
6

When recovering the journal, don't fall into an infinite loop if we
encounter a corrupt journal block. Instead, just skip the block and
return an error, which fails the mount and thus forces the user to run
a full filesystem fsck.

Signed-off-by: Darrick J. Wong
Signed-off-by: Theodore Ts'o
Cc: stable@vger.kernel.org

Darrick J. Wong
2014-08-29 10:22:28 +0800

16 Jul, 2014

1 commit

743162013 sched: Remove proliferation of wait_on_bit() action functions ... Browse Code »
23

The current "wait_on_bit" interface requires an 'action'
function to be provided which does the actual waiting.
There are over 20 such functions, many of them identical.
Most cases can be satisfied by one of just two functions, one
which uses io_schedule() and one which just uses schedule().

So:
Rename wait_on_bit and wait_on_bit_lock to
wait_on_bit_action and wait_on_bit_lock_action
to make it explicit that they need an action function.

Introduce new wait_on_bit{,_lock} and wait_on_bit{,_lock}_io
which are *not* given an action function but implicitly use
a standard one.
The decision to error-out if a signal is pending is now made
based on the 'mode' argument rather than being encoded in the action
function.

All instances of the old wait_on_bit and wait_on_bit_lock which
can use the new version have been changed accordingly and their
action functions have been discarded.
wait_on_bit{_lock} does not return any specific error code in the
event of a signal so the caller must check for non-zero and
interpolate their own error code as appropriate.

The wait_on_bit() call in __fscache_wait_on_invalidate() was
ambiguous as it specified TASK_UNINTERRUPTIBLE but used
fscache_wait_bit_interruptible as an action function.
David Howells confirms this should be uniformly
"uninterruptible"

The main remaining user of wait_on_bit{,_lock}_action is NFS
which needs to use a freezer-aware schedule() call.

A comment in fs/gfs2/glock.c notes that having multiple 'action'
functions is useful as they display differently in the 'wchan'
field of 'ps'. (and /proc/$PID/wchan).
As the new bit_wait{,_io} functions are tagged "__sched", they
will not show up at all, but something higher in the stack. So
the distinction will still be visible, only with different
function names (gds2_glock_wait versus gfs2_glock_dq_wait in the
gfs2/glock.c case).

Since first version of this patch (against 3.15) two new action
functions appeared, on in NFS and one in CIFS. CIFS also now
uses an action function that makes the same freezer aware
schedule call as NFS.

Signed-off-by: NeilBrown
Acked-by: David Howells (fscache, keys)
Acked-by: Steven Whitehouse (gfs2)
Acked-by: Peter Zijlstra
Cc: Oleg Nesterov
Cc: Steve French
Cc: Linus Torvalds
Link: http://lkml.kernel.org/r/20140707051603.28027.72349.stgit@notabene.brown
Signed-off-by: Ingo Molnar

NeilBrown
2014-07-16 21:10:39 +0800

06 Jul, 2014

1 commit

5dd214248 ext4: disable synchronous transaction batching if max_batch_time==0 ... Browse Code »
5

The mount manpage says of the max_batch_time option,

This optimization can be turned off entirely
by setting max_batch_time to 0.

But the code doesn't do that. So fix the code to do
that.

Signed-off-by: Eric Sandeen
Signed-off-by: Theodore Ts'o
Cc: stable@vger.kernel.org

Eric Sandeen
2014-07-06 07:18:22 +0800

18 Apr, 2014

1 commit

4e857c58e arch: Mass conversion of smp_mb__*() ... Browse Code »

Mostly scripted conversion of the smp_mb__* barriers.

Signed-off-by: Peter Zijlstra
Acked-by: Paul E. McKenney
Link: http://lkml.kernel.org/n/tip-55dhyhocezdw1dg7u19hmh1u@git.kernel.org
Cc: Linus Torvalds
Cc: linux-arch@vger.kernel.org
Signed-off-by: Ingo Molnar

Peter Zijlstra
2014-04-18 20:20:48 +0800

13 Mar, 2014

1 commit

66a4cb187 jbd2: improve error messages for inconsistent journal heads ... Browse Code »

Fix up error messages printed when the transaction pointers in a
journal head are inconsistent. This improves the error messages which
are printed when running xfstests generic/068 in data=journal mode.
See the bug report at: https://bugzilla.kernel.org/show_bug.cgi?id=60786

Signed-off-by: "Theodore Ts'o"

Theodore Ts'o
2014-03-13 04:38:03 +0800

09 Mar, 2014

7 commits

0bfea8118 jbd2: minimize region locked by j_list_lock in jbd2_journal_forget() ... Browse Code »

It's not needed until we start trying to modifying fields in the
journal_head which are protected by j_list_lock.

Signed-off-by: "Theodore Ts'o"

Theodore Ts'o
2014-03-09 13:56:58 +0800
6e4862a5b jbd2: minimize region locked by j_list_lock in journal_get_create_access() ... Browse Code »
2

It's not needed until we start trying to modifying fields in the
journal_head which are protected by j_list_lock.

Signed-off-by: "Theodore Ts'o"

Theodore Ts'o
2014-03-09 13:46:23 +0800
d2eb0b998 jbd2: check jh->b_transaction without taking j_list_lock ... Browse Code »

jh->b_transaction is adequately protected for reading by the
jbd_lock_bh_state(bh), so we don't need to take j_list_lock in
__journal_try_to_free_buffer().

Signed-off-by: "Theodore Ts'o"

Theodore Ts'o
2014-03-09 13:07:19 +0800
d4e839d4a jbd2: add transaction to checkpoint list earlier ... Browse Code »

We don't otherwise need j_list_lock during the rest of commit phase
#7, so add the transaction to the checkpoint list at the very end of
commit phase #6. This allows us to drop j_list_lock earlier, which is
a good thing since it is a super hot lock.

Signed-off-by: "Theodore Ts'o"

Theodore Ts'o
2014-03-09 11:34:10 +0800
42cf3452d jbd2: calculate statistics without holding j_state_lock and j_list_lock ... Browse Code »

The two hottest locks, and thus the biggest scalability bottlenecks,
in the jbd2 layer, are the j_list_lock and j_state_lock. This has
inspired some people to do some truly unnatural things[1].

[1] https://www.usenix.org/system/files/conference/fast14/fast14-paper_kang.pdf

We don't need to be holding both j_state_lock and j_list_lock while
calculating the journal statistics, so move those calculations to the
very end of jbd2_journal_commit_transaction.

Signed-off-by: "Theodore Ts'o"

Theodore Ts'o
2014-03-09 08:51:16 +0800
3469a32a1 jbd2: don't hold j_state_lock while calling wake_up() ... Browse Code »

The j_state_lock is one of the hottest locks in the jbd2 layer and
thus one of its scalability bottlenecks.

We don't need to be holding the j_state_lock while we are calling
wake_up(&journal->j_wait_commit), so release the lock a little bit
earlier.

Signed-off-by: "Theodore Ts'o"

Theodore Ts'o
2014-03-09 08:11:36 +0800
df3c1e9a0 jbd2: don't unplug after writing revoke records ... Browse Code »

During commit process, keep the block device plugged after we are done
writing the revoke records, until we are finished writing the rest of
the commit records in the journal. This will allow most of the
journal blocks to be written in a single I/O operation, instead of
separating the the revoke blocks from the rest of the journal blocks.

Signed-off-by: "Theodore Ts'o"

Theodore Ts'o
2014-03-09 07:13:52 +0800

18 Feb, 2014

2 commits

7747e6d02 jbd2: mark file-local functions as static ... Browse Code »

Mark functions as static in jbd2/journal.c because they are not used
outside this file.

This eliminates the following warning in jbd2/journal.c:
fs/jbd2/journal.c:125:5: warning: no previous prototype for ‘jbd2_verify_csum_type’ [-Wmissing-prototypes]
fs/jbd2/journal.c:146:5: warning: no previous prototype for ‘jbd2_superblock_csum_verify’ [-Wmissing-prototypes]
fs/jbd2/journal.c:154:6: warning: no previous prototype for ‘jbd2_superblock_csum_set’ [-Wmissing-prototypes]

Signed-off-by: Rashika Kheria
Signed-off-by: "Theodore Ts'o"
Reviewed-by: Josh Triplett
Reviewed-by: Darrick J. Wong

Rashika Kheria
2014-02-18 09:49:04 +0800
92e3b4053 jbd2: fix use after free in jbd2_journal_start_reserved() ... Browse Code »

If start_this_handle() fails then it leads to a use after free of
"handle".

Signed-off-by: Dan Carpenter
Signed-off-by: "Theodore Ts'o"
Cc: stable@vger.kernel.org

Dan Carpenter
2014-02-18 09:33:01 +0800

09 Dec, 2013

3 commits

a67c848a8 jbd2: rename obsoleted msg JBD->JBD2 ... Browse Code »

Rename performed via: perl -pi -e 's/JBD:/JBD2:/g' fs/jbd2/*.c

Signed-off-by: Dmitry Monakhov
Signed-off-by: "Theodore Ts'o"
Reviewed-by: Carlos Maiolino

Dmitry Monakhov
2013-12-09 10:14:59 +0800
75685071c jbd2: revise KERN_EMERG error messages ... Browse Code »

Some of KERN_EMERG printk messages do not really deserve this log
level and the one in log_wait_commit() is even rather useless (the
journal has been previously aborted and *that* is where we should have
been complaining). So make some messages just KERN_ERR and remove the
useless message.

Signed-off-by: Jan Kara
Signed-off-by: "Theodore Ts'o"

Jan Kara
2013-12-09 10:13:59 +0800
f6c07cad0 jbd2: don't BUG but return ENOSPC if a handle runs out of space ... Browse Code »

If a handle runs out of space, we currently stop the kernel with a BUG
in jbd2_journal_dirty_metadata(). This makes it hard to figure out
what might be going on. So return an error of ENOSPC, so we can let
the file system layer figure out what is going on, to make it more
likely we can get useful debugging information). This should make it
easier to debug problems such as the one which was reported by:

https://bugzilla.kernel.org/show_bug.cgi?id=44731

The only two callers of this function are ext4_handle_dirty_metadata()
and ocfs2_journal_dirty(). The ocfs2 function will trigger a
BUG_ON(), which means there will be no change in behavior. The ext4
function will call ext4_error_inode() which will print the useful
debugging information and then handle the situation using ext4's error
handling mechanisms (i.e., which might mean halting the kernel or
remounting the file system read-only).

Also, since both file systems already call WARN_ON(), drop the WARN_ON
from jbd2_journal_dirty_metadata() to avoid two stack traces from
being displayed.

Signed-off-by: "Theodore Ts'o"
Cc: ocfs2-devel@oss.oracle.com
Acked-by: Joel Becker

Theodore Ts'o
2013-12-09 10:12:59 +0800

29 Aug, 2013

1 commit

18a6ea1e5 jbd2: Fix endian mixing problems in the checksumming code ... Browse Code »

In the jbd2 checksumming code, explicitly declare separate variables with
endianness information so that we don't get confused and screw things up again.
Also fixes sparse warnings.

Signed-off-by: Darrick J. Wong
Signed-off-by: "Theodore Ts'o"

Darrick J. Wong
2013-08-29 02:59:58 +0800

01 Jul, 2013

3 commits

41a5b9131 jbd2: invalidate handle if jbd2_journal_restart() fails ... Browse Code »
13

If jbd2_journal_restart() fails the handle will have been disconnected
from the current transaction. In this situation, the handle must not
be used for for any jbd2 function other than jbd2_journal_stop().
Enforce this with by treating a handle which has a NULL transaction
pointer as an aborted handle, and issue a kernel warning if
jbd2_journal_extent(), jbd2_journal_get_write_access(),
jbd2_journal_dirty_metadata(), etc. is called with an invalid handle.

This commit also fixes a bug where jbd2_journal_stop() would trip over
a kernel jbd2 assertion check when trying to free an invalid handle.

Also move the responsibility of setting current->journal_info to
start_this_handle(), simplifying the three users of this function.

Signed-off-by: "Theodore Ts'o"
Reported-by: Younger Liu
Cc: Jan Kara

Theodore Ts'o
2013-07-01 20:12:41 +0800
39c04153f jbd2: fix theoretical race in jbd2__journal_restart ... Browse Code »

Once we decrement transaction->t_updates, if this is the last handle
holding the transaction from closing, and once we release the
t_handle_lock spinlock, it's possible for the transaction to commit
and be released. In practice with normal kernels, this probably won't
happen, since the commit happens in a separate kernel thread and it's
unlikely this could all happen within the space of a few CPU cycles.

On the other hand, with a real-time kernel, this could potentially
happen, so save the tid found in transaction->t_tid before we release
t_handle_lock. It would require an insane configuration, such as one
where the jbd2 thread was set to a very high real-time priority,
perhaps because a high priority real-time thread is trying to read or
write to a file system. But some people who use real-time kernels
have been known to do insane things, including controlling
laser-wielding industrial robots. :-)

Signed-off-by: "Theodore Ts'o"
Cc: stable@vger.kernel.org

Theodore Ts'o
2013-07-01 20:12:40 +0800
fe52d17cd jbd2: move superblock checksum calculation to jbd2_write_superblock() ... Browse Code »

Some of the functions which modify the jbd2 superblock were not
updating the checksum before calling jbd2_write_superblock(). Move
the call to jbd2_superblock_csum_set() to jbd2_write_superblock(), so
that the checksum is calculated consistently.

Signed-off-by: "Theodore Ts'o"
Cc: Darrick J. Wong
Cc: stable@vger.kernel.org

Theodore Ts'o
2013-07-01 20:12:38 +0800

13 Jun, 2013

4 commits

75497d060 jbd2: remove debug dependency on debug_fs and update Kconfig help text ... Browse Code »

Commit b6e96d0067d8 ("jbd2: use module parameters instead of debugfs
for jbd_debug") removed any need for a dependency on DEBUG_FS. It
also moved the /sys variables out from underneath the typical debugfs
mount point. Delete the dependency and update the /sys path to where
the debug settings are currently.

Signed-off-by: Paul Gortmaker
Signed-off-by: "Theodore Ts'o"

Paul Gortmaker
2013-06-13 11:07:51 +0800
169f1a2a8 jbd2: use a single printk for jbd_debug() ... Browse Code »

Since the jbd_debug() is implemented with two separate printk()
calls, it can lead to corrupted and misleading debug output like
the following (see lines marked with "*"):

[ 290.339362] (fs/jbd2/journal.c, 203): kjournald2: kjournald2 wakes
[ 290.339365] (fs/jbd2/journal.c, 155): kjournald2: commit_sequence=42103, commit_request=42104
[ 290.339369] (fs/jbd2/journal.c, 158): kjournald2: OK, requests differ
[* 290.339376] (fs/jbd2/journal.c, 648): jbd2_log_wait_commit:
[* 290.339379] (fs/jbd2/commit.c, 370): jbd2_journal_commit_transaction: JBD2: want 42104, j_commit_sequence=42103
[* 290.339382] JBD2: starting commit of transaction 42104
[ 290.339410] (fs/jbd2/revoke.c, 566): jbd2_journal_write_revoke_records: Wrote 0 revoke records
[ 290.376555] (fs/jbd2/commit.c, 1088): jbd2_journal_commit_transaction: JBD2: commit 42104 complete, head 42079

i.e. the debug output from log_wait_commit and journal_commit_transaction
have become interleaved. The output should have been:

(fs/jbd2/journal.c, 648): jbd2_log_wait_commit: JBD2: want 42104, j_commit_sequence=42103
(fs/jbd2/commit.c, 370): jbd2_journal_commit_transaction: JBD2: starting commit of transaction 42104

It is expected that this is not easy to replicate -- I was only able
to cause it on preempt-rt kernels, and even then only under heavy
I/O load.

Reported-by: Paul Gortmaker
Suggested-by: "Theodore Ts'o"
Signed-off-by: Paul Gortmaker
Signed-off-by: "Theodore Ts'o"

Paul Gortmaker
2013-06-13 11:04:04 +0800
cfc7bc896 jbd2: fix duplicate debug label for phase 2 ... Browse Code »

Currently we see this output:

$git grep phase fs/jbd2
fs/jbd2/commit.c: jbd_debug(3, "JBD2: commit phase 1\n");
fs/jbd2/commit.c: jbd_debug(3, "JBD2: commit phase 2\n");
fs/jbd2/commit.c: jbd_debug(3, "JBD2: commit phase 2\n");
fs/jbd2/commit.c: jbd_debug(3, "JBD2: commit phase 3\n");
fs/jbd2/commit.c: jbd_debug(3, "JBD2: commit phase 4\n");
[...]

There is clearly a duplicate label for phase 2, and they are
both active (i.e. not in #if ... #else block). Rename them to
be "2a" and "2b" so the debug output is unambiguous.

Signed-off-by: Paul Gortmaker
Signed-off-by: "Theodore Ts'o"

Paul Gortmaker
2013-06-13 10:56:35 +0800
0ef54180e jbd2: drop checkpoint mutex when waiting in __jbd2_log_wait_for_space() ... Browse Code »

While trying to debug an an issue under extreme I/O loading
on preempt-rt kernels, the following backtrace was observed
via SysRQ output:

rm D ffff8802203afbc0 4600 4878 4748 0x00000000
ffff8802217bfb78 0000000000000082 ffff88021fc2bb80 ffff88021fc2bb80
ffff88021fc2bb80 ffff8802217bffd8 ffff8802217bffd8 ffff8802217bffd8
ffff88021f1d4c80 ffff88021fc2bb80 ffff8802217bfb88 ffff88022437b000
Call Trace:
[] schedule+0x24/0x70
[] jbd2_log_wait_commit+0xbd/0x140
[] ? __init_waitqueue_head+0x50/0x50
[] jbd2_log_do_checkpoint+0xf5/0x520
[] __jbd2_log_wait_for_space+0xa9/0x1f0
[] start_this_handle.isra.10+0x2e0/0x530
[] ? __init_waitqueue_head+0x50/0x50
[] jbd2__journal_start+0xc3/0x110
[] ? ext4_rmdir+0x6e/0x230
[] jbd2_journal_start+0xe/0x10
[] ext4_journal_start_sb+0x5b/0x160
[] ext4_rmdir+0x6e/0x230
[] vfs_rmdir+0xd5/0x140
[] do_rmdir+0xdf/0x120
[] ? task_work_run+0x44/0x80
[] ? do_notify_resume+0x89/0x100
[] ? int_signal+0x12/0x17
[] sys_unlinkat+0x25/0x40
[] system_call_fastpath+0x16/0x1b

What is interesting here, is that we call log_wait_commit, from
within wait_for_space, but we are still holding the checkpoint_mutex
as it surrounds mostly the whole of wait_for_space. And then, as we
are waiting, journal_commit_transaction can run, and if the JBD2_FLUSHED
bit is set, then we will also try to take the same checkpoint_mutex.

It seems that we need to drop the checkpoint_mutex while sitting in
jbd2_log_wait_commit, if we want to guarantee that progress can be made
by jbd2_journal_commit_transaction(). There does not seem to be
anything preempt-rt specific about this, other then perhaps increasing
the odds of it happening.

Signed-off-by: Paul Gortmaker
Signed-off-by: "Theodore Ts'o"

Paul Gortmaker
2013-06-13 10:47:35 +0800