Eric Lee / smarc-fsl-linux-kernel

01 Jul, 2013

3 commits

41a5b9131 jbd2: invalidate handle if jbd2_journal_restart() fails ... Browse Code »

If jbd2_journal_restart() fails the handle will have been disconnected
from the current transaction. In this situation, the handle must not
be used for for any jbd2 function other than jbd2_journal_stop().
Enforce this with by treating a handle which has a NULL transaction
pointer as an aborted handle, and issue a kernel warning if
jbd2_journal_extent(), jbd2_journal_get_write_access(),
jbd2_journal_dirty_metadata(), etc. is called with an invalid handle.

This commit also fixes a bug where jbd2_journal_stop() would trip over
a kernel jbd2 assertion check when trying to free an invalid handle.

Also move the responsibility of setting current->journal_info to
start_this_handle(), simplifying the three users of this function.

Signed-off-by: "Theodore Ts'o"
Reported-by: Younger Liu
Cc: Jan Kara

Theodore Ts'o
2013-07-01 20:12:41 +0800
39c04153f jbd2: fix theoretical race in jbd2__journal_restart ... Browse Code »

Once we decrement transaction->t_updates, if this is the last handle
holding the transaction from closing, and once we release the
t_handle_lock spinlock, it's possible for the transaction to commit
and be released. In practice with normal kernels, this probably won't
happen, since the commit happens in a separate kernel thread and it's
unlikely this could all happen within the space of a few CPU cycles.

On the other hand, with a real-time kernel, this could potentially
happen, so save the tid found in transaction->t_tid before we release
t_handle_lock. It would require an insane configuration, such as one
where the jbd2 thread was set to a very high real-time priority,
perhaps because a high priority real-time thread is trying to read or
write to a file system. But some people who use real-time kernels
have been known to do insane things, including controlling
laser-wielding industrial robots. :-)

Signed-off-by: "Theodore Ts'o"
Cc: stable@vger.kernel.org

Theodore Ts'o
2013-07-01 20:12:40 +0800
fe52d17cd jbd2: move superblock checksum calculation to jbd2_write_superblock() ... Browse Code »

Some of the functions which modify the jbd2 superblock were not
updating the checksum before calling jbd2_write_superblock(). Move
the call to jbd2_superblock_csum_set() to jbd2_write_superblock(), so
that the checksum is calculated consistently.

Signed-off-by: "Theodore Ts'o"
Cc: Darrick J. Wong
Cc: stable@vger.kernel.org

Theodore Ts'o
2013-07-01 20:12:38 +0800

13 Jun, 2013

6 commits

75497d060 jbd2: remove debug dependency on debug_fs and update Kconfig help text ... Browse Code »

Commit b6e96d0067d8 ("jbd2: use module parameters instead of debugfs
for jbd_debug") removed any need for a dependency on DEBUG_FS. It
also moved the /sys variables out from underneath the typical debugfs
mount point. Delete the dependency and update the /sys path to where
the debug settings are currently.

Signed-off-by: Paul Gortmaker
Signed-off-by: "Theodore Ts'o"

Paul Gortmaker
2013-06-13 11:07:51 +0800
169f1a2a8 jbd2: use a single printk for jbd_debug() ... Browse Code »

Since the jbd_debug() is implemented with two separate printk()
calls, it can lead to corrupted and misleading debug output like
the following (see lines marked with "*"):

[ 290.339362] (fs/jbd2/journal.c, 203): kjournald2: kjournald2 wakes
[ 290.339365] (fs/jbd2/journal.c, 155): kjournald2: commit_sequence=42103, commit_request=42104
[ 290.339369] (fs/jbd2/journal.c, 158): kjournald2: OK, requests differ
[* 290.339376] (fs/jbd2/journal.c, 648): jbd2_log_wait_commit:
[* 290.339379] (fs/jbd2/commit.c, 370): jbd2_journal_commit_transaction: JBD2: want 42104, j_commit_sequence=42103
[* 290.339382] JBD2: starting commit of transaction 42104
[ 290.339410] (fs/jbd2/revoke.c, 566): jbd2_journal_write_revoke_records: Wrote 0 revoke records
[ 290.376555] (fs/jbd2/commit.c, 1088): jbd2_journal_commit_transaction: JBD2: commit 42104 complete, head 42079

i.e. the debug output from log_wait_commit and journal_commit_transaction
have become interleaved. The output should have been:

(fs/jbd2/journal.c, 648): jbd2_log_wait_commit: JBD2: want 42104, j_commit_sequence=42103
(fs/jbd2/commit.c, 370): jbd2_journal_commit_transaction: JBD2: starting commit of transaction 42104

It is expected that this is not easy to replicate -- I was only able
to cause it on preempt-rt kernels, and even then only under heavy
I/O load.

Reported-by: Paul Gortmaker
Suggested-by: "Theodore Ts'o"
Signed-off-by: Paul Gortmaker
Signed-off-by: "Theodore Ts'o"

Paul Gortmaker
2013-06-13 11:04:04 +0800
cfc7bc896 jbd2: fix duplicate debug label for phase 2 ... Browse Code »

Currently we see this output:

$git grep phase fs/jbd2
fs/jbd2/commit.c: jbd_debug(3, "JBD2: commit phase 1\n");
fs/jbd2/commit.c: jbd_debug(3, "JBD2: commit phase 2\n");
fs/jbd2/commit.c: jbd_debug(3, "JBD2: commit phase 2\n");
fs/jbd2/commit.c: jbd_debug(3, "JBD2: commit phase 3\n");
fs/jbd2/commit.c: jbd_debug(3, "JBD2: commit phase 4\n");
[...]

There is clearly a duplicate label for phase 2, and they are
both active (i.e. not in #if ... #else block). Rename them to
be "2a" and "2b" so the debug output is unambiguous.

Signed-off-by: Paul Gortmaker
Signed-off-by: "Theodore Ts'o"

Paul Gortmaker
2013-06-13 10:56:35 +0800
0ef54180e jbd2: drop checkpoint mutex when waiting in __jbd2_log_wait_for_space() ... Browse Code »

While trying to debug an an issue under extreme I/O loading
on preempt-rt kernels, the following backtrace was observed
via SysRQ output:

rm D ffff8802203afbc0 4600 4878 4748 0x00000000
ffff8802217bfb78 0000000000000082 ffff88021fc2bb80 ffff88021fc2bb80
ffff88021fc2bb80 ffff8802217bffd8 ffff8802217bffd8 ffff8802217bffd8
ffff88021f1d4c80 ffff88021fc2bb80 ffff8802217bfb88 ffff88022437b000
Call Trace:
[] schedule+0x24/0x70
[] jbd2_log_wait_commit+0xbd/0x140
[] ? __init_waitqueue_head+0x50/0x50
[] jbd2_log_do_checkpoint+0xf5/0x520
[] __jbd2_log_wait_for_space+0xa9/0x1f0
[] start_this_handle.isra.10+0x2e0/0x530
[] ? __init_waitqueue_head+0x50/0x50
[] jbd2__journal_start+0xc3/0x110
[] ? ext4_rmdir+0x6e/0x230
[] jbd2_journal_start+0xe/0x10
[] ext4_journal_start_sb+0x5b/0x160
[] ext4_rmdir+0x6e/0x230
[] vfs_rmdir+0xd5/0x140
[] do_rmdir+0xdf/0x120
[] ? task_work_run+0x44/0x80
[] ? do_notify_resume+0x89/0x100
[] ? int_signal+0x12/0x17
[] sys_unlinkat+0x25/0x40
[] system_call_fastpath+0x16/0x1b

What is interesting here, is that we call log_wait_commit, from
within wait_for_space, but we are still holding the checkpoint_mutex
as it surrounds mostly the whole of wait_for_space. And then, as we
are waiting, journal_commit_transaction can run, and if the JBD2_FLUSHED
bit is set, then we will also try to take the same checkpoint_mutex.

It seems that we need to drop the checkpoint_mutex while sitting in
jbd2_log_wait_commit, if we want to guarantee that progress can be made
by jbd2_journal_commit_transaction(). There does not seem to be
anything preempt-rt specific about this, other then perhaps increasing
the odds of it happening.

Signed-off-by: Paul Gortmaker
Signed-off-by: "Theodore Ts'o"

Paul Gortmaker
2013-06-13 10:47:35 +0800
3ca841c10 jbd2: relocate assert after state lock in journal_commit_transaction() ... Browse Code »

The state lock is taken after we are doing an assert on the state
value, not before. So we might in fact be doing an assert on a
transient value. Ensure the state check is within the scope of
the state lock being taken.

Signed-off-by: Paul Gortmaker
Signed-off-by: "Theodore Ts'o"

Paul Gortmaker
2013-06-13 10:46:35 +0800
9ff864462 jbd2: optimize jbd2_journal_force_commit ... Browse Code »

Current implementation of jbd2_journal_force_commit() is suboptimal because
result in empty and useless commits. But callers just want to force and wait
any unfinished commits. We already have jbd2_journal_force_commit_nested()
which does exactly what we want, except we are guaranteed that we do not hold
journal transaction open.

Signed-off-by: Dmitry Monakhov
Signed-off-by: "Theodore Ts'o"

Dmitry Monakhov
2013-06-13 10:24:07 +0800

05 Jun, 2013

8 commits

8f7d89f36 jbd2: transaction reservation support ... Browse Code »

In some cases we cannot start a transaction because of locking
constraints and passing started transaction into those places is not
handy either because we could block transaction commit for too long.
Transaction reservation is designed to solve these issues. It
reserves a handle with given number of credits in the journal and the
handle can be later attached to the running transaction without
blocking on commit or checkpointing. Reserved handles do not block
transaction commit in any way, they only reduce maximum size of the
running transaction (because we have to always be prepared to
accomodate request for attaching reserved handle).

Signed-off-by: Jan Kara
Signed-off-by: "Theodore Ts'o"

Jan Kara
2013-06-05 00:35:11 +0800
f29fad721 jbd2: remove unused waitqueues ... Browse Code »

j_wait_logspace and j_wait_checkpoint are unused. Remove them.

Reviewed-by: Zheng Liu
Signed-off-by: Jan Kara
Signed-off-by: "Theodore Ts'o"

Jan Kara
2013-06-05 00:24:11 +0800
fe1e8db59 jbd2: fix race in t_outstanding_credits update in jbd2_journal_extend() ... Browse Code »

jbd2_journal_extend() first checked whether transaction can accept
extending handle with more credits and then added credits to
t_outstanding_credits. This can race with start_this_handle() adding
another handle to a transaction and thus overbooking a transaction.
Make jbd2_journal_extend() use atomic_add_return() to close the race.

Reviewed-by: Zheng Liu
Signed-off-by: Jan Kara
Signed-off-by: "Theodore Ts'o"

Jan Kara
2013-06-05 00:22:15 +0800
76c399045 jbd2: cleanup needed free block estimates when starting a transaction ... Browse Code »

__jbd2_log_space_left() and jbd_space_needed() were kind of odd.
jbd_space_needed() accounted also credits needed for currently
committing transaction while it didn't account for credits needed for
control blocks. __jbd2_log_space_left() then accounted for control
blocks as a fraction of free space. Since results of these two
functions are always only compared against each other, this works
correct but is somewhat strange. Move the estimates so that
jbd_space_needed() returns number of blocks needed for a transaction
including control blocks and __jbd2_log_space_left() returns free
space in the journal (with the committing transaction already
subtracted). Rename functions to jbd2_log_space_left() and
jbd2_space_needed() while we are changing them.

Reviewed-by: Zheng Liu
Signed-off-by: Jan Kara
Signed-off-by: "Theodore Ts'o"

Jan Kara
2013-06-05 00:12:57 +0800
2f387f849 jbd2: remove outdated comment ... Browse Code »

The comment about credit estimates isn't true anymore. We do what the
comment describes now.

Reviewed-by: Zheng Liu
Signed-off-by: Jan Kara
Signed-off-by: "Theodore Ts'o"

Jan Kara
2013-06-05 00:10:11 +0800
b34090e5e jbd2: refine waiting for shadow buffers ... Browse Code »

Currently when we add a buffer to a transaction, we wait until the
buffer is removed from BJ_Shadow list (so that we prevent any changes
to the buffer that is just written to the journal). This can take
unnecessarily long as a lot happens between the time the buffer is
submitted to the journal and the time when we remove the buffer from
BJ_Shadow list. (e.g. We wait for all data buffers in the
transaction, we issue a cache flush, etc.) Also this creates a
dependency of do_get_write_access() on transaction commit (namely
waiting for data IO to complete) which we want to avoid when
implementing transaction reservation.

So we modify commit code to set new BH_Shadow flag when temporary
shadowing buffer is created and we clear that flag once IO on that
buffer is complete. This allows do_get_write_access() to wait only
for BH_Shadow bit and thus removes the dependency on data IO
completion.

Reviewed-by: Zheng Liu
Signed-off-by: Jan Kara
Signed-off-by: "Theodore Ts'o"

Jan Kara
2013-06-05 00:08:56 +0800
e5a120aeb jbd2: remove journal_head from descriptor buffers ... Browse Code »

Similarly as for metadata buffers, also log descriptor buffers don't
really need the journal head. So strip it and remove BJ_LogCtl list.

Reviewed-by: Zheng Liu
Signed-off-by: Jan Kara
Signed-off-by: "Theodore Ts'o"

Jan Kara
2013-06-05 00:06:01 +0800
f5113effc jbd2: don't create journal_head for temporary journal buffers ... Browse Code »

When writing metadata to the journal, we create temporary buffer heads
for that task. We also attach journal heads to these buffer heads but
the only purpose of the journal heads is to keep buffers linked in
transaction's BJ_IO list. We remove the need for journal heads by
reusing buffer_head's b_assoc_buffers list for that purpose. Also
since BJ_IO list is just a temporary list for transaction commit, we
use a private list in jbd2_journal_commit_transaction() for that thus
removing BJ_IO list from transaction completely.

Reviewed-by: Zheng Liu
Signed-off-by: Jan Kara
Signed-off-by: "Theodore Ts'o"

Jan Kara
2013-06-05 00:01:45 +0800

28 May, 2013

2 commits

eee06c567 jbd2: fix block tag checksum verification brokenness ... Browse Code »

Al Viro complained of a ton of bogosity with regards to the jbd2 block
tag header checksum. This one checksum is 16 bits, so cut off the
upper 16 bits and treat it as a 16-bit value and don't mess around
with be32* conversions. Fortunately metadata checksumming is still
"experimental" and not in a shipping e2fsprogs, so there should be few
users affected by this.

Reported-by: Al Viro
Signed-off-by: Darrick J. Wong

Darrick J. Wong
2013-05-28 19:31:59 +0800
5d9cf9c62 jbd2: use kmem_cache_zalloc for allocating journal head ... Browse Code »

This commit tries to use kmem_cache_zalloc instead of kmem_cache_alloc/
memset when a new journal head is alloctated.

Signed-off-by: Zheng Liu
Cc: "Theodore Ts'o"

Zheng Liu
2013-05-28 19:27:11 +0800

22 May, 2013

1 commit

259709b07 jbd2: change jbd2_journal_invalidatepage to accept length ... Browse Code »

invalidatepage now accepts range to invalidate and there are two file
system using jbd2 also implementing punch hole feature which can benefit
from this. We need to implement the same thing for jbd2 layer in order to
allow those file system take benefit of this functionality.

This commit adds length argument to the jbd2_journal_invalidatepage()
and updates all instances in ext4 and ocfs2.

Signed-off-by: Lukas Czerner
Reviewed-by: Jan Kara

Lukas Czerner
2013-05-22 11:20:03 +0800

02 May, 2013

1 commit

20b4fb485 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull VFS updates from Al Viro,

Misc cleanups all over the place, mainly wrt /proc interfaces (switch
create_proc_entry to proc_create(), get rid of the deprecated
create_proc_read_entry() in favor of using proc_create_data() and
seq_file etc).

7kloc removed.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (204 commits)
don't bother with deferred freeing of fdtables
proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h
proc: Make the PROC_I() and PDE() macros internal to procfs
proc: Supply a function to remove a proc entry by PDE
take cgroup_open() and cpuset_open() to fs/proc/base.c
ppc: Clean up scanlog
ppc: Clean up rtas_flash driver somewhat
hostap: proc: Use remove_proc_subtree()
drm: proc: Use remove_proc_subtree()
drm: proc: Use minor->index to label things, not PDE->name
drm: Constify drm_proc_list[]
zoran: Don't print proc_dir_entry data in debug
reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show()
proc: Supply an accessor for getting the data from a PDE's parent
airo: Use remove_proc_subtree()
rtl8192u: Don't need to save device proc dir PDE
rtl8187se: Use a dir under /proc/net/r8180/
proc: Add proc_mkdir_data()
proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h}
proc: Move PDE_NET() to fs/proc/proc_net.c
...

Linus Torvalds
2013-05-02 08:51:54 +0800

01 May, 2013

1 commit

149b30608 Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 ... Browse Code »

Pull ext4 updates from Ted Ts'o:
"Mostly performance and bug fixes, plus some cleanups. The one new
feature this merge window is a new ioctl EXT4_IOC_SWAP_BOOT which
allows installation of a hidden inode designed for boot loaders."

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (50 commits)
ext4: fix type-widening bug in inode table readahead code
ext4: add check for inodes_count overflow in new resize ioctl
ext4: fix Kconfig documentation for CONFIG_EXT4_DEBUG
ext4: fix online resizing for ext3-compat file systems
jbd2: trace when lock_buffer in do_get_write_access takes a long time
ext4: mark metadata blocks using bh flags
buffer: add BH_Prio and BH_Meta flags
ext4: mark all metadata I/O with REQ_META
ext4: fix readdir error in case inline_data+^dir_index.
ext4: fix readdir error in the case of inline_data+dir_index
jbd2: use kmem_cache_zalloc instead of kmem_cache_alloc/memset
ext4: mext_insert_extents should update extent block checksum
ext4: move quota initialization out of inode allocation transaction
ext4: reserve xattr index for Rich ACL support
jbd2: reduce journal_head size
ext4: clear buffer_uninit flag when submitting IO
ext4: use io_end for multiple bios
ext4: make ext4_bio_write_page() use BH_Async_Write flags
ext4: Use kstrtoul() instead of parse_strtoul()
ext4: defragmentation code cleanup
...

Linus Torvalds
2013-05-01 23:04:12 +0800

30 Apr, 2013

1 commit

e76004093 fs/buffer.c: remove unnecessary init operation after allocating buffer_head. ... Browse Code »

bh allocation uses kmem_cache_zalloc() so we needn't call
'init_buffer(bh, NULL, NULL)' and perform other set-zero-operations.

Signed-off-by: Jianpeng Ma
Cc: Jan Kara
Cc: Theodore Ts'o
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

majianpeng
2013-04-30 06:54:39 +0800

22 Apr, 2013

1 commit

f783f091e jbd2: trace when lock_buffer in do_get_write_access takes a long time ... Browse Code »

While investigating interactivity problems it was clear that processes
sometimes stall for long periods of times if an attempt is made to
lock a buffer which is undergoing writeback. It would stall in
a trace looking something like

[] __lock_buffer+0x2e/0x30
[] do_get_write_access+0x43f/0x4b0
[] jbd2_journal_get_write_access+0x2b/0x50
[] __ext4_journal_get_write_access+0x39/0x80
[] ext4_reserve_inode_write+0x78/0xa0
[] ext4_mark_inode_dirty+0x49/0x220
[] ext4_dirty_inode+0x41/0x60
[] __mark_inode_dirty+0x4e/0x2d0
[] update_time+0x79/0xc0
[] file_update_time+0x98/0x100
[] __generic_file_aio_write+0x17c/0x3b0
[] generic_file_aio_write+0x7a/0xf0
[] ext4_file_write+0x83/0xd0
[] do_sync_write+0xa3/0xe0
[] vfs_write+0xae/0x180
[] sys_write+0x4d/0x90
[] system_call_fastpath+0x1a/0x1f
[] 0xffffffffffffffff

Signed-off-by: Mel Gorman
Signed-off-by: "Theodore Ts'o"

Theodore Ts'o
2013-04-22 04:47:54 +0800

20 Apr, 2013

1 commit

28daf4fae jbd2: use kmem_cache_zalloc instead of kmem_cache_alloc/memset ... Browse Code »

The jbd2_alloc_handle() function is only called by new_handle(). So
this commit uses kmem_cache_zalloc() instead of
kmem_cache_alloc()/memset().

Signed-off-by: Zheng Liu
Signed-off-by: "Theodore Ts'o"

Zheng Liu
2013-04-20 05:49:23 +0800

10 Apr, 2013

1 commit

d9dda78ba procfs: new helper - PDE_DATA(inode) ... Browse Code »

The only part of proc_dir_entry the code outside of fs/proc
really cares about is PDE(inode)->data. Provide a helper
for that; static inline for now, eventually will be moved
to fs/proc, along with the knowledge of struct proc_dir_entry
layout.

Signed-off-by: Al Viro

Al Viro
2013-04-10 02:13:32 +0800

04 Apr, 2013

2 commits

794446c69 jbd2: fix race between jbd2_journal_remove_checkpoint and ->j_commit_callback ... Browse Code »

The following race is possible:

[kjournald2] other_task
jbd2_journal_commit_transaction()
j_state = T_FINISHED;
spin_unlock(&journal->j_list_lock);
->jbd2_journal_remove_checkpoint()
->jbd2_journal_free_transaction();
->kmem_cache_free(transaction)
->j_commit_callback(journal, transaction);
-> USE_AFTER_FREE

WARNING: at lib/list_debug.c:62 __list_del_entry+0x1c0/0x250()
Hardware name:
list_del corruption. prev->next should be ffff88019a4ec198, but was 6b6b6b6b6b6b6b6b
Modules linked in: cpufreq_ondemand acpi_cpufreq freq_table mperf coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode sg xhci_hcd button sd_mod crc_t10dif aesni_intel ablk_helper cryptd lrw aes_x86_64 xts gf128mul ahci libahci pata_acpi ata_generic dm_mirror dm_region_hash dm_log dm_mod
Pid: 16400, comm: jbd2/dm-1-8 Tainted: G W 3.8.0-rc3+ #107
Call Trace:
[] warn_slowpath_common+0xad/0xf0
[] warn_slowpath_fmt+0x46/0x50
[] ? ext4_journal_commit_callback+0x99/0xc0
[] __list_del_entry+0x1c0/0x250
[] ext4_journal_commit_callback+0x6f/0xc0
[] jbd2_journal_commit_transaction+0x23a6/0x2570
[] ? try_to_del_timer_sync+0x82/0xa0
[] ? del_timer_sync+0x91/0x1e0
[] kjournald2+0x19f/0x6a0
[] ? wake_up_bit+0x40/0x40
[] ? bit_spin_lock+0x80/0x80
[] kthread+0x10e/0x120
[] ? __init_kthread_worker+0x70/0x70
[] ret_from_fork+0x7c/0xb0
[] ? __init_kthread_worker+0x70/0x70

In order to demonstrace this issue one should mount ext4 with mount -o
discard option on SSD disk. This makes callback longer and race
window becomes wider.

In order to fix this we should mark transaction as finished only after
callbacks have completed

Signed-off-by: Dmitry Monakhov
Signed-off-by: "Theodore Ts'o"
Cc: stable@vger.kernel.org

Dmitry Monakhov
2013-04-04 10:06:52 +0800
d76a3a771 ext4/jbd2: don't wait (forever) for stale tid caused by wraparound ... Browse Code »

In the case where an inode has a very stale transaction id (tid) in
i_datasync_tid or i_sync_tid, it's possible that after a very large
(2**31) number of transactions, that the tid number space might wrap,
causing tid_geq()'s calculations to fail.

Commit deeeaf13 "jbd2: fix fsync() tid wraparound bug", later modified
by commit e7b04ac0 "jbd2: don't wake kjournald unnecessarily",
attempted to fix this problem, but it only avoided kjournald spinning
forever by fixing the logic in jbd2_log_start_commit().

Unfortunately, in the codepaths in fs/ext4/fsync.c and fs/ext4/inode.c
that might call jbd2_log_start_commit() with a stale tid, those
functions will subsequently call jbd2_log_wait_commit() with the same
stale tid, and then wait for a very long time. To fix this, we
replace the calls to jbd2_log_start_commit() and
jbd2_log_wait_commit() with a call to a new function,
jbd2_complete_transaction(), which will correctly handle stale tid's.

As a bonus, jbd2_complete_transaction() will avoid locking
j_state_lock for writing unless a commit needs to be started. This
should have a small (but probably not measurable) improvement for
ext4's scalability.

Signed-off-by: "Theodore Ts'o"
Reported-by: Ben Hutchings
Reported-by: George Barnett
Cc: stable@vger.kernel.org

Theodore Ts'o
2013-04-04 10:02:52 +0800

12 Mar, 2013

1 commit

ad56edad0 jbd2: fix use after free in jbd2_journal_dirty_metadata() ... Browse Code »

jbd2_journal_dirty_metadata() didn't get a reference to journal_head it
was working with. This is OK in most of the cases since the journal head
should be attached to a transaction but in rare occasions when we are
journalling data, __ext4_journalled_writepage() can race with
jbd2_journal_invalidatepage() stripping buffers from a page and thus
journal head can be freed under hands of jbd2_journal_dirty_metadata().

Fix the problem by getting own journal head reference in
jbd2_journal_dirty_metadata() (and also in jbd2_journal_set_triggers()
which can possibly have the same issue).

Reported-by: Zheng Liu
Signed-off-by: Jan Kara
Signed-off-by: "Theodore Ts'o"
Cc: stable@vger.kernel.org

Jan Kara
2013-03-12 01:24:56 +0800

03 Mar, 2013

1 commit

df05c1b85 jbd2: fix ERR_PTR dereference in jbd2__journal_start ... Browse Code »

If start_this_handle() failed handle will be initialized
to ERR_PTR() and can not be dereferenced.

paging request at fffffffffffffff6
IP: [] jbd2__journal_start+0x18f/0x290
PGD 200e067 PUD 200f067 PMD 0
Oops: 0000 [#1] SMP
Modules linked in: cpufreq_ondemand acpi_cpufreq freq_table mperf coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode sg xhci_hcd button sd_mod crc_t10dif aesni_intel ablk_helper cryptd lrw aes_x86_64 xts gf128mul ahci libahci pata_acpi ata_generic dm_mirror dm_region_hash dm_log dm_mod
CPU 0 journal commit I/O error

Pid: 2694, comm: fio Not tainted 3.8.0-rc3+ #79 /DQ67SW
RIP: 0010:[] [] jbd2__journal_start+0x18f/0x290
RSP: 0018:ffff880233b8ba58 EFLAGS: 00010292
RAX: 00000000ffffffe2 RBX: ffffffffffffffe2 RCX: 0000000000000006
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff82128f48
RBP: ffff880233b8ba98 R08: 0000000000000000 R09: ffff88021440a6e0

Signed-off-by: Dmitry Monakhov
Signed-off-by: "Theodore Ts'o"

Dmitry Monakhov
2013-03-03 06:08:46 +0800

10 Feb, 2013

1 commit

b6e96d006 jbd2: use module parameters instead of debugfs for jbd_debug ... Browse Code »

There are multiple reasons to move away from debugfs. First of all,
we are only using it for a single parameter, and it is much more
complicated to set up (some 30 lines of code compared to 3), and one
more thing that might fail while loading the jbd2 module.

Secondly, as a module paramter it can be specified as a boot option if
jbd2 is built into the kernel, or as a parameter when the module is
loaded, and it can also be manipulated dynamically under
/sys/module/jbd2/parameters/jbd2_debug. So it is more flexible.

Ultimately we want to move away from using jbd_debug() towards
tracepoints, but for now this is still a useful simplification of the
code base.

Signed-off-by: "Theodore Ts'o"

Theodore Ts'o
2013-02-10 05:29:20 +0800

09 Feb, 2013

1 commit

343d9c283 jbd2: add tracepoints which provide per-handle statistics ... Browse Code »

Handles which stay open a long time are problematic when it comes time
to close down a transaction so it can be committed. These tracepoints
will help us determine which ones are the problematic ones, and to
validate whether changes makes things better or worse.

Signed-off-by: "Theodore Ts'o"

Theodore Ts'o
2013-02-09 02:00:22 +0800

07 Feb, 2013

1 commit

9fff24aa2 jbd2: track request delay statistics ... Browse Code »

Track the delay between when we first request that the commit begin
and when it actually begins, so we can see how much of a gap exists.
In theory, this should just be the remaining scheduling quantuum of
the thread which requested the commit (assuming it was not a
synchronous operation which triggered the commit request) plus
scheduling overhead; however, it's possible that real time processes
might get in the way of letting the kjournald thread from executing.

Signed-off-by: "Theodore Ts'o"

Theodore Ts'o
2013-02-07 11:30:23 +0800

30 Jan, 2013

1 commit

e7b04ac00 jbd2: don't wake kjournald unnecessarily ... Browse Code »

Don't send an extra wakeup to kjournald in the case where we
already have the proper target in j_commit_request, i.e. that
transaction has already been requested for commit.

commit deeeaf13 "jbd2: fix fsync() tid wraparound bug" changed
the logic leading to a wakeup, but it caused some extra wakeups
which were found to lead to a measurable performance regression.

Signed-off-by: Eric Sandeen
[tytso@mit.edu: reworked check to make it clearer]
Signed-off-by: "Theodore Ts'o"

Eric Sandeen
2013-01-30 13:39:28 +0800

03 Jan, 2013

1 commit

5439ca6b8 Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 ... Browse Code »

Pull ext4 bug fixes from Ted Ts'o:
"Various bug fixes for ext4. Perhaps the most serious bug fixed is one
which could cause file system corruptions when performing file punch
operations."

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: avoid hang when mounting non-journal filesystems with orphan list
ext4: lock i_mutex when truncating orphan inodes
ext4: do not try to write superblock on ro remount w/o journal
ext4: include journal blocks in df overhead calcs
ext4: remove unaligned AIO warning printk
ext4: fix an incorrect comment about i_mutex
ext4: fix deadlock in journal_unmap_buffer()
ext4: split off ext4_journalled_invalidatepage()
jbd2: fix assertion failure in jbd2_journal_flush()
ext4: check dioread_nolock on remount
ext4: fix extent tree corruption caused by hole punch

Linus Torvalds
2013-01-03 01:57:34 +0800

26 Dec, 2012

1 commit

53e872681 ext4: fix deadlock in journal_unmap_buffer() ... Browse Code »

We cannot wait for transaction commit in journal_unmap_buffer()
because we hold page lock which ranks below transaction start. We
solve the issue by bailing out of journal_unmap_buffer() and
jbd2_journal_invalidatepage() with -EBUSY. Caller is then responsible
for waiting for transaction commit to finish and try invalidation
again. Since the issue can happen only for page stradding i_size, it
is simple enough to manually call jbd2_journal_invalidatepage() for
such page from ext4_setattr(), check the return value and wait if
necessary.

Signed-off-by: Jan Kara
Signed-off-by: "Theodore Ts'o"

Jan Kara
2012-12-26 02:29:52 +0800

21 Dec, 2012

1 commit

d7961c7fa jbd2: fix assertion failure in jbd2_journal_flush() ... Browse Code »

The following race is possible between start_this_handle() and someone
calling jbd2_journal_flush().

Process A Process B
start_this_handle().
if (journal->j_barrier_count) # false
if (!journal->j_running_transaction) { #true
read_unlock(&journal->j_state_lock);
jbd2_journal_lock_updates()
jbd2_journal_flush()
write_lock(&journal->j_state_lock);
if (journal->j_running_transaction) {
# false
... wait for committing trans ...
write_unlock(&journal->j_state_lock);
...
write_lock(&journal->j_state_lock);
if (!journal->j_running_transaction) { # true
jbd2_get_transaction(journal, new_transaction);
write_unlock(&journal->j_state_lock);
goto repeat; # eventually blocks on j_barrier_count > 0
...
J_ASSERT(!journal->j_running_transaction);
# fails

We fix the race by rechecking j_barrier_count after reacquiring j_state_lock
in exclusive mode.

Reported-by: yjwsignal@empal.com
Signed-off-by: Jan Kara
Signed-off-by: "Theodore Ts'o"
Cc: stable@vger.kernel.org

Jan Kara
2012-12-21 13:15:51 +0800

17 Dec, 2012

1 commit

36cd5c19c Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 ... Browse Code »

Pull ext4 update from Ted Ts'o:
"There are two major features for this merge window. The first is
inline data, which allows small files or directories to be stored in
the in-inode extended attribute area. (This requires that the file
system use inodes which are at least 256 bytes or larger; 128 byte
inodes do not have any room for in-inode xattrs.)

The second new feature is SEEK_HOLE/SEEK_DATA support. This is
enabled by the extent status tree patches, and this infrastructure
will be used to further optimize ext4 in the future.

Beyond that, we have the usual collection of code cleanups and bug
fixes."

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (63 commits)
ext4: zero out inline data using memset() instead of empty_zero_page
ext4: ensure Inode flags consistency are checked at build time
ext4: Remove CONFIG_EXT4_FS_XATTR
ext4: remove unused variable from ext4_ext_in_cache()
ext4: remove redundant initialization in ext4_fill_super()
ext4: remove redundant code in ext4_alloc_inode()
ext4: use sync_inode_metadata() when syncing inode metadata
ext4: enable ext4 inline support
ext4: let fallocate handle inline data correctly
ext4: let ext4_truncate handle inline data correctly
ext4: evict inline data out if we need to strore xattr in inode
ext4: let fiemap work with inline data
ext4: let ext4_rename handle inline dir
ext4: let empty_dir handle inline dir
ext4: let ext4_delete_entry() handle inline data
ext4: make ext4_delete_entry generic
ext4: let ext4_find_entry handle inline data
ext4: create a new function search_dir
ext4: let ext4_readdir handle inline data
ext4: let add_dir_entry handle inline data properly
...

Linus Torvalds
2012-12-17 09:33:01 +0800

19 Nov, 2012

1 commit

48fc7f7e7 Fix misspellings of "whether" in comments. ... Browse Code »

"Whether" is misspelled in various comments across the tree; this
fixes them. No code changes.

Signed-off-by: Adam Buchbinder
Signed-off-by: Jiri Kosina

Adam Buchbinder
2012-11-19 21:31:35 +0800

09 Nov, 2012

1 commit

37be2f59d ext4: remove ext4_handle_release_buffer() ... Browse Code »

ext4_handle_release_buffer() was intended to remove journal
write access from a buffer, but it doesn't actually do anything
at all other than add a BUFFER_TRACE point, but it's not reliably
used for that either. Remove all the associated dead code.

Signed-off-by: Eric Sandeen
Signed-off-by: "Theodore Ts'o"
Reviewed-by: Carlos Maiolino

Eric Sandeen
2012-11-09 00:22:46 +0800