Doug / smarc-fsl-linux-kernel | Embedian Git Server

04 May, 2013

1 commit

14a9e5c09 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs ... Browse Code »

Pull ext3/jbd fixes from Jan Kara:
"A couple of ext3/jbd fixes"

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
jbd: use kmem_cache_zalloc for allocating journal head
jbd: use kmem_cache_zalloc instead of kmem_cache_alloc/memset
jbd: don't wait (forever) for stale tid caused by wraparound
ext3: fix data=journal fast mount/umount hang

Linus Torvalds
2013-05-04 00:56:25 +0800

30 Apr, 2013

1 commit

e76004093 fs/buffer.c: remove unnecessary init operation after allocating buffer_head. ... Browse Code »

bh allocation uses kmem_cache_zalloc() so we needn't call
'init_buffer(bh, NULL, NULL)' and perform other set-zero-operations.

Signed-off-by: Jianpeng Ma
Cc: Jan Kara
Cc: Theodore Ts'o
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

majianpeng
2013-04-30 06:54:39 +0800

29 Apr, 2013

1 commit

8bb9da943 jbd: use kmem_cache_zalloc for allocating journal head ... Browse Code »

This commit tries to use kmem_cache_zalloc instead of kmem_cache_alloc/
memset when a new journal head is alloctated.

Signed-off-by: Zheng Liu
Cc: Jan Kara
Signed-off-by: Jan Kara

Zheng Liu
2013-04-29 20:34:05 +0800

28 Mar, 2013

1 commit

e678a4f0f jbd: don't wait (forever) for stale tid caused by wraparound ... Browse Code »

In the case where an inode has a very stale transaction id (tid) in
i_datasync_tid or i_sync_tid, it's possible that after a very large
(2**31) number of transactions, that the tid number space might wrap,
causing tid_geq()'s calculations to fail.

Commit d9b0193 "jbd: fix fsync() tid wraparound bug" attempted to fix
this problem, but it only avoided kjournald spinning forever by fixing
the logic in jbd_log_start_commit().

Signed-off-by: Jan Kara

Jan Kara
2013-03-28 00:30:59 +0800

15 Jan, 2013

1 commit

7e2fb2d7e jbd: don't wake kjournald unnecessarily ... Browse Code »

Don't send an extra wakeup to kjournald in the case where we
already have the proper target in j_commit_request, i.e. that
commit has already been requested for commit.

commit d9b0193 "jbd: fix fsync() tid wraparound bug" changed
the logic leading to a wakeup, but it caused some extra wakeups
which were found to lead to a measurable performance regression.

Signed-off-by: Eric Sandeen
Signed-off-by: Jan Kara

Eric Sandeen
2013-01-15 05:50:45 +0800

15 Aug, 2012

1 commit

2e84f2641 jbd: don't write superblock when unmounting an ro filesystem ... Browse Code »

This sequence:

results in an IO error when unmounting the RO filesystem. The bug was
introduced by:

commit 9754e39c7bc51328f145e933bfb0df47cd67b6e9
Author: Jan Kara
Date: Sat Apr 7 12:33:03 2012 +0200

jbd: Split updating of journal superblock and marking journal empty

which lost some of the magic in journal_update_superblock() which
used to test for a journal with no outstanding transactions.

This is a port of a jbd2 fix by Eric Sandeen.

CC: # 3.4.x
Signed-off-by: Jan Kara

Jan Kara
2012-08-15 19:53:30 +0800

04 Aug, 2012

1 commit

12810ad70 jbd/jbd2: nuke write_super from comments ... Browse Code »

The '->write_super' superblock method is gone, and this patch removes all the
references to 'write_super' from various jbd and jbd2.

Cc: Andrew Morton
Cc: Jan Kara
Cc: "Theodore Ts'o"
Signed-off-by: Artem Bityutskiy
Signed-off-by: Al Viro

Artem Bityutskiy
2012-08-04 16:15:36 +0800

16 May, 2012

3 commits

fd2cbd4df jbd: Write journal superblock with WRITE_FUA after checkpointing ... Browse Code »

If journal superblock is written only in disk's caches and other transaction
starts reusing space of the transaction cleaned from the log, it can happen
blocks of a new transaction reach the disk before journal superblock. When
power failure happens in such case, subsequent journal replay would still try
to replay the old transaction but some of it's blocks may be already
overwritten by the new transaction. For this reason we must use WRITE_FUA when
updating log tail and we must first write new log tail to disk and update
in-memory information only after that.

Signed-off-by: Jan Kara

Jan Kara
2012-05-16 05:34:37 +0800
1ce8486dc jbd: protect all log tail updates with j_checkpoint_mutex ... Browse Code »

There are some log tail updates that are not protected by j_checkpoint_mutex.
Some of these are harmless because they happen during startup or shutdown but
updates in journal_commit_transaction() and journal_flush() can really race
with other log tail updates (e.g. someone doing journal_flush() with someone
running cleanup_journal_tail()). So protect all log tail updates with
j_checkpoint_mutex.

Signed-off-by: Jan Kara

Jan Kara
2012-05-16 05:34:36 +0800
9754e39c7 jbd: Split updating of journal superblock and marking journal empty ... Browse Code »

There are three case of updating journal superblock. In the first case, we want
to mark journal as empty (setting s_sequence to 0), in the second case we want
to update log tail, in the third case we want to update s_errno. Split these
cases into separate functions. It makes the code slightly more straightforward
and later patches will make the distinction even more important.

Signed-off-by: Jan Kara

Jan Kara
2012-05-16 05:34:36 +0800

11 Apr, 2012

1 commit

2db938bee jbd: Refine commit writeout logic ... Browse Code »

Currently we write out all journal buffers in WRITE_SYNC mode. This improves
performance for fsync heavy workloads but hinders performance when writes
are mostly asynchronous, most noticably it slows down readers and users
complain about slow desktop response etc.

So submit writes as asynchronous in the normal case and only submit writes as
WRITE_SYNC if we detect someone is waiting for current transaction commit.

I've gathered some numbers to back this change. The first is the read latency
test. It measures time to read 1 MB after several seconds of sleeping in
presence of streaming writes.

Top 10 times (out of 90) in us:
Before After
2131586 697473
1709932 557487
1564598 535642
1480462 347573
1478579 323153
1408496 222181
1388960 181273
1329565 181070
1252486 172832
1223265 172278

Average:
619377 82180

So the improvement in both maximum and average latency is massive.

I've measured fsync throughput by:
fs_mark -n 100 -t 1 -s 16384 -d /mnt/fsync/ -S 1 -L 4

in presence of streaming reader. The numbers (fsyncs/s) are:
Before After
9.9 6.3
6.8 6.0
6.3 6.2
5.8 6.1

So fsync performance seems unharmed by this change.

Signed-off-by: Jan Kara

Jan Kara
2012-04-11 17:12:44 +0800

22 Mar, 2012

1 commit

c7c66c0cb Merge tag 'pm-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm ... Browse Code »

Pull power management updates for 3.4 from Rafael Wysocki:
"Assorted extensions and fixes including:

* Introduction of early/late suspend/hibernation device callbacks.
* Generic PM domains extensions and fixes.
* devfreq updates from Axel Lin and MyungJoo Ham.
* Device PM QoS updates.
* Fixes of concurrency problems with wakeup sources.
* System suspend and hibernation fixes."

* tag 'pm-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (43 commits)
PM / Domains: Check domain status during hibernation restore of devices
PM / devfreq: add relation of recommended frequency.
PM / shmobile: Make MTU2 driver use pm_genpd_dev_always_on()
PM / shmobile: Make CMT driver use pm_genpd_dev_always_on()
PM / shmobile: Make TMU driver use pm_genpd_dev_always_on()
PM / Domains: Introduce "always on" device flag
PM / Domains: Fix hibernation restore of devices, v2
PM / Domains: Fix handling of wakeup devices during system resume
sh_mmcif / PM: Use PM QoS latency constraint
tmio_mmc / PM: Use PM QoS latency constraint
PM / QoS: Make it possible to expose PM QoS latency constraints
PM / Sleep: JBD and JBD2 missing set_freezable()
PM / Domains: Fix include for PM_GENERIC_DOMAINS=n case
PM / Freezer: Remove references to TIF_FREEZE in comments
PM / Sleep: Add more wakeup source initialization routines
PM / Hibernate: Enable usermodehelpers in hibernate() error path
PM / Sleep: Make __pm_stay_awake() delete wakeup source timers
PM / Sleep: Fix race conditions related to wakeup source timer function
PM / Sleep: Fix possible infinite loop during wakeup source destruction
PM / Hibernate: print physical addresses consistently with other parts of kernel
...

Linus Torvalds
2012-03-22 01:15:51 +0800

20 Mar, 2012

1 commit

8fb53c46d jbd: remove the second argument of k[un]map_atomic() ... Browse Code »

Acked-by: Jan Kara
Signed-off-by: Cong Wang

Cong Wang
2012-03-20 21:48:23 +0800

14 Mar, 2012

1 commit

35c80422a PM / Sleep: JBD and JBD2 missing set_freezable() ... Browse Code »

With the latest and greatest changes to the freezer, I started seeing
panics that were caused by jbd2 running post-process freezing and
hitting the canary BUG_ON for non-TuxOnIce I/O submission. I've traced
this back to a lack of set_freezable calls in both jbd and jbd2. Since
they're clearly meant to be frozen (there are tests for freezing()), I
submit the following patch to add the missing calls.

Signed-off-by: Nigel Cunningham
Acked-by: Jan Kara
Signed-off-by: Rafael J. Wysocki

Nigel Cunningham
2012-03-14 05:36:44 +0800

10 Jan, 2012

1 commit

ac69e0928 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs ... Browse Code »

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
ext2/3/4: delete unneeded includes of module.h
ext{3,4}: Fix potential race when setversion ioctl updates inode
udf: Mark LVID buffer as uptodate before marking it dirty
ext3: Don't warn from writepage when readonly inode is spotted after error
jbd: Remove j_barrier mutex
reiserfs: Force inode evictions before umount to avoid crash
reiserfs: Fix quota mount option parsing
udf: Treat symlink component of type 2 as /
udf: Fix deadlock when converting file from in-ICB one to normal one
udf: Cleanup calling convention of inode_getblk()
ext2: Fix error handling on inode bitmap corruption
ext3: Fix error handling on inode bitmap corruption
ext3: replace ll_rw_block with other functions
ext3: NULL dereference in ext3_evict_inode()
jbd: clear revoked flag on buffers before a new transaction started
ext3: call ext3_mark_recovery_complete() when recovery is really needed

Linus Torvalds
2012-01-10 04:51:21 +0800

09 Jan, 2012

1 commit

004827855 jbd: Remove j_barrier mutex ... Browse Code »

j_barrier mutex is used for serializing different journal lock operations. The
problem with it is that e.g. FIFREEZE ioctl results in process leaving kernel
with j_barrier mutex held which makes lockdep freak out. Also hibernation code
wants to freeze filesystem but it cannot do so because it then cannot hibernate
the system because of mutex being locked.

So we remove j_barrier mutex and use direct wait on j_barrier_count instead.
Since locking journal is a rare operation we don't have to care about fairness
or such things.

CC: Andrew Morton
Acked-by: Joel Becker
Signed-off-by: Jan Kara

Jan Kara
2012-01-09 20:52:09 +0800

22 Nov, 2011

1 commit

a0acae0e8 freezer: unexport refrigerator() and update try_to_freeze() slightly ... Browse Code »

There is no reason to export two functions for entering the
refrigerator. Calling refrigerator() instead of try_to_freeze()
doesn't save anything noticeable or removes any race condition.

* Rename refrigerator() to __refrigerator() and make it return bool
indicating whether it scheduled out for freezing.

* Update try_to_freeze() to return bool and relay the return value of
__refrigerator() if freezing().

* Convert all refrigerator() users to try_to_freeze().

* Update documentation accordingly.

* While at it, add might_sleep() to try_to_freeze().

Signed-off-by: Tejun Heo
Cc: Samuel Ortiz
Cc: Chris Mason
Cc: "Theodore Ts'o"
Cc: Steven Whitehouse
Cc: Andrew Morton
Cc: Jan Kara
Cc: KONISHI Ryusuke
Cc: Christoph Hellwig

Tejun Heo
2011-11-22 04:32:22 +0800

02 Nov, 2011

1 commit

8762202dd jbd/jbd2: validate sb->s_first in journal_get_superblock() ... Browse Code »

I hit a J_ASSERT(blocknr != 0) failure in cleanup_journal_tail() when
mounting a fsfuzzed ext3 image. It turns out that the corrupted ext3
image has s_first = 0 in journal superblock, and the 0 is passed to
journal->j_head in journal_reset(), then to blocknr in
cleanup_journal_tail(), in the end the J_ASSERT failed.

So validate s_first after reading journal superblock from disk in
journal_get_superblock() to ensure s_first is valid.

The following script could reproduce it:

fstype=ext3
blocksize=1024
img=$fstype.img
offset=0
found=0
magic="c0 3b 39 98"

dd if=/dev/zero of=$img bs=1M count=8
mkfs -t $fstype -b $blocksize -F $img
filesize=`stat -c %s $img`
while [ $offset -lt $filesize ]
do
if od -j $offset -N 4 -t x1 $img | grep -i "$magic";then
echo "Found journal: $offset"
found=1
break
fi
offset=`echo "$offset+$blocksize" | bc`
done

if [ $found -ne 1 ];then
echo "Magic \"$magic\" not found"
exit 1
fi

dd if=/dev/zero of=$img seek=$(($offset+23)) conv=notrunc bs=1 count=1

mkdir -p ./mnt
mount -o loop $img ./mnt

Cc: Jan Kara
Signed-off-by: Eryu Guan
Signed-off-by: "Theodore Ts'o"

Eryu Guan
2011-11-02 07:04:59 +0800

27 Jun, 2011

1 commit

bb189247f jbd: Fix oops in journal_remove_journal_head() ... Browse Code »

journal_remove_journal_head() can oops when trying to access journal_head
returned by bh2jh(). This is caused for example by the following race:

TASK1 TASK2
journal_commit_transaction()
...
processing t_forget list
__journal_refile_buffer(jh);
if (!jh->b_transaction) {
jbd_unlock_bh_state(bh);
journal_try_to_free_buffers()
journal_grab_journal_head(bh)
jbd_lock_bh_state(bh)
__journal_try_to_free_buffer()
journal_put_journal_head(jh)
journal_remove_journal_head(bh);

journal_put_journal_head() in TASK2 sees that b_jcount == 0 and buffer is not
part of any transaction and thus frees journal_head before TASK1 gets to doing
so. Note that even buffer_head can be released by try_to_free_buffers() after
journal_put_journal_head() which adds even larger opportunity for oops (but I
didn't see this happen in reality).

Fix the problem by making transactions hold their own journal_head reference
(in b_jcount). That way we don't have to remove journal_head explicitely via
journal_remove_journal_head() and instead just remove journal_head when
b_jcount drops to zero. The result of this is that [__]journal_refile_buffer(),
[__]journal_unfile_buffer(), and __journal_remove_checkpoint() can free
journal_head which needs modification of a few callers. Also we have to be
careful because once journal_head is removed, buffer_head might be freed as
well. So we have to get our own buffer_head reference where it matters.

Signed-off-by: Jan Kara

Jan Kara
2011-06-27 17:44:37 +0800

25 Jun, 2011

1 commit

99cb1a318 jbd: Add fixed tracepoints ... Browse Code »

This commit adds fixed tracepoint for jbd. It has been based on fixed
tracepoints for jbd2, however there are missing those for collecting
statistics, since I think that it will require more intrusive patch so I
should have its own commit, if someone decide that it is needed. Also
there are new tracepoints in __journal_drop_transaction() and
journal_update_superblock().

The list of jbd tracepoints:

jbd_checkpoint
jbd_start_commit
jbd_commit_locking
jbd_commit_flushing
jbd_commit_logging
jbd_drop_transaction
jbd_end_commit
jbd_do_submit_data
jbd_cleanup_journal_tail
jbd_update_superblock_end

Signed-off-by: Lukas Czerner
Cc: Jan Kara
Signed-off-by: Jan Kara

Lukas Czerner
2011-06-25 23:29:51 +0800

17 May, 2011

1 commit

d9b01934d jbd: fix fsync() tid wraparound bug ... Browse Code »

If an application program does not make any changes to the indirect
blocks or extent tree, i_datasync_tid will not get updated. If there
are enough commits (i.e., 2**31) such that tid_geq()'s calculations
wrap, and there isn't a currently active transaction at the time of
the fdatasync() call, this can end up triggering a BUG_ON in
fs/jbd/commit.c:

J_ASSERT(journal->j_running_transaction != NULL);

It's pretty rare that this can happen, since it requires the use of
fdatasync() plus *very* frequent and excessive use of fsync(). But
with the right workload, it can.

We fix this by replacing the use of tid_geq() with an equality test,
since there's only one valid transaction id that is valid for us to
start: namely, the currently running transaction (if it exists).

CC: stable@kernel.org
Reported-by: Martin_Zielinski@McAfee.com
Signed-off-by: "Theodore Ts'o"
Signed-off-by: Jan Kara

Ted Ts'o
2011-05-17 19:47:41 +0800

31 Mar, 2011

1 commit

25985edce Fix common misspellings ... Browse Code »

Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi

Lucas De Marchi
2011-03-31 22:26:23 +0800

01 Mar, 2011

1 commit

3c26bdb42 jbd: Remove one to many n's in a word. ... Browse Code »

The Patch below removes one to many "n's" in a word..

Signed-off-by: Justin P. Mattock
CC: Andrew Morton
CC: linux-ext4@vger.kernel.org
Acked-by: "Theodore Ts'o"
Signed-off-by: Jan Kara

Justin P. Mattock
2011-03-01 04:55:58 +0800

28 Oct, 2010

4 commits

bcf3d0bcf jbd/2: fixed typos ... Browse Code »

"wakup"

Signed-off-by: Andrea Gelmini
Signed-off-by: Jan Kara

Andrea Gelmini
2010-10-28 07:30:05 +0800
2a0e33889 jbd: Check return value of __getblk() ... Browse Code »

Fail journal creation if __getblk() returns NULL. unlikely() is
added because it is called in a loop and we've been OK without
the check until now.

Signed-off-by: Namhyung Kim
Signed-off-by: Jan Kara

Namhyung Kim
2010-10-28 07:30:04 +0800
dff6825e9 ext3/jbd: Avoid WARN() messages when failing to write the superblock ... Browse Code »

This fixes a WARN backtrace in mark_buffer_dirty() that occurs during unmount
when the underlying block device is removed. This bug has been seen on System
Z when removing all paths from a multipath-backed ext3 mount; on System P when
injecting enough PCI EEH errors to make the SCSI controller go offline; and
similar warnings have been seen (and patched) with ext2/ext4.

The super block update from a previous operation has marked the buffer as in
error, and the flag has to be cleared before doing the update. Similar changes
have been made to ext4 by commit 914258bf2cb22bf4336a1b1d90c551b4b11ca5aa.

Signed-off-by: Darrick J. Wong
Signed-off-by: Jan Kara

Darrick J. Wong
2010-10-28 07:30:02 +0800
f81e3d456 jbd: Use printk_ratelimited() in journal_alloc_journal_head() ... Browse Code »

Use printk_ratelimited() instead of doing it manually.

Signed-off-by: Namhyung Kim
Signed-off-by: Jan Kara

Namhyung Kim
2010-10-28 07:30:01 +0800

18 Aug, 2010

1 commit

9cb569d60 remove SWRITE* I/O types ... Browse Code »

These flags aren't real I/O types, but tell ll_rw_block to always
lock the buffer instead of giving up on a failed trylock.

Instead add a new write_dirty_buffer helper that implements this semantic
and use it from the existing SWRITE* callers. Note that the ll_rw_block
code had a bug where it didn't promote WRITE_SYNC_PLUG properly, which
this patch fixes.

In the ufs code clean up the helper that used to call ll_rw_block
to mirror sync_dirty_buffer, which is the function it implements for
compound buffers.

Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2010-08-18 13:09:01 +0800

21 Jul, 2010

1 commit

0411ba790 ext3: Fix set but unused variables ... Browse Code »

[tytso@mit.edu: Fix compilation with CONFIG_JBD_DEBUG enabled]

Acked-by: tytso@mit.edu
cc: linux-ext4@vger.kernel.org
Signed-off-by: Andi Kleen
Signed-off-by: Jan Kara

Andi Kleen
2010-07-21 22:01:47 +0800

22 May, 2010

2 commits

527797087 ext3: Fix waiting on transaction during fsync ... Browse Code »

log_start_commit() returns 1 only when it started a transaction
commit. Thus in case transaction commit is already running, we
fail to wait for the commit to finish. Fix the issue by always
waiting for the commit regardless of the log_start_commit return
value.

Signed-off-by: Jan Kara

Jan Kara
2010-05-22 01:30:41 +0800
03f4d804a jbd: Provide function to check whether transaction will issue data barrier ... Browse Code »

Provide a function which returns whether a transaction with given tid
will send a barrier to the filesystem device. The function will be used
by ext3 to detect whether fsync needs to send a separate barrier or not.

Signed-off-by: Jan Kara

Jan Kara
2010-05-22 01:30:40 +0800

23 Dec, 2009

1 commit

765f83619 jbd: jbd-debug and jbd2-debug should be writable ... Browse Code »

jbd-debug and jbd2-debug is currently read-only (S_IRUGO), which is not
correct. Make it writable so that we can start debuging.

Signed-off-by: Yin Kangkai
Reviewed-by: Aneesh Kumar K.V
Signed-off-by: Andrew Morton
Signed-off-by: Jan Kara

Yin Kangkai
2009-12-23 20:44:13 +0800

12 Nov, 2009

1 commit

ff5e4b51a fs/jbd: Export log_start_commit to fix ext3 build. ... Browse Code »

This fixes:
ERROR: "log_start_commit" [fs/ext3/ext3.ko] undefined!

Signed-off-by: Stefan Schmidt

Stefan Schmidt
2009-11-12 17:24:12 +0800

11 Nov, 2009

1 commit

7b02bec07 JBD/JBD2: free j_wbuf if journal init fails. ... Browse Code »

If journal init fails, we need to free j_wbuf.

Cc: Andrew Morton
Cc: Jan Kara
Signed-off-by: Tao Ma
Signed-off-by: Jan Kara

Tao Ma
2009-11-11 22:24:14 +0800

16 Sep, 2009

1 commit

9c28cbcce jbd: Journal block numbers can ever be only 32-bit use unsigned int for them ... Browse Code »

It does not make sense to store block number for journal as unsigned long
since they can be only 32-bit (because of on-disk format limitation). So
change in-memory structures and variables to use unsigned int instead.

Signed-off-by: Jan Kara

Jan Kara
2009-09-16 23:44:10 +0800

21 Jul, 2009

1 commit

f1015c447 jbd: fix race between write_metadata_buffer and get_write_access ... Browse Code »

The function journal_write_metadata_buffer() calls jbd_unlock_bh_state(bh_in)
too early; this could potentially allow another thread to call get_write_access
on the buffer head, modify the data, and dirty it, and allowing the wrong data
to be written into the journal. Fortunately, if we lose this race, the only
time this will actually cause filesystem corruption is if there is a system
crash or other unclean shutdown of the system before the next commit can take
place.

Signed-off-by: dingdinghua
Acked-by: "Theodore Ts'o"
Signed-off-by: Jan Kara

dingdinghua
2009-07-21 17:54:42 +0800

16 Jul, 2009

1 commit

7447a668a jbd: Fail to load a journal if it is too short ... Browse Code »

Due to on disk corruption, it can happen that journal is too short. Fail
to load it in such case so that we don't oops somewhere later.

Reported-by: Nageswara R Sastry
Signed-off-by: Jan Kara

Jan Kara
2009-07-16 03:26:23 +0800

03 Apr, 2009

1 commit

ecca9af0a jbd: fix oops in jbd_journal_init_inode() on corrupted fs ... Browse Code »

On 32-bit system with CONFIG_LBD getblk can fail because provided block
number is too big. Make JBD gracefully handle that.

Signed-off-by: Jan Kara
Cc:
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Kara
2009-04-03 10:04:52 +0800

12 Feb, 2009

1 commit

8fe4cd0dc jbd: fix return value of journal_start_commit() ... Browse Code »

journal_start_commit() returns 1 if either a transaction is committing or
the function has queued a transaction commit. But it returns 0 if we
raced with somebody queueing the transaction commit as well. This
resulted in ext3_sync_fs() not functioning correctly (description from
Arthur Jones): In the case of a data=ordered umount with pending long
symlinks which are delayed due to a long list of other I/O on the backing
block device, this causes the buffer associated with the long symlinks to
not be moved to the inode dirty list in the second phase of fsync_super.
Then, before they can be dirtied again, kjournald exits, seeing the UMOUNT
flag and the dirty pages are never written to the backing block device,
causing long symlink corruption and exposing new or previously freed block
data to userspace.

This can be reproduced with a script created by Eric Sandeen
:

#!/bin/bash

umount /mnt/test2
mount /dev/sdb4 /mnt/test2
rm -f /mnt/test2/*
dd if=/dev/zero of=/mnt/test2/bigfile bs=1M count=512
touch /mnt/test2/thisisveryveryveryveryveryveryveryveryveryveryveryveryveryveryveryverylongfilename
ln -s /mnt/test2/thisisveryveryveryveryveryveryveryveryveryveryveryveryveryveryveryverylongfilename
/mnt/test2/link
umount /mnt/test2
mount /dev/sdb4 /mnt/test2
ls /mnt/test2/

This patch fixes journal_start_commit() to always return 1 when there's
a transaction committing or queued for commit.

Cc: Eric Sandeen
Cc: Mike Snitzer
Cc:
Signed-off-by: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Kara
2009-02-12 06:25:35 +0800

23 Oct, 2008

1 commit

4afe97853 jbd: fix error handling for checkpoint io ... Browse Code »

When a checkpointing IO fails, current JBD code doesn't check the error
and continue journaling. This means latest metadata can be lost from both
the journal and filesystem.

This patch leaves the failed metadata blocks in the journal space and
aborts journaling in the case of log_do_checkpoint(). To achieve this, we
need to do:

1. don't remove the failed buffer from the checkpoint list where in
the case of __try_to_free_cp_buf() because it may be released or
overwritten by a later transaction
2. log_do_checkpoint() is the last chance, remove the failed buffer
from the checkpoint list and abort the journal
3. when checkpointing fails, don't update the journal super block to
prevent the journaled contents from being cleaned. For safety,
don't update j_tail and j_tail_sequence either
4. when checkpointing fails, notify this error to the ext3 layer so
that ext3 don't clear the needs_recovery flag, otherwise the
journaled contents are ignored and cleaned in the recovery phase
5. if the recovery fails, keep the needs_recovery flag
6. prevent cleanup_journal_tail() from being called between
__journal_drop_transaction() and journal_abort() (a race issue
between journal_flush() and __log_wait_for_space()

Signed-off-by: Hidehiro Kawai
Acked-by: Jan Kara
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hidehiro Kawai
2008-10-23 23:55:01 +0800