Eric Lee / smarc-fsl-linux-kernel

11 Feb, 2009

2 commits

7f5aa2150 jbd2: Avoid possible NULL dereference in jbd2_journal_begin_ordered_truncate() ... Browse Code »

If we race with commit code setting i_transaction to NULL, we could
possibly dereference it. Proper locking requires the journal pointer
(to access journal->j_list_lock), which we don't have. So we have to
change the prototype of the function so that filesystem passes us the
journal pointer. Also add a more detailed comment about why the
function jbd2_journal_begin_ordered_truncate() does what it does and
how it should be used.

Thanks to Dan Carpenter for pointing to the
suspitious code.

Signed-off-by: Jan Kara
Signed-off-by: "Theodore Ts'o"
Acked-by: Joel Becker
CC: linux-ext4@vger.kernel.org
CC: ocfs2-devel@oss.oracle.com
CC: mfasheh@suse.de
CC: Dan Carpenter

Jan Kara
2009-02-11 00:15:34 +0800
c88ccea31 jbd2: Fix return value of jbd2_journal_start_commit() ... Browse Code »

The function jbd2_journal_start_commit() returns 1 if either a
transaction is committing or the function has queued a transaction
commit. But it returns 0 if we raced with somebody queueing the
transaction commit as well. This resulted in ext4_sync_fs() not
functioning correctly (description from Arthur Jones):

In the case of a data=ordered umount with pending long symlinks
which are delayed due to a long list of other I/O on the backing
block device, this causes the buffer associated with the long
symlinks to not be moved to the inode dirty list in the second
phase of fsync_super. Then, before they can be dirtied again,
kjournald exits, seeing the UMOUNT flag and the dirty pages are
never written to the backing block device, causing long symlink
corruption and exposing new or previously freed block data to
userspace.

This can be reproduced with a script created by Eric Sandeen
:

#!/bin/bash

umount /mnt/test2
mount /dev/sdb4 /mnt/test2
rm -f /mnt/test2/*
dd if=/dev/zero of=/mnt/test2/bigfile bs=1M count=512
touch /mnt/test2/thisisveryveryveryveryveryveryveryveryveryveryveryveryveryveryveryverylongfilename
ln -s /mnt/test2/thisisveryveryveryveryveryveryveryveryveryveryveryveryveryveryveryverylongfilename
/mnt/test2/link
umount /mnt/test2
mount /dev/sdb4 /mnt/test2
ls /mnt/test2/

This patch fixes jbd2_journal_start_commit() to always return 1 when
there's a transaction committing or queued for commit.

Signed-off-by: Jan Kara
Signed-off-by: "Theodore Ts'o"
CC: Eric Sandeen
CC: linux-ext4@vger.kernel.org

Jan Kara
2009-02-11 00:27:46 +0800

12 Jan, 2009

1 commit

c225aa57f ext4: fix wrong use of do_div ... Browse Code »

the following warning:

fs/jbd2/journal.c: In function ‘jbd2_seq_info_show’:
fs/jbd2/journal.c:850: warning: format ‘%lu’ expects type ‘long
unsigned int’, but argument 3 has type ‘uint32_t’

is caused by wrong usage of do_div that modifies the dividend in-place
and returns the quotient. So not only would an incorrect value be
displayed, but s->journal->j_average_commit_time would also be changed
to a wrong value!

Fix it by using div_u64 instead.

Signed-off-by: Simon Holm Thøgersen
Signed-off-by: "Theodore Ts'o"

Simon Holm Thøgersen
2009-01-12 11:34:01 +0800

09 Jan, 2009

1 commit

2150edc6c Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 ... Browse Code »

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (57 commits)
jbd2: Fix oops in jbd2_journal_init_inode() on corrupted fs
ext4: Remove "extents" mount option
block: Add Kconfig help which notes that ext4 needs CONFIG_LBD
ext4: Make printk's consistently prefixed with "EXT4-fs: "
ext4: Add sanity checks for the superblock before mounting the filesystem
ext4: Add mount option to set kjournald's I/O priority
jbd2: Submit writes to the journal using WRITE_SYNC
jbd2: Add pid and journal device name to the "kjournald2 starting" message
ext4: Add markers for better debuggability
ext4: Remove code to create the journal inode
ext4: provide function to release metadata pages under memory pressure
ext3: provide function to release metadata pages under memory pressure
add releasepage hooks to block devices which can be used by file systems
ext4: Fix s_dirty_blocks_counter if block allocation failed with nodelalloc
ext4: Init the complete page while building buddy cache
ext4: Don't allow new groups to be added during block allocation
ext4: mark the blocks/inode bitmap beyond end of group as used
ext4: Use new buffer_head flag to check uninit group bitmaps initialization
ext4: Fix the race between read_inode_bitmap() and ext4_new_inode()
ext4: code cleanup
...

Linus Torvalds
2009-01-09 09:14:59 +0800

07 Jan, 2009

2 commits

4b905671d jbd2: Fix oops in jbd2_journal_init_inode() on corrupted fs ... Browse Code »

On 32-bit system with CONFIG_LBD getblk can fail because provided
block number is too big. Add error checks so we fail gracefully if
getblk() returns NULL (which can also happen on memory allocation
failures).

Thanks to David Maciejak from Fortinet's FortiGuard Global Security
Research Team for reporting this bug.

http://bugzilla.kernel.org/show_bug.cgi?id=12370

Signed-off-by: Jan Kara
Signed-off-by: "Theodore Ts'o"
cc: stable@kernel.org

Jan Kara
2009-01-07 03:53:35 +0800
c31910672 ext4: Remove code to create the journal inode ... Browse Code »

This code has been obsolete in quite some time, since the supported
method for adding a journal inode is to use tune2fs (or to creating
new filesystem with a journal via mke2fs or mkfs.ext4).

Signed-off-by: "Theodore Ts'o"

Theodore Ts'o
2009-01-07 00:14:25 +0800

06 Jan, 2009

2 commits

e06c8227f jbd2: Add buffer triggers ... Browse Code »

Filesystems often to do compute intensive operation on some
metadata. If this operation is repeated many times, it can be very
expensive. It would be much nicer if the operation could be performed
once before a buffer goes to disk.

This adds triggers to jbd2 buffer heads. Just before writing a metadata
buffer to the journal, jbd2 will optionally call a commit trigger associated
with the buffer. If the journal is aborted, an abort trigger will be
called on any dirty buffers as they are dropped from pending
transactions.

ocfs2 will use this feature.

Initially I tried to come up with a more generic trigger that could be
used for non-buffer-related events like transaction completion. It
doesn't tie nicely, because the information a buffer trigger needs
(specific to a journal_head) isn't the same as what a transaction
trigger needs (specific to a tranaction_t or perhaps journal_t). So I
implemented a buffer set, with the understanding that
journal/transaction wide triggers should be implemented separately.

There is only one trigger set allowed per buffer. I can't think of any
reason to attach more than one set. Contrast this with a journal or
transaction in which multiple places may want to watch the entire
transaction separately.

The trigger sets are considered static allocation from the jbd2
perspective. ocfs2 will just have one trigger set per block type,
setting the same set on every bh of the same type.

Signed-off-by: Joel Becker
Cc: "Theodore Ts'o"
Cc:
Signed-off-by: Mark Fasheh

Joel Becker
2009-01-06 00:40:30 +0800
fd98496f4 jbd2: Add barrier not supported test to journal_wait_on_commit_record ... Browse Code »

Xen doesn't report that barriers are not supported until buffer I/O is
reported as completed, instead of when the buffer I/O is submitted.
Add a check and a fallback codepath to journal_wait_on_commit_record()
to detect this case, so that attempts to mount ext4 filesystems on
LVM/devicemapper devices on Xen guests don't blow up with an "Aborting
journal on device XXX"; "Remounting filesystem read-only" error.

Thanks to Andreas Sundstrom for reporting this issue.

Signed-off-by: "Theodore Ts'o"
Cc: stable@kernel.org

Theodore Ts'o
2009-01-06 10:34:13 +0800

05 Jan, 2009

1 commit

40a1984d2 jbd2: Submit writes to the journal using WRITE_SYNC ... Browse Code »

Since we will be waiting the write of the commit record to the journal
to complete in journal_submit_commit_record(), submit it using
WRITE_SYNC.

Signed-off-by: "Theodore Ts'o"

Theodore Ts'o
2009-01-05 08:55:57 +0800

04 Jan, 2009

2 commits

4a9bf99b2 jbd2: Add pid and journal device name to the "kjournald2 starting" message ... Browse Code »

Signed-off-by: "Theodore Ts'o"

Theodore Ts'o
2009-01-04 11:56:44 +0800
30773840c ext4: add fsync batch tuning knobs ... Browse Code »

Add new mount options, min_batch_time and max_batch_time, which
controls how long the jbd2 layer should wait for additional filesystem
operations to get batched with a synchronous write transaction.

Signed-off-by: "Theodore Ts'o"

Theodore Ts'o
2009-01-04 09:27:38 +0800

17 Dec, 2008

1 commit

d7cfa4684 ext4: display average commit time ... Browse Code »

Display the average commit time (which is used by the ext4 fsync
batching patch) in /proc/fs/jbd2/*/info for performance tuning
purposes.

Signed-off-by: "Theodore Ts'o"

Theodore Ts'o
2008-12-17 13:20:45 +0800

26 Nov, 2008

1 commit

e07f7183a jbd2: improve jbd2 fsync batching ... Browse Code »

This patch removes the static sleep time in favor of a more self
optimizing approach where we measure the average amount of time it
takes to commit a transaction to disk and the ammount of time a
transaction has been running. If somebody does a sync write or an
fsync() traditionally we would sleep for 1 jiffies, which depending on
the value of HZ could be a significant amount of time compared to how
long it takes to commit a transaction to the underlying storage. With
this patch instead of sleeping for a jiffie, we check to see if the
amount of time this transaction has been running is less than the
average commit time, and if it is we sleep for the delta using
schedule_hrtimeout to give us a higher precision sleep time. This
greatly benefits high end storage where you could end up sleeping for
longer than it takes to commit the transaction and therefore sitting
idle instead of allowing the transaction to be committed by keeping
the sleep time to a minimum so you are sure to always be doing
something.

Signed-off-by: Josef Bacik
Signed-off-by: "Theodore Ts'o"

Josef Bacik
2008-11-26 14:14:26 +0800

07 Nov, 2008

2 commits

fb68407b0 jbd2: Call journal commit callback without holding j_list_lock ... Browse Code »

Avoid freeing the transaction in __jbd2_journal_drop_transaction() so
the journal commit callback can run without holding j_list_lock, to
avoid lock contention on this spinlock.

Signed-off-by: Aneesh Kumar K.V
Signed-off-by: "Theodore Ts'o"

Aneesh Kumar K.V
2008-11-07 06:50:21 +0800
8c3f25d89 jbd2: don't give up looking for space so easily in __jbd2_log_wait_for_space ... Browse Code »

Commit 23f8b79e introducd a regression because it assumed that if
there were no transactions ready to be checkpointed, that no progress
could be made on making space available in the journal, and so the
journal should be aborted. This assumption is false; it could be the
case that simply calling jbd2_cleanup_journal_tail() will recover the
necessary space, or, for small journals, the currently committing
transaction could be responsible for chewing up the required space in
the log, so we need to wait for the currently committing transaction
to finish before trying to force a checkpoint operation.

This patch fixes a bug reported by Mihai Harpau at:
https://bugzilla.redhat.com/show_bug.cgi?id=469582

This patch fixes a bug reported by François Valenduc at:
http://bugzilla.kernel.org/show_bug.cgi?id=11840

Signed-off-by: "Theodore Ts'o"
Cc: Duane Griffin
Cc: Toshiyuki Okajima

Theodore Ts'o
2008-11-07 11:38:07 +0800

05 Nov, 2008

1 commit

1a0d3786d jbd2: Remove a large array of bh's from the stack of the checkpoint routine ... Browse Code »

jbd2_log_do_checkpoint()n is one of the kernel's largest stack users.
Move the array of buffer head's from the stack of jbd2_log_do_checkpoint()
to the in-core journal structure.

Signed-off-by: "Theodore Ts'o"

Theodore Ts'o
2008-11-05 13:09:22 +0800

03 Nov, 2008

1 commit

2423840de jbd2: deregister proc on failure in jbd2_journal_init_inode ... Browse Code »

jbd2_journal_init_inode() does not call jbd2_stats_proc_exit() on all
failure paths after calling jbd2_stats_proc_init(). This leaves
dangling references to the fs in proc.

This patch fixes a bug reported by Sami Leides at:
http://bugzilla.kernel.org/show_bug.cgi?id=11493

Signed-off-by: Sami Liedes
Signed-off-by: "Theodore Ts'o"

Sami Liedes
2008-11-03 08:23:30 +0800

29 Oct, 2008

1 commit

6c20ec850 jbd2: Call the commit callback before the transaction could get dropped ... Browse Code »

The transaction can potentially get dropped if there are no buffers
that need to be written. Make sure we call the commit callback before
potentially deciding to drop the transaction. Also avoid
dereferencing the commit_transaction pointer in the marker for the
same reason.

This patch fixes the bug reported by Eric Paris at:
http://bugzilla.kernel.org/show_bug.cgi?id=11838

Signed-off-by: "Theodore Ts'o"
Acked-by: Eric Sandeen
Tested-by: Eric Paris

Theodore Ts'o
2008-10-29 09:08:20 +0800

21 Oct, 2008

1 commit

6da0b38f4 fs/Kconfig: move ext2, ext3, ext4, JBD, JBD2 out ... Browse Code »

Use fs/*/Kconfig more, which is good because everything related to one
filesystem is in one place and fs/Kconfig is quite fat.

Signed-off-by: Alexey Dobriyan
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2008-10-21 02:43:59 +0800

17 Oct, 2008

1 commit

3e624fc72 ext4: Replace hackish ext4_mb_poll_new_transaction with commit callback ... Browse Code »

The multiblock allocator needs to be able to release blocks (and issue
a blkdev discard request) when the transaction which freed those
blocks is committed. Previously this was done via a polling mechanism
when blocks are allocated or freed. A much better way of doing things
is to create a jbd2 callback function and attaching the list of blocks
to be freed directly to the transaction structure.

Signed-off-by: "Theodore Ts'o"

Theodore Ts'o
2008-10-17 08:00:24 +0800

13 Oct, 2008

1 commit

77e841de8 jbd2: abort when failed to log metadata buffers ... Browse Code »

If we failed to write metadata buffers to the journal space and
succeeded to write the commit record, stale data can be written
back to the filesystem as metadata in the recovery phase.

To avoid this, when we failed to write out metadata buffers,
abort the journal before writing the commit record.

We can also avoid this kind of corruption by using the journal
checksum feature because it can detect invalid metadata blocks in the
journal and avoid them from being replayed. So we don't need to care
about asynchronous commit record writeout with a checksum.

Signed-off-by: Hidehiro Kawai
Signed-off-by: Theodore Ts'o

Hidehiro Kawai
2008-10-13 04:39:16 +0800

11 Oct, 2008

3 commits

5bf5683a3 ext4: add an option to control error handling on file data ... Browse Code »

If the journal doesn't abort when it gets an IO error in file data
blocks, the file data corruption will spread silently. Because
most of applications and commands do buffered writes without fsync(),
they don't notice the IO error. It's scary for mission critical
systems. On the other hand, if the journal aborts whenever it gets
an IO error in file data blocks, the system will easily become
inoperable. So this patch introduces a filesystem option to
determine whether it aborts the journal or just call printk() when
it gets an IO error in file data.

If you mount an ext4 fs with data_err=abort option, it aborts on file
data write error. If you mount it with data_err=ignore, it doesn't
abort, just call printk(). data_err=ignore is the default.

Here is the corresponding patch of the ext3 version:
http://kerneltrap.org/mailarchive/linux-kernel/2008/9/9/3239374

Signed-off-by: Hidehiro Kawai
Signed-off-by: Theodore Ts'o

Hidehiro Kawai
2008-10-11 10:12:43 +0800
7ad7445f6 jbd2: don't dirty original metadata buffer on abort ... Browse Code »

Currently, original metadata buffers are dirtied when they are
unfiled whether the journal has aborted or not. Eventually these
buffers will be written-back to the filesystem by pdflush. This
means some metadata buffers are written to the filesystem without
journaling if the journal aborts. So if both journal abort and
system crash happen at the same time, the filesystem would become
inconsistent state. Additionally, replaying journaled metadata
can overwrite the latest metadata on the filesystem partly.
Because, if the journal gets aborted, journaled metadata are
preserved and replayed during the next mount not to lose
uncheckpointed metadata. This would also break the consistency
of the filesystem.

This patch prevents original metadata buffers from being dirtied
on abort by clearing BH_JBDDirty flag from those buffers. Thus,
no metadata buffers are written to the filesystem without journaling.

Signed-off-by: Hidehiro Kawai
Signed-off-by: Theodore Ts'o

Hidehiro Kawai
2008-10-11 08:29:31 +0800
44519faf2 jbd2: fix error handling for checkpoint io ... Browse Code »

When a checkpointing IO fails, current JBD2 code doesn't check the
error and continue journaling. This means latest metadata can be
lost from both the journal and filesystem.

This patch leaves the failed metadata blocks in the journal space
and aborts journaling in the case of jbd2_log_do_checkpoint().
To achieve this, we need to do:

1. don't remove the failed buffer from the checkpoint list where in
the case of __try_to_free_cp_buf() because it may be released or
overwritten by a later transaction
2. jbd2_log_do_checkpoint() is the last chance, remove the failed
buffer from the checkpoint list and abort the journal
3. when checkpointing fails, don't update the journal super block to
prevent the journaled contents from being cleaned. For safety,
don't update j_tail and j_tail_sequence either
4. when checkpointing fails, notify this error to the ext4 layer so
that ext4 don't clear the needs_recovery flag, otherwise the
journaled contents are ignored and cleaned in the recovery phase
5. if the recovery fails, keep the needs_recovery flag
6. prevent jbd2_cleanup_journal_tail() from being called between
__jbd2_journal_drop_transaction() and jbd2_journal_abort()
(a possible race issue between jbd2_log_do_checkpoint()s called by
jbd2_journal_flush() and __jbd2_log_wait_for_space())

Signed-off-by: Hidehiro Kawai
Signed-off-by: Theodore Ts'o

Hidehiro Kawai
2008-10-11 08:29:13 +0800

09 Oct, 2008

1 commit

23f8b79ea jbd2: abort instead of waiting for nonexistent transaction ... Browse Code »

The __jbd2_log_wait_for_space function sits in a loop checkpointing
transactions until there is sufficient space free in the journal.
However, if there are no transactions to be processed (e.g. because the
free space calculation is wrong due to a corrupted filesystem) it will
never progress.

Check for space being required when no transactions are outstanding and
abort the journal instead of endlessly looping.

This patch fixes the bug reported by Sami Liedes at:
http://bugzilla.kernel.org/show_bug.cgi?id=10976

Signed-off-by: Duane Griffin
Cc: Sami Liedes
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: "Theodore Ts'o"

Duane Griffin
2008-10-09 11:28:31 +0800

07 Oct, 2008

2 commits

45a90bfd9 jbd2: Fix buffer head leak when writing the commit block ... Browse Code »

Also make sure the buffer heads are marked clean before submitting bh
for writing. The previous code was marking the buffer head dirty,
which would have forced an unneeded write (and seek) to the journal
for no good reason.

Signed-off-by: "Theodore Ts'o"

Theodore Ts'o
2008-10-07 00:04:02 +0800
914258bf2 ext4/jbd2: Avoid WARN() messages when failing to write to the superblock ... Browse Code »

This fixes some very common warnings reported by kerneloops.org

Signed-off-by: "Theodore Ts'o"

Theodore Ts'o
2008-10-07 09:35:40 +0800

06 Oct, 2008

1 commit

ede86cc47 ext4: Add debugging markers that can be used by systemtap ... Browse Code »

This debugging markers are designed to debug problems such as the
random filesystem latency problems reported by Arjan.

Signed-off-by: "Theodore Ts'o"

Theodore Ts'o
2008-10-06 08:50:06 +0800

17 Sep, 2008

1 commit

05496769e jbd2: clean up how the journal device name is printed ... Browse Code »

Calculate the journal device name once and stash it away in the
journal_s structure. This avoids needing to call bdevname()
everywhere and reduces stack usage by not needing to allocate an
on-stack buffer. In addition, we eliminate the '/' that can appear in
device names (e.g. "cciss/c0d0p9" --- see kernel bugzilla #11321) that
can cause problems when creating proc directory names, and include the
inode number to support ocfs2 which creates multiple journals with
different inode numbers.

Signed-off-by: "Theodore Ts'o"

Theodore Ts'o
2008-09-17 02:36:17 +0800

12 Aug, 2008

1 commit

23a0ee908 Merge branch 'core/locking' into core/urgent Browse Code »

Ingo Molnar
2008-08-12 06:11:49 +0800

11 Aug, 2008

2 commits

3295f0ef9 lockdep: rename map_[acquire|release]() => lock_map_[acquire|release]() ... Browse Code »

the names were too generic:

drivers/uio/uio.c:87: error: expected identifier or '(' before 'do'
drivers/uio/uio.c:87: error: expected identifier or '(' before 'while'
drivers/uio/uio.c:113: error: 'map_release' undeclared here (not in a function)

Signed-off-by: Ingo Molnar

Ingo Molnar
2008-08-11 16:30:30 +0800
4f3e7524b lockdep: map_acquire ... Browse Code »

Most the free-standing lock_acquire() usages look remarkably similar, sweep
them into a new helper.

Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar

Peter Zijlstra
2008-08-11 15:30:23 +0800

05 Aug, 2008

1 commit

529ae9aaa mm: rename page trylock ... Browse Code »

Converting page lock to new locking bitops requires a change of page flag
operation naming, so we might as well convert it to something nicer
(!TestSetPageLocked_Lock => trylock_page, SetPageLocked => set_page_locked).

This also facilitates lockdeping of page lock.

Signed-off-by: Nick Piggin
Acked-by: KOSAKI Motohiro
Acked-by: Peter Zijlstra
Acked-by: Andrew Morton
Acked-by: Benjamin Herrenschmidt
Signed-off-by: Linus Torvalds

Nick Piggin
2008-08-05 12:31:34 +0800

01 Aug, 2008

1 commit

e9e34f4e8 jbd2: don't abort if flushing file data failed ... Browse Code »

In ordered mode, the current jbd2 aborts the journal if a file data buffer
has an error. But this behavior is unintended, and we found that it has
been adopted accidentally.

This patch undoes it and just calls printk() instead of aborting the
journal. Unlike a similar patch for ext3/jbd, file data buffers are
written via generic_writepages(). But we also need to set AS_EIO
into their mappings because wait_on_page_writeback_range() clears
AS_EIO before a user process sees it.

Signed-off-by: Hidehiro Kawai
Signed-off-by: "Theodore Ts'o"

Hidehiro Kawai
2008-08-01 10:26:04 +0800

27 Jul, 2008

1 commit

00b32b7fb ext4: unexport jbd2_journal_update_superblock ... Browse Code »

Remove the unused EXPORT_SYMBOL(jbd2_journal_update_superblock).

Signed-off-by: Adrian Bunk
Signed-off-by: "Theodore Ts'o"

Theodore Ts'o
2008-07-27 05:33:53 +0800

14 Jul, 2008

1 commit

530576bbf jbd2: fix race between jbd2_journal_try_to_free_buffers() and jbd2 commit transaction ... Browse Code »

journal_try_to_free_buffers() could race with jbd commit transaction
when the later is holding the buffer reference while waiting for the
data buffer to flush to disk. If the caller of
journal_try_to_free_buffers() request tries hard to release the buffers,
it will treat the failure as error and return back to the caller. We
have seen the directo IO failed due to this race. Some of the caller of
releasepage() also expecting the buffer to be dropped when passed with
GFP_KERNEL mask to the releasepage()->journal_try_to_free_buffers().

With this patch, if the caller is passing the GFP_KERNEL to indicating
this call could wait, in case of try_to_free_buffers() failed, let's
waiting for journal_commit_transaction() to finish commit the current
committing transaction , then try to free those buffers again with
journal locked.

Signed-off-by: Mingming Cao
Reviewed-by: Badari Pulavarty
Signed-off-by: "Theodore Ts'o"

Mingming Cao
2008-07-14 09:06:39 +0800

12 Jul, 2008

4 commits

cd1aac329 ext4: Add ordered mode support for delalloc ... Browse Code »

This provides a new ordered mode implementation which gets rid of using
buffer heads to enforce the ordering between metadata change with the
related data chage. Instead, in the new ordering mode, it keeps track
of all of the inodes touched by each transaction on a list, and when
that transaction is committed, it flushes all of the dirty pages for
those inodes. In addition, the new ordered mode reverses the lock
ordering of the page lock and transaction lock, which provides easier
support for delayed allocation.

Signed-off-by: Aneesh Kumar K.V
Signed-off-by: Mingming Cao
Signed-off-by: "Theodore Ts'o"

Aneesh Kumar K.V
2008-07-12 07:27:31 +0800
87c89c232 jbd2: Remove data=ordered mode support using jbd buffer heads ... Browse Code »

Signed-off-by: Jan Kara

Jan Kara
2008-07-12 07:27:31 +0800
c851ed540 jbd2: Implement data=ordered mode handling via inodes ... Browse Code »

This patch adds necessary framework into JBD2 to be able to track inodes
with each transaction and write-out their dirty data during transaction
commit time.

This new ordered mode brings all sorts of advantages such as possibility
to get rid of journal heads and buffer heads for data buffers in ordered
mode, better ordering of writes on transaction commit, simplification of
some JBD code, no more anonymous pages when truncate of data being
committed happens. Also with this new ordered mode, delayed allocation
on ordered mode is much simpler.

Signed-off-by: Jan Kara

Jan Kara
2008-07-12 07:27:31 +0800
736603ab2 jbd2: Add commit time into the commit block ... Browse Code »

Carlo Wood has demonstrated that it's possible to recover deleted
files from the journal. Something that will make this easier is if we
can put the time of the commit into commit block.

Signed-off-by: "Theodore Ts'o"

Theodore Ts'o
2008-07-12 07:27:31 +0800