Eric Lee / smarc-fsl-linux-kernel

25 Jul, 2011

2 commits

a035bff6b ocfs2: Add comment about orphan scanning ... Browse Code »

Add a comment that explains the reason as to why orphan scan scans all the slots.

Signed-off-by: Sunil Mushran

Sunil Mushran
2011-07-25 01:35:54 +0800
619c200de ocfs2: Clean up messages in the fs ... Browse Code »

Convert useful messages from ML_NOTICE to KERN_NOTICE to improve readability.

Signed-off-by: Sunil Mushran

Sunil Mushran
2011-07-25 01:34:54 +0800

14 May, 2011

1 commit

10b3dd761 ocfs2: Skip mount recovery for hard-ro mounts ... Browse Code »

Patch skips mount recovery for hard-ro mounts which otherwise leads to an oops.

Signed-off-by: Sunil Mushran
Acked-by: Mark Fasheh
Signed-off-by: Joel Becker

Sunil Mushran
2011-05-14 02:27:14 +0800

31 Mar, 2011

1 commit

25985edce Fix common misspellings ... Browse Code »

Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi

Lucas De Marchi
2011-03-31 22:26:23 +0800

07 Mar, 2011

1 commit

c1e8d35ef ocfs2: Remove EXIT from masklog. ... Browse Code »
43

mlog_exit is used to record the exit status of a function.
But because it is added in so many functions, if we enable it,
the system logs get filled up quickly and cause too much I/O.
So actually no one can open it for a production system or even
for a test.

This patch just try to remove it or change it. So:
1. if all the error paths already use mlog_errno, it is just removed.
Otherwise, it will be replaced by mlog_errno.
2. if it is used to print some return value, it is replaced with
mlog(0,...).
mlog_exit_ptr is changed to mlog(0.
All those mlog(0,...) will be replaced with trace events later.

Signed-off-by: Tao Ma

Tao Ma
2011-03-07 16:43:21 +0800

24 Feb, 2011

1 commit

b41079504 ocfs2: Remove masklog ML_JOURNAL. ... Browse Code »

Remove mlog(0) from fs/ocfs2/journal.c and the masklog JOURNAL.

Signed-off-by: Tao Ma

Tao Ma
2011-02-24 14:15:35 +0800

21 Feb, 2011

1 commit

ef6b689b6 ocfs2: Remove ENTRY from masklog. ... Browse Code »

ENTRY is used to record the entry of a function.
But because it is added in so many functions, if we enable it,
the system logs get filled up quickly and cause too much I/O.
So actually no one can open it for a production system or even
for a test.

So for mlog_entry_void, we just remove it.
for mlog_entry(...), we replace it with mlog(0,...), and they
will be replace by trace event later.

Signed-off-by: Tao Ma

Tao Ma
2011-02-21 11:10:44 +0800

10 Sep, 2010

3 commits

17ae52115 ocfs2: Remove obsolete comments before ocfs2_start_trans. ... Browse Code »

Signed-off-by: Tao Ma
Signed-off-by: Joel Becker

Tao Ma
2010-09-10 23:40:18 +0800
f9c57ada3 ocfs2: Remove unused old_id in ocfs2_commit_cache. ... Browse Code »

Signed-off-by: Tao Ma
Signed-off-by: Joel Becker

Tao Ma
2010-09-10 23:40:08 +0800
3c3f20c98 ocfs2: Add some trace log for orphan scan. ... Browse Code »

Now orphan scan worker has no trace log, so it is
very hard to tell whether it is finished or blocked.
So add 2 mlog trace log so that we can tell whether
the current orphan scan worker is blocked or not.
It does help when I analyzed a orphan scan bug.

Signed-off-by: Tao Ma
Signed-off-by: Joel Becker

Tao Ma
2010-09-10 23:35:51 +0800

08 Aug, 2010

1 commit

09dc942c2 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 ... Browse Code »

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (40 commits)
ext4: Adding error check after calling ext4_mb_regular_allocator()
ext4: Fix dirtying of journalled buffers in data=journal mode
ext4: re-inline ext4_rec_len_(to|from)_disk functions
jbd2: Remove t_handle_lock from start_this_handle()
jbd2: Change j_state_lock to be a rwlock_t
jbd2: Use atomic variables to avoid taking t_handle_lock in jbd2_journal_stop
ext4: Add mount options in superblock
ext4: force block allocation on quota_off
ext4: fix freeze deadlock under IO
ext4: drop inode from orphan list if ext4_delete_inode() fails
ext4: check to make make sure bd_dev is set before dereferencing it
jbd2: Make barrier messages less scary
ext4: don't print scary messages for allocation failures post-abort
ext4: fix EFBIG edge case when writing to large non-extent file
ext4: fix ext4_get_blocks references
ext4: Always journal quota file modifications
ext4: Fix potential memory leak in ext4_fill_super
ext4: Don't error out the fs if the user tries to make a file too big
ext4: allocate stripe-multiple IOs on stripe boundaries
ext4: move aio completion after unwritten extent conversion
...

Fix up conflicts in fs/ext4/inode.c as per Ted.

Fix up xfs conflicts as per earlier xfs merge.

Linus Torvalds
2010-08-08 04:03:53 +0800

04 Aug, 2010

1 commit

a931da6ac jbd2: Change j_state_lock to be a rwlock_t ... Browse Code »

Lockstat reports have shown that j_state_lock is a major source of
lock contention, especially on systems with more than 4 CPU cores. So
change it to be a read/write spinlock.

Signed-off-by: "Theodore Ts'o"

Theodore Ts'o
2010-08-04 09:35:12 +0800

16 Jul, 2010

1 commit

13ceef099 jbd2/ocfs2: Fix block checksumming when a buffer is used in several transactions ... Browse Code »

OCFS2 uses t_commit trigger to compute and store checksum of the just
committed blocks. When a buffer has b_frozen_data, checksum is computed
for it instead of b_data but this can result in an old checksum being
written to the filesystem in the following scenario:

1) transaction1 is opened
2) handle1 is opened
3) journal_access(handle1, bh)
- This sets jh->b_transaction to transaction1
4) modify(bh)
5) journal_dirty(handle1, bh)
6) handle1 is closed
7) start committing transaction1, opening transaction2
8) handle2 is opened
9) journal_access(handle2, bh)
- This copies off b_frozen_data to make it safe for transaction1 to commit.
jh->b_next_transaction is set to transaction2.
10) jbd2_journal_write_metadata() checksums b_frozen_data
11) the journal correctly writes b_frozen_data to the disk journal
12) handle2 is closed
- There was no dirty call for the bh on handle2, so it is never queued for
any more journal operation
13) Checkpointing finally happens, and it just spools the bh via normal buffer
writeback. This will write b_data, which was never triggered on and thus
contains a wrong (old) checksum.

This patch fixes the problem by calling the trigger at the moment data is
frozen for journal commit - i.e., either when b_frozen_data is created by
do_get_write_access or just before we write a buffer to the log if
b_frozen_data does not exist. We also rename the trigger to t_frozen as
that better describes when it is called.

Signed-off-by: Jan Kara
Signed-off-by: Mark Fasheh
Signed-off-by: Joel Becker

Jan Kara
2010-07-16 06:17:47 +0800

16 Jun, 2010

1 commit

40f165f41 ocfs2: Move orphan scan work to ocfs2_wq. ... Browse Code »

We used to let orphan scan work in the default work queue,
but there is a corner case which will make the system deadlock.
The scenario is like this:
1. set heartbeat threadshold to 200. this will allow us to have a
great chance to have a orphan scan work before our quorum decision.
2. mount node 1.
3. after 1~2 minutes, mount node 2(in order to make the bug easier
to reproduce, better add maxcpus=1 to kernel command line).
4. node 1 do orphan scan work.
5. node 2 do orphan scan work.
6. node 1 do orphan scan work. After this, node 1 hold the orphan scan
lock while node 2 know node 1 is the master.
7. ifdown eth2 in node 2(eth2 is what we do ocfs2 interconnection).

Now when node 2 begins orphan scan, the system queue is blocked.

The root cause is that both orphan scan work and quorum decision work
will use the system event work queue. orphan scan has a chance of
blocking the event work queue(in dlm_wait_for_node_death) so that there
is no chance for quorum decision work to proceed.

This patch resolve it by moving orphan scan work to ocfs2_wq.

Signed-off-by: Tao Ma
Signed-off-by: Joel Becker

Tao Ma
2010-06-16 06:43:48 +0800

06 May, 2010

2 commits

c901fb007 ocfs2: Make ocfs2_extend_trans() really extend. ... Browse Code »

In ocfs2, we use ocfs2_extend_trans() to extend a journal handle's
blocks. But if jbd2_journal_extend() fails, it will only restart
with the the new number of blocks. This tends to be awkward since
in most cases we want additional reserved blocks. It makes our code
harder to mantain since the caller can't be sure all the original
blocks will not be accessed and dirtied again. There are 15 callers
of ocfs2_extend_trans() in fs/ocfs2, and 12 of them have to add
h_buffer_credits before they call ocfs2_extend_trans(). This makes
ocfs2_extend_trans() really extend atop the original block count.

Signed-off-by: Tao Ma
Signed-off-by: Joel Becker

Tao Ma
2010-05-06 09:18:09 +0800
ec20cec7a ocfs2: Make ocfs2_journal_dirty() void. ... Browse Code »

jbd[2]_journal_dirty_metadata() only returns 0. It's been returning 0
since before the kernel moved to git. There is no point in checking
this error.

ocfs2_journal_dirty() has been faithfully returning the status since the
beginning. All over ocfs2, we have blocks of code checking this can't
fail status. In the past few years, we've tried to avoid adding these
checks, because they are pointless. But anyone who looks at our code
assumes they are needed.

Finally, ocfs2_journal_dirty() is made a void function. All error
checking is removed from other files. We'll BUG_ON() the status of
jbd2_journal_dirty_metadata() just in case they change it someday. They
won't.

Signed-off-by: Joel Becker

Joel Becker
2010-05-06 09:17:29 +0800

26 Jan, 2010

1 commit

2bd632165 ocfs2/trivial: Remove trailing whitespaces ... Browse Code »

Patch removes trailing whitespaces.

Signed-off-by: Sunil Mushran
Signed-off-by: Joel Becker

Sunil Mushran
2010-01-26 11:20:51 +0800

04 Dec, 2009

1 commit

af901ca18 tree-wide: fix assorted typos all over the place ... Browse Code »

That is "success", "unknown", "through", "performance", "[re|un]mapping"
, "access", "default", "reasonable", "[con]currently", "temperature"
, "channel", "[un]used", "application", "example","hierarchy", "therefore"
, "[over|under]flow", "contiguous", "threshold", "enough" and others.

Signed-off-by: André Goddard Rosa
Signed-off-by: Jiri Kosina

André Goddard Rosa
2009-12-04 22:39:55 +0800

23 Sep, 2009

1 commit

93c97087a ocfs2: Add metaecc for ocfs2_refcount_block. ... Browse Code »

Add metaecc and journal trigger for ocfs2_refcount_block.

Signed-off-by: Tao Ma

Tao Ma
2009-09-23 11:09:26 +0800

05 Sep, 2009

2 commits

0cf2f7632 ocfs2: Pass struct ocfs2_caching_info to the journal functions. ... Browse Code »

The next step in divorcing metadata I/O management from struct inode is
to pass struct ocfs2_caching_info to the journal functions. Thus the
journal locks a metadata cache with the cache io_lock function. It also
can compare ci_last_trans and ci_created_trans directly.

This is a large patch because of all the places we change
ocfs2_journal_access..(handle, inode, ...) to
ocfs2_journal_access..(handle, INODE_CACHE(inode), ...).

Signed-off-by: Joel Becker

Joel Becker
2009-09-05 07:07:50 +0800
8cb471e8f ocfs2: Take the inode out of the metadata read/write paths. ... Browse Code »

We are really passing the inode into the ocfs2_read/write_blocks()
functions to get at the metadata cache. This commit passes the cache
directly into the metadata block functions, divorcing them from the
inode.

Signed-off-by: Joel Becker

Joel Becker
2009-09-05 07:07:48 +0800

09 Jul, 2009

1 commit

8b712cd58 ocfs2: Fixup orphan scan cleanup after failed mount ... Browse Code »

If the mount fails for any reason, ocfs2_dismount_volume calls
ocfs2_orphan_scan_stop. It requires that ocfs2_orphan_scan_init
be called to setup the mutex and work queues, but that doesn't
happen if the mount has failed and we oops accessing an uninitialized
work queue.

This patch splits the init and startup of the orphan scan, eliminating
the oops.

Signed-off-by: Jeff Mahoney
Signed-off-by: Joel Becker

Jeff Mahoney
2009-07-09 06:34:02 +0800

23 Jun, 2009

3 commits

df152c241 ocfs2: Disable orphan scanning for local and hard-ro mounts ... Browse Code »

Local and Hard-RO mounts do not need orphan scanning.

Signed-off-by: Sunil Mushran
Signed-off-by: Joel Becker

Sunil Mushran
2009-06-23 05:24:55 +0800
3211949f8 ocfs2: Do not initialize lvb in ocfs2_orphan_scan_lock_res_init() ... Browse Code »

We don't access the LVB in our ocfs2_*_lock_res_init() functions.

Since the LVB can become invalid during some cluster recovery
operations, the dlmglue must be able to handle an uninitialized
LVB.

For the orphan scan lock, we initialized an uninitialzed LVB with our
scan sequence number plus one. This starts a normal orphan scan
cycle.

Signed-off-by: Sunil Mushran
Signed-off-by: Joel Becker

Sunil Mushran
2009-06-23 05:24:53 +0800
692684e19 ocfs2: Stop orphan scan as early as possible during umount ... Browse Code »

Currently if the orphan scan fires a tick before the user issues the umount,
the umount will wait for the queued orphan scan tasks to complete.

This patch makes the umount stop the orphan scan as early as possible so as
to reduce the probability of the queued tasks slowing down the umount.

Signed-off-by: Sunil Mushran
Signed-off-by: Joel Becker

Sunil Mushran
2009-06-23 05:24:51 +0800

04 Jun, 2009

2 commits

15633a220 ocfs2 patch to track delayed orphan scan timer statistics ... Browse Code »

Patch to track delayed orphan scan timer statistics.

Modifies ocfs2_osb_dump to print the following:
Orphan Scan=> Local: 10 Global: 21 Last Scan: 67 seconds ago

Signed-off-by: Srinivas Eeda
Signed-off-by: Joel Becker

Srinivas Eeda
2009-06-04 10:14:31 +0800
83273932f ocfs2: timer to queue scan of all orphan slots ... Browse Code »

When a dentry is unlinked, the unlinking node takes an EX on the dentry lock
before moving the dentry to the orphan directory. Other nodes that have
this dentry in cache have a PR on the same dentry lock. When the EX is
requested, the other nodes flag the corresponding inode as MAYBE_ORPHANED
during downconvert. The inode is finally deleted when the last node to iput
the inode sees that i_nlink==0 and the MAYBE_ORPHANED flag is set.

A problem arises if a node is forced to free dentry locks because of memory
pressure. If this happens, the node will no longer get downconvert
notifications for the dentries that have been unlinked on another node.
If it also happens that node is actively using the corresponding inode and
happens to be the one performing the last iput on that inode, it will fail
to delete the inode as it will not have the MAYBE_ORPHANED flag set.

This patch fixes this shortcoming by introducing a periodic scan of the
orphan directories to delete such inodes. Care has been taken to distribute
the workload across the cluster so that no one node has to perform the task
all the time.

Signed-off-by: Srinivas Eeda
Signed-off-by: Joel Becker

Srinivas Eeda
2009-06-04 10:14:31 +0800

04 Apr, 2009

3 commits

9140db04e ocfs2: recover orphans in offline slots during recovery and mount ... Browse Code »

During recovery, a node recovers orphans in it's slot and the dead node(s). But
if the dead nodes were holding orphans in offline slots, they will be left
unrecovered.

If the dead node is the last one to die and is holding orphans in other slots
and is the first one to mount, then it only recovers it's own slot, which
leaves orphans in offline slots.

This patch queues complete_recovery to clean orphans for all offline slots
during mount and node recovery.

Signed-off-by: Srinivas Eeda
Acked-by: Joel Becker
Signed-off-by: Mark Fasheh

Srinivas Eeda
2009-04-04 02:39:26 +0800
9b7895efa ocfs2: Add a name indexed b-tree to directory inodes ... Browse Code »

This patch makes use of Ocfs2's flexible btree code to add an additional
tree to directory inodes. The new tree stores an array of small,
fixed-length records in each leaf block. Each record stores a hash value,
and pointer to a block in the traditional (unindexed) directory tree where a
dirent with the given name hash resides. Lookup exclusively uses this tree
to find dirents, thus providing us with constant time name lookups.

Some of the hashing code was copied from ext3. Unfortunately, it has lots of
unfixed checkpatch errors. I left that as-is so that tracking changes would
be easier.

Signed-off-by: Mark Fasheh
Acked-by: Joel Becker

Mark Fasheh
2009-04-04 02:39:15 +0800
96a6c64b5 ocfs2: Move struct recovery_map to a header file ... Browse Code »

Move the definition of struct recovery_map from journal.c to journal.h. This
is preparation for the next patch.

Signed-off-by: Sunil Mushran
Signed-off-by: Mark Fasheh

Sunil Mushran
2009-04-04 02:39:14 +0800

06 Jan, 2009

9 commits

c175a518b ocfs2: Checksum and ECC for directory blocks. ... Browse Code »

Use the db_check field of ocfs2_dir_block_trailer to crc/ecc the
dirblocks.

Signed-off-by: Joel Becker
Signed-off-by: Mark Fasheh

Joel Becker
2009-01-06 00:40:34 +0800
13723d00e ocfs2: Use metadata-specific ocfs2_journal_access_*() functions. ... Browse Code »

The per-metadata-type ocfs2_journal_access_*() functions hook up jbd2
commit triggers and allow us to compute metadata ecc right before the
buffers are written out. This commit provides ecc for inodes, extent
blocks, group descriptors, and quota blocks. It is not safe to use
extened attributes and metaecc at the same time yet.

The ocfs2_extent_tree and ocfs2_path abstractions in alloc.c both hide
the type of block at their root. Before, it didn't matter, but now the
root block must use the appropriate ocfs2_journal_access_*() function.
To keep this abstract, the structures now have a pointer to the matching
journal_access function and a wrapper call to call it.

A few places use naked ocfs2_write_block() calls instead of adding the
blocks to the journal. We make sure to calculate their checksum and ecc
before the write.

Since we pass around the journal_access functions. Let's typedef them
in ocfs2.h.

Signed-off-by: Joel Becker
Signed-off-by: Mark Fasheh

Joel Becker
2009-01-06 00:40:32 +0800
50655ae9e ocfs2: Add journal_access functions with jbd2 triggers. ... Browse Code »

We create wrappers for ocfs2_journal_access() that are specific to the
type of metadata block. This allows us to associate jbd2 commit
triggers with the block. The triggers will compute metadata ecc in a
future commit.

Signed-off-by: Joel Becker
Signed-off-by: Mark Fasheh

Joel Becker
2009-01-06 00:40:31 +0800
19ece546a ocfs2: Enable quota accounting on mount, disable on umount ... Browse Code »

Enable quota usage tracking on mount and disable it on umount. Also
add support for quota on and quota off quotactls and usrquota and
grpquota mount options. Add quota features among supported ones.

Signed-off-by: Jan Kara
Signed-off-by: Mark Fasheh

Jan Kara
2009-01-06 00:40:24 +0800
2205363dc ocfs2: Implement quota recovery ... Browse Code »

Implement functions for recovery after a crash. Functions just
read local quota file and sync info to global quota file.

Signed-off-by: Jan Kara
Signed-off-by: Mark Fasheh

Jan Kara
2009-01-06 00:40:24 +0800
90e86a63e ocfs2: Support nested transactions ... Browse Code »

OCFS2 can easily support nested transactions. We just have to
take care and not spoil statistics acquire semaphore unnecessarily.

Signed-off-by: Jan Kara
Signed-off-by: Mark Fasheh

Jan Kara
2009-01-06 00:40:23 +0800
53ef99cad ocfs2: Remove JBD compatibility layer ... Browse Code »

JBD2 is fully backwards compatible with JBD and it's been tested enough with
Ocfs2 that we can clean this code up now.

Signed-off-by: Mark Fasheh

Mark Fasheh
2009-01-06 00:36:55 +0800
10995aa24 ocfs2: Morph the haphazard OCFS2_IS_VALID_DINODE() checks. ... Browse Code »

Random places in the code would check a dinode bh to see if it was
valid. Not only did they do different levels of validation, they
handled errors in different ways.

The previous commit unified inode block reads, validating all block
reads in the same place. Thus, these haphazard checks are no longer
necessary. Rather than eliminate them, however, we change them to
BUG_ON() checks. This ensures the assumptions remain true. All of the
code paths to these checks have been audited to ensure they come from a
validated inode read.

Signed-off-by: Joel Becker
Signed-off-by: Mark Fasheh

Joel Becker
2009-01-06 00:36:52 +0800
b657c95c1 ocfs2: Wrap inode block reads in a dedicated function. ... Browse Code »

The ocfs2 code currently reads inodes off disk with a simple
ocfs2_read_block() call. Each place that does this has a different set
of sanity checks it performs. Some check only the signature. A couple
validate the block number (the block read vs di->i_blkno). A couple
others check for VALID_FL. Only one place validates i_fs_generation. A
couple check nothing. Even when an error is found, they don't all do
the same thing.

We wrap inode reading into ocfs2_read_inode_block(). This will validate
all the above fields, going readonly if they are invalid (they never
should be). ocfs2_read_inode_block_full() is provided for the places
that want to pass read_block flags. Every caller is passing a struct
inode with a valid ip_blkno, so we don't need a separate blkno argument
either.

We will remove the validation checks from the rest of the code in a
later commit, as they are no longer necessary.

Signed-off-by: Joel Becker
Signed-off-by: Mark Fasheh

Joel Becker
2009-01-06 00:36:52 +0800

11 Nov, 2008

1 commit

ae0dff683 ocfs2: Set journal descriptor to NULL after journal shutdown ... Browse Code »

Patch sets journal descriptor to NULL after the journal is shutdown.
This ensures that jbd2_journal_release_jbd_inode(), which removes the
jbd2 inode from txn lists, can be called safely from ocfs2_clear_inode()
even after the journal has been shutdown.

Signed-off-by: Sunil Mushran
Signed-off-by: Joel Becker
Signed-off-by: Mark Fasheh

Sunil Mushran
2008-11-11 01:51:47 +0800