Eric Lee / smarc-fsl-linux-kernel

19 May, 2010

2 commits

41841b0bc Merge branch 'discontig-bg' of git://oss.oracle.com/git/tma/linux-2.6 into ocfs2-merge-window Browse Code »

Joel Becker
2010-05-19 07:40:42 +0800
78f94673d Ocfs2: Optimize ocfs2 truncate to use ocfs2_remove_btree_range() instead. ... Browse Code »

Truncate is just a special case of punching holes(from new i_size to
end), we therefore could take advantage of the existing
ocfs2_remove_btree_range() to reduce the comlexity and redundancy in
alloc.c. The goal here is to make truncate more generic and
straightforward.

Several functions only used by ocfs2_commit_truncate() will smiply be
removed.

ocfs2_remove_btree_range() was originally used by the hole punching
code, which didn't take refcount trees into account (definitely a bug).
We therefore need to change that func a bit to handle refcount trees.
It must take the refcount lock, calculate and reserve blocks for
refcount tree changes, and decrease refcounts at the end. We replace
ocfs2_lock_allocators() here by adding a new func
ocfs2_reserve_blocks_for_rec_trunc() which accepts some extra blocks to
reserve. This will not hurt any other code using
ocfs2_remove_btree_range() (such as dir truncate and hole punching).

I merged the following steps into one patch since they may be
logically doing one thing, though I know it looks a little bit fat
to review.

1). Remove redundant code used by ocfs2_commit_truncate(), since we're
moving to ocfs2_remove_btree_range anyway.

2). Add a new func ocfs2_reserve_blocks_for_rec_trunc() for purpose of
accepting some extra blocks to reserve.

3). Change ocfs2_prepare_refcount_change_for_del() a bit to fit our
needs. It's safe to do this since it's only being called by
truncate.

4). Change ocfs2_remove_btree_range() a bit to take refcount case into
account.

5). Finally, we change ocfs2_commit_truncate() to call
ocfs2_remove_btree_range() in a proper way.

The patch has been tested normally for sanity check, stress tests
with heavier workload will be expected.

Based on this patch, fixing the punching holes bug will be fairly easy.

Signed-off-by: Tristan Ye
Acked-by: Mark Fasheh
Signed-off-by: Joel Becker

Tristan Ye
2010-05-19 03:25:10 +0800

06 May, 2010

4 commits

83f92318f ocfs2: Add dir_resv_level mount option ... Browse Code »

The default behavior for directory reservations stays the same, but we add a
mount option so people can tweak the size of directory reservations
according to their workloads.

Signed-off-by: Mark Fasheh
Signed-off-by: Joel Becker

Mark Fasheh
2010-05-06 09:18:07 +0800
e3b4a97db ocfs2: use allocation reservations for directory data ... Browse Code »

Use the reservations system for unindexed dir tree allocations. We don't
bother with the indexed tree as reads from it are mostly random anyway.
Directory reservations are marked seperately, to allow the reservations code
a chance to optimize their window sizes. This patch allocates only 8 bits
for directory windows as they generally are not expected to grow as quickly
as file data. Future improvements to dir window sizing can trivially be
made.

Signed-off-by: Mark Fasheh

Mark Fasheh
2010-05-06 09:17:30 +0800
ec20cec7a ocfs2: Make ocfs2_journal_dirty() void. ... Browse Code »

jbd[2]_journal_dirty_metadata() only returns 0. It's been returning 0
since before the kernel moved to git. There is no point in checking
this error.

ocfs2_journal_dirty() has been faithfully returning the status since the
beginning. All over ocfs2, we have blocks of code checking this can't
fail status. In the past few years, we've tried to avoid adding these
checks, because they are pointless. But anyone who looks at our code
assumes they are needed.

Finally, ocfs2_journal_dirty() is made a void function. All error
checking is removed from other files. We'll BUG_ON() the status of
jbd2_journal_dirty_metadata() just in case they change it someday. They
won't.

Signed-off-by: Joel Becker

Joel Becker
2010-05-06 09:17:29 +0800
1ed9b777f ocfs2: ocfs2_claim_*() don't need an ocfs2_super argument. ... Browse Code »

They all take an ocfs2_alloc_context, which has the allocation inode.

Signed-off-by: Joel Becker
Signed-off-by: Tao Ma

Joel Becker
2010-05-06 13:59:06 +0800

26 Mar, 2010

1 commit

2b6cb576a ocfs2: Set suballoc_loc on allocated metadata. ... Browse Code »

Get the suballoc_loc from ocfs2_claim_new_inode() or
ocfs2_claim_metadata(). Store it on the appropriate field of the block
we just allocated.

Signed-off-by: Joel Becker

Joel Becker
2010-03-26 10:09:15 +0800

22 Mar, 2010

1 commit

74380c479 ocfs2: Free block to the right block group. ... Browse Code »

In case the block we are going to free is allocated from
a discontiguous block group, we have to use suballoc_loc
to be the right group.

Signed-off-by: Tao Ma

Tao Ma
2010-03-22 14:20:18 +0800

06 Mar, 2010

1 commit

e213e26ab Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 ... Browse Code »

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: (33 commits)
quota: stop using QUOTA_OK / NO_QUOTA
dquot: cleanup dquot initialize routine
dquot: move dquot initialization responsibility into the filesystem
dquot: cleanup dquot drop routine
dquot: move dquot drop responsibility into the filesystem
dquot: cleanup dquot transfer routine
dquot: move dquot transfer responsibility into the filesystem
dquot: cleanup inode allocation / freeing routines
dquot: cleanup space allocation / freeing routines
ext3: add writepage sanity checks
ext3: Truncate allocated blocks if direct IO write fails to update i_size
quota: Properly invalidate caches even for filesystems with blocksize < pagesize
quota: generalize quota transfer interface
quota: sb_quota state flags cleanup
jbd: Delay discarding buffers in journal_unmap_buffer
ext3: quota_write cross block boundary behaviour
quota: drop permission checks from xfs_fs_set_xstate/xfs_fs_set_xquota
quota: split out compat_sys_quotactl support from quota.c
quota: split out netlink notification support from quota.c
quota: remove invalid optimization from quota_sync_all
...

Fixed trivial conflicts in fs/namei.c and fs/ufs/inode.c

Linus Torvalds
2010-03-06 05:20:53 +0800

05 Mar, 2010

1 commit

5dd4056db dquot: cleanup space allocation / freeing routines ... Browse Code »

Get rid of the alloc_space, free_space, reserve_space, claim_space and
release_rsv dquot operations - they are always called from the filesystem
and if a filesystem really needs their own (which none currently does)
it can just call into it's own routine directly.

Move shared logic into the common __dquot_alloc_space,
dquot_claim_space_nodirty and __dquot_free_space low-level methods,
and rationalize the wrappers around it to move as much as possible
code into the common block for CONFIG_QUOTA vs not. Also rename
all these helpers to be named dquot_* instead of vfs_dq_*.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jan Kara

Christoph Hellwig
2010-03-05 07:20:28 +0800

27 Feb, 2010

1 commit

b89c54282 ocfs2: add extent block stealing for ocfs2 v5 ... Browse Code »

This patch add extent block (metadata) stealing mechanism for
extent allocation. This mechanism is same as the inode stealing.
if no room in slot specific extent_alloc, we will try to
allocate extent block from the next slot.

Signed-off-by: Tiger Yang
Acked-by: Tao Ma
Signed-off-by: Joel Becker

Tiger Yang
2010-02-27 07:41:07 +0800

05 Sep, 2009

6 commits

5e404e9ed ocfs2: Pass ocfs2_caching_info into ocfs_init_*_extent_tree(). ... Browse Code »

With this commit, extent tree operations are divorced from inodes and
rely on ocfs2_caching_info. Phew!

Signed-off-by: Joel Becker

Joel Becker
2009-09-05 07:08:13 +0800
cc79d8c19 ocfs2: ocfs2_insert_extent() no longer needs struct inode. ... Browse Code »

One more function down, no inode in the entire insert-extent chain.

Signed-off-by: Joel Becker

Joel Becker
2009-09-05 07:08:09 +0800
facdb77f5 ocfs2: ocfs2_find_path() only needs the caching info ... Browse Code »

ocfs2_find_path and ocfs2_find_leaf() walk our btrees, reading extent
blocks. They need struct ocfs2_caching_info for that, but not struct
inode.

Signed-off-by: Joel Becker

Joel Becker
2009-09-05 07:07:53 +0800
3d03a305d ocfs2: Pass ocfs2_caching_info to ocfs2_read_extent_block(). ... Browse Code »

extent blocks belong to btrees on more than just inodes, so we want to
pass the ocfs2_caching_info structure directly to
ocfs2_read_extent_block(). A number of places in alloc.c can now drop
struct inode from their argument list.

Signed-off-by: Joel Becker

Joel Becker
2009-09-05 07:07:52 +0800
0cf2f7632 ocfs2: Pass struct ocfs2_caching_info to the journal functions. ... Browse Code »

The next step in divorcing metadata I/O management from struct inode is
to pass struct ocfs2_caching_info to the journal functions. Thus the
journal locks a metadata cache with the cache io_lock function. It also
can compare ci_last_trans and ci_created_trans directly.

This is a large patch because of all the places we change
ocfs2_journal_access..(handle, inode, ...) to
ocfs2_journal_access..(handle, INODE_CACHE(inode), ...).

Signed-off-by: Joel Becker

Joel Becker
2009-09-05 07:07:50 +0800
8cb471e8f ocfs2: Take the inode out of the metadata read/write paths. ... Browse Code »

We are really passing the inode into the ocfs2_read/write_blocks()
functions to get at the metadata cache. This commit passes the cache
directly into the metadata block functions, divorcing them from the
inode.

Signed-off-by: Joel Becker

Joel Becker
2009-09-05 07:07:48 +0800

04 Jun, 2009

1 commit

edd45c084 ocfs2: Correct ordering of ip_alloc_sem and localloc locks for directories ... Browse Code »

We use ordering ip_alloc_sem -> local alloc locks in ocfs2_write_begin().
So change lock ordering in ocfs2_extend_dir() and ocfs2_expand_inline_dir()
to also use this lock ordering.

Signed-off-by: Jan Kara
Acked-by: Mark Fasheh
Signed-off-by: Joel Becker

Jan Kara
2009-06-04 10:14:30 +0800

22 Apr, 2009

1 commit

0fba81374 ocfs2: Fix 2 warning during ocfs2 make. ... Browse Code »

fs/ocfs2/dir.c: In function ‘ocfs2_extend_dir’:
fs/ocfs2/dir.c:2700: warning: ‘ret’ may be used uninitialized in this function

fs/ocfs2/suballoc.c: In function ‘ocfs2_get_suballoc_slot_bit’:
fs/ocfs2/suballoc.c:2216: warning: comparison is always true due to limited range of data type

Signed-off-by: Tao Ma
Signed-off-by: Joel Becker

Tao Ma
2009-04-22 07:23:39 +0800

08 Apr, 2009

1 commit

035a57112 ocfs2: Reserve 1 more cluster in expanding_inline_dir for indexed dir. ... Browse Code »

In ocfs2_expand_inline_dir, we calculate whether we need 1 extra
cluster if we can't store the dx inline the root and save it in
dx_alloc. So add it when we call ocfs2_reserve_clusters.

Signed-off-by: Tao Ma
Signed-off-by: Mark Fasheh

Tao Ma
2009-04-08 00:40:17 +0800

04 Apr, 2009

6 commits

1d46dc08d ocfs2: fix leaf start calculation in ocfs2_dx_dir_rebalance() ... Browse Code »

ocfs2_dx_dir_rebalance() is passed the block offset of a dx leaf which needs
rebalancing. Since we rebalance an entire cluster at a time however, this
function needs to calculate the beginning of that cluster, in blocks. The
calculation was wrong, which would result in a read of non-leaf blocks. Fix
the calculation by adding ocfs2_block_to_cluster_start() which is a more
straight-forward way of determining this.

Reported-by: Tristan Ye
Signed-off-by: Mark Fasheh

Mark Fasheh
2009-04-04 02:39:17 +0800
e3a93c2db ocfs2: Add total entry count to dx_root_block ... Browse Code »

This little bit of extra accounting speeds up ocfs2_empty_dir()
dramatically by allowing us to short-circuit the full directory scan.

Signed-off-by: Mark Fasheh

Mark Fasheh
2009-04-04 02:39:16 +0800
e7c17e430 ocfs2: Introduce dir free space list ... Browse Code »

The only operation which doesn't get faster with directory indexing is
insert, which still has to walk the entire unindexed directory portion to
find a free block. This patch provides an improvement in directory insert
performance by maintaining a singly linked list of directory leaf blocks
which have space for additional dirents.

Signed-off-by: Mark Fasheh
Acked-by: Joel Becker

Mark Fasheh
2009-04-04 02:39:16 +0800
4ed8a6bb0 ocfs2: Store dir index records inline ... Browse Code »

Allow us to store a small number of directory index records in the
ocfs2_dx_root_block. This saves us a disk read on small to medium sized
directories (less than about 250 entries). The inline root is automatically
turned into a root block with extents if the directory size increases beyond
it's capacity.

Signed-off-by: Mark Fasheh
Acked-by: Joel Becker

Mark Fasheh
2009-04-04 02:39:16 +0800
9b7895efa ocfs2: Add a name indexed b-tree to directory inodes ... Browse Code »

This patch makes use of Ocfs2's flexible btree code to add an additional
tree to directory inodes. The new tree stores an array of small,
fixed-length records in each leaf block. Each record stores a hash value,
and pointer to a block in the traditional (unindexed) directory tree where a
dirent with the given name hash resides. Lookup exclusively uses this tree
to find dirents, thus providing us with constant time name lookups.

Some of the hashing code was copied from ext3. Unfortunately, it has lots of
unfixed checkpatch errors. I left that as-is so that tracking changes would
be easier.

Signed-off-by: Mark Fasheh
Acked-by: Joel Becker

Mark Fasheh
2009-04-04 02:39:15 +0800
4a12ca3a0 ocfs2: Introduce dir lookup helper struct ... Browse Code »

Many directory manipulation calls pass around a tuple of dirent, and it's
containing buffer_head. Dir indexing has a bit more state, but instead of
adding yet more arguments to functions, we introduce 'struct
ocfs2_dir_lookup_result'. In this patch, it simply holds the same tuple, but
future patches will add more state.

Signed-off-by: Mark Fasheh
Acked-by: Joel Becker

Mark Fasheh
2009-04-04 02:39:15 +0800

06 Jan, 2009

8 commits

c175a518b ocfs2: Checksum and ECC for directory blocks. ... Browse Code »

Use the db_check field of ocfs2_dir_block_trailer to crc/ecc the
dirblocks.

Signed-off-by: Joel Becker
Signed-off-by: Mark Fasheh

Joel Becker
2009-01-06 00:40:34 +0800
87d35a74b ocfs2: Add directory block trailers. ... Browse Code »

Future ocfs2 features metaecc and indexed directories need to store a
little bit of data in each dirblock. For compatibility, we place this
in a trailer at the end of the dirblock. The trailer plays itself as an
empty dirent, so that if the features are turned off, it can be reused
without requiring a tunefs scan.

This code adds the trailer and validates it when the block is read in.

[ Mark is the original author, but I reinserted this code before his
dir index work. -- Joel ]

Signed-off-by: Joel Becker
Signed-off-by: Mark Fasheh

Mark Fasheh
2009-01-06 00:40:34 +0800
13723d00e ocfs2: Use metadata-specific ocfs2_journal_access_*() functions. ... Browse Code »

The per-metadata-type ocfs2_journal_access_*() functions hook up jbd2
commit triggers and allow us to compute metadata ecc right before the
buffers are written out. This commit provides ecc for inodes, extent
blocks, group descriptors, and quota blocks. It is not safe to use
extened attributes and metaecc at the same time yet.

The ocfs2_extent_tree and ocfs2_path abstractions in alloc.c both hide
the type of block at their root. Before, it didn't matter, but now the
root block must use the appropriate ocfs2_journal_access_*() function.
To keep this abstract, the structures now have a pointer to the matching
journal_access function and a wrapper call to call it.

A few places use naked ocfs2_write_block() calls instead of adding the
blocks to the journal. We make sure to calculate their checksum and ecc
before the write.

Since we pass around the journal_access functions. Let's typedef them
in ocfs2.h.

Signed-off-by: Joel Becker
Signed-off-by: Mark Fasheh

Joel Becker
2009-01-06 00:40:32 +0800
a90714c15 ocfs2: Add quota calls for allocation and freeing of inodes and space ... Browse Code »

Add quota calls for allocation and freeing of inodes and space, also update
estimates on number of needed credits for a transaction. Move out inode
allocation from ocfs2_mknod_locked() because vfs_dq_init() must be called
outside of a transaction.

Signed-off-by: Jan Kara
Signed-off-by: Mark Fasheh

Jan Kara
2009-01-06 00:40:23 +0800
511308d90 ocfs2: Convert ocfs2_read_dir_block() to ocfs2_read_virt_blocks() ... Browse Code »

Now that we've centralized the ocfs2_read_virt_blocks() code, let's use
it in ocfs2_read_dir_block().

Signed-off-by: Joel Becker
Signed-off-by: Mark Fasheh

Joel Becker
2009-01-06 00:36:55 +0800
970e4936d ocfs2: Validate metadata only when it's read from disk. ... Browse Code »

Add an optional validation hook to ocfs2_read_blocks(). Now the
validation function is only called when a block was actually read off of
disk. It is not called when the buffer was in cache.

We add a buffer state bit BH_NeedsValidate to flag these buffers. It
must always be one higher than the last JBD2 buffer state bit.

The dinode, dirblock, extent_block, and xattr_block validators are
lifted to this scheme directly. The group_descriptor validator needs to
be split into two pieces. The first part only needs the gd buffer and
is passed to ocfs2_read_block(). The second part requires the dinode as
well, and is called every time. It's only 3 compares, so it's tiny.
This also allows us to clean up the non-fatal gd check used by resize.c.
It now has no magic argument.

Signed-off-by: Joel Becker
Signed-off-by: Mark Fasheh

Joel Becker
2009-01-06 00:36:53 +0800
a22305cc6 ocfs2: Wrap dirblock reads in a dedicated function. ... Browse Code »

We have ocfs2_bread() as a vestige of the original ext-based dir code.
It's only used by directories, though. Turn it into
ocfs2_read_dir_block(), with a prototype matching the other metadata
read functions. It's set up to validate dirblocks when the time comes.

Signed-off-by: Joel Becker
Signed-off-by: Mark Fasheh

Joel Becker
2009-01-06 00:36:53 +0800
b657c95c1 ocfs2: Wrap inode block reads in a dedicated function. ... Browse Code »

The ocfs2 code currently reads inodes off disk with a simple
ocfs2_read_block() call. Each place that does this has a different set
of sanity checks it performs. Some check only the signature. A couple
validate the block number (the block read vs di->i_blkno). A couple
others check for VALID_FL. Only one place validates i_fs_generation. A
couple check nothing. Even when an error is found, they don't all do
the same thing.

We wrap inode reading into ocfs2_read_inode_block(). This will validate
all the above fields, going readonly if they are invalid (they never
should be). ocfs2_read_inode_block_full() is provided for the places
that want to pass read_block flags. Every caller is passing a struct
inode with a valid ip_blkno, so we don't need a separate blkno argument
either.

We will remove the validation checks from the rest of the code in a
later commit, as they are no longer necessary.

Signed-off-by: Joel Becker
Signed-off-by: Mark Fasheh

Joel Becker
2009-01-06 00:36:52 +0800

15 Oct, 2008

5 commits

d4a8c93c8 ocfs2: Make cached block reads the common case. ... Browse Code »

ocfs2_read_blocks() currently requires the CACHED flag for cached I/O.
However, that's the common case. Let's flip it around and provide an
IGNORE_CACHE flag for the special users. This has the added benefit of
cleaning up the code some (ignore_cache takes on its special meaning
earlier in the loop).

Signed-off-by: Joel Becker
Signed-off-by: Mark Fasheh

Joel Becker
2008-10-15 02:58:22 +0800
5e0b3dec0 ocfs2: Kill the last naked wait_on_buffer() for cached reads. ... Browse Code »

ocfs2's cached buffer I/O goes through ocfs2_read_block(s)(). dir.c had
a naked wait_on_buffer() to wait for some readahead, but it should
use ocfs2_read_block() instead.

Signed-off-by: Joel Becker
Signed-off-by: Mark Fasheh

Joel Becker
2008-10-15 02:58:11 +0800
07446dc72 ocfs2: Move ocfs2_bread() into dir.c ... Browse Code »

dir.c is the only place using ocfs2_bread(), so let's make it static to
that file.

Signed-off-by: Joel Becker
Signed-off-by: Mark Fasheh

Joel Becker
2008-10-15 02:58:03 +0800
0fcaa56a2 ocfs2: Simplify ocfs2_read_block() ... Browse Code »

More than 30 callers of ocfs2_read_block() pass exactly OCFS2_BH_CACHED.
Only six pass a different flag set. Rather than have every caller care,
let's make ocfs2_read_block() take no flags and always do a cached read.
The remaining six places can call ocfs2_read_blocks() directly.

Signed-off-by: Joel Becker
Signed-off-by: Mark Fasheh

Joel Becker
2008-10-15 02:51:57 +0800
31d33073c ocfs2: Require an inode for ocfs2_read_block(s)(). ... Browse Code »

Now that synchronous readers are using ocfs2_read_blocks_sync(), all
callers of ocfs2_read_blocks() are passing an inode. Use it
unconditionally. Since it's there, we don't need to pass the
ocfs2_super either.

Signed-off-by: Joel Becker
Signed-off-by: Mark Fasheh

Joel Becker
2008-10-15 02:43:29 +0800

14 Oct, 2008

1 commit

a81cb88b6 ocfs2: Don't check for NULL before brelse() ... Browse Code »

This is pointless as brelse() already does the check.

Signed-off-by: Mark Fasheh

Mark Fasheh
2008-10-14 08:02:44 +0800