Eric Lee / smarc-fsl-linux-kernel

28 Jul, 2011

1 commit

a11f7e63c ocfs2: serialize unaligned aio ... Browse Code »
43

Fix a corruption that can happen when we have (two or more) outstanding
aio's to an overlapping unaligned region. Ext4
(e9e3bcecf44c04b9e6b505fd8e2eb9cea58fb94d) and xfs recently had to fix
similar issues.

In our case what happens is that we can have an outstanding aio on a region
and if a write comes in with some bytes overlapping the original aio we may
decide to read that region into a page before continuing (typically because
of buffered-io fallback). Since we have no ordering guarantees with the
aio, we can read stale or bad data into the page and then write it back out.

If the i/o is page and block aligned, then we avoid this issue as there
won't be any need to read data from disk.

I took the same approach as Eric in the ext4 patch and introduced some
serialization of unaligned async direct i/o. I don't expect this to have an
effect on the most common cases of AIO. Unaligned aio will be slower
though, but that's far more acceptable than data corruption.

Signed-off-by: Mark Fasheh
Signed-off-by: Joel Becker

Mark Fasheh
2011-07-28 17:07:16 +0800

11 Sep, 2010

1 commit

5e98d4924 Track negative entries v3 ... Browse Code »

Track negative dentries by recording the generation number of the parent
directory in d_fsdata. The generation number for the parent directory is
recorded in the inode_info, which increments every time the lock on the
directory is dropped.

If the generation number of the parent directory and the negative dentry
matches, there is no need to perform the revalidate, else a revalidate
is forced. This improves performance in situations where nodes look for
the same non-existent file multiple times.

Thanks Mark for explaining the DLM sequence.

Signed-off-by: Goldwyn Rodrigues
Signed-off-by: Joel Becker

Goldwyn Rodrigues
2010-09-11 00:18:15 +0800

10 Sep, 2010

1 commit

83fd9c7f6 Reorganize data elements to reduce struct sizes ... Browse Code »

Thanks for the comments. I have incorportated them all.

CONFIG_OCFS2_FS_STATS is enabled and CONFIG_DEBUG_LOCK_ALLOC is disabled.
Statistics now look like -
ocfs2_write_ctxt: 2144 - 2136 = 8
ocfs2_inode_info: 1960 - 1848 = 112
ocfs2_journal: 168 - 160 = 8
ocfs2_lock_res: 336 - 304 = 32
ocfs2_refcount_tree: 512 - 472 = 40

Signed-off-by: Goldwyn Rodrigues
Signed-off-by: Joel Becker

Goldwyn Rodrigues
2010-09-10 23:39:27 +0800

10 Aug, 2010

2 commits

45321ac54 Make ->drop_inode() just return whether inode needs to be dropped ... Browse Code »

... and let iput_final() do the actual eviction or retention

Signed-off-by: Al Viro

Al Viro
2010-08-10 04:48:35 +0800
066d92dcb convert ocfs2 to ->evict_inode() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2010-08-10 04:48:21 +0800

21 May, 2010

1 commit

03e62303c Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 ... Browse Code »

* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: (47 commits)
ocfs2: Silence a gcc warning.
ocfs2: Don't retry xattr set in case value extension fails.
ocfs2:dlm: avoid dlm->ast_lock lockres->spinlock dependency break
ocfs2: Reset xattr value size after xa_cleanup_value_truncate().
fs/ocfs2/dlm: Use kstrdup
fs/ocfs2/dlm: Drop memory allocation cast
Ocfs2: Optimize punching-hole code.
Ocfs2: Make ocfs2_find_cpos_for_left_leaf() public.
Ocfs2: Fix hole punching to correctly do CoW during cluster zeroing.
Ocfs2: Optimize ocfs2 truncate to use ocfs2_remove_btree_range() instead.
ocfs2: Block signals for mkdir/link/symlink/O_CREAT.
ocfs2: Wrap signal blocking in void functions.
ocfs2/dlm: Increase o2dlm lockres hash size
ocfs2: Make ocfs2_extend_trans() really extend.
ocfs2/trivial: Code cleanup for allocation reservation.
ocfs2: make ocfs2_adjust_resv_from_alloc simple.
ocfs2: Make nointr a default mount option
ocfs2/dlm: Make o2dlm domain join/leave messages KERN_NOTICE
o2net: log socket state changes
ocfs2: print node # when tcp fails
...

Linus Torvalds
2010-05-21 22:20:17 +0800

06 May, 2010

1 commit

4fe370afa ocfs2: use allocation reservations during file write ... Browse Code »

Add a per-inode reservations structure and pass it through to the
reservations code.

Signed-off-by: Mark Fasheh

Mark Fasheh
2010-05-06 09:17:30 +0800

24 Apr, 2010

1 commit

d4cd1871c ocfs2: add OCFS2_INODE_SKIP_ORPHAN_DIR flag and honor it in the inode wipe code ... Browse Code »

Currently in the error path of ocfs2_symlink and ocfs2_mknod, we just call
iput with the inode we failed with, but the inode wipe code will complain
because we don't add the inode to orphan dir. One solution would be to lock
the orphan dir during the entire transaction, but that's too heavy for a
rare error path. Instead, we add a flag, OCFS2_INODE_SKIP_ORPHAN_DIR which
tells the inode wipe code that it won't find this inode in the orphan dir.

[ Merge fixes and comment style cleanups -Mark ]

Signed-off-by: Li Dongyang
Signed-off-by: Mark Fasheh

Li Dongyang
2010-04-24 02:03:49 +0800

05 Sep, 2009

6 commits

6136ca5f5 ocfs2: Drop struct inode from ocfs2_extent_tree_operations. ... Browse Code »

We can get to the inode from the caching information. Other parent
types don't need it.

Signed-off-by: Joel Becker

Joel Becker
2009-09-05 07:07:57 +0800
292dd27ec ocfs2: move ip_created_trans to struct ocfs2_caching_info ... Browse Code »

Similar ip_last_trans, ip_created_trans tracks the creation of a journal
managed inode. This specifically tracks what transaction created the
inode. This is so the code can know if the inode has ever been written
to disk.

This behavior is desirable for any journal managed object. We move it
to struct ocfs2_caching_info as ci_created_trans so that any object
using ocfs2_caching_info can rely on this behavior.

Signed-off-by: Joel Becker

Joel Becker
2009-09-05 07:07:49 +0800
66fb345dd ocfs2: move ip_last_trans to struct ocfs2_caching_info ... Browse Code »

We have the read side of metadata caching isolated to struct
ocfs2_caching_info, now we need the write side. This means the journal
functions. The journal only does a couple of things with struct inode.

This change moves the ip_last_trans field onto struct
ocfs2_caching_info as ci_last_trans. This field tells the journal
whether a pending journal flush is required.

Signed-off-by: Joel Becker

Joel Becker
2009-09-05 07:07:49 +0800
8cb471e8f ocfs2: Take the inode out of the metadata read/write paths. ... Browse Code »

We are really passing the inode into the ocfs2_read/write_blocks()
functions to get at the metadata cache. This commit passes the cache
directly into the metadata block functions, divorcing them from the
inode.

Signed-off-by: Joel Becker

Joel Becker
2009-09-05 07:07:48 +0800
6e5a3d753 ocfs2: Change metadata caching locks to an operations structure. ... Browse Code »

We don't really want to cart around too many new fields on the
ocfs2_caching_info structure. So let's wrap all our access of the
parent object in a set of operations. One pointer on caching_info, and
more flexibility to boot.

Signed-off-by: Joel Becker

Joel Becker
2009-09-05 07:07:48 +0800
47460d65a ocfs2: Make the ocfs2_caching_info structure self-contained. ... Browse Code »

We want to use the ocfs2_caching_info structure in places that are not
inodes. To do that, it can no longer rely on referencing the inode
directly.

This patch moves the flags to ocfs2_caching_info->ci_flags, stores
pointers to the parent's locks on the ocfs2_caching_info, and renames
the constants and flags to reflect its independant state.

Signed-off-by: Joel Becker

Joel Becker
2009-09-05 07:07:47 +0800

04 Apr, 2009

2 commits

6ca497a83 ocfs2: fix rare stale inode errors when exporting via nfs ... Browse Code »

For nfs exporting, ocfs2_get_dentry() returns the dentry for fh.
ocfs2_get_dentry() may read from disk when the inode is not in memory,
without any cross cluster lock. this leads to the file system loading a
stale inode.

This patch fixes above problem.

Solution is that in case of inode is not in memory, we get the cluster
lock(PR) of alloc inode where the inode in question is allocated from (this
causes node on which deletion is done sync the alloc inode) before reading
out the inode itsself. then we check the bitmap in the group (the inode in
question allcated from) to see if the bit is clear. if it's clear then it's
stale. if the bit is set, we then check generation as the existing code
does.

We have to read out the inode in question from disk first to know its alloc
slot and allot bit. And if its not stale we read it out using ocfs2_iget().
The second read should then be from cache.

And also we have to add a per superblock nfs_sync_lock to cover the lock for
alloc inode and that for inode in question. this is because ocfs2_get_dentry()
and ocfs2_delete_inode() lock on them in reverse order. nfs_sync_lock is locked
in EX mode in ocfs2_get_dentry() and in PR mode in ocfs2_delete_inode(). so
that mutliple ocfs2_delete_inode() can run concurrently in normal case.

[mfasheh@suse.com: build warning fixes and comment cleanups]
Signed-off-by: Wengang Wang
Acked-by: Joel Becker
Signed-off-by: Mark Fasheh

wengang wang
2009-04-04 02:39:25 +0800
138211515 ocfs2: Optimize inode allocation by remembering last group ... Browse Code »

In ocfs2, the inode block search looks for the "emptiest" inode
group to allocate from. So if an inode alloc file has many equally
(or almost equally) empty groups, new inodes will tend to get
spread out amongst them, which in turn can put them all over the
disk. This is undesirable because directory operations on conceptually
"nearby" inodes force a large number of seeks.

So we add ip_last_used_group in core directory inodes which records
the last used allocation group. Another field named ip_last_used_slot
is also added in case inode stealing happens. When claiming new inode,
we passed in directory's inode so that the allocation can use this
information.
For more details, please see
http://oss.oracle.com/osswiki/OCFS2/DesignDocs/InodeAllocationStrategy.

Signed-off-by: Tao Ma
Signed-off-by: Mark Fasheh

Tao Ma
2009-04-04 02:39:17 +0800

06 Jan, 2009

2 commits

9e33d69f5 ocfs2: Implementation of local and global quota file handling ... Browse Code »

For each quota type each node has local quota file. In this file it stores
changes users have made to disk usage via this node. Once in a while this
information is synced to global file (and thus with other nodes) so that
limits enforcement at least aproximately works.

Global quota files contain all the information about usage and limits. It's
mostly handled by the generic VFS code (which implements a trie of structures
inside a quota file). We only have to provide functions to convert structures
from on-disk format to in-memory one. We also have to provide wrappers for
various quota functions starting transactions and acquiring necessary cluster
locks before the actual IO is really started.

Signed-off-by: Jan Kara
Signed-off-by: Mark Fasheh

Jan Kara
2009-01-06 00:40:23 +0800
b657c95c1 ocfs2: Wrap inode block reads in a dedicated function. ... Browse Code »

The ocfs2 code currently reads inodes off disk with a simple
ocfs2_read_block() call. Each place that does this has a different set
of sanity checks it performs. Some check only the signature. A couple
validate the block number (the block read vs di->i_blkno). A couple
others check for VALID_FL. Only one place validates i_fs_generation. A
couple check nothing. Even when an error is found, they don't all do
the same thing.

We wrap inode reading into ocfs2_read_inode_block(). This will validate
all the above fields, going readonly if they are invalid (they never
should be). ocfs2_read_inode_block_full() is provided for the places
that want to pass read_block flags. Every caller is passing a struct
inode with a valid ip_blkno, so we don't need a separate blkno argument
either.

We will remove the validation checks from the rest of the code in a
later commit, as they are no longer necessary.

Signed-off-by: Joel Becker
Signed-off-by: Mark Fasheh

Joel Becker
2009-01-06 00:36:52 +0800

15 Oct, 2008

1 commit

07446dc72 ocfs2: Move ocfs2_bread() into dir.c ... Browse Code »

dir.c is the only place using ocfs2_bread(), so let's make it static to
that file.

Signed-off-by: Joel Becker
Signed-off-by: Mark Fasheh

Joel Becker
2008-10-15 02:58:03 +0800

14 Oct, 2008

2 commits

2b4e30fbd ocfs2: Switch over to JBD2. ... Browse Code »

ocfs2 wants JBD2 for many reasons, not the least of which is that JBD is
limiting our maximum filesystem size.

It's a pretty trivial change. Most functions are just renamed. The
only functional change is moving to Jan's inode-based ordered data mode.
It's better, too.

Because JBD2 reads and writes JBD journals, this is compatible with any
existing filesystem. It can even interact with JBD-based ocfs2 as long
as the journal is formated for JBD.

We provide a compatibility option so that paranoid people can still use
JBD for the time being. This will go away shortly.

[ Moved call of ocfs2_begin_ordered_truncate() from ocfs2_delete_inode() to
ocfs2_truncate_for_delete(). --Mark ]

Signed-off-by: Joel Becker
Signed-off-by: Mark Fasheh

Joel Becker
2008-10-14 08:02:43 +0800
cf1d6c763 ocfs2: Add extended attribute support ... Browse Code »

This patch implements storing extended attributes both in inode or a single
external block. We only store EA's in-inode when blocksize > 512 or that
inode block has free space for it. When an EA's value is larger than 80
bytes, we will store the value via b-tree outside inode or block.

Signed-off-by: Tiger Yang
Signed-off-by: Mark Fasheh

Tiger Yang
2008-10-14 07:57:02 +0800

26 Jan, 2008

3 commits

5fa0613ea ocfs2: Silence false lockdep warnings ... Browse Code »

Create separate lockdep lock classes for system file's i_mutexes. They are
used to guard allocations and similar things and thus rank differently
than i_mutex of a regular file or directory.

Signed-off-by: Jan Kara
Signed-off-by: Mark Fasheh

Jan Kara
2008-01-26 07:05:44 +0800
e63aecb65 ocfs2: Rename ocfs2_meta_[un]lock ... Browse Code »

Call this the "inode_lock" now, since it covers both data and meta data.
This patch makes no functional changes.

Signed-off-by: Mark Fasheh

Mark Fasheh
2008-01-26 06:46:01 +0800
c934a92d0 ocfs2: Remove data locks ... Browse Code »

The meta lock now covers both meta data and data, so this just removes the
now-redundant data lock.

Combining locks saves us a round of lock mastery per inode and one less lock
to ping between nodes during read/write.

We don't lose much - since meta locks were always held before a data lock
(and at the same level) ordered writeout mode (the default) ensured that
flushing for the meta data lock also pushed out data anyways.

Signed-off-by: Mark Fasheh

Mark Fasheh
2008-01-26 06:45:57 +0800

13 Oct, 2007

1 commit

15b1e36bd ocfs2: Structure updates for inline data ... Browse Code »

Add the disk, network and memory structures needed to support data in inode.

Struct ocfs2_inline_data is defined and embedded in ocfs2_dinode for storing
inline data.

A new inode field, i_dyn_features, is added to facilitate tracking of
dynamic inode state. Since it will be used often, we want to mirror it on
ocfs2_inode_info, and transfer it via the meta data lvb.

Signed-off-by: Mark Fasheh
Reviewed-by: Joel Becker

Mark Fasheh
2007-10-13 02:54:39 +0800

03 May, 2007

1 commit

6e4b0d569 [PATCH] Copy i_flags to ocfs2 inode flags on write ... Browse Code »

Propagate flags such as S_APPEND, S_IMMUTABLE, etc. from i_flags into
ocfs2-specific ip_attr. Hence, when someone sets these flags via a different
interface than ioctl, they are stored correctly.

Signed-off-by: Jan Kara
Signed-off-by: Mark Fasheh

Jan Kara
2007-05-03 06:07:58 +0800

27 Apr, 2007

5 commits

834189788 ocfs2: Cache extent records ... Browse Code »

The extent map code was ripped out earlier because of an inability to deal
with holes. This patch adds back a simpler caching scheme requiring far less
code.

Our old extent map caching was designed back when meta data block caching in
Ocfs2 didn't work very well, resulting in many disk reads. These days our
metadata caching is much better, resulting in no un-necessary disk reads. As
a result, extent caching doesn't have to be as fancy, nor does it have to
cache as many extents. Keeping the last 3 extents seen should be sufficient
to give us a small performance boost on some streaming workloads.

Signed-off-by: Mark Fasheh

Mark Fasheh
2007-04-27 06:10:40 +0800
8110b073a ocfs2: Fix up i_blocks calculation to know about holes ... Browse Code »

Older file systems which didn't support holes did a dumb calculation of
i_blocks based on i_size. This is no longer accurate, so fix things up to
take actual allocation into account.

Signed-off-by: Mark Fasheh

Mark Fasheh
2007-04-27 06:07:40 +0800
363041a5f ocfs2: temporarily remove extent map caching ... Browse Code »

The code in extent_map.c is not prepared to deal with a subtree being
rotated between lookups. This can happen when filling holes in sparse files.
Instead of a lengthy patch to update the code (which would likely lose the
benefit of caching subtree roots), we remove most of the algorithms and
implement a simple path based lookup. A less ambitious extent caching scheme
will be added in a later patch.

Signed-off-by: Mark Fasheh

Mark Fasheh
2007-04-27 06:01:31 +0800
68e2b740c ocfs2: remove unused code ... Browse Code »

Remove node messaging code that becomes unused with the delete inode vote
removal.

[Removed even more cruft which I spotted during review --Mark]

Signed-off-by: Tiger Yang
Signed-off-by: Mark Fasheh

Tiger Yang
2007-04-27 05:40:16 +0800
500086300 ocfs2: Remove delete inode vote ... Browse Code »

Ocfs2 currently does cluster-wide node messaging to check the open state of
an inode during delete. This patch removes that mechanism in favor of an
inode cluster lock which is taken at shared read when an inode is first read
and dropped in clear_inode(). This allows a deleting node to test the
liveness of an inode by attempting to take an exclusive lock.

Signed-off-by: Tiger Yang
Signed-off-by: Mark Fasheh

Tiger Yang
2007-04-27 05:39:48 +0800

08 Dec, 2006

1 commit

e18b890bb [PATCH] slab: remove kmem_cache_t ... Browse Code »

Replace all uses of kmem_cache_t with struct kmem_cache.

The patch was generated using the following script:

#!/bin/sh
#
# Replace one string by another in all the kernel sources.
#

set -e

for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
quilt add $file
sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
mv /tmp/$$ $file
quilt refresh
done

The script was run like this

sh replace kmem_cache_t "struct kmem_cache"

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-12-08 00:39:25 +0800

02 Dec, 2006

2 commits

1fabe1481 ocfs2: Remove struct ocfs2_journal_handle in favor of handle_t ... Browse Code »

This is mostly a search and replace as ocfs2_journal_handle is now no more
than a container for a handle_t pointer.

ocfs2_commit_trans() becomes very straight forward, and we remove some out
of date comments / code.

Signed-off-by: Mark Fasheh

Mark Fasheh
2006-12-02 10:28:28 +0800
02928a71a ocfs2: remove unused ocfs2_handle_add_inode() ... Browse Code »

We can also delete the unused infrastructure which was once in place to
support this functionality. ocfs2_inode_private loses ip_handle and
ip_handle_list. ocfs2_journal_handle loses handle_list.

Signed-off-by: Mark Fasheh

Mark Fasheh
2006-12-02 10:27:55 +0800

25 Sep, 2006

1 commit

24c19ef40 ocfs2: Remove i_generation from inode lock names ... Browse Code »

OCFS2 puts inode meta data in the "lock value block" provided by the DLM.
Typically, i_generation is encoded in the lock name so that a deleted inode
on and a new one in the same block don't share the same lvb.

Unfortunately, that scheme means that the read in ocfs2_read_locked_inode()
is potentially thrown away as soon as the meta data lock is taken - we
cannot encode the lock name without first knowing i_generation, which
requires a disk read.

This patch encodes i_generation in the inode meta data lvb, and removes the
value from the inode meta data lock name. This way, the read can be covered
by a lock, and at the same time we can distinguish between an up to date and
a stale LVB.

This will help cold-cache stat(2) performance in particular.

Since this patch changes the protocol version, we take the opportunity to do
a minor re-organization of two of the LVB fields.

Signed-off-by: Mark Fasheh

Mark Fasheh
2006-09-25 04:50:46 +0800

21 Sep, 2006

1 commit

ca4d147e6 ocfs2: add ext2 attributes ... Browse Code »

Support immutable, and other attributes.

Some renaming and other minor fixes done by myself.

Signed-off-by: Herbert Poetzl
Signed-off-by: Mark Fasheh

Herbert Poetzl
2006-09-21 06:48:39 +0800

29 Jun, 2006

1 commit

f5e54d6e5 [PATCH] mark address_space_operations const ... Browse Code »

Same as with already do with the file operations: keep them in .rodata and
prevents people from doing runtime patching.

Signed-off-by: Christoph Hellwig
Cc: Steven French
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2006-06-29 05:59:04 +0800

04 Feb, 2006

1 commit

251b6eccb [OCFS2] Make ip_io_sem a mutex ... Browse Code »

ip_io_sem is now ip_io_mutex.

Signed-off-by: Mark Fasheh

Mark Fasheh
2006-02-04 05:47:19 +0800

04 Jan, 2006

1 commit

ccd979bdb [PATCH] OCFS2: The Second Oracle Cluster Filesystem ... Browse Code »

The OCFS2 file system module.

Signed-off-by: Mark Fasheh
Signed-off-by: Kurt Hackel

Mark Fasheh
2006-01-04 03:45:47 +0800