Eric Lee / smarc-fsl-linux-kernel

11 Sep, 2010

1 commit

07eaac943 ocfs2: Fix lockdep warning in reflink. ... Browse Code »

This patch change mutex_lock to a new subclass and
add a new inode lock subclass for the target inode
which caused this lockdep warning.

=============================================
[ INFO: possible recursive locking detected ]
2.6.35+ #5
---------------------------------------------
reflink/11086 is trying to acquire lock:
(Meta){+++++.}, at: [] ocfs2_reflink_ioctl+0x898/0x1229 [ocfs2]

but task is already holding lock:
(Meta){+++++.}, at: [] ocfs2_reflink_ioctl+0x5d3/0x1229 [ocfs2]

other info that might help us debug this:
6 locks held by reflink/11086:
#0: (&sb->s_type->i_mutex_key#15/1){+.+.+.}, at: [] lookup_create+0x26/0x97
#1: (&sb->s_type->i_mutex_key#15){+.+.+.}, at: [] ocfs2_reflink_ioctl+0x4d3/0x1229 [ocfs2]
#2: (Meta){+++++.}, at: [] ocfs2_reflink_ioctl+0x5d3/0x1229 [ocfs2]
#3: (&oi->ip_xattr_sem){+.+.+.}, at: [] ocfs2_reflink_ioctl+0x68b/0x1229 [ocfs2]
#4: (&oi->ip_alloc_sem){+.+.+.}, at: [] ocfs2_reflink_ioctl+0x69a/0x1229 [ocfs2]
#5: (&sb->s_type->i_mutex_key#15/2){+.+...}, at: [] ocfs2_reflink_ioctl+0x882/0x1229 [ocfs2]

stack backtrace:
Pid: 11086, comm: reflink Not tainted 2.6.35+ #5
Call Trace:
[] validate_chain+0x56e/0xd68
[] ? mark_held_locks+0x49/0x69
[] __lock_acquire+0x79a/0x7f1
[] lock_acquire+0xc6/0xed
[] ? ocfs2_reflink_ioctl+0x898/0x1229 [ocfs2]
[] __ocfs2_cluster_lock+0x975/0xa0d [ocfs2]
[] ? ocfs2_reflink_ioctl+0x898/0x1229 [ocfs2]
[] ? ocfs2_wait_for_recovery+0x15/0x8a [ocfs2]
[] ocfs2_inode_lock_full_nested+0x1ac/0xdc5 [ocfs2]
[] ? ocfs2_reflink_ioctl+0x898/0x1229 [ocfs2]
[] ? trace_hardirqs_on_caller+0x10b/0x12f
[] ? debug_mutex_free_waiter+0x4f/0x53
[] ocfs2_reflink_ioctl+0x898/0x1229 [ocfs2]
[] ? ocfs2_file_lock_res_init+0x66/0x78 [ocfs2]
[] ? might_fault+0x40/0x8d
[] ocfs2_ioctl+0x61a/0x656 [ocfs2]
[] ? mntput_no_expire+0x1d/0xb0
[] ? path_put+0x2c/0x31
[] vfs_ioctl+0x2a/0x9d
[] do_vfs_ioctl+0x45d/0x4ae
[] ? _raw_spin_unlock+0x26/0x2a
[] ? sysret_check+0x27/0x62
[] sys_ioctl+0x57/0x7a
[] system_call_fastpath+0x16/0x1b

Signed-off-by: Tao Ma
Signed-off-by: Joel Becker

Tao Ma
2010-09-11 00:19:06 +0800

23 Sep, 2009

1 commit

8dec98edf ocfs2: Add new refcount tree lock resource in dlmglue. ... Browse Code »

refcount tree lock resource is used to protect refcount
tree read/write among multiple nodes.

Signed-off-by: Tao Ma

Tao Ma
2009-09-23 11:09:28 +0800

23 Jun, 2009

2 commits

cb25797d4 ocfs2: Add lockdep annotations ... Browse Code »

Add lockdep support to OCFS2. The support also covers all of the cluster
locks except for open locks, journal locks, and local quotafile locks. These
are special because they are acquired for a node, not for a particular process
and lockdep cannot deal with such type of locking.

Signed-off-by: Jan Kara
Signed-off-by: Joel Becker

Jan Kara
2009-06-23 05:34:26 +0800
df152c241 ocfs2: Disable orphan scanning for local and hard-ro mounts ... Browse Code »

Local and Hard-RO mounts do not need orphan scanning.

Signed-off-by: Sunil Mushran
Signed-off-by: Joel Becker

Sunil Mushran
2009-06-23 05:24:55 +0800

04 Jun, 2009

1 commit

83273932f ocfs2: timer to queue scan of all orphan slots ... Browse Code »

When a dentry is unlinked, the unlinking node takes an EX on the dentry lock
before moving the dentry to the orphan directory. Other nodes that have
this dentry in cache have a PR on the same dentry lock. When the EX is
requested, the other nodes flag the corresponding inode as MAYBE_ORPHANED
during downconvert. The inode is finally deleted when the last node to iput
the inode sees that i_nlink==0 and the MAYBE_ORPHANED flag is set.

A problem arises if a node is forced to free dentry locks because of memory
pressure. If this happens, the node will no longer get downconvert
notifications for the dentries that have been unlinked on another node.
If it also happens that node is actively using the corresponding inode and
happens to be the one performing the last iput on that inode, it will fail
to delete the inode as it will not have the MAYBE_ORPHANED flag set.

This patch fixes this shortcoming by introducing a periodic scan of the
orphan directories to delete such inodes. Care has been taken to distribute
the workload across the cluster so that no one node has to perform the task
all the time.

Signed-off-by: Srinivas Eeda
Signed-off-by: Joel Becker

Srinivas Eeda
2009-06-04 10:14:31 +0800

04 Apr, 2009

1 commit

6ca497a83 ocfs2: fix rare stale inode errors when exporting via nfs ... Browse Code »

For nfs exporting, ocfs2_get_dentry() returns the dentry for fh.
ocfs2_get_dentry() may read from disk when the inode is not in memory,
without any cross cluster lock. this leads to the file system loading a
stale inode.

This patch fixes above problem.

Solution is that in case of inode is not in memory, we get the cluster
lock(PR) of alloc inode where the inode in question is allocated from (this
causes node on which deletion is done sync the alloc inode) before reading
out the inode itsself. then we check the bitmap in the group (the inode in
question allcated from) to see if the bit is clear. if it's clear then it's
stale. if the bit is set, we then check generation as the existing code
does.

We have to read out the inode in question from disk first to know its alloc
slot and allot bit. And if its not stale we read it out using ocfs2_iget().
The second read should then be from cache.

And also we have to add a per superblock nfs_sync_lock to cover the lock for
alloc inode and that for inode in question. this is because ocfs2_get_dentry()
and ocfs2_delete_inode() lock on them in reverse order. nfs_sync_lock is locked
in EX mode in ocfs2_get_dentry() and in PR mode in ocfs2_delete_inode(). so
that mutliple ocfs2_delete_inode() can run concurrently in normal case.

[mfasheh@suse.com: build warning fixes and comment cleanups]
Signed-off-by: Wengang Wang
Acked-by: Joel Becker
Signed-off-by: Mark Fasheh

wengang wang
2009-04-04 02:39:25 +0800

06 Jan, 2009

1 commit

9e33d69f5 ocfs2: Implementation of local and global quota file handling ... Browse Code »

For each quota type each node has local quota file. In this file it stores
changes users have made to disk usage via this node. Once in a while this
information is synced to global file (and thus with other nodes) so that
limits enforcement at least aproximately works.

Global quota files contain all the information about usage and limits. It's
mostly handled by the generic VFS code (which implements a trie of structures
inside a quota file). We only have to provide functions to convert structures
from on-disk format to in-memory one. We also have to provide wrappers for
various quota functions starting transactions and acquiring necessary cluster
locks before the actual IO is really started.

Signed-off-by: Jan Kara
Signed-off-by: Mark Fasheh

Jan Kara
2009-01-06 00:40:23 +0800

18 Apr, 2008

4 commits

286eaa95c ocfs2: Break out stackglue into modules. ... Browse Code »

We define the ocfs2_stack_plugin structure to represent a stack driver.
The o2cb stack code is split into stack_o2cb.c. This becomes the
ocfs2_stack_o2cb.ko module.

The stackglue generic functions are similarly split into the
ocfs2_stackglue.ko module. This module now provides an interface to
register drivers. The ocfs2_stack_o2cb driver registers itself. As
part of this interface, ocfs2_stackglue can load drivers on demand.
This is accomplished in ocfs2_cluster_connect().

ocfs2_cluster_disconnect() is now notified when a _hangup() is pending.
If a hangup is pending, it will not release the driver module and will
let _hangup() do that.

Signed-off-by: Joel Becker

Joel Becker
2008-04-18 23:56:05 +0800
63e0c48ae ocfs2: Clean up stackglue initialization ... Browse Code »

The stack glue initialization function needs a better name so that it can be
used cleanly when stackglue becomes a module.

Signed-off-by: Joel Becker
Signed-off-by: Mark Fasheh

Joel Becker
2008-04-18 23:56:05 +0800
4670c46de ocfs2: Introduce the new ocfs2_cluster_connect/disconnect() API. ... Browse Code »

This step introduces a cluster stack agnostic API for initializing and
exiting. fs/ocfs2/dlmglue.c no longer uses o2cb/o2dlm knowledge to
connect to the stack. It is all handled in stackglue.c.

heartbeat.c no longer needs to know how it gets called.
ocfs2_do_node_down() is now a clean recovery trigger.

The big gotcha is the ordering of initializations and de-initializations done
underneath ocfs2_cluster_connect(). ocfs2_dlm_init() used to do all
o2dlm initialization in one block. Thus, the o2dlm functionality of
ocfs2_cluster_connect() is very straightforward. ocfs2_dlm_shutdown(),
however, did a few things between de-registration of the eviction
callback and actually shutting down the domain. Now de-registration and
shutdown of the domain are wrapped within the single
ocfs2_cluster_disconnect() call. I've checked the code paths to make
sure we can safely tear down things in ocfs2_dlm_shutdown() before
calling ocfs2_cluster_disconnect(). The filesystem has already set
itself to ignore the callback.

Signed-off-by: Joel Becker
Signed-off-by: Mark Fasheh

Joel Becker
2008-04-18 23:56:04 +0800
24ef1815e ocfs2: Separate out dlm lock functions. ... Browse Code »

This is the first in a series of patches to isolate ocfs2 from the
underlying cluster stack. Here we wrap the dlm locking functions with
ocfs2-specific calls. Because ocfs2 always uses the same dlm lock status
callbacks, we can eliminate the callbacks from the filesystem visible
functions.

Signed-off-by: Joel Becker
Signed-off-by: Mark Fasheh

Joel Becker
2008-04-18 23:56:03 +0800

04 Mar, 2008

1 commit

006000566 [2.6 patch] fs/ocfs2/: possible cleanups ... Browse Code »

This patch contains the following cleanups that are now possible:
- make the following needlessly global functions static:
- dlmglue.c:ocfs2_process_blocked_lock()
- heartbeat.c:ocfs2_node_map_init()
- #if 0 the following unused global function plus support functions:
- heartbeat.c:ocfs2_node_map_is_only()

Signed-off-by: Adrian Bunk
Signed-off-by: Mark Fasheh

Adrian Bunk
2008-03-04 07:50:21 +0800

07 Feb, 2008

1 commit

d24fbcda0 ocfs2: Negotiate locking protocol versions. ... Browse Code »

Currently, when ocfs2 nodes connect via TCP, they advertise their
compatibility level. If the versions do not match, two nodes cannot speak
to each other and they disconnect. As a result, this provides no forward or
backwards compatibility.

This patch implements a simple protocol negotiation at the dlm level by
introducing a major/minor version number scheme for entities that
communicate. Specifically, o2dlm has a major/minor version for interaction
with o2dlm on other nodes, and ocfs2 itself has a major/minor version for
interacting with the filesystem on other nodes.

This will allow rolling upgrades of ocfs2 clusters when changes to the
locking or network protocols can be done in a backwards compatible manner.
In those cases, only the minor number is changed and the negotatied protocol
minor is returned from dlm join. In the far less likely event that a
required protocol change makes backwards compatibility impossible, we simply
bump the major number.

Signed-off-by: Joel Becker
Signed-off-by: Mark Fasheh

Joel Becker
2008-02-07 08:11:29 +0800

26 Jan, 2008

4 commits

cf8e06f1a [PATCH 1/2] ocfs2: add flock lock type ... Browse Code »

This adds a new dlmglue lock type which is intended to back flock()
requests.

Since these locks are driven from userspace, usage rules are much more
liberal than the typical Ocfs2 internal cluster lock. As a result, we can't
make use of most dlmglue features - lock caching and lock level
optimizations in particular. Additionally, userspace is free to deadlock
itself, so we have to deal with that in the same way as the rest of the
kernel - by allowing a signal to abort a lock request.

In order to keep ocfs2_cluster_lock() complexity down, ocfs2_file_lock()
does it's own dlm coordination. We still use the same helper functions
though, so duplicated code is kept to a minimum.

Signed-off-by: Mark Fasheh

Mark Fasheh
2008-01-26 07:05:43 +0800
e63aecb65 ocfs2: Rename ocfs2_meta_[un]lock ... Browse Code »

Call this the "inode_lock" now, since it covers both data and meta data.
This patch makes no functional changes.

Signed-off-by: Mark Fasheh

Mark Fasheh
2008-01-26 06:46:01 +0800
c934a92d0 ocfs2: Remove data locks ... Browse Code »

The meta lock now covers both meta data and data, so this just removes the
now-redundant data lock.

Combining locks saves us a round of lock mastery per inode and one less lock
to ping between nodes during read/write.

We don't lose much - since meta locks were always held before a data lock
(and at the same level) ordered writeout mode (the default) ensured that
flushing for the meta data lock also pushed out data anyways.

Signed-off-by: Mark Fasheh

Mark Fasheh
2008-01-26 06:45:57 +0800
34d024f84 ocfs2: Remove mount/unmount votes ... Browse Code »

The node maps that are set/unset by these votes are no longer relevant, thus
we can remove the mount and umount votes. Since those are the last two
remaining votes, we can also remove the entire vote infrastructure.

The vote thread has been renamed to the downconvert thread, and the small
amount of functionality related to managing it has been moved into
fs/ocfs2/dlmglue.c. All references to votes have been removed or updated.

Signed-off-by: Mark Fasheh

Mark Fasheh
2008-01-26 06:45:34 +0800

13 Oct, 2007

1 commit

15b1e36bd ocfs2: Structure updates for inline data ... Browse Code »

Add the disk, network and memory structures needed to support data in inode.

Struct ocfs2_inline_data is defined and embedded in ocfs2_dinode for storing
inline data.

A new inode field, i_dyn_features, is added to facilitate tracking of
dynamic inode state. Since it will be used often, we want to mirror it on
ocfs2_inode_info, and transfer it via the meta data lvb.

Signed-off-by: Mark Fasheh
Reviewed-by: Joel Becker

Mark Fasheh
2007-10-13 02:54:39 +0800

03 May, 2007

1 commit

6cb129f56 [PATCH] fs/ocfs2/: make 3 functions static ... Browse Code »

This patch makes the following needlessly global functions static:
- aops.c: ocfs2_write_data_page()
- dlmglue.c: ocfs2_dump_meta_lvb_info()
- file.c: ocfs2_set_inode_size()

Signed-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Mark Fasheh

Adrian Bunk
2007-05-03 06:07:27 +0800

27 Apr, 2007

1 commit

500086300 ocfs2: Remove delete inode vote ... Browse Code »

Ocfs2 currently does cluster-wide node messaging to check the open state of
an inode during delete. This patch removes that mechanism in favor of an
inode cluster lock which is taken at shared read when an inode is first read
and dropped in clear_inode(). This allows a deleting node to test the
liveness of an inode by attempting to take an exclusive lock.

Signed-off-by: Tiger Yang
Signed-off-by: Mark Fasheh

Tiger Yang
2007-04-27 05:39:48 +0800

02 Dec, 2006

3 commits

7f1a37e31 ocfs2: core atime update functions ... Browse Code »

This patch adds the core routines for updating atime in ocfs2.

Signed-off-by: Tiger Yang
Signed-off-by: Mark Fasheh

Tiger Yang
2006-12-02 10:28:51 +0800
4bcec1847 ocfs2: remove unused handle argument from ocfs2_meta_lock_full() ... Browse Code »

Now that this is unused and all callers pass NULL, we can safely remove it.

Signed-off-by: Mark Fasheh

Mark Fasheh
2006-12-02 10:28:05 +0800
da66116ee [2.6 patch] make ocfs2_create_new_lock() static ... Browse Code »

This patch makes the needlessly global ocfs2_create_new_lock() static.

Signed-off-by: Adrian Bunk
Signed-off-by: Mark Fasheh

Adrian Bunk
2006-12-02 10:26:50 +0800

25 Sep, 2006

4 commits

24c19ef40 ocfs2: Remove i_generation from inode lock names ... Browse Code »

OCFS2 puts inode meta data in the "lock value block" provided by the DLM.
Typically, i_generation is encoded in the lock name so that a deleted inode
on and a new one in the same block don't share the same lvb.

Unfortunately, that scheme means that the read in ocfs2_read_locked_inode()
is potentially thrown away as soon as the meta data lock is taken - we
cannot encode the lock name without first knowing i_generation, which
requires a disk read.

This patch encodes i_generation in the inode meta data lvb, and removes the
value from the inode meta data lock name. This way, the read can be covered
by a lock, and at the same time we can distinguish between an up to date and
a stale LVB.

This will help cold-cache stat(2) performance in particular.

Since this patch changes the protocol version, we take the opportunity to do
a minor re-organization of two of the LVB fields.

Signed-off-by: Mark Fasheh

Mark Fasheh
2006-09-25 04:50:46 +0800
f9e2d82e6 ocfs2: Encode i_generation in the meta data lvb ... Browse Code »

When i_generation is removed from the lockname, this will help us determine
whether a meta data lvb has information that is in sync with the local
struct inode.

Signed-off-by: Mark Fasheh

Mark Fasheh
2006-09-25 04:50:45 +0800
4d3b83f73 ocfs2: Free up some space in the lvb ... Browse Code »

lvb_version doesn't need to be a whole 32 bits. Make it an 8 bit field to
free up some space. This should be backwards compatible until we use one of
the fields, in which case we'd bump the lvb version anyway.

Signed-off-by: Mark Fasheh

Mark Fasheh
2006-09-25 04:50:45 +0800
d680efe9d ocfs2: Add new cluster lock type ... Browse Code »

Replace the dentry vote mechanism with a cluster lock which covers a set
of dentries. This allows us to force d_delete() only on nodes which actually
care about an unlink.

Every node that does a ->lookup() gets a read only lock on the dentry, until
an unlink during which the unlinking node, will request an exclusive lock,
forcing the other nodes who care about that dentry to d_delete() it. The
effect is that we retain a very lightweight ->d_revalidate(), and at the
same time get to make large improvements to the average case performance of
the ocfs2 unlink and rename operations.

This patch adds the cluster lock type which OCFS2 can attach to
dentries. A small number of fs/ocfs2/dcache.c functions are stubbed
out so that this change can compile.

Signed-off-by: Mark Fasheh

Mark Fasheh
2006-09-25 04:50:42 +0800

21 Sep, 2006

1 commit

ca4d147e6 ocfs2: add ext2 attributes ... Browse Code »

Support immutable, and other attributes.

Some renaming and other minor fixes done by myself.

Signed-off-by: Herbert Poetzl
Signed-off-by: Mark Fasheh

Herbert Poetzl
2006-09-21 06:48:39 +0800

04 Jan, 2006

1 commit

ccd979bdb [PATCH] OCFS2: The Second Oracle Cluster Filesystem ... Browse Code »

The OCFS2 file system module.

Signed-off-by: Mark Fasheh
Signed-off-by: Kurt Hackel

Mark Fasheh
2006-01-04 03:45:47 +0800