Doug / smarc-fsl-linux-kernel | Embedian Git Server

08 May, 2013

1 commit

41003a7bc aio: remove retry-based AIO ... Browse Code »

This removes the retry-based AIO infrastructure now that nothing in tree
is using it.

We want to remove retry-based AIO because it is fundemantally unsafe.
It retries IO submission from a kernel thread that has only assumed the
mm of the submitting task. All other task_struct references in the IO
submission path will see the kernel thread, not the submitting task.
This design flaw means that nothing of any meaningful complexity can use
retry-based AIO.

This removes all the code and data associated with the retry machinery.
The most significant benefit of this is the removal of the locking
around the unused run list in the submission path.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Kent Overstreet
Signed-off-by: Zach Brown
Cc: Zach Brown
Cc: Felipe Balbi
Cc: Greg Kroah-Hartman
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Rusty Russell
Cc: Jens Axboe
Cc: Asai Thambi S P
Cc: Selvan Mani
Cc: Sam Bradshaw
Acked-by: Jeff Moyer
Cc: Al Viro
Cc: Benjamin LaHaise
Reviewed-by: "Theodore Ts'o"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Zach Brown
2013-05-08 09:38:27 +0800

26 Feb, 2013

1 commit

94f2f1423 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace ... Browse Code »

Pull user namespace and namespace infrastructure changes from Eric W Biederman:
"This set of changes starts with a few small enhnacements to the user
namespace. reboot support, allowing more arbitrary mappings, and
support for mounting devpts, ramfs, tmpfs, and mqueuefs as just the
user namespace root.

I do my best to document that if you care about limiting your
unprivileged users that when you have the user namespace support
enabled you will need to enable memory control groups.

There is a minor bug fix to prevent overflowing the stack if someone
creates way too many user namespaces.

The bulk of the changes are a continuation of the kuid/kgid push down
work through the filesystems. These changes make using uids and gids
typesafe which ensures that these filesystems are safe to use when
multiple user namespaces are in use. The filesystems converted for
3.9 are ceph, 9p, afs, ocfs2, gfs2, ncpfs, nfs, nfsd, and cifs. The
changes for these filesystems were a little more involved so I split
the changes into smaller hopefully obviously correct changes.

XFS is the only filesystem that remains. I was hoping I could get
that in this release so that user namespace support would be enabled
with an allyesconfig or an allmodconfig but it looks like the xfs
changes need another couple of days before it they are ready."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (93 commits)
cifs: Enable building with user namespaces enabled.
cifs: Convert struct cifs_ses to use a kuid_t and a kgid_t
cifs: Convert struct cifs_sb_info to use kuids and kgids
cifs: Modify struct smb_vol to use kuids and kgids
cifs: Convert struct cifsFileInfo to use a kuid
cifs: Convert struct cifs_fattr to use kuid and kgids
cifs: Convert struct tcon_link to use a kuid.
cifs: Modify struct cifs_unix_set_info_args to hold a kuid_t and a kgid_t
cifs: Convert from a kuid before printing current_fsuid
cifs: Use kuids and kgids SID to uid/gid mapping
cifs: Pass GLOBAL_ROOT_UID and GLOBAL_ROOT_GID to keyring_alloc
cifs: Use BUILD_BUG_ON to validate uids and gids are the same size
cifs: Override unmappable incoming uids and gids
nfsd: Enable building with user namespaces enabled.
nfsd: Properly compare and initialize kuids and kgids
nfsd: Store ex_anon_uid and ex_anon_gid as kuids and kgids
nfsd: Modify nfsd4_cb_sec to use kuids and kgids
nfsd: Handle kuids and kgids in the nfs4acl to posix_acl conversion
nfsd: Convert nfsxdr to use kuids and kgids
nfsd: Convert nfs3xdr to use kuids and kgids
...

Linus Torvalds
2013-02-26 08:00:49 +0800

22 Feb, 2013

1 commit

3278bb748 ocfs2: unlock super lock if lockres refresh failed ... Browse Code »

If lockres refresh failed, the super lock will never be released which
will cause some processes on other cluster nodes hung forever.

Signed-off-by: Junxiao Bi
Cc: Joel Becker
Cc: Mark Fasheh
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Junxiao Bi
2013-02-22 09:22:19 +0800

13 Feb, 2013

1 commit

03ab30f73 ocfs2: convert between kuids and kgids and DLM locks ... Browse Code »

Convert between uid and gids stored in the on the wire format of dlm
locks aka struct ocfs2_meta_lvb and kuids and kgids stored in
inode->i_uid and inode->i_gid.

Cc: Mark Fasheh
Cc: Joel Becker
Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2013-02-13 22:00:57 +0800

04 Jul, 2012

2 commits

a75e9ccab ocfs2: use spinlock irqsave for downconvert lock.patch ... Browse Code »

When ocfs2dc thread holds dc_task_lock spinlock and receives soft IRQ it
deadlock itself trying to get same spinlock in ocfs2_wake_downconvert_thread.
Below is the stack snippet.

The patch disables interrupts when acquiring dc_task_lock spinlock.

ocfs2_wake_downconvert_thread
ocfs2_rw_unlock
ocfs2_dio_end_io
dio_complete
.....
bio_endio
req_bio_endio
....
scsi_io_completion
blk_done_softirq
__do_softirq
do_softirq
irq_exit
do_IRQ
ocfs2_downconvert_thread
[kthread]

Signed-off-by: Srinivas Eeda
Signed-off-by: Joel Becker

Srinivas Eeda
2012-07-04 14:27:15 +0800
16865b7c4 ocfs2: Misplaced parens in unlikley ... Browse Code »

Fix misplaced parentheses

Signed-off-by: Roel Kluin
Signed-off-by: Joel Becker

roel
2012-07-04 14:27:13 +0800

02 Dec, 2011

1 commit

0a4ebed78 Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 ... Browse Code »

* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: (31 commits)
ocfs2: avoid unaligned access to dqc_bitmap
ocfs2: Use filemap_write_and_wait() instead of write_inode_now()
ocfs2: honor O_(D)SYNC flag in fallocate
ocfs2: Add a missing journal credit in ocfs2_link_credits() -v2
ocfs2: send correct UUID to cleancache initialization
ocfs2: Commit transactions in error cases -v2
ocfs2: make direntry invalid when deleting it
fs/ocfs2/dlm/dlmlock.c: free kmem_cache_zalloc'd data using kmem_cache_free
ocfs2: Avoid livelock in ocfs2_readpage()
ocfs2: serialize unaligned aio
ocfs2: Implement llseek()
ocfs2: Fix ocfs2_page_mkwrite()
ocfs2: Add comment about orphan scanning
ocfs2: Clean up messages in the fs
ocfs2/cluster: Cluster up now includes network connections too
ocfs2/cluster: Add new function o2net_fill_node_map()
ocfs2/cluster: Fix output in file elapsed_time_in_ms
ocfs2/dlm: dlmlock_remote() needs to account for remastery
ocfs2/dlm: Take inflight reference count for remotely mastered resources too
ocfs2/dlm: Cleanup dlm_wait_for_node_death() and dlm_wait_for_node_recovery()
...

Linus Torvalds
2011-12-02 06:55:34 +0800

02 Nov, 2011

1 commit

bfe868486 filesystems: add set_nlink() ... Browse Code »

Replace remaining direct i_nlink updates with a new set_nlink()
updater function.

Signed-off-by: Miklos Szeredi
Tested-by: Toshiyuki Okajima
Signed-off-by: Christoph Hellwig

Miklos Szeredi
2011-11-02 19:53:43 +0800

01 Jun, 2011

1 commit

03efed8a2 ocfs2: Bugfix for hard readonly mount ... Browse Code »

ocfs2 cannot currently mount a device that is readonly at the media
("hard readonly"). Fix the broken places.
see detail: http://oss.oracle.com/bugzilla/show_bug.cgi?id=1322

[ Description edited -- Joel ]

Signed-off-by: Tiger Yang
Reviewed-by: Sunil Mushran
Signed-off-by: Joel Becker

Tiger Yang
2011-06-01 10:03:44 +0800

29 Mar, 2011

1 commit

99bdc3880 Merge branch 'mlog_replace_for_39' of git://repo.or.cz/taoma-kernel into ocfs2-merge-window-fix Browse Code »

Joel Becker
2011-03-29 00:44:26 +0800

07 Mar, 2011

1 commit

c1e8d35ef ocfs2: Remove EXIT from masklog. ... Browse Code »

mlog_exit is used to record the exit status of a function.
But because it is added in so many functions, if we enable it,
the system logs get filled up quickly and cause too much I/O.
So actually no one can open it for a production system or even
for a test.

This patch just try to remove it or change it. So:
1. if all the error paths already use mlog_errno, it is just removed.
Otherwise, it will be replaced by mlog_errno.
2. if it is used to print some return value, it is replaced with
mlog(0,...).
mlog_exit_ptr is changed to mlog(0.
All those mlog(0,...) will be replaced with trace events later.

Signed-off-by: Tao Ma

Tao Ma
2011-03-07 16:43:21 +0800

21 Feb, 2011

1 commit

ef6b689b6 ocfs2: Remove ENTRY from masklog. ... Browse Code »

ENTRY is used to record the entry of a function.
But because it is added in so many functions, if we enable it,
the system logs get filled up quickly and cause too much I/O.
So actually no one can open it for a production system or even
for a test.

So for mlog_entry_void, we just remove it.
for mlog_entry(...), we replace it with mlog(0,...), and they
will be replace by trace event later.

Signed-off-by: Tao Ma

Tao Ma
2011-02-21 11:10:44 +0800

20 Feb, 2011

1 commit

5bc970e80 ocfs2: Use hrtimer to track ocfs2 fs lock stats ... Browse Code »

Patch makes use of the hrtimer to track times in ocfs2 lock stats.

The patch is a bit involved to ensure no additional impact on the memory
footprint. The size of ocfs2_inode_cache remains 1280 bytes on 32-bit systems.

A related change was to modify the unit of the max wait time from nanosec to
microsec allowing us to track max time larger than 4 secs. This change
necessitated the bumping of the output version in the debugfs file,
locking_state, from 2 to 3.

Signed-off-by: Sunil Mushran
Signed-off-by: Joel Becker

Sunil Mushran
2011-02-20 19:56:07 +0800

11 Sep, 2010

1 commit

5e98d4924 Track negative entries v3 ... Browse Code »

Track negative dentries by recording the generation number of the parent
directory in d_fsdata. The generation number for the parent directory is
recorded in the inode_info, which increments every time the lock on the
directory is dropped.

If the generation number of the parent directory and the negative dentry
matches, there is no need to perform the revalidate, else a revalidate
is forced. This improves performance in situations where nodes look for
the same non-existent file multiple times.

Thanks Mark for explaining the DLM sequence.

Signed-off-by: Goldwyn Rodrigues
Signed-off-by: Joel Becker

Goldwyn Rodrigues
2010-09-11 00:18:15 +0800

20 Jul, 2010

1 commit

33fa1d909 fs/ocfs2: Remove unnecessary casts of private_data ... Browse Code »

Signed-off-by: Joe Perches
Acked-by: Joel Becker
Signed-off-by: Jiri Kosina

Joe Perches
2010-07-20 23:20:08 +0800

22 May, 2010

1 commit

ae4f6ef13 ocfs2: Avoid unnecessary block mapping when refreshing quota info ... Browse Code »

The position of global quota file info does not change. So we do not have
to do logical -> physical block translation every time we reread it from
disk. Thus we can also avoid taking ip_alloc_sem.

Acked-by: Joel Becker
Signed-off-by: Jan Kara

Jan Kara
2010-05-22 01:30:46 +0800

08 Mar, 2010

1 commit

318ae2edc Merge branch 'for-next' into for-linus ... Browse Code »

Conflicts:
Documentation/filesystems/proc.txt
arch/arm/mach-u300/include/mach/debug-macro.S
drivers/net/qlge/qlge_ethtool.c
drivers/net/qlge/qlge_main.c
drivers/net/typhoon.c

Jiri Kosina
2010-03-08 23:55:37 +0800

28 Feb, 2010

1 commit

9b915181a ocfs2: Use a separate masklog for AST and BASTs ... Browse Code »

This patch adds a new masklog and uses it allow tracing ASTs and BASTs
in the dlmglue layer. This has been found to be very useful in debugging
cluster locking issues.

Signed-off-by: Sunil Mushran
Signed-off-by: Joel Becker

Sunil Mushran
2010-02-28 11:57:06 +0800

27 Feb, 2010

3 commits

553b5eb91 ocfs2: Pass the locking protocol into ocfs2_cluster_connect(). ... Browse Code »

Inside the stackglue, the locking protocol structure is hanging off of
the ocfs2_cluster_connection. This takes it one further; the locking
protocol is passed into ocfs2_cluster_connect(). Now different cluster
connections can have different locking protocols with distinct asts.
Note that all locking protocols have to keep their maximum protocol
version in lock-step.

With the protocol structure set in ocfs2_cluster_connect(), there is no
need for the stackglue to have a static pointer to a specific protocol
structure. We can change initialization to only pass in the maximum
protocol version.

Signed-off-by: Joel Becker

Joel Becker
2010-02-27 07:41:17 +0800
c0e413385 ocfs2: Attach the connection to the lksb ... Browse Code »

We're going to want it in the ast functions, so we convert union
ocfs2_dlm_lksb to struct ocfs2_dlm_lksb and let it carry the connection.

Signed-off-by: Joel Becker

Joel Becker
2010-02-27 07:41:14 +0800
a796d2862 ocfs2: Pass lksbs back from stackglue ast/bast functions. ... Browse Code »

The stackglue ast and bast functions tried to maintain the fiction that
their arguments were void pointers. In reality, stack_user.c had to
know that the argument was an ocfs2_lock_res in order to get the status
off of the lksb. That's ugly.

This changes stackglue to always pass the lksb as the argument to ast
and bast functions. The caller can always use container_of() to get the
ocfs2_lock_res or user_dlm_lock_res. The net effect to the caller is
zero. They still get back the lockres in their ast. stackglue gets
cleaner, and now can use the lksb itself.

Signed-off-by: Joel Becker

Joel Becker
2010-02-27 07:41:14 +0800

09 Feb, 2010

1 commit

3ad2f3fbb tree-wide: Assorted spelling fixes ... Browse Code »

In particular, several occurances of funny versions of 'success',
'unknown', 'therefore', 'acknowledge', 'argument', 'achieve', 'address',
'beginning', 'desirable', 'separate' and 'necessary' are fixed.

Signed-off-by: Daniel Mack
Cc: Joe Perches
Cc: Junio C Hamano
Signed-off-by: Jiri Kosina

Daniel Mack
2010-02-09 18:13:56 +0800

04 Feb, 2010

1 commit

079b80578 ocfs2: Plugs race between the dc thread and an unlock ast message ... Browse Code »

This patch plugs a race between the downconvert thread and an unlock ast message.
Specifically, after the downconvert worker has done its task, the dc thread needs
to check whether an unlock ast made the downconvert moot.

Reported-by: David Teigland
Signed-off-by: Sunil Mushran
Acked-by: Mark Fasheh
Signed-off-by: Joel Becker

Sunil Mushran
2010-02-04 09:26:03 +0800

03 Feb, 2010

4 commits

db0f6ce69 ocfs2: Remove overzealous BUG_ON during blocked lock processing ... Browse Code »

During blocked lock processing, we should consider the possibility that the
lock is no longer blocking.

Joel Becker assisted in fixing this issue.

Reported-by: David Teigland
Signed-off-by: Sunil Mushran
Signed-off-by: Joel Becker

Sunil Mushran
2010-02-03 15:51:16 +0800
0d74125a6 ocfs2: Do not downconvert if the lock level is already compatible ... Browse Code »

During upconvert, if the master were to send a BAST, dlmglue will detect the
upconversion in process and send a cancel convert to the master. Upon receiving
the AST for the cancel convert, it will re-process the lock resource to determine
whether it needs downconverting. Say, the up was from PR to EX and the BAST was
for EX. After the cancel convert, it will need to downconvert to NL.

However, if the node was originally upconverting from NL to EX, then there would
be no reason to downconvert (assuming the same message sequence).

This patch makes dlmglue consider the possibility that the current lock level
is already compatible and that downconverting is not required.

Joel Becker assisted in fixing this issue.

Fixes ossbz#1178
http://oss.oracle.com/bugzilla/show_bug.cgi?id=1178

Reported-by: Coly Li
Signed-off-by: Sunil Mushran
Signed-off-by: Joel Becker

Sunil Mushran
2010-02-03 15:51:14 +0800
a19128260 ocfs2: Prevent a livelock in dlmglue ... Browse Code »

There is possibility of a livelock in __ocfs2_cluster_lock(). If a node were
to get an ast for an upconvert request, followed immediately by a bast,
there is a small window where the fs may downconvert the lock before the
process requesting the upconvert is able to take the lock.

This patch adds a new flag to indicate that the upconvert is still in
progress and that the dc thread should not downconvert it right now.

Wengang Wang and Joel Becker
contributed heavily to this patch.

Reported-by: David Teigland
Signed-off-by: Sunil Mushran
Signed-off-by: Joel Becker

Sunil Mushran
2010-02-03 15:51:13 +0800
0b94a909e ocfs2: Fix setting of OCFS2_LOCK_BLOCKED during bast ... Browse Code »

During bast, set the OCFS2_LOCK_BLOCKED flag only if the lock needs to
downconverted.

Signed-off-by: Wengang Wang
Acked-by: Sunil Mushran
Acked-by: Mark Fasheh
Signed-off-by: Joel Becker

Wengang Wang
2010-02-03 15:50:55 +0800

26 Jan, 2010

1 commit

2bd632165 ocfs2/trivial: Remove trailing whitespaces ... Browse Code »

Patch removes trailing whitespaces.

Signed-off-by: Sunil Mushran
Signed-off-by: Joel Becker

Sunil Mushran
2010-01-26 11:20:51 +0800

04 Dec, 2009

1 commit

af901ca18 tree-wide: fix assorted typos all over the place ... Browse Code »

That is "success", "unknown", "through", "performance", "[re|un]mapping"
, "access", "default", "reasonable", "[con]currently", "temperature"
, "channel", "[un]used", "application", "example","hierarchy", "therefore"
, "[over|under]flow", "contiguous", "threshold", "enough" and others.

Signed-off-by: André Goddard Rosa
Signed-off-by: Jiri Kosina

André Goddard Rosa
2009-12-04 22:39:55 +0800

23 Sep, 2009

3 commits

d92bc5127 dlmglue.c: add missed mlog lines ... Browse Code »

This patch adds the missed mlog_exit() and mlog_exit_void() lines when routines
return.

Signed-off-by: Coly Li
Acked-by: Mark Fasheh
Signed-off-by: Joel Becker

Coly Li
2009-09-23 16:54:47 +0800
8dec98edf ocfs2: Add new refcount tree lock resource in dlmglue. ... Browse Code »

refcount tree lock resource is used to protect refcount
tree read/write among multiple nodes.

Signed-off-by: Tao Ma

Tao Ma
2009-09-23 11:09:28 +0800
a43384813 ocfs2: Abstract caching info checkpoint. ... Browse Code »

In meta downconvert, we need to checkpoint the metadata in an inode.
For refcount tree, we also need it. So abstract the process out.

Signed-off-by: Tao Ma

Tao Ma
2009-09-23 11:09:27 +0800

05 Sep, 2009

2 commits

0cf2f7632 ocfs2: Pass struct ocfs2_caching_info to the journal functions. ... Browse Code »

The next step in divorcing metadata I/O management from struct inode is
to pass struct ocfs2_caching_info to the journal functions. Thus the
journal locks a metadata cache with the cache io_lock function. It also
can compare ci_last_trans and ci_created_trans directly.

This is a large patch because of all the places we change
ocfs2_journal_access..(handle, inode, ...) to
ocfs2_journal_access..(handle, INODE_CACHE(inode), ...).

Signed-off-by: Joel Becker

Joel Becker
2009-09-05 07:07:50 +0800
8cb471e8f ocfs2: Take the inode out of the metadata read/write paths. ... Browse Code »

We are really passing the inode into the ocfs2_read/write_blocks()
functions to get at the metadata cache. This commit passes the cache
directly into the metadata block functions, divorcing them from the
inode.

Signed-off-by: Joel Becker

Joel Becker
2009-09-05 07:07:48 +0800

23 Jun, 2009

4 commits

cb25797d4 ocfs2: Add lockdep annotations ... Browse Code »

Add lockdep support to OCFS2. The support also covers all of the cluster
locks except for open locks, journal locks, and local quotafile locks. These
are special because they are acquired for a node, not for a particular process
and lockdep cannot deal with such type of locking.

Signed-off-by: Jan Kara
Signed-off-by: Joel Becker

Jan Kara
2009-06-23 05:34:26 +0800
df152c241 ocfs2: Disable orphan scanning for local and hard-ro mounts ... Browse Code »

Local and Hard-RO mounts do not need orphan scanning.

Signed-off-by: Sunil Mushran
Signed-off-by: Joel Becker

Sunil Mushran
2009-06-23 05:24:55 +0800
3211949f8 ocfs2: Do not initialize lvb in ocfs2_orphan_scan_lock_res_init() ... Browse Code »

We don't access the LVB in our ocfs2_*_lock_res_init() functions.

Since the LVB can become invalid during some cluster recovery
operations, the dlmglue must be able to handle an uninitialized
LVB.

For the orphan scan lock, we initialized an uninitialzed LVB with our
scan sequence number plus one. This starts a normal orphan scan
cycle.

Signed-off-by: Sunil Mushran
Signed-off-by: Joel Becker

Sunil Mushran
2009-06-23 05:24:53 +0800
1c520dfbf ocfs2: Provide the ocfs2_dlm_lvb_valid() stack API. ... Browse Code »

The Lock Value Block (LVB) of a DLM lock can be lost when nodes die and
the DLM cannot reconstruct its state. Clients of the DLM need to know
this.

ocfs2's internal DLM, o2dlm, explicitly zeroes out the LVB when it loses
track of the state. This is not a standard behavior, but ocfs2 has
always relied on it. Thus, an o2dlm LVB is always "valid".

ocfs2 now supports both o2dlm and fs/dlm via the stack glue. When
fs/dlm loses track of an LVBs state, it sets a flag
(DLM_SBF_VALNOTVALID) on the Lock Status Block (LKSB). The contents of
the LVB may be garbage or merely stale.

ocfs2 doesn't want to try to guess at the validity of the stale LVB.
Instead, it should be checking the VALNOTVALID flag. As this is the
'standard' way of treating LVBs, we will promote this behavior.

We add a stack glue API ocfs2_dlm_lvb_valid(). It returns non-zero when
the LVB is valid. o2dlm will always return valid, while fs/dlm will
check VALNOTVALID.

Signed-off-by: Joel Becker
Acked-by: Mark Fasheh

Joel Becker
2009-06-23 05:24:30 +0800

04 Jun, 2009

1 commit

83273932f ocfs2: timer to queue scan of all orphan slots ... Browse Code »

When a dentry is unlinked, the unlinking node takes an EX on the dentry lock
before moving the dentry to the orphan directory. Other nodes that have
this dentry in cache have a PR on the same dentry lock. When the EX is
requested, the other nodes flag the corresponding inode as MAYBE_ORPHANED
during downconvert. The inode is finally deleted when the last node to iput
the inode sees that i_nlink==0 and the MAYBE_ORPHANED flag is set.

A problem arises if a node is forced to free dentry locks because of memory
pressure. If this happens, the node will no longer get downconvert
notifications for the dentries that have been unlinked on another node.
If it also happens that node is actively using the corresponding inode and
happens to be the one performing the last iput on that inode, it will fail
to delete the inode as it will not have the MAYBE_ORPHANED flag set.

This patch fixes this shortcoming by introducing a periodic scan of the
orphan directories to delete such inodes. Care has been taken to distribute
the workload across the cluster so that no one node has to perform the task
all the time.

Signed-off-by: Srinivas Eeda
Signed-off-by: Joel Becker

Srinivas Eeda
2009-06-04 10:14:31 +0800

04 Apr, 2009

1 commit

6ca497a83 ocfs2: fix rare stale inode errors when exporting via nfs ... Browse Code »

For nfs exporting, ocfs2_get_dentry() returns the dentry for fh.
ocfs2_get_dentry() may read from disk when the inode is not in memory,
without any cross cluster lock. this leads to the file system loading a
stale inode.

This patch fixes above problem.

Solution is that in case of inode is not in memory, we get the cluster
lock(PR) of alloc inode where the inode in question is allocated from (this
causes node on which deletion is done sync the alloc inode) before reading
out the inode itsself. then we check the bitmap in the group (the inode in
question allcated from) to see if the bit is clear. if it's clear then it's
stale. if the bit is set, we then check generation as the existing code
does.

We have to read out the inode in question from disk first to know its alloc
slot and allot bit. And if its not stale we read it out using ocfs2_iget().
The second read should then be from cache.

And also we have to add a per superblock nfs_sync_lock to cover the lock for
alloc inode and that for inode in question. this is because ocfs2_get_dentry()
and ocfs2_delete_inode() lock on them in reverse order. nfs_sync_lock is locked
in EX mode in ocfs2_get_dentry() and in PR mode in ocfs2_delete_inode(). so
that mutliple ocfs2_delete_inode() can run concurrently in normal case.

[mfasheh@suse.com: build warning fixes and comment cleanups]
Signed-off-by: Wengang Wang
Acked-by: Joel Becker
Signed-off-by: Mark Fasheh

wengang wang
2009-04-04 02:39:25 +0800