Eric Lee / smarc-fsl-linux-kernel

08 Dec, 2011

1 commit

be655596b ceph: use i_ceph_lock instead of i_lock ... Browse Code »

We have been using i_lock to protect all kinds of data structures in the
ceph_inode_info struct, including lists of inodes that we need to iterate
over while avoiding races with inode destruction. That requires grabbing
a reference to the inode with the list lock protected, but igrab() now
takes i_lock to check the inode flags.

Changing the list lock ordering would be a painful process.

However, using a ceph-specific i_ceph_lock in the ceph inode instead of
i_lock is a simple mechanical change and avoids the ordering constraints
imposed by igrab().

Reported-by: Amon Ott
Signed-off-by: Sage Weil

Sage Weil
2011-12-08 02:46:44 +0800

27 Jul, 2011

2 commits

41b02e1f9 ceph: explicitly reference rename old_dentry parent dir in request ... Browse Code »

We carry a pin on the parent directory for the rename source and dest
dentries. For the source it's r_locked_dir; we need to explicitly
reference the old_dentry parent as well, since the dentry's d_parent may
change between when the request was created and pinned and when it is
freed.

Reviewed-by: Yehuda Sadeh
Signed-off-by: Sage Weil

Sage Weil
2011-07-27 02:31:14 +0800
2f90b852e ceph: ignore lease mask ... Browse Code »

The lease mask is no longer used (and it changed a while back). Instead,
use a non-zero duration to indicate that there is a lease being issued.

Reviewed-by: Yehuda Sadeh
Signed-off-by: Sage Weil

Sage Weil
2011-07-27 02:28:25 +0800

25 May, 2011

1 commit

db3540522 ceph: fix cap flush race reentrancy ... Browse Code »

In e9964c10 we change cap flushing to do a delicate dance because some
inodes on the cap_dirty list could be in a migrating state (got EXPORT but
not IMPORT) in which we couldn't actually flush and move from
dirty->flushing, breaking the while (!empty) { process first } loop
structure. It worked for a single sync thread, but was not reentrant and
triggered infinite loops when multiple syncers came along.

Instead, move inodes with dirty to a separate cap_dirty_migrating list
when in the limbo export-but-no-import state, allowing us to go back to
the simple loop structure (which was reentrant). This is cleaner and more
robust.

Audited the cap_dirty users and this looks fine:
list_empty(&ci->i_dirty_item) is still a reliable indicator of whether we
have dirty caps (which list we're on is irrelevant) and list_del_init()
calls still do the right thing.

Signed-off-by: Sage Weil

Sage Weil
2011-05-25 02:52:12 +0800

13 Jan, 2011

2 commits

4af25fdda ceph: drop redundant r_mds field ... Browse Code »

The r_mds field is redundant, since we can find the same information at
r_session->s_mds, and when r_session is NULL then r_mds is meaningless.

Signed-off-by: Sage Weil

Sage Weil
2011-01-13 07:15:13 +0800
14303d20f ceph: implement DIRLAYOUTHASH feature to get dir layout from MDS ... Browse Code »

This implements the DIRLAYOUTHASH protocol feature, which passes the dir
layout over the wire from the MDS. This gives the client knowledge
of the correct hash function to use for mapping dentries among dir
fragments.

Note that if this feature is _not_ present on the client but is on the
MDS, the client may misdirect requests. This will result in a forward
and degrade performance. It may also result in inaccurate NFS filehandle
generation, which will prevent fh resolution when the inode is not present
in the client cache and the parent directories have been fragmented.

Signed-off-by: Sage Weil

Sage Weil
2011-01-13 07:15:13 +0800

02 Dec, 2010

1 commit

25933abdd ceph: Handle file locks in replies from the MDS. ... Browse Code »

Previously the kernel client incorrectly assumed everything was a directory.

Signed-off-by: Herb Shiu
Acked-by: Greg Farnum
Signed-off-by: Sage Weil

Herb Shiu
2010-12-02 06:22:27 +0800

08 Nov, 2010

1 commit

cb4276cca ceph: fix uid/gid on resent mds requests ... Browse Code »

MDS requests can be rebuilt and resent in non-process context, but were
filling in uid/gid from current_fsuid/gid. Put that information in the
request struct on request setup.

This fixes incorrect (and root) uid/gid getting set for requests that
are forwarded between MDSs, usually due to metadata migrations.

Signed-off-by: Sage Weil

Sage Weil
2010-11-08 23:29:05 +0800

21 Oct, 2010

1 commit

3d14c5d2b ceph: factor out libceph from Ceph file system ... Browse Code »

This factors out protocol and low-level storage parts of ceph into a
separate libceph module living in net/ceph and include/linux/ceph. This
is mostly a matter of moving files around. However, a few key pieces
of the interface change as well:

- ceph_client becomes ceph_fs_client and ceph_client, where the latter
captures the mon and osd clients, and the fs_client gets the mds client
and file system specific pieces.
- Mount option parsing and debugfs setup is correspondingly broken into
two pieces.
- The mon client gets a generic handler callback for otherwise unknown
messages (mds map, in this case).
- The basic supported/required feature bits can be expanded (and are by
ceph_fs_client).

No functional change, aside from some subtle error handling cases that got
cleaned up in the refactoring process.

Signed-off-by: Sage Weil

Yehuda Sadeh
2010-10-21 06:37:28 +0800

23 Aug, 2010

1 commit

f3c60c591 ceph: fix multiple mds session shutdown ... Browse Code »

The use of a completion when waiting for session shutdown during umount is
inappropriate, given the complexity of the condition. For multiple MDS's,
this resulted in the umount thread spinning, often preventing the session
close message from being processed in some cases.

Switch to a waitqueue and defined a condition helper. This cleans things
up nicely.

Signed-off-by: Sage Weil

Sage Weil
2010-08-23 06:04:43 +0800

02 Aug, 2010

4 commits

e55b71f80 ceph: handle ESTALE properly; on receipt send to authority if it wasn't ... Browse Code »

Signed-off-by: Greg Farnum
Signed-off-by: Sage Weil

Greg Farnum
2010-08-02 11:11:41 +0800
154f42c2c ceph: connect to export targets on cap export ... Browse Code »

When we get a cap EXPORT message, make sure we are connected to all export
targets to ensure we can handle the matching IMPORT.

Signed-off-by: Sage Weil

Sage Weil
2010-08-02 11:11:41 +0800
37151668b ceph: do caps accounting per mds_client ... Browse Code »

Caps related accounting is now being done per mds client instead
of just being global. This prepares ground work for a later revision
of the caps preallocated reservation list.

Signed-off-by: Yehuda Sadeh
Signed-off-by: Sage Weil

Yehuda Sadeh
2010-08-02 11:11:40 +0800
ee6b272b9 ceph: drop unused argument ... Browse Code »

Signed-off-by: Sage Weil

Sage Weil
2010-08-02 11:11:39 +0800

17 Jul, 2010

1 commit

e979cf503 ceph: do not include cap/dentry releases in replayed messages ... Browse Code »

Strip the cap and dentry releases from replayed messages. They can
cause the shared state to get out of sync because they were generated
(with the request message) earlier, and no longer reflect the current
client state.

Signed-off-by: Sage Weil

Sage Weil
2010-07-17 01:30:18 +0800

11 Jun, 2010

2 commits

2b2300d62 ceph: try to send partial cap release on cap message on missing inode ... Browse Code »

If we have enough memory to allocate a new cap release message, do so, so
that we can send a partial release message immediately. This keeps us from
making the MDS wait when the cap release it needs is in a partially full
release message.

If we fail because of ENOMEM, oh well, they'll just have to wait a bit
longer.

Signed-off-by: Sage Weil

Sage Weil
2010-06-11 04:30:25 +0800
3d7ded4d8 ceph: release cap on import if we don't have the inode ... Browse Code »

If we get an IMPORT that give us a cap, but we don't have the inode, queue
a release (and try to send it immediately) so that the MDS doesn't get
stuck waiting for us.

Signed-off-by: Sage Weil

Sage Weil
2010-06-11 04:30:07 +0800

18 May, 2010

3 commits

167c9e352 ceph: use common helper for aborted dir request invalidation ... Browse Code »

We invalidate I_COMPLETE and dentry leases in two places: on aborted mds
request and on request replay. Use common helper to avoid duplicate code.

Signed-off-by: Sage Weil

Sage Weil
2010-05-18 06:25:40 +0800
b4556396f ceph: fix race between aborted requests and fill_trace ... Browse Code »

When we abort requests we need to prevent fill_trace et al from doing
anything that relies on locks held by the VFS caller. This fixes a race
between the reply handler and the abort code, ensuring that continue
holding the dir mutex until the reply handler completes.

Signed-off-by: Sage Weil

Sage Weil
2010-05-18 01:25:45 +0800
e1518c7c0 ceph: clean up mds reply, error handling ... Browse Code »

We would occasionally BUG out in the reply handler because r_reply was
nonzero, due to a race with ceph_mdsc_do_request temporarily setting
r_reply to an ERR_PTR value. This is unnecessary, messy, and also wrong
in the EIO case.

Clean up by consistently using r_err for errors and r_reply for messages.
Also fix the abort logic to trigger consistently for all errors that return
to the caller early (e.g., EIO from timeout case). If an abort races with
a reply, use the result from the reply.

Also fix locking for r_err, r_reply update in the reply handler.

Signed-off-by: Sage Weil

Sage Weil
2010-05-18 01:25:44 +0800

18 Feb, 2010

1 commit

7c1332b8c ceph: fix iterate_caps removal race ... Browse Code »

We need to be able to iterate over all caps on a session with a
possibly slow callback on each cap. To allow this, we used to
prevent cap reordering while we were iterating. However, we were
not safe from races with removal: removing the 'next' cap would
make the next pointer from list_for_each_entry_safe be invalid,
and cause a lock up or similar badness.

Instead, we keep an iterator pointer in the session pointing to
the current cap. As before, we avoid reordering. For removal,
if the cap isn't the current cap we are iterating over, we are
fine. If it is, we clear cap->ci (to mark the cap as pending
removal) but leave it in the session list. In iterate_caps, we
can safely finish removal and get the next cap pointer.

While we're at it, clean up put_cap to not take a cap reservation
context, as it was never used.

Signed-off-by: Sage Weil

Sage Weil
2010-02-18 02:02:47 +0800

17 Feb, 2010

2 commits

a105f00cf ceph: use rbtree for snap_realms ... Browse Code »

Switch from radix tree to rbtree for snap realms. This is much more
appropriate given that realm keys are few and far between.

Signed-off-by: Sage Weil

Sage Weil
2010-02-17 14:01:09 +0800
44ca18f26 ceph: use rbtree for mds requests ... Browse Code »

The rbtree is a more appropriate data structure than a radix_tree. It
avoids extra memory usage and simplifies the code.

It also fixes a bug where the debugfs 'mdsc' file wasn't including the
most recent mds request.

Signed-off-by: Sage Weil

Sage Weil
2010-02-17 14:01:08 +0800

26 Jan, 2010

1 commit

5b1daecd5 ceph: properly handle aborted mds requests ... Browse Code »

Previously, if the MDS request was interrupted, we would unregister the
request and ignore any reply. This could cause the caps or other cache
state to become out of sync. (For instance, aborting dbench and doing
rm -r on clients would complain about a non-empty directory because the
client didn't realize it's aborted file create request completed.)

Even we don't unregister, we still can't process the reply normally because
we are no longer holding the caller's locks (like the dir i_mutex).

So, mark aborted operations with r_aborted, and in the reply handler, be
sure to process all the caps. Do not process the namespace changes,
though, since we no longer will hold the dir i_mutex. The dentry lease
state can also be ignored as it's more forgiving.

Signed-off-by: Sage Weil

Sage Weil
2010-01-26 03:49:51 +0800

24 Dec, 2009

1 commit

5dacf0912 ceph: do not touch_caps while iterating over caps list ... Browse Code »

Avoid confusing iterate_session_caps(), flag the session while we are
iterating so that __touch_cap does not rearrange items on the list.

All other modifiers of session->s_caps do so under the protection of
s_mutex.

Signed-off-by: Sage Weil

Sage Weil
2009-12-24 00:17:14 +0800

08 Dec, 2009

1 commit

153c8e6bf ceph: use kref for struct ceph_mds_request ... Browse Code »

Signed-off-by: Sage Weil

Sage Weil
2009-12-08 04:31:09 +0800

19 Nov, 2009

2 commits

4e7a5dcd1 ceph: negotiate authentication protocol; implement AUTH_NONE protocol ... Browse Code »

When we open a monitor session, we send an initial AUTH message listing
the auth protocols we support, our entity name, and (possibly) a previously
assigned global_id. The monitor chooses a protocol and responds with an
initial message.

Initially implement AUTH_NONE, a dummy protocol that provides no security,
but works within the new framework. It generates 'authorizers' that are
used when connecting to (mds, osd) services that simply state our entity
name and global_id.

This is a wire protocol change.

Signed-off-by: Sage Weil

Sage Weil
2009-11-19 08:19:57 +0800
5f44f1426 ceph: handle errors during osd client init ... Browse Code »

Unwind initializing if we get ENOMEM during client initialization.

Signed-off-by: Sage Weil

Sage Weil
2009-11-19 07:02:36 +0800

13 Nov, 2009

1 commit

039934b89 ceph: build cleanly without CONFIG_DEBUG_FS ... Browse Code »

Signed-off-by: Sage Weil

Sage Weil
2009-11-13 07:56:51 +0800

11 Nov, 2009

1 commit

cdac83031 ceph: remove recon_gen logic ... Browse Code »

We don't get an explicit affirmative confirmation that our caps reconnect,
nor do we necessarily want to pay that cost. So, take all this code out
for now.

Signed-off-by: Sage Weil

Sage Weil
2009-11-11 08:03:53 +0800

10 Nov, 2009

1 commit

685f9a5d1 ceph: do not confuse stale and dead (unreconnected) caps ... Browse Code »

We were using the cap_gen to track both stale caps (caps that timed out
due to temporarily losing touch with the mds) and dead caps that did not
reconnect after an MDS failure. Introduce a recon_gen counter to track
reconnections to restarted MDSs and kill dead caps based on that instead.

Rename gen to cap_gen while we're at it to make it more clear which is
which.

Signed-off-by: Sage Weil

Sage Weil
2009-11-10 04:06:07 +0800

07 Oct, 2009

1 commit

2f2dc0534 ceph: MDS client ... Browse Code »

The MDS (metadata server) client is responsible for submitting
requests to the MDS cluster and parsing the response. We decide which
MDS to submit each request to based on cached information about the
current partition of the directory hierarchy across the cluster. A
stateful session is opened with each MDS before we submit requests to
it, and a mutex is used to control the ordering of messages within
each session.

An MDS request may generate two responses. The first indicates the
operation was a success and returns any result. A second reply is
sent when the operation commits to disk. Note that locking on the MDS
ensures that the results of updates are visible only to the updating
client before the operation commits. Requests are linked to the
containing directory so that an fsync will wait for them to commit.

If an MDS fails and/or recovers, we resubmit requests as needed. We
also reconnect existing capabilities to a recovering MDS to
reestablish that shared session state. Old dentry leases are
invalidated.

Signed-off-by: Sage Weil

Sage Weil
2009-10-07 02:31:09 +0800