Eric Lee / smarc-fsl-linux-kernel

14 Jan, 2011

1 commit

a17031542 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
rbd: fix cleanup when trying to mount inexistent image
net/ceph: make ceph_msgr_wq non-reentrant
ceph: fsc->*_wq's aren't used in memory reclaim path
ceph: Always free allocated memory in osdmap_decode()
ceph: Makefile: Remove unnessary code
ceph: associate requests with opening sessions
ceph: drop redundant r_mds field
ceph: implement DIRLAYOUTHASH feature to get dir layout from MDS
ceph: add dir_layout to inode

Linus Torvalds
2011-01-14 02:25:24 +0800

13 Jan, 2011

6 commits

01e6acc4e ceph: fsc->*_wq's aren't used in memory reclaim path ... Browse Code »

fsc->*_wq's aren't depended upon during memory reclaim. Convert to
alloc_workqueue() w/o WQ_MEM_RECLAIM.

Signed-off-by: Tejun Heo
Cc: Sage Weil
Cc: ceph-devel@vger.kernel.org
Signed-off-by: Sage Weil

Tejun Heo
2011-01-13 07:15:14 +0800
582c86e69 ceph: Makefile: Remove unnessary code ... Browse Code »

Remove the if and else conditional because the code is in mainline and there
is no need in it being there.

Also, Changed Makefile to use -y instead of -objs
because -objs is deprecated and not mentioned in
Documentation/kbuild/makefiles.txt.

Signed-off-by: Tracey Dent
Signed-off-by: Sage Weil

Tracey Dent
2011-01-13 07:15:13 +0800
dc69e2e9f ceph: associate requests with opening sessions ... Browse Code »

Associate request with sessions that aren't yep open. This makes the
debugfs mdsc request list more informative.

Signed-off-by: Sage Weil

Sage Weil
2011-01-13 07:15:13 +0800
4af25fdda ceph: drop redundant r_mds field ... Browse Code »

The r_mds field is redundant, since we can find the same information at
r_session->s_mds, and when r_session is NULL then r_mds is meaningless.

Signed-off-by: Sage Weil

Sage Weil
2011-01-13 07:15:13 +0800
14303d20f ceph: implement DIRLAYOUTHASH feature to get dir layout from MDS ... Browse Code »

This implements the DIRLAYOUTHASH protocol feature, which passes the dir
layout over the wire from the MDS. This gives the client knowledge
of the correct hash function to use for mapping dentries among dir
fragments.

Note that if this feature is _not_ present on the client but is on the
MDS, the client may misdirect requests. This will result in a forward
and degrade performance. It may also result in inaccurate NFS filehandle
generation, which will prevent fh resolution when the inode is not present
in the client cache and the parent directories have been fragmented.

Signed-off-by: Sage Weil

Sage Weil
2011-01-13 07:15:13 +0800
6c0f3af72 ceph: add dir_layout to inode ... Browse Code »

Add a ceph_dir_layout to the inode, and calculate dentry hash values based
on the parent directory's specified dir_hash function. This is needed
because the old default Linux dcache hash function is extremely week and
leads to a poor distribution of files among dir fragments.

Signed-off-by: Sage Weil

Sage Weil
2011-01-13 07:15:12 +0800

07 Jan, 2011

8 commits

b74c79e99 fs: provide rcu-walk aware permission i_ops ... Browse Code »

Signed-off-by: Nick Piggin

Nick Piggin
2011-01-07 14:50:29 +0800
34286d666 fs: rcu-walk aware d_revalidate method ... Browse Code »

Require filesystems be aware of .d_revalidate being called in rcu-walk
mode (nd->flags & LOOKUP_RCU). For now do a simple push down, returning
-ECHILD from all implementations.

Signed-off-by: Nick Piggin

Nick Piggin
2011-01-07 14:50:29 +0800
fb045adb9 fs: dcache reduce branches in lookup path ... Browse Code »

Reduce some branches and memory accesses in dcache lookup by adding dentry
flags to indicate common d_ops are set, rather than having to check them.
This saves a pointer memory access (dentry->d_op) in common path lookup
situations, and saves another pointer load and branch in cases where we
have d_op but not the particular operation.

Patched with:

git grep -E '[.>]([[:space:]])*d_op([[:space:]])*=' | xargs sed -e 's/\([^\t ]*\)->d_op = \(.*\);/d_set_d_op(\1, \2);/' -e 's/\([^\t ]*\)\.d_op = \(.*\);/d_set_d_op(\&\1, \2);/' -i

Signed-off-by: Nick Piggin

Nick Piggin
2011-01-07 14:50:28 +0800
fa0d7e3de fs: icache RCU free inodes ... Browse Code »

RCU free the struct inode. This will allow:

- Subsequent store-free path walking patch. The inode must be consulted for
permissions when walking, so an RCU inode reference is a must.
- sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
to take i_lock no longer need to take sb_inode_list_lock to walk the list in
the first place. This will simplify and optimize locking.
- Could remove some nested trylock loops in dcache code
- Could potentially simplify things a bit in VM land. Do not need to take the
page lock to follow page->mapping.

The downsides of this is the performance cost of using RCU. In a simple
creat/unlink microbenchmark, performance drops by about 10% due to inability to
reuse cache-hot slab objects. As iterations increase and RCU freeing starts
kicking over, this increases to about 20%.

In cases where inode lifetimes are longer (ie. many inodes may be allocated
during the average life span of a single inode), a lot of this cache reuse is
not applicable, so the regression caused by this patch is smaller.

The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
however this adds some complexity to list walking and store-free path walking,
so I prefer to implement this at a later date, if it is shown to be a win in
real situations. I haven't found a regression in any non-micro benchmark so I
doubt it will be a problem.

Signed-off-by: Nick Piggin

Nick Piggin
2011-01-07 14:50:26 +0800
b5c84bf6f fs: dcache remove dcache_lock ... Browse Code »

dcache_lock no longer protects anything. remove it.

Signed-off-by: Nick Piggin

Nick Piggin
2011-01-07 14:50:23 +0800
2fd6b7f50 fs: dcache scale subdirs ... Browse Code »

Protect d_subdirs and d_child with d_lock, except in filesystems that aren't
using dcache_lock for these anyway (eg. using i_mutex).

Note: if we change the locking rule in future so that ->d_child protection is
provided only with ->d_parent->d_lock, it may allow us to reduce some locking.
But it would be an exception to an otherwise regular locking scheme, so we'd
have to see some good results. Probably not worthwhile.

Signed-off-by: Nick Piggin

Nick Piggin
2011-01-07 14:50:21 +0800
da5029563 fs: dcache scale d_unhashed ... Browse Code »

Protect d_unhashed(dentry) condition with d_lock. This means keeping
DCACHE_UNHASHED bit in synch with hash manipulations.

Signed-off-by: Nick Piggin

Nick Piggin
2011-01-07 14:50:21 +0800
b7ab39f63 fs: dcache scale dentry refcount ... Browse Code »

Make d_count non-atomic and protect it with d_lock. This allows us to ensure a
0 refcount dentry remains 0 without dcache_lock. It is also fairly natural when
we start protecting many other dentry members with d_lock.

Signed-off-by: Nick Piggin

Nick Piggin
2011-01-07 14:50:21 +0800

18 Dec, 2010

2 commits

b6aa5901c ceph: mark user pages dirty on direct-io reads ... Browse Code »

For read operation, we have to set the argument _write_ of get_user_pages
to 1 since we will write data to pages. Also, we need to SetPageDirty before
releasing these pages.

Signed-off-by: Henry C Chang
Signed-off-by: Sage Weil

Henry C Chang
2010-12-18 01:54:40 +0800
92cf76523 ceph: fix null pointer dereference in ceph_init_dentry for nfs reexport ... Browse Code »

The fh_to_dentry etc. methods use ceph_init_dentry(), which assumes that
d_parent is defined. It isn't for those callers, so check!

Signed-off-by: Sage Weil

Sage Weil
2010-12-18 01:53:48 +0800

16 Dec, 2010

1 commit

ab226e21a ceph: fix direct-io on non-page-aligned buffers ... Browse Code »

The user buffer may be 512-byte aligned, not page-aligned. We were
assuming the buffer was page-aligned and only accounting for
non-page-aligned io offsets.

Signed-off-by: Henry C Chang
Signed-off-by: Sage Weil

Henry C Chang
2010-12-16 12:46:16 +0800

07 Dec, 2010

1 commit

1cd275f60 ceph: fix ioctl magic ... Browse Code »

The ioctl magic was inadvertently changed in 571dba52.

Signed-off-by: Sage Weil

Sage Weil
2010-12-07 01:45:22 +0800

02 Dec, 2010

4 commits

a5b10629e ceph: Behave better when handling file lock replies. ... Browse Code »

Fill in the local lock with response data if appropriate,
and don't call posix_lock_file when reading locks.

Signed-off-by: Herb Shiu
Acked-by: Greg Farnum
Signed-off-by: Sage Weil

Herb Shiu
2010-12-02 06:22:34 +0800
637ae8d54 ceph: pass lock information by struct file_lock instead of as individual params. ... Browse Code »

Signed-off-by: Herb Shiu
Acked-by: Greg Farnum
Signed-off-by: Sage Weil

Herb Shiu
2010-12-02 06:22:34 +0800
25933abdd ceph: Handle file locks in replies from the MDS. ... Browse Code »

Previously the kernel client incorrectly assumed everything was a directory.

Signed-off-by: Herb Shiu
Acked-by: Greg Farnum
Signed-off-by: Sage Weil

Herb Shiu
2010-12-02 06:22:27 +0800
884ea8927 ceph: avoid possible null deref in readdir after dir llseek ... Browse Code »

last may be NULL, but we dereference it in the else branch without
checking. Normally it doesn't trigger because last == NULL when fpos == 2,
but it could happen on a newly opened dir if the user seeks forward.

Reported-by: Dan Carpenter
Signed-off-by: Sage Weil

Sage Weil
2010-12-02 06:15:31 +0800

20 Nov, 2010

1 commit

76db8ac45 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
ceph: fix readdir EOVERFLOW on 32-bit archs
ceph: fix frag offset for non-leftmost frags
ceph: fix dangling pointer
ceph: explicitly specify page alignment in network messages
ceph: make page alignment explicit in osd interface
ceph: fix comment, remove extraneous args
ceph: fix update of ctime from MDS
ceph: fix version check on racing inode updates
ceph: fix uid/gid on resent mds requests
ceph: fix rdcache_gen usage and invalidate
ceph: re-request max_size if cap auth changes
ceph: only let auth caps update max_size
ceph: fix open for write on clustered mds
ceph: fix bad pointer dereference in ceph_fill_trace
ceph: fix small seq message skipping
Revert "ceph: update issue_seq on cap grant"

Linus Torvalds
2010-11-20 07:32:22 +0800

19 Nov, 2010

1 commit

3105c19c4 ceph: fix readdir EOVERFLOW on 32-bit archs ... Browse Code »

One of the readdir filldir_t callers was passing the raw ceph 64-bit ino
instead of the hashed 32-bit one, producing an EOVERFLOW in the filler
callback. Fix this by calling the ceph_vino_to_ino() helper to do the
conversion.

Reported-by: Jan Smets
Tested-by: Jan Smets
Signed-off-by: Sage Weil

Sage Weil
2010-11-19 01:15:07 +0800

18 Nov, 2010

1 commit

451a3c24b BKL: remove extraneous #include <smp_lock.h> ... Browse Code »

The big kernel lock has been removed from all these files at some point,
leaving only the #include.

Remove this too as a cleanup.

Signed-off-by: Arnd Bergmann
Signed-off-by: Linus Torvalds

Arnd Bergmann
2010-11-18 00:59:32 +0800

12 Nov, 2010

2 commits

7b88dadc1 ceph: fix frag offset for non-leftmost frags ... Browse Code »

We start at offset 2 for the leftmost frag, and 0 for subsequent frags.
When we reach the end (rightmost), we go back to 2. This fixes readdir on
fragmented (large) directories.

Signed-off-by: Sage Weil

Sage Weil
2010-11-12 08:48:59 +0800
a1629c3b2 ceph: fix dangling pointer ... Browse Code »

Clear fi->last_name when it's freed. The only caller is rewinddir() (or
equivalent lseek).

Signed-off-by: Sage Weil

Sage Weil
2010-11-12 07:24:06 +0800

10 Nov, 2010

2 commits

b7495fc2f ceph: make page alignment explicit in osd interface ... Browse Code »

We used to infer alignment of IOs within a page based on the file offset,
which assumed they matched. This broke with direct IO that was not aligned
to pages (e.g., 512-byte aligned IO). We were also trusting the alignment
specified in the OSD reply, which could have been adjusted by the server.

Explicitly specify the page alignment when setting up OSD IO requests.

Signed-off-by: Sage Weil

Sage Weil
2010-11-10 04:43:12 +0800
e98b6fed8 ceph: fix comment, remove extraneous args ... Browse Code »

The offset/length arguments aren't used.

Signed-off-by: Sage Weil

Sage Weil
2010-11-10 04:24:53 +0800

09 Nov, 2010

2 commits

d8672d64b ceph: fix update of ctime from MDS ... Browse Code »

The client can have a newer ctime than the MDS due to AUTH_EXCL and
XATTR_EXCL caps as well; update the check in ceph_fill_file_time
appropriately.

This fixes cases where ctime/mtime goes backward under the right sequence
of local updates (e.g. chmod) and mds replies (e.g. subsequent stat that
goes to the MDS).

Signed-off-by: Sage Weil

Sage Weil
2010-11-09 01:24:34 +0800
8bd59e018 ceph: fix version check on racing inode updates ... Browse Code »

We may get updates on the same inode from multiple MDSs; generally we only
pay attention if the update is newer than what we already have. The
exception is when an MDS sense unstable information, in which case we
always update.

The old > check got this wrong when our version was odd (e.g. 3) and the
reply version was even (e.g. 2): the older stale (v2) info would be
applied. Fixed and clarified the comment.

Signed-off-by: Sage Weil

Sage Weil
2010-11-09 01:23:12 +0800

08 Nov, 2010

6 commits

cb4276cca ceph: fix uid/gid on resent mds requests ... Browse Code »

MDS requests can be rebuilt and resent in non-process context, but were
filling in uid/gid from current_fsuid/gid. Put that information in the
request struct on request setup.

This fixes incorrect (and root) uid/gid getting set for requests that
are forwarded between MDSs, usually due to metadata migrations.

Signed-off-by: Sage Weil

Sage Weil
2010-11-08 23:29:05 +0800
cd045cb42 ceph: fix rdcache_gen usage and invalidate ... Browse Code »

We used to use rdcache_gen to indicate whether we "might" have cached
pages. Now we just look at the mapping to determine that. However, some
old behavior remains from that transition.

First, rdcache_gen == 0 no longer means we have no pages. That can happen
at any time (presumably when we carry FILE_CACHE). We should not reset it
to zero, and we should not check that it is zero.

That means that the only purpose for rdcache_revoking is to resolve races
between new issues of FILE_CACHE and an async invalidate. If they are
equal, we should invalidate. On success, we decrement rdcache_revoking,
so that it is no longer equal to rdcache_gen. Similarly, if we success
in doing a sync invalidate, set revoking = gen - 1. (This is a small
optimization to avoid doing unnecessary invalidate work and does not
affect correctness.)

Signed-off-by: Sage Weil

Sage Weil
2010-11-08 23:29:05 +0800
feb4cc9bb ceph: re-request max_size if cap auth changes ... Browse Code »

If the auth cap migrates to another MDS, clear requested_max_size so that
we resend any pending max_size increase requests. This fixes potential
hangs on writes that extend a file and race with an cap migration between
MDSs.

Signed-off-by: Sage Weil

Sage Weil
2010-11-08 01:39:23 +0800
912a9b031 ceph: only let auth caps update max_size ... Browse Code »

Only the auth MDS has a meaningful max_size value for us, so only update it
in fill_inode if we're being issued an auth cap. Otherwise, a random
stat result from a non-auth MDS can clobber a meaningful max_size, get
the clientmds cap state out of sync, and make writes hang.

Specifically, even if the client re-requests a larger max_size (which it
will), the MDS won't respond because as far as it knows we already have a
sufficiently large value.

Signed-off-by: Sage Weil

Sage Weil
2010-11-08 01:39:21 +0800
7421ab804 ceph: fix open for write on clustered mds ... Browse Code »

Normally when we open a file we already have a cap, and simply update the
wanted set. However, if we open a file for write, but don't have an auth
cap, that doesn't work; we need to open a new cap with the auth MDS. Only
reuse existing caps if we are opening for read or the existing cap is auth.

Signed-off-by: Sage Weil

Sage Weil
2010-11-08 01:07:15 +0800
d8b16b3d1 ceph: fix bad pointer dereference in ceph_fill_trace ... Browse Code »

We dereference *in a few lines down, but only set it on rename. It is
apparently pretty rare for this to trigger, but I have been hitting it
with a clustered MDSs.

Signed-off-by: Sage Weil

Sage Weil
2010-11-08 00:40:43 +0800

29 Oct, 2010

1 commit

a7f9fb205 convert ceph ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2010-10-29 16:17:18 +0800

28 Oct, 2010

1 commit

2f56f56ad Revert "ceph: update issue_seq on cap grant" ... Browse Code »

This reverts commit d91f2438d881514e4a923fd786dbd94b764a9440.

The intent of issue_seq is to distinguish between mds->client messages that
(re)create the cap and those that do not, which means we should _only_ be
updating that value in the create paths. By updating it in handle_cap_grant,
we reset it to zero, which then breaks release.

The larger question is what workload/problem made me think it should be
updated here...

Signed-off-by: Sage Weil

Sage Weil
2010-10-28 12:05:54 +0800