Eric Lee / linux-smarc-t335x-v3.2

10 Sep, 2011

1 commit

0d20fbbe8 Merge branch 'for-linus' of git://ceph.newdream.net/git/ceph-client ... Browse Code »

* 'for-linus' of git://ceph.newdream.net/git/ceph-client:
libceph: fix leak of osd structs during shutdown
ceph: fix memory leak
ceph: fix encoding of ino only (not relative) paths
libceph: fix msgpool

Linus Torvalds
2011-09-10 06:48:34 +0800

23 Aug, 2011

1 commit

259a187ad ceph: fix memory leak ... Browse Code »

kfree does not clean up indirect allocations in
ceph_fs_client and ceph_options (e.g. snapdir_name).

Signed-off-by: Noah Watkins
Signed-off-by: Sage Weil

Noah Watkins
2011-08-23 04:06:59 +0800

16 Aug, 2011

1 commit

795858dbd ceph: fix encoding of ino only (not relative) paths ... Browse Code »

A 'path' consists of a starting ino and relative component. Encode even
when there is no relative component. This is primarily needed by the
NFS reexport code.

Signed-off-by: Sage Weil

Sage Weil
2011-08-16 04:03:56 +0800

27 Jul, 2011

21 commits

ba5b56cb3 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (23 commits)
ceph: document unlocked d_parent accesses
ceph: explicitly reference rename old_dentry parent dir in request
ceph: document locking for ceph_set_dentry_offset
ceph: avoid d_parent in ceph_dentry_hash; fix ceph_encode_fh() hashing bug
ceph: protect d_parent access in ceph_d_revalidate
ceph: protect access to d_parent
ceph: handle racing calls to ceph_init_dentry
ceph: set dir complete frag after adding capability
rbd: set blk_queue request sizes to object size
ceph: set up readahead size when rsize is not passed
rbd: cancel watch request when releasing the device
ceph: ignore lease mask
ceph: fix ceph_lookup_open intent usage
ceph: only link open operations to directory unsafe list if O_CREAT|O_TRUNC
ceph: fix bad parent_inode calc in ceph_lookup_open
ceph: avoid carrying Fw cap during write into page cache
libceph: don't time out osd requests that haven't been received
ceph: report f_bfree based on kb_avail rather than diffing.
ceph: only queue capsnap if caps are dirty
ceph: fix snap writeback when racing with writes
...

Linus Torvalds
2011-07-27 04:38:50 +0800
d79698da3 ceph: document unlocked d_parent accesses ... Browse Code »

For the most part we don't care about racing with rename when directing
MDS requests; either the old or new parent is fine. Document that, and
do some minor cleanup.

Reviewed-by: Yehuda Sadeh
Signed-off-by: Sage Weil

Sage Weil
2011-07-27 02:31:26 +0800
41b02e1f9 ceph: explicitly reference rename old_dentry parent dir in request ... Browse Code »

We carry a pin on the parent directory for the rename source and dest
dentries. For the source it's r_locked_dir; we need to explicitly
reference the old_dentry parent as well, since the dentry's d_parent may
change between when the request was created and pinned and when it is
freed.

Reviewed-by: Yehuda Sadeh
Signed-off-by: Sage Weil

Sage Weil
2011-07-27 02:31:14 +0800
4f1772645 ceph: document locking for ceph_set_dentry_offset ... Browse Code »

Reviewed-by: Yehuda Sadeh
Signed-off-by: Sage Weil

Sage Weil
2011-07-27 02:31:08 +0800
e5f86dc37 ceph: avoid d_parent in ceph_dentry_hash; fix ceph_encode_fh() hashing bug ... Browse Code »

Have caller pass in a safely-obtained reference to the parent directory
for calculating a dentry's hash valud.

While we're here, simpify the flow through ceph_encode_fh() so that there
is a single exit point and cleanup.

Also fix a bug with the dentry hash calculation: calculate the hash for the
dentry we were given, not its parent.

Reviewed-by: Yehuda Sadeh
Signed-off-by: Sage Weil

Sage Weil
2011-07-27 02:30:55 +0800
bf1c6aca9 ceph: protect d_parent access in ceph_d_revalidate ... Browse Code »

Protect d_parent with d_lock. Carry a reference. Simplify the flow so
that there is a single exit point and cleanup.

Reviewed-by: Yehuda Sadeh
Signed-off-by: Sage Weil

Sage Weil
2011-07-27 02:30:43 +0800
5f21c96dd ceph: protect access to d_parent ... Browse Code »

d_parent is protected by d_lock: use it when looking up a dentry's parent
directory inode. Also take a reference and drop it in the caller to avoid
a use-after-free.

Reported-by: Al Viro
Reviewed-by: Yehuda Sadeh
Signed-off-by: Sage Weil

Sage Weil
2011-07-27 02:30:29 +0800
48d0cbd12 ceph: handle racing calls to ceph_init_dentry ... Browse Code »

The ->lookup() and prepopulate_readdir() callers are working with unhashed
dentries, so we don't have to worry. The export.c callers, though, need
to initialize something they got back from d_obtain_alias() and are
potentially racing with other callers. Make sure we don't return unless
the dentry is properly initialized (by us or someone else).

Reported-by: Al Viro
Reviewed-by: Yehuda Sadeh
Signed-off-by: Sage Weil

Sage Weil
2011-07-27 02:30:15 +0800
dfabbed6f ceph: set dir complete frag after adding capability ... Browse Code »

Curretly ceph_add_cap clears the complete bit if we are newly issued the
FILE_SHARED cap, which is normally the case for a newly issue cap on a new
directory. That means we clear the just-set bit. Move the check that sets
the flag to after the cap is added/updated.

Reviewed-by: Yehuda Sadeh
Signed-off-by: Sage Weil

Sage Weil
2011-07-27 02:30:02 +0800
e98522274 ceph: set up readahead size when rsize is not passed ... Browse Code »

This should improve the default read performance, as without it
readahead is practically disabled.

Signed-off-by: Yehuda Sadeh

Yehuda Sadeh
2011-07-27 02:29:14 +0800
2f90b852e ceph: ignore lease mask ... Browse Code »

The lease mask is no longer used (and it changed a while back). Instead,
use a non-zero duration to indicate that there is a lease being issued.

Reviewed-by: Yehuda Sadeh
Signed-off-by: Sage Weil

Sage Weil
2011-07-27 02:28:25 +0800
468640e32 ceph: fix ceph_lookup_open intent usage ... Browse Code »

We weren't properly calling lookup_instantiate_filp when setting up the
lookup intent, which could lead to file leakage on errors. So:

- use separate helper for the hidden snapdir translation, immediately
following the mds request
- use ceph_finish_lookup for the final dentry/return value dance in the
exit path
- lookup_instantiate_filp on success

Reported-by: Al Viro
Reviewed-by: Yehuda Sadeh
Signed-off-by: Sage Weil

Sage Weil
2011-07-27 02:28:11 +0800
9bae113a0 ceph: only link open operations to directory unsafe list if O_CREAT|O_TRUNC ... Browse Code »

We only need to put these on the directory unsafe list if they have
side effects that fsync(2) should flush out.

Reviewed-by: Yehuda Sadeh
Signed-off-by: Sage Weil

Sage Weil
2011-07-27 02:27:59 +0800
acda76578 ceph: fix bad parent_inode calc in ceph_lookup_open ... Browse Code »

We were always getting NULL here because the intent file f_dentry is always
NULL at this point, which means we were always passing NULL to
ceph_mdsc_do_request. In reality, this was fine, since this isn't
currently ever a write operation that needs to get strung on the dir's
unsafe list.

Use the dir explicitly, and only pass it if this open has side-effects that
a dir fsync should flush.

Reviewed-by: Yehuda Sadeh
Signed-off-by: Sage Weil

Sage Weil
2011-07-27 02:27:48 +0800
d8de9ab63 ceph: avoid carrying Fw cap during write into page cache ... Browse Code »

The generic_file_aio_write call may block on balance_dirty_pages while we
flush data to the OSDs. If we hold a reference to the FILE_WR cap during
that interval revocation by the MDS (e.g., to do a stat(2)) may be very
slow.

Reviewed-by: Yehuda Sadeh
Signed-off-by: Sage Weil

Sage Weil
2011-07-27 02:27:34 +0800
8f04d4227 ceph: report f_bfree based on kb_avail rather than diffing. ... Browse Code »

Reviewed-by: Yehuda Sadeh
Signed-off-by: Greg Farnum

Greg Farnum
2011-07-27 02:27:06 +0800
e77dc3e9c ceph: only queue capsnap if caps are dirty ... Browse Code »

We used to go into this branch if i_wrbuffer_ref_head was non-zero. This
was an ancient check from before we were careful about dealing with all
kinds of caps (and not just dirty pages). It is cleaner to only queue a
capsnap if there is an actual dirty cap. If we are racing with...
something...we will end up here with ci->i_wrbuffer_refs but no dirty
caps.

Reviewed-by: Yehuda Sadeh
Signed-off-by: Sage Weil

Sage Weil
2011-07-27 02:26:41 +0800
af0ed569d ceph: fix snap writeback when racing with writes ... Browse Code »

There are two problems that come up when we try to queue a capsnap while a
write is in progress:

- The FILE_WR cap is held, but not yet dirty, so we may queue a capsnap
with dirty == 0. That will crash later in __ceph_flush_snaps(). Or
on the FILE_WR cap if a write is in progress.
- We may not have i_head_snapc set, which causes problems pretty quickly.
Look to the snaprealm in this case.

Reviewed-by: Yehuda Sadeh
Signed-off-by: Sage Weil

Sage Weil
2011-07-27 02:26:31 +0800
9cfa1098d ceph: use flag bit for at_end readdir flag ... Browse Code »

This saves us a word of memory per file.

Reviewed-by: Yehuda Sadeh
Signed-off-by: Sage Weil

Sage Weil
2011-07-27 02:26:18 +0800
4918b6d14 ceph: add F_SYNC file flag to force sync (non-O_DIRECT) io ... Browse Code »

This allows us to force IO through the sync path which you normally only
get when multiple clients are reading/writing to the same file or by
mounting with -o sync. Among other things, this lets test programs verify
correctness with a single mount.

Reviewed-by: Yehuda Sadeh
Signed-off-by: Sage Weil

Sage Weil
2011-07-27 02:26:07 +0800
252c6728d ceph: add flags field to file_info ... Browse Code »

Reviewed-by: Yehuda Sadeh
Signed-off-by: Sage Weil

Sage Weil
2011-07-27 02:25:27 +0800

21 Jul, 2011

3 commits

02c24a821 fs: push i_mutex and filemap_write_and_wait down into ->fsync() handlers ... Browse Code »

Btrfs needs to be able to control how filemap_write_and_wait_range() is called
in fsync to make it less of a painful operation, so push down taking i_mutex and
the calling of filemap_write_and_wait() down into the ->fsync() handlers. Some
file systems can drop taking the i_mutex altogether it seems, like ext3 and
ocfs2. For correctness sake I just pushed everything down in all cases to make
sure that we keep the current behavior the same for everybody, and then each
individual fs maintainer can make up their mind about what to do from there.
Thanks,

Acked-by: Jan Kara
Signed-off-by: Josef Bacik
Signed-off-by: Al Viro

Josef Bacik
2011-07-21 08:47:59 +0800
06222e491 fs: handle SEEK_HOLE/SEEK_DATA properly in all fs's that define their own llseek ... Browse Code »
3

This converts everybody to handle SEEK_HOLE/SEEK_DATA properly. In some cases
we just return -EINVAL, in others we do the normal generic thing, and in others
we're simply making sure that the properly due-dilligence is done. For example
in NFS/CIFS we need to make sure the file size is update properly for the
SEEK_HOLE and SEEK_DATA case, but since it calls the generic llseek stuff itself
that is all we have to do. Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Al Viro

Josef Bacik
2011-07-21 08:47:58 +0800
b85fd6bdc don't open-code parent_ino() in assorted ->readdir() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2011-07-21 08:47:54 +0800

20 Jul, 2011

5 commits

a127e0af5 ceph: LOOKUP_OPEN is set only when it's the last component ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2011-07-20 13:43:59 +0800
8a5e929dd don't transliterate lower bits of ->intent.open.flags to FMODE_... ... Browse Code »

->create() instances are much happier that way...

Signed-off-by: Al Viro

Al Viro
2011-07-20 13:43:52 +0800
10556cb21 ->permission() sanitizing: don't pass flags to ->permission() ... Browse Code »

not used by the instances anymore.

Signed-off-by: Al Viro

Al Viro
2011-07-20 13:43:24 +0800
2830ba7f3 ->permission() sanitizing: don't pass flags to generic_permission() ... Browse Code »

redundant; all callers get it duplicated in mask & MAY_NOT_BLOCK and none of
them removes that bit.

Signed-off-by: Al Viro

Al Viro
2011-07-20 13:43:22 +0800
178ea7352 kill check_acl callback of generic_permission() ... Browse Code »

its value depends only on inode and does not change; we might as
well store it in ->i_op->check_acl and be done with that.

Signed-off-by: Al Viro

Al Viro
2011-07-20 13:43:16 +0800

17 Jul, 2011

1 commit

1b71fe2ef ceph analog of cifs build_path_from_dentry() race fix ... Browse Code »

... unfortunately, cifs bug got copied. Fix is essentially the same.

Signed-off-by: Al Viro

Al Viro
2011-07-17 11:43:58 +0800

14 Jun, 2011

2 commits

d7f124f12 ceph: fix sync and dio writes across stripe boundaries ... Browse Code »

We were iterating across stripe boundaries properly, but not moving the
write buffer pointer forward. This caused us to rewrite the same data
after the break. Fix by adjusting the data pointer forward, and
recalculating the io and buffer alignment after the break.

Signed-off-by: Sage Weil

Sage Weil
2011-06-14 07:26:22 +0800
773e9b442 ceph: fix page alignment corrections ... Browse Code »

dd if=/dev/urandom of=/mnt/fs_depot/dd10 bs=500 seek=8388 count=1
dd if=/mnt/fs_depot/dd10 of=/root/dd10out bs=500 skip=8388 count=1

Reported-by: Henry C Chang
Signed-off-by: Sage Weil

Sage Weil
2011-06-14 07:26:10 +0800

08 Jun, 2011

4 commits

0c1f91f27 ceph: unwind canceled flock state ... Browse Code »

If we request a lock and then abort (e.g., ^C), we need to send a matching
unlock request to the MDS to unwind our lock attempt to avoid indefinitely
blocking other clients.

Reported-by: Brian Chrisman
Signed-off-by: Sage Weil

Sage Weil
2011-06-08 12:36:45 +0800
0e98728fa ceph: fix ENOENT logic in striped_read ... Browse Code »

Getting ENOENT is equivalent to reading 0 bytes. Make that correction
before setting up the hit_stripe and was_short flags.

Fixes the following case:
dd if=/dev/zero of=/mnt/fs_depot/dd3 bs=1 seek=1048576 count=0
dd if=/mnt/fs_depot/dd3 of=/root/ddout1 skip=8 bs=500 count=2 iflag=direct

Reported-by: Henry C Chang
Signed-off-by: Sage Weil

Sage Weil
2011-06-08 12:34:16 +0800
c3cd62839 ceph: fix short sync reads from the OSD ... Browse Code »

If we get a short read from the OSD because the object is small, we need to
zero the remainder of the buffer. For O_DIRECT reads, the attempted range
is not trimmed to i_size by the VFS, so we were actually looping
indefinitely.

Fix by trimming by i_size, and the unconditionally zeroing the trailing
range.

Reported-by: Jeff Wu
Signed-off-by: Sage Weil

Sage Weil
2011-06-08 12:34:14 +0800
70b666c3b ceph: use ihold when we already have an inode ref ... Browse Code »

We should use ihold whenever we already have a stable inode ref, even
when we aren't holding i_lock. This avoids adding new and unnecessary
locking dependencies.

Signed-off-by: Sage Weil

Sage Weil
2011-06-08 12:34:11 +0800

25 May, 2011

1 commit

db3540522 ceph: fix cap flush race reentrancy ... Browse Code »

In e9964c10 we change cap flushing to do a delicate dance because some
inodes on the cap_dirty list could be in a migrating state (got EXPORT but
not IMPORT) in which we couldn't actually flush and move from
dirty->flushing, breaking the while (!empty) { process first } loop
structure. It worked for a single sync thread, but was not reentrant and
triggered infinite loops when multiple syncers came along.

Instead, move inodes with dirty to a separate cap_dirty_migrating list
when in the limbo export-but-no-import state, allowing us to go back to
the simple loop structure (which was reentrant). This is cleaner and more
robust.

Audited the cap_dirty users and this looks fine:
list_empty(&ci->i_dirty_item) is still a reliable indicator of whether we
have dirty caps (which list we're on is irrelevant) and list_del_init()
calls still do the right thing.

Signed-off-by: Sage Weil

Sage Weil
2011-05-25 02:52:12 +0800