Doug / smarc-fsl-linux-kernel | Embedian Git Server

14 Jun, 2011

2 commits

d7f124f12 ceph: fix sync and dio writes across stripe boundaries ... Browse Code »

We were iterating across stripe boundaries properly, but not moving the
write buffer pointer forward. This caused us to rewrite the same data
after the break. Fix by adjusting the data pointer forward, and
recalculating the io and buffer alignment after the break.

Signed-off-by: Sage Weil

Sage Weil
2011-06-14 07:26:22 +0800
773e9b442 ceph: fix page alignment corrections ... Browse Code »

dd if=/dev/urandom of=/mnt/fs_depot/dd10 bs=500 seek=8388 count=1
dd if=/mnt/fs_depot/dd10 of=/root/dd10out bs=500 skip=8388 count=1

Reported-by: Henry C Chang
Signed-off-by: Sage Weil

Sage Weil
2011-06-14 07:26:10 +0800

08 Jun, 2011

3 commits

0e98728fa ceph: fix ENOENT logic in striped_read ... Browse Code »

Getting ENOENT is equivalent to reading 0 bytes. Make that correction
before setting up the hit_stripe and was_short flags.

Fixes the following case:
dd if=/dev/zero of=/mnt/fs_depot/dd3 bs=1 seek=1048576 count=0
dd if=/mnt/fs_depot/dd3 of=/root/ddout1 skip=8 bs=500 count=2 iflag=direct

Reported-by: Henry C Chang
Signed-off-by: Sage Weil

Sage Weil
2011-06-08 12:34:16 +0800
c3cd62839 ceph: fix short sync reads from the OSD ... Browse Code »

If we get a short read from the OSD because the object is small, we need to
zero the remainder of the buffer. For O_DIRECT reads, the attempted range
is not trimmed to i_size by the VFS, so we were actually looping
indefinitely.

Fix by trimming by i_size, and the unconditionally zeroing the trailing
range.

Reported-by: Jeff Wu
Signed-off-by: Sage Weil

Sage Weil
2011-06-08 12:34:14 +0800
70b666c3b ceph: use ihold when we already have an inode ref ... Browse Code »

We should use ihold whenever we already have a stable inode ref, even
when we aren't holding i_lock. This avoids adding new and unnecessary
locking dependencies.

Signed-off-by: Sage Weil

Sage Weil
2011-06-08 12:34:11 +0800

05 May, 2011

1 commit

fca65b4ad ceph: do not call __mark_dirty_inode under i_lock ... Browse Code »

The __mark_dirty_inode helper now takes i_lock as of 250df6ed. Fix the
one ceph callers that held i_lock (__ceph_mark_dirty_caps) to return the
flags value so that the callers can do it outside of i_lock.

Signed-off-by: Sage Weil

Sage Weil
2011-05-05 03:56:45 +0800

22 Mar, 2011

2 commits

49bcb9323 ceph: add request to the tail of unsafe write list ... Browse Code »

In sync_write_wait(), we assume that the newest request is at the
tail of unsafe write list. We should maintain the semantics here.

Signed-off-by: Henry C Chang
Signed-off-by: Sage Weil

Henry C Chang
2011-03-22 03:24:25 +0800
78a255654 ceph: remove request from unsafe list if it is canceled/timed out ... Browse Code »

This fixes the list corruption warning like this:

------------[ cut here ]------------
WARNING: at lib/list_debug.c:30 __list_add+0x68/0x81()
Hardware name: X8DTU
list_add corruption. prev->next should be next (ffff880618931250), but was (null). (prev=ffff880c188b9130).
Modules linked in: nfsd lockd nfs_acl auth_rpcgss exportfs ceph libceph libcrc32c sunrpc ipv6 fuse igb i2c_i801 ioatdma i2c_core iTCO_wdt iTCO_vendor_support joydev dca serio_raw usb_storage [last unloaded: scsi_wait_scan]
Pid: 10977, comm: smbd Tainted: G W 2.6.32.23-170.Elaster.xendom0.fc12.x86_64 #1
Call Trace:
[] warn_slowpath_common+0x7c/0x94
[] warn_slowpath_fmt+0x41/0x43
[] __list_add+0x68/0x81
[] ceph_aio_write+0x614/0x8a2 [ceph]
[] do_sync_write+0xe8/0x125
[] ? autoremove_wake_function+0x0/0x39
[] ? selinux_file_permission+0x5c/0xb3
[] ? security_file_permission+0x16/0x18
[] vfs_write+0xae/0x10b
[] sys_pwrite64+0x5a/0x76
[] system_call_fastpath+0x16/0x1b
---[ end trace 08573eb9f07ff6f4 ]---

Signed-off-by: Henry C Chang
Signed-off-by: Sage Weil

Henry C Chang
2011-03-22 03:24:24 +0800

18 Dec, 2010

1 commit

b6aa5901c ceph: mark user pages dirty on direct-io reads ... Browse Code »

For read operation, we have to set the argument _write_ of get_user_pages
to 1 since we will write data to pages. Also, we need to SetPageDirty before
releasing these pages.

Signed-off-by: Henry C Chang
Signed-off-by: Sage Weil

Henry C Chang
2010-12-18 01:54:40 +0800

16 Dec, 2010

1 commit

ab226e21a ceph: fix direct-io on non-page-aligned buffers ... Browse Code »

The user buffer may be 512-byte aligned, not page-aligned. We were
assuming the buffer was page-aligned and only accounting for
non-page-aligned io offsets.

Signed-off-by: Henry C Chang
Signed-off-by: Sage Weil

Henry C Chang
2010-12-16 12:46:16 +0800

10 Nov, 2010

2 commits

b7495fc2f ceph: make page alignment explicit in osd interface ... Browse Code »

We used to infer alignment of IOs within a page based on the file offset,
which assumed they matched. This broke with direct IO that was not aligned
to pages (e.g., 512-byte aligned IO). We were also trusting the alignment
specified in the OSD reply, which could have been adjusted by the server.

Explicitly specify the page alignment when setting up OSD IO requests.

Signed-off-by: Sage Weil

Sage Weil
2010-11-10 04:43:12 +0800
e98b6fed8 ceph: fix comment, remove extraneous args ... Browse Code »

The offset/length arguments aren't used.

Signed-off-by: Sage Weil

Sage Weil
2010-11-10 04:24:53 +0800

08 Nov, 2010

1 commit

7421ab804 ceph: fix open for write on clustered mds ... Browse Code »

Normally when we open a file we already have a cap, and simply update the
wanted set. However, if we open a file for write, but don't have an auth
cap, that doesn't work; we need to open a new cap with the auth MDS. Only
reuse existing caps if we are opening for read or the existing cap is auth.

Signed-off-by: Sage Weil

Sage Weil
2010-11-08 01:07:15 +0800

21 Oct, 2010

1 commit

3d14c5d2b ceph: factor out libceph from Ceph file system ... Browse Code »

This factors out protocol and low-level storage parts of ceph into a
separate libceph module living in net/ceph and include/linux/ceph. This
is mostly a matter of moving files around. However, a few key pieces
of the interface change as well:

- ceph_client becomes ceph_fs_client and ceph_client, where the latter
captures the mon and osd clients, and the fs_client gets the mds client
and file system specific pieces.
- Mount option parsing and debugfs setup is correspondingly broken into
two pieces.
- The mon client gets a generic handler callback for otherwise unknown
messages (mds map, in this case).
- The basic supported/required feature bits can be expanded (and are by
ceph_fs_client).

No functional change, aside from some subtle error handling cases that got
cleaned up in the refactoring process.

Signed-off-by: Sage Weil

Yehuda Sadeh
2010-10-21 06:37:28 +0800

07 Oct, 2010

1 commit

936aeb5c4 ceph: fix list_add usage on unsafe_writes list ... Browse Code »

Fix argument order.

Signed-off-by: Henry C Chang
Signed-off-by: Sage Weil

Henry C Chang
2010-10-07 23:00:23 +0800

04 Aug, 2010

1 commit

213c99ee0 ceph: whitespace cleanup ... Browse Code »

Signed-off-by: Sage Weil

Sage Weil
2010-08-04 01:25:11 +0800

03 Aug, 2010

1 commit

40819f6fb ceph: add flock/fcntl lock support ... Browse Code »

Implement flock inode operation to support advisory file locking. All
lock/unlock operations are synchronous with the MDS. Lock state is
sent when reconnecting to a recovering MDS to restore the shared lock
state.

Signed-off-by: Greg Farnum
Signed-off-by: Sage Weil

Greg Farnum
2010-08-03 07:10:53 +0800

02 Aug, 2010

3 commits

cd84db6e4 ceph: code cleanup ... Browse Code »

Mainly fixing minor issues reported by sparse.

Signed-off-by: Yehuda Sadeh
Signed-off-by: Sage Weil

Yehuda Sadeh
2010-08-02 11:11:40 +0800
2962507ca ceph: perform lazy reads when file mode and caps permit ... Browse Code »

If the file mode is marked as "lazy," perform cached/buffered reads when
the caps permit it. Adjust the rdcache_gen and invalidation logic
accordingly so that we manage our cache based on the FILE_CACHE -or-
FILE_LAZYIO cap bits.

Signed-off-by: Sage Weil

Sage Weil
2010-08-02 11:11:39 +0800
33caad324 ceph: perform lazy writes when file mode and caps permit ... Browse Code »

If we have marked a file as "lazy" (using the ceph ioctl), perform buffered
writes when the MDS caps allow it.

Signed-off-by: Sage Weil

Sage Weil
2010-08-02 11:11:39 +0800

28 Jul, 2010

1 commit

03066f234 ceph: use complete_all and wake_up_all ... Browse Code »

This fixes an issue triggered by running concurrent syncs. One of the syncs
would go through while the other would just hang indefinitely. In any case, we
never actually want to wake a single waiter, so the *_all functions should
be used.

Signed-off-by: Yehuda Sadeh
Signed-off-by: Sage Weil

Yehuda Sadeh
2010-07-28 04:11:17 +0800

30 May, 2010

2 commits

b612a0553 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
ceph: clean up on forwarded aborted mds request
ceph: fix leak of osd authorizer
ceph: close out mds, osd connections before stopping auth
ceph: make lease code DN specific
fs/ceph: Use ERR_CAST
ceph: renew auth tickets before they expire
ceph: do not resend mon requests on auth ticket renewal
ceph: removed duplicated #includes
ceph: avoid possible null dereference
ceph: make mds requests killable, not interruptible
sched: add wait_for_completion_killable_timeout

Linus Torvalds
2010-05-30 23:56:39 +0800
7e34bc524 fs/ceph: Use ERR_CAST ... Browse Code »

Use ERR_CAST(x) rather than ERR_PTR(PTR_ERR(x)). The former makes more
clear what is the purpose of the operation, which otherwise looks like a
no-op.

In the case of fs/ceph/inode.c, ERR_CAST is not needed, because the type of
the returned value is the same as the type of the enclosing function.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

//
@@
type T;
T x;
identifier f;
@@

T f (...) { }

@@
expression x;
@@

- ERR_PTR(PTR_ERR(x))
+ ERR_CAST(x)
//

Signed-off-by: Julia Lawall
Signed-off-by: Sage Weil

Julia Lawall
2010-05-30 00:12:41 +0800

24 May, 2010

1 commit

6e188240e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (59 commits)
ceph: reuse mon subscribe message instead of allocated anew
ceph: avoid resending queued message to monitor
ceph: Storage class should be before const qualifier
ceph: all allocation functions should get gfp_mask
ceph: specify max_bytes on readdir replies
ceph: cleanup pool op strings
ceph: Use kzalloc
ceph: use common helper for aborted dir request invalidation
ceph: cope with out of order (unsafe after safe) mds reply
ceph: save peer feature bits in connection structure
ceph: resync headers with userland
ceph: use ceph. prefix for virtual xattrs
ceph: throw out dirty caps metadata, data on session teardown
ceph: attempt mds reconnect if mds closes our session
ceph: clean up send_mds_reconnect interface
ceph: wait for mds OPEN reply to indicate reconnect success
ceph: only send cap releases when mds is OPEN|HUNG
ceph: dicard cap releases on mds restart
ceph: make mon client statfs handling more generic
ceph: drop src address(es) from message header [new protocol feature]
...

Linus Torvalds
2010-05-24 22:37:52 +0800

22 May, 2010

1 commit

8018ab057 sanitize vfs_fsync calling conventions ... Browse Code »

Now that the last user passing a NULL file pointer is gone we can remove
the redundant dentry argument and associated hacks inside vfs_fsynmc_range.

The next step will be removig the dentry argument from ->fsync, but given
the luck with the last round of method prototype changes I'd rather
defer this until after the main merge window.

Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2010-05-22 06:31:21 +0800

18 May, 2010

4 commits

34d23762d ceph: all allocation functions should get gfp_mask ... Browse Code »

This is essential, as for the rados block device we'll need
to run in different contexts that would need flags that
are other than GFP_NOFS.

Signed-off-by: Yehuda Sadeh
Signed-off-by: Sage Weil

Yehuda Sadeh
2010-05-18 06:25:42 +0800
a79832f26 ceph: make ceph_msg_new return NULL on failure; clean up, fix callers ... Browse Code »

Returning ERR_PTR(-ENOMEM) is useless extra work. Return NULL on failure
instead, and fix up the callers (about half of which were wrong anyway).

Signed-off-by: Sage Weil

Sage Weil
2010-05-18 06:25:18 +0800
640ef79d2 ceph: use ceph_sb_to_client instead of ceph_client ... Browse Code »

ceph_sb_to_client and ceph_client are really identical, we need to dump
one; while function ceph_client is confusing with "struct ceph_client",
ceph_sb_to_client's definition is more clear; so we'd better switch all
call to ceph_sb_to_client.

-static inline struct ceph_client *ceph_client(struct super_block *sb)
-{
- return sb->s_fs_info;
-}

Signed-off-by: Cheng Renquan
Signed-off-by: Sage Weil

Cheng Renquan
2010-05-18 06:25:17 +0800
31459fe4b ceph: use __page_cache_alloc and add_to_page_cache_lru ... Browse Code »

Following Nick Piggin patches in btrfs, pagecache pages should be
allocated with __page_cache_alloc, so they obey pagecache memory
policies.

Also, using add_to_page_cache_lru instead of using a private
pagevec where applicable.

Signed-off-by: Yehuda Sadeh
Signed-off-by: Sage Weil

Yehuda Sadeh
2010-05-18 06:25:12 +0800

04 May, 2010

1 commit

5c6a2cdb4 ceph: fix direct io truncate offset ... Browse Code »

truncate_inode_pages_range wants the end offset to align with the last byte
in a page.

Signed-off-by: Sage Weil

Sage Weil
2010-05-04 01:49:25 +0800

30 Mar, 2010

1 commit

5a0e3ad6a include cleanup: Update gfp.h and slab.h includes to prepare for breaking implic… ... Browse Code »

…it slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.

2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).

* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

Tejun Heo
2010-03-30 21:02:32 +0800

02 Mar, 2010

1 commit

195d3ce2c ceph: return EBADF if waiting for caps on closed file ... Browse Code »

Verify the file is actually open for the given caps when we are
waiting for caps. This ensures we will wake up and return EBADF
if another thread closes the file out from under us.

Note that EBADF is also the correct return code from write(2)
when called on a file handle opened for reading (although the
vfs should catch that).

Signed-off-by: Sage Weil

Sage Weil
2010-03-02 07:28:00 +0800

24 Feb, 2010

1 commit

88d892a37 ceph: don't clobber write return value when using O_SYNC ... Browse Code »

Signed-off-by: Yehuda Sadeh
Signed-off-by: Sage Weil

Yehuda Sadeh
2010-02-24 06:26:36 +0800

12 Feb, 2010

3 commits

6a026589b ceph: fix sync read eof check deadlock ... Browse Code »

If a sync read gets a short result from the OSD, it may need to do a
getattr to see if it is short due to reaching end-of-file. The getattr
was being done while holding a reference to FILE_RD, which can lead to
a deadlock if the MDS is revoking that capability bit and can't process
the getattr until it does.

We fix this by setting a flag if EOF size validation is needed, and doing
the getattr in ceph_aio_read, after the RD cap ref is dropped. If the
read needs to be continued, we loop and continue traversing the file.

Signed-off-by: Sage Weil

Sage Weil
2010-02-12 03:48:53 +0800
29065a513 ceph: sync read/write considers page cache ... Browse Code »

In the cases where we either do a sync read or a write, we
need to make sure that everything in the page cache is flushed.
In the case of a sync write we invalidate the relevant pages,
so that subsequent read/write reflects the new data written.

Signed-off-by: Yehuda Sadeh
Signed-off-by: Sage Weil

Yehuda Sadeh
2010-02-12 03:48:51 +0800
972f0d3ab ceph: fix short synchronous reads ... Browse Code »

Zeroing of holes was not done correctly: page_off was miscalculated and
zeroing the tail didn't not adjust the 'read' value to include the zeroed
portion.

Signed-off-by: Yehuda Sadeh
Signed-off-by: Sage Weil

Yehuda Sadeh
2010-02-12 03:48:49 +0800

07 Jan, 2010

1 commit

6a4ef4810 ceph: fix copy_user_to_page_vector() ... Browse Code »

The function was broken in the case where there was more than one page
involved, broke the ceph sync_write case.

Signed-off-by: Yehuda Sadeh
Signed-off-by: Sage Weil

Yehuda Sadeh
2010-01-07 08:05:20 +0800

05 Nov, 2009

1 commit

6a18be16f ceph: fix sparse endian warning ... Browse Code »

Use the __le macro, even though for -1 it doesn't matter.

Signed-off-by: Sage Weil

Sage Weil
2009-11-05 08:36:12 +0800

07 Oct, 2009

1 commit

124e68e74 ceph: file operations ... Browse Code »

File open and close operations, and read and write methods that ensure
we have obtained the proper capabilities from the MDS cluster before
performing IO on a file. We take references on held capabilities for
the duration of the read/write to avoid prematurely releasing them
back to the MDS.

We implement two main paths for read and write: one that is buffered
(and uses generic_aio_{read,write}), and one that is fully synchronous
and blocking (operating either on a __user pointer or, if O_DIRECT,
directly on user pages).

Signed-off-by: Sage Weil

Sage Weil
2009-10-07 02:31:08 +0800