Eric Lee / smarc-fsl-linux-kernel

12 Nov, 2011

1 commit

224736d91 libceph: Allocate larger oid buffer in request msgs ... Browse Code »

ceph_osd_request struct allocates a 40-byte buffer for object names.
RBD image names can be up to 96 chars long (100 with the .rbd suffix),
which results in the object name for the image being truncated, and a
subsequent map failure.

Increase the oid buffer in request messages, in order to avoid the
truncation.

Signed-off-by: Stratos Psomadakis
Signed-off-by: Sage Weil

Stratos Psomadakis
2011-11-12 01:50:19 +0800

29 Oct, 2011

1 commit

97d2eb13a Merge branch 'for-linus' of git://ceph.newdream.net/git/ceph-client ... Browse Code »

* 'for-linus' of git://ceph.newdream.net/git/ceph-client:
libceph: fix double-free of page vector
ceph: fix 32-bit ino numbers
libceph: force resend of osd requests if we skip an osdmap
ceph: use kernel DNS resolver
ceph: fix ceph_monc_init memory leak
ceph: let the set_layout ioctl set single traits
Revert "ceph: don't truncate dirty pages in invalidate work thread"
ceph: replace leading spaces with tabs
libceph: warn on msg allocation failures
libceph: don't complain on msgpool alloc failures
libceph: always preallocate mon connection
libceph: create messenger with client
ceph: document ioctls
ceph: implement (optional) max read size
ceph: rename rsize -> rasize
ceph: make readpages fully async

Linus Torvalds
2011-10-29 07:42:18 +0800

26 Oct, 2011

2 commits

b61c27636 libceph: don't complain on msgpool alloc failures ... Browse Code »

The pool allocation failures are masked by the pool; there is no need to
spam the console about them. (That's the whole point of having the pool
in the first place.)

Mark msg allocations whose failure is safely handled as such.

Signed-off-by: Sage Weil

Sage Weil
2011-10-26 07:10:15 +0800
6ab00d465 libceph: create messenger with client ... Browse Code »

This simplifies the init/shutdown paths, and makes client->msgr available
during the rest of the setup process.

Signed-off-by: Sage Weil

Sage Weil
2011-10-26 07:10:15 +0800

15 Sep, 2011

2 commits

e060c3843 Merge branch 'master' into for-next ... Browse Code »

Fast-forward merge with Linus to be able to merge patches
based on more recent version of the tree.

Jiri Kosina
2011-09-15 21:08:18 +0800
e81b15168 Remove unneeded version.h includes from include/ ... Browse Code »

It was pointed out by 'make versioncheck' that some includes of
linux/version.h are not needed in include/.
This patch removes them.

When I last posted the patch, the ceph bit was ACK'ed by Sage Weil, so
I've added that below.

The pwc-ioctl change generated quite a bit of discussion about V4L version
numbers in general, but as far as I can tell, no concensus was reached on
what the long term solution should be, so in the mean time I think we
could start by just removing the unneeded include, which is why I'm
resending the patch with that hunk still included.

Signed-off-by: Jesper Juhl
Acked-by: Sage Weil
Signed-off-by: Jiri Kosina

Jesper Juhl
2011-09-15 20:57:06 +0800

27 Jul, 2011

2 commits

ba5b56cb3 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (23 commits)
ceph: document unlocked d_parent accesses
ceph: explicitly reference rename old_dentry parent dir in request
ceph: document locking for ceph_set_dentry_offset
ceph: avoid d_parent in ceph_dentry_hash; fix ceph_encode_fh() hashing bug
ceph: protect d_parent access in ceph_d_revalidate
ceph: protect access to d_parent
ceph: handle racing calls to ceph_init_dentry
ceph: set dir complete frag after adding capability
rbd: set blk_queue request sizes to object size
ceph: set up readahead size when rsize is not passed
rbd: cancel watch request when releasing the device
ceph: ignore lease mask
ceph: fix ceph_lookup_open intent usage
ceph: only link open operations to directory unsafe list if O_CREAT|O_TRUNC
ceph: fix bad parent_inode calc in ceph_lookup_open
ceph: avoid carrying Fw cap during write into page cache
libceph: don't time out osd requests that haven't been received
ceph: report f_bfree based on kb_avail rather than diffing.
ceph: only queue capsnap if caps are dirty
ceph: fix snap writeback when racing with writes
...

Linus Torvalds
2011-07-27 04:38:50 +0800
4cf9d5446 libceph: don't time out osd requests that haven't been received ... Browse Code »

Keep track of when an outgoing message is ACKed (i.e., the server fully
received it and, presumably, queued it for processing). Time out OSD
requests only if it's been too long since they've been received.

This prevents timeouts and connection thrashing when the OSDs are simply
busy and are throttling the requests they read off the network.

Reviewed-by: Yehuda Sadeh
Signed-off-by: Sage Weil

Sage Weil
2011-07-27 02:27:24 +0800

21 Jul, 2011

1 commit

497888cf6 treewide: fix potentially dangerous trailing ';' in #defined values/expressions ... Browse Code »

All these are instances of
#define NAME value;
or
#define NAME(params_opt) value;

These of course fail to build when used in contexts like
if(foo $OP NAME)
while(bar $OP NAME)
and may silently generate the wrong code in contexts such as
foo = NAME + 1; /* foo = value; + 1; */
bar = NAME - 1; /* bar = value; - 1; */
baz = NAME & quux; /* baz = value; & quux; */

Reported on comp.lang.c,
Message-ID:
Initial analysis of the dangers provided by Keith Thompson in that thread.

There are many more instances of more complicated macros having unnecessary
trailing semicolons, but this pile seems to be all of the cases of simple
values suffering from the problem. (Thus things that are likely to be found
in one of the contexts above, more complicated ones aren't.)

Signed-off-by: Phil Carmody
Signed-off-by: Jiri Kosina

Phil Carmody
2011-07-21 20:10:00 +0800

25 May, 2011

1 commit

3c454cf21 ceph: use LOOKUPINO to make unconnected nfs fh more reliable ... Browse Code »

If we are unable to locate an inode by ino, ask the MDS using the new
LOOKUPINO command.

Signed-off-by: Sage Weil

Sage Weil
2011-05-25 02:52:05 +0800

30 Mar, 2011

1 commit

8323c3aa7 ceph: Move secret key parsing earlier. ... Browse Code »

This makes the base64 logic be contained in mount option parsing,
and prepares us for replacing the homebew key management with the
kernel key retention service.

Signed-off-by: Tommi Virtanen
Signed-off-by: Sage Weil

Tommi Virtanen
2011-03-30 03:11:16 +0800

23 Mar, 2011

1 commit

a40c4f10e libceph: add lingering request and watch/notify event framework ... Browse Code »

Lingering requests are requests that are sent to the OSD normally but
tracked also after we get a successful request. This keeps the OSD
connection open and resends the original request if the object moves to
another OSD. The OSD can then send notification messages back to us
if another client initiates a notify.

This framework will be used by RBD so that the client gets notification
when a snapshot is created by another node or tool.

Signed-off-by: Yehuda Sadeh
Signed-off-by: Sage Weil

Yehuda Sadeh
2011-03-23 02:33:55 +0800

22 Mar, 2011

3 commits

80456f867 ceph: move readahead default to fs/ceph from libceph ... Browse Code »

Signed-off-by: Sage Weil

Sage Weil
2011-03-22 03:24:23 +0800
483fac714 ceph: update common header files ... Browse Code »

This updates the common header files used by the different ceph
related modules. Specifically it adds definitions required by
the rbd watch/notify feature.

Signed-off-by: Yehuda Sadeh

Yehuda Sadeh
2011-03-22 03:24:21 +0800
6f6c70067 libceph: fix osd request queuing on osdmap updates ... Browse Code »

If we send a request to osd A, and the request's pg remaps to osd B and
then back to A in quick succession, we need to resend the request to A. The
old code was only calling kick_requests after processing all incremental
maps in a message, so it was very possible to not resend a request that
needed to be resent. This would make the osd eventually time out (at least
with the current default of osd timeouts enabled).

The correct approach is to scan requests on every map incremental. This
patch refactors the kick code in a few ways:
- all requests are either on req_lru (in flight), req_unsent (ready to
send), or req_notarget (currently map to no up osd)
- mapping always done by map_request (previous map_osds)
- if the mapping changes, we requeue. requests are resent only after all
map incrementals are processed.
- some osd reset code is moved out of kick_requests into a separate
function
- the "kick this osd" functionality is moved to kick_osd_requests, as it
is unrelated to scanning for request->pg->osd mapping changes

Signed-off-by: Sage Weil

Sage Weil
2011-03-22 03:24:19 +0800

05 Mar, 2011

2 commits

e76661d0a libceph: fix msgr keepalive flag ... Browse Code »

There was some broken keepalive code using a dead variable. Shift to using
the proper bit flag.

Signed-off-by: Sage Weil

Sage Weil
2011-03-05 04:24:31 +0800
60bf8bf88 libceph: fix msgr backoff ... Browse Code »

With commit f363e45f we replaced a bunch of hacky workqueue mutual
exclusion logic with the WQ_NON_REENTRANT flag. One pieces of fallout is
that the exponential backoff breaks in certain cases:

* con_work attempts to connect.
* we get an immediate failure, and the socket state change handler queues
immediate work.
* con_work calls con_fault, we decide to back off, but can't queue delayed
work.

In this case, we add a BACKOFF bit to make con_work reschedule delayed work
next time it runs (which should be immediately).

Signed-off-by: Sage Weil

Sage Weil
2011-03-05 04:24:28 +0800

13 Jan, 2011

2 commits

f363e45fd net/ceph: make ceph_msgr_wq non-reentrant ... Browse Code »

ceph messenger code does a rather complex dancing around multithread
workqueue to make sure the same work item isn't executed concurrently
on different CPUs. This restriction can be provided by workqueue with
WQ_NON_REENTRANT.

Make ceph_msgr_wq non-reentrant workqueue with the default concurrency
level and remove the QUEUED/BUSY logic.

* This removes backoff handling in con_work() but it couldn't reliably
block execution of con_work() to begin with - queue_con() can be
called after the work started but before BUSY is set. It seems that
it was an optimization for a rather cold path and can be safely
removed.

* The number of concurrent work items is bound by the number of
connections and connetions are independent from each other. With
the default concurrency level, different connections will be
executed independently.

Signed-off-by: Tejun Heo
Cc: Sage Weil
Cc: ceph-devel@vger.kernel.org
Signed-off-by: Sage Weil

Tejun Heo
2011-01-13 07:15:14 +0800
6c0f3af72 ceph: add dir_layout to inode ... Browse Code »

Add a ceph_dir_layout to the inode, and calculate dentry hash values based
on the parent directory's specified dir_hash function. This is needed
because the old default Linux dcache hash function is extremely week and
leads to a poor distribution of files among dir fragments.

Signed-off-by: Sage Weil

Sage Weil
2011-01-13 07:15:12 +0800

18 Dec, 2010

1 commit

b6aa5901c ceph: mark user pages dirty on direct-io reads ... Browse Code »

For read operation, we have to set the argument _write_ of get_user_pages
to 1 since we will write data to pages. Also, we need to SetPageDirty before
releasing these pages.

Signed-off-by: Henry C Chang
Signed-off-by: Sage Weil

Henry C Chang
2010-12-18 01:54:40 +0800

10 Nov, 2010

3 commits

c5c6b19d4 ceph: explicitly specify page alignment in network messages ... Browse Code »

The alignment used for reading data into or out of pages used to be taken
from the data_off field in the message header. This only worked as long
as the page alignment matched the object offset, breaking direct io to
non-page aligned offsets.

Instead, explicitly specify the page alignment next to the page vector
in the ceph_msg struct, and use that instead of the message header (which
probably shouldn't be trusted). The alloc_msg callback is responsible for
filling in this field properly when it sets up the page vector.

Signed-off-by: Sage Weil

Sage Weil
2010-11-10 04:43:17 +0800
b7495fc2f ceph: make page alignment explicit in osd interface ... Browse Code »

We used to infer alignment of IOs within a page based on the file offset,
which assumed they matched. This broke with direct IO that was not aligned
to pages (e.g., 512-byte aligned IO). We were also trusting the alignment
specified in the OSD reply, which could have been adjusted by the server.

Explicitly specify the page alignment when setting up OSD IO requests.

Signed-off-by: Sage Weil

Sage Weil
2010-11-10 04:43:12 +0800
e98b6fed8 ceph: fix comment, remove extraneous args ... Browse Code »

The offset/length arguments aren't used.

Signed-off-by: Sage Weil

Sage Weil
2010-11-10 04:24:53 +0800

21 Oct, 2010

3 commits

571dba52a ceph: add CEPH_MDS_OP_SETDIRLAYOUT and associated ioctl. ... Browse Code »

Signed-off-by: Sage Weil

Greg Farnum
2010-10-21 06:38:23 +0800
ac0b74d8a ceph: add pagelist_reserve, pagelist_truncate, pagelist_set_cursor ... Browse Code »

These facilitate preallocation of pages so that we can encode into the pagelist
in an atomic context.

Signed-off-by: Greg Farnum
Signed-off-by: Sage Weil

Greg Farnum
2010-10-21 06:38:16 +0800
3d14c5d2b ceph: factor out libceph from Ceph file system ... Browse Code »

This factors out protocol and low-level storage parts of ceph into a
separate libceph module living in net/ceph and include/linux/ceph. This
is mostly a matter of moving files around. However, a few key pieces
of the interface change as well:

- ceph_client becomes ceph_fs_client and ceph_client, where the latter
captures the mon and osd clients, and the fs_client gets the mds client
and file system specific pieces.
- Mount option parsing and debugfs setup is correspondingly broken into
two pieces.
- The mon client gets a generic handler callback for otherwise unknown
messages (mds map, in this case).
- The basic supported/required feature bits can be expanded (and are by
ceph_fs_client).

No functional change, aside from some subtle error handling cases that got
cleaned up in the refactoring process.

Signed-off-by: Sage Weil

Yehuda Sadeh
2010-10-21 06:37:28 +0800