11 Jan, 2012

1 commit


12 Nov, 2011

1 commit

  • ceph_osd_request struct allocates a 40-byte buffer for object names.
    RBD image names can be up to 96 chars long (100 with the .rbd suffix),
    which results in the object name for the image being truncated, and a
    subsequent map failure.

    Increase the oid buffer in request messages, in order to avoid the
    truncation.

    Signed-off-by: Stratos Psomadakis
    Signed-off-by: Sage Weil

    Stratos Psomadakis
     

26 Oct, 2011

2 commits


17 Sep, 2011

1 commit

  • The r_req_lru_item list node moves between several lists, and that cycle
    is not directly related (and does not begin) with __register_request().
    Initialize it in the request constructor, not __register_request(). This
    fixes later badness (below) when OSDs restart underneath an rbd mount.

    Crashes we've seen due to this include:

    [ 213.974288] kernel BUG at net/ceph/messenger.c:2193!

    and

    [ 144.035274] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
    [ 144.035278] IP: [] con_work+0x1463/0x2ce0 [libceph]

    Signed-off-by: Sage Weil

    Sage Weil
     

01 Sep, 2011

1 commit


27 Jul, 2011

1 commit

  • Keep track of when an outgoing message is ACKed (i.e., the server fully
    received it and, presumably, queued it for processing). Time out OSD
    requests only if it's been too long since they've been received.

    This prevents timeouts and connection thrashing when the OSDs are simply
    busy and are throttling the requests they read off the network.

    Reviewed-by: Yehuda Sadeh
    Signed-off-by: Sage Weil

    Sage Weil
     

14 Jun, 2011

1 commit


08 Jun, 2011

1 commit


25 May, 2011

1 commit


20 May, 2011

2 commits


04 May, 2011

1 commit


15 Apr, 2011

1 commit


08 Apr, 2011

1 commit


07 Apr, 2011

1 commit

  • Fix the request transition from linger -> normal request. The key is to
    preserve r_osd and requeue on the same OSD. Reregister as a normal request,
    add the request to the proper queues, then unregister the linger. Fix the
    unregister helper to avoid clearing r_osd (and also simplify the parallel
    check in __unregister_request()).

    Reported-by: Henry Chang
    Signed-off-by: Sage Weil

    Sage Weil
     

31 Mar, 2011

1 commit


30 Mar, 2011

1 commit

  • We should only clear r_osd if we are neither registered as a linger or a
    regular request. We may unregister as a linger while still registered as
    a regular request (e.g., in reset_osd). Incorrectly clearing r_osd there
    leads to a null pointer dereference in __send_request.

    Also simplify the parallel check in __unregister_request() where we just
    removed r_osd_item and know it's empty.

    Signed-off-by: Sage Weil

    Sage Weil
     

29 Mar, 2011

1 commit


27 Mar, 2011

1 commit


23 Mar, 2011

1 commit

  • Lingering requests are requests that are sent to the OSD normally but
    tracked also after we get a successful request. This keeps the OSD
    connection open and resends the original request if the object moves to
    another OSD. The OSD can then send notification messages back to us
    if another client initiates a notify.

    This framework will be used by RBD so that the client gets notification
    when a snapshot is created by another node or tool.

    Signed-off-by: Yehuda Sadeh
    Signed-off-by: Sage Weil

    Yehuda Sadeh
     

22 Mar, 2011

1 commit

  • If we send a request to osd A, and the request's pg remaps to osd B and
    then back to A in quick succession, we need to resend the request to A. The
    old code was only calling kick_requests after processing all incremental
    maps in a message, so it was very possible to not resend a request that
    needed to be resent. This would make the osd eventually time out (at least
    with the current default of osd timeouts enabled).

    The correct approach is to scan requests on every map incremental. This
    patch refactors the kick code in a few ways:
    - all requests are either on req_lru (in flight), req_unsent (ready to
    send), or req_notarget (currently map to no up osd)
    - mapping always done by map_request (previous map_osds)
    - if the mapping changes, we requeue. requests are resent only after all
    map incrementals are processed.
    - some osd reset code is moved out of kick_requests into a separate
    function
    - the "kick this osd" functionality is moved to kick_osd_requests, as it
    is unrelated to scanning for request->pg->osd mapping changes

    Signed-off-by: Sage Weil

    Sage Weil
     

10 Nov, 2010

2 commits

  • The alignment used for reading data into or out of pages used to be taken
    from the data_off field in the message header. This only worked as long
    as the page alignment matched the object offset, breaking direct io to
    non-page aligned offsets.

    Instead, explicitly specify the page alignment next to the page vector
    in the ceph_msg struct, and use that instead of the message header (which
    probably shouldn't be trusted). The alloc_msg callback is responsible for
    filling in this field properly when it sets up the page vector.

    Signed-off-by: Sage Weil

    Sage Weil
     
  • We used to infer alignment of IOs within a page based on the file offset,
    which assumed they matched. This broke with direct IO that was not aligned
    to pages (e.g., 512-byte aligned IO). We were also trusting the alignment
    specified in the OSD reply, which could have been adjusted by the server.

    Explicitly specify the page alignment when setting up OSD IO requests.

    Signed-off-by: Sage Weil

    Sage Weil
     

21 Oct, 2010

1 commit

  • This factors out protocol and low-level storage parts of ceph into a
    separate libceph module living in net/ceph and include/linux/ceph. This
    is mostly a matter of moving files around. However, a few key pieces
    of the interface change as well:

    - ceph_client becomes ceph_fs_client and ceph_client, where the latter
    captures the mon and osd clients, and the fs_client gets the mds client
    and file system specific pieces.
    - Mount option parsing and debugfs setup is correspondingly broken into
    two pieces.
    - The mon client gets a generic handler callback for otherwise unknown
    messages (mds map, in this case).
    - The basic supported/required feature bits can be expanded (and are by
    ceph_fs_client).

    No functional change, aside from some subtle error handling cases that got
    cleaned up in the refactoring process.

    Signed-off-by: Sage Weil

    Yehuda Sadeh