01 Nov, 2011

1 commit


26 Oct, 2011

3 commits

  • Change ceph_parse_ips to take either names given as
    IP addresses or standard hostnames (e.g. localhost).
    The DNS lookup is done using the dns_resolver facility
    similar to its use in AFS, NFS, and CIFS.

    This patch defines CONFIG_CEPH_LIB_USE_DNS_RESOLVER
    that controls if this feature is on or off.

    Signed-off-by: Noah Watkins
    Signed-off-by: Sage Weil

    Noah Watkins
     
  • Any non-masked msg allocation failure should generate a warning and stack
    trace to the console. All of these need to eventually be replaced by
    safe preallocation or msgpools.

    Signed-off-by: Sage Weil

    Sage Weil
     
  • The pool allocation failures are masked by the pool; there is no need to
    spam the console about them. (That's the whole point of having the pool
    in the first place.)

    Mark msg allocations whose failure is safely handled as such.

    Signed-off-by: Sage Weil

    Sage Weil
     

17 Sep, 2011

1 commit

  • Commit 4cf9d544631c recorded when an outgoing ceph message was ACKed,
    in order to avoid unnecessary connection resets when an OSD is busy.

    However, ack_stamp is uninitialized, so there is a window between
    when the message is sent and when it is ACKed in which handle_timeout()
    interprets the unitialized value as an expired timeout, and resets
    the connection unnecessarily.

    Close the window by initializing ack_stamp.

    Signed-off-by: Jim Schutt
    Signed-off-by: Sage Weil

    Jim Schutt
     

27 Jul, 2011

1 commit

  • Keep track of when an outgoing message is ACKed (i.e., the server fully
    received it and, presumably, queued it for processing). Time out OSD
    requests only if it's been too long since they've been received.

    This prevents timeouts and connection thrashing when the OSDs are simply
    busy and are throttling the requests they read off the network.

    Reviewed-by: Yehuda Sadeh
    Signed-off-by: Sage Weil

    Sage Weil
     

20 May, 2011

5 commits


04 May, 2011

1 commit


05 Mar, 2011

3 commits

  • The standby logic used to be pretty dependent on the work requeueing
    behavior that changed when we switched to WQ_NON_REENTRANT. It was also
    very fragile.

    Restructure things so that:
    - We clear WRITE_PENDING when we set STANDBY. This ensures we will
    requeue work when we wake up later.
    - con_work backs off if STANDBY is set. There is nothing to do if we are
    in standby.
    - clear_standby() helper is called by both con_send() and con_keepalive(),
    the two actions that can wake us up again. Move the connect_seq++
    logic here.

    Signed-off-by: Sage Weil

    Sage Weil
     
  • There was some broken keepalive code using a dead variable. Shift to using
    the proper bit flag.

    Signed-off-by: Sage Weil

    Sage Weil
     
  • With commit f363e45f we replaced a bunch of hacky workqueue mutual
    exclusion logic with the WQ_NON_REENTRANT flag. One pieces of fallout is
    that the exponential backoff breaks in certain cases:

    * con_work attempts to connect.
    * we get an immediate failure, and the socket state change handler queues
    immediate work.
    * con_work calls con_fault, we decide to back off, but can't queue delayed
    work.

    In this case, we add a BACKOFF bit to make con_work reschedule delayed work
    next time it runs (which should be immediately).

    Signed-off-by: Sage Weil

    Sage Weil
     

04 Mar, 2011

1 commit

  • If we mark the connection CLOSED we will give up trying to reconnect to
    this server instance. That is appropriate for things like a protocol
    version mismatch that won't change until the server is restarted, at which
    point we'll get a new addr and reconnect. An authorization failure like
    this is probably due to the server not properly rotating it's secret keys,
    however, and should be treated as transient so that the normal backoff and
    retry behavior kicks in.

    Signed-off-by: Sage Weil

    Sage Weil
     

26 Jan, 2011

2 commits

  • Pass errors from writing to the socket up the stack. If we get -EAGAIN,
    return 0 from the helper to simplify the callers' checks.

    Signed-off-by: Sage Weil

    Sage Weil
     
  • If we get EAGAIN when trying to read from the socket, it is not an error.
    Return 0 from the helper in this case to simplify the error handling cases
    in the caller (indirectly, try_read).

    Fix try_read to pass any error to it's caller (con_work) instead of almost
    always returning 0. This let's us respond to things like socket
    disconnects.

    Signed-off-by: Sage Weil

    Sage Weil
     

13 Jan, 2011

1 commit

  • ceph messenger code does a rather complex dancing around multithread
    workqueue to make sure the same work item isn't executed concurrently
    on different CPUs. This restriction can be provided by workqueue with
    WQ_NON_REENTRANT.

    Make ceph_msgr_wq non-reentrant workqueue with the default concurrency
    level and remove the QUEUED/BUSY logic.

    * This removes backoff handling in con_work() but it couldn't reliably
    block execution of con_work() to begin with - queue_con() can be
    called after the work started but before BUSY is set. It seems that
    it was an optimization for a rather cold path and can be safely
    removed.

    * The number of concurrent work items is bound by the number of
    connections and connetions are independent from each other. With
    the default concurrency level, different connections will be
    executed independently.

    Signed-off-by: Tejun Heo
    Cc: Sage Weil
    Cc: ceph-devel@vger.kernel.org
    Signed-off-by: Sage Weil

    Tejun Heo
     

14 Dec, 2010

1 commit


10 Nov, 2010

1 commit

  • The alignment used for reading data into or out of pages used to be taken
    from the data_off field in the message header. This only worked as long
    as the page alignment matched the object offset, breaking direct io to
    non-page aligned offsets.

    Instead, explicitly specify the page alignment next to the page vector
    in the ceph_msg struct, and use that instead of the message header (which
    probably shouldn't be trusted). The alloc_msg callback is responsible for
    filling in this field properly when it sets up the page vector.

    Signed-off-by: Sage Weil

    Sage Weil
     

02 Nov, 2010

1 commit

  • If the client gets out of sync with the server message sequence number, we
    normally skip low seq messages (ones we already received). The skip code
    was also incrementing the expected seq, such that all subsequent messages
    also appeared old and got skipped, and an eventual timeout on the osd
    connection. This resulted in some lagging requests and console messages
    like

    [233480.882885] ceph: skipping osd22 10.138.138.13:6804 seq 2016, expected 2017
    [233480.882919] ceph: skipping osd22 10.138.138.13:6804 seq 2017, expected 2018
    [233480.882963] ceph: skipping osd22 10.138.138.13:6804 seq 2018, expected 2019
    [233480.883488] ceph: skipping osd22 10.138.138.13:6804 seq 2019, expected 2020
    [233485.219558] ceph: skipping osd22 10.138.138.13:6804 seq 2020, expected 2021
    [233485.906595] ceph: skipping osd22 10.138.138.13:6804 seq 2021, expected 2022
    [233490.379536] ceph: skipping osd22 10.138.138.13:6804 seq 2022, expected 2023
    [233495.523260] ceph: skipping osd22 10.138.138.13:6804 seq 2023, expected 2024
    [233495.923194] ceph: skipping osd22 10.138.138.13:6804 seq 2024, expected 2025
    [233500.534614] ceph: tid 6023602 timed out on osd22, will reset osd

    Reported-by: Theodore Ts'o
    Signed-off-by: Sage Weil

    Sage Weil
     

21 Oct, 2010

1 commit

  • This factors out protocol and low-level storage parts of ceph into a
    separate libceph module living in net/ceph and include/linux/ceph. This
    is mostly a matter of moving files around. However, a few key pieces
    of the interface change as well:

    - ceph_client becomes ceph_fs_client and ceph_client, where the latter
    captures the mon and osd clients, and the fs_client gets the mds client
    and file system specific pieces.
    - Mount option parsing and debugfs setup is correspondingly broken into
    two pieces.
    - The mon client gets a generic handler callback for otherwise unknown
    messages (mds map, in this case).
    - The basic supported/required feature bits can be expanded (and are by
    ceph_fs_client).

    No functional change, aside from some subtle error handling cases that got
    cleaned up in the refactoring process.

    Signed-off-by: Sage Weil

    Yehuda Sadeh