03 Jan, 2013

1 commit

  • Pull Ceph fixes from Sage Weil:
    "Two of Alex's patches deal with a race when reseting server
    connections for open RBD images, one demotes some non-fatal BUGs to
    WARNs, and my patch fixes a protocol feature bit failure path."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
    libceph: fix protocol feature mismatch failure path
    libceph: WARN, don't BUG on unexpected connection states
    libceph: always reset osds when kicking
    libceph: move linger requests sooner in kick_requests()

    Linus Torvalds
     

28 Dec, 2012

4 commits

  • We should not set con->state to CLOSED here; that happens in
    ceph_fault() in the caller, where it first asserts that the state
    is not yet CLOSED. Avoids a BUG when the features don't match.

    Since the fail_protocol() has become a trivial wrapper, replace
    calls to it with direct calls to reset_connection().

    Signed-off-by: Sage Weil
    Reviewed-by: Alex Elder

    Sage Weil
     
  • A number of assertions in the ceph messenger are implemented with
    BUG_ON(), killing the system if connection's state doesn't match
    what's expected. At this point our state model is (evidently) not
    well understood enough for these assertions to trigger a BUG().
    Convert all BUG_ON(con->state...) calls to be WARN_ON(con->state...)
    so we learn about these issues without killing the machine.

    We now recognize that a connection fault can occur due to a socket
    closure at any time, regardless of the state of the connection. So
    there is really nothing we can assert about the state of the
    connection at that point so eliminate that assertion.

    Reported-by: Ugis
    Tested-by: Ugis
    Signed-off-by: Alex Elder
    Reviewed-by: Sage Weil

    Alex Elder
     
  • When ceph_osdc_handle_map() is called to process a new osd map,
    kick_requests() is called to ensure all affected requests are
    updated if necessary to reflect changes in the osd map. This
    happens in two cases: whenever an incremental map update is
    processed; and when a full map update (or the last one if there is
    more than one) gets processed.

    In the former case, the kick_requests() call is followed immediately
    by a call to reset_changed_osds() to ensure any connections to osds
    affected by the map change are reset. But for full map updates
    this isn't done.

    Both cases should be doing this osd reset.

    Rather than duplicating the reset_changed_osds() call, move it into
    the end of kick_requests().

    Signed-off-by: Alex Elder
    Reviewed-by: Sage Weil

    Alex Elder
     
  • The kick_requests() function is called by ceph_osdc_handle_map()
    when an osd map change has been indicated. Its purpose is to
    re-queue any request whose target osd is different from what it
    was when it was originally sent.

    It is structured as two loops, one for incomplete but registered
    requests, and a second for handling completed linger requests.
    As a special case, in the first loop if a request marked to linger
    has not yet completed, it is moved from the request list to the
    linger list. This is as a quick and dirty way to have the second
    loop handle sending the request along with all the other linger
    requests.

    Because of the way it's done now, however, this quick and dirty
    solution can result in these incomplete linger requests never
    getting re-sent as desired. The problem lies in the fact that
    the second loop only arranges for a linger request to be sent
    if it appears its target osd has changed. This is the proper
    handling for *completed* linger requests (it avoids issuing
    the same linger request twice to the same osd).

    But although the linger requests added to the list in the first loop
    may have been sent, they have not yet completed, so they need to be
    re-sent regardless of whether their target osd has changed.

    The first required fix is we need to avoid calling __map_request()
    on any incomplete linger request. Otherwise the subsequent
    __map_request() call in the second loop will find the target osd
    has not changed and will therefore not re-send the request.

    Second, we need to be sure that a sent but incomplete linger request
    gets re-sent. If the target osd is the same with the new osd map as
    it was when the request was originally sent, this won't happen.
    This can be fixed through careful handling when we move these
    requests from the request list to the linger list, by unregistering
    the request *before* it is registered as a linger request. This
    works because a side-effect of unregistering the request is to make
    the request's r_osd pointer be NULL, and *that* will ensure the
    second loop actually re-sends the linger request.

    Processing of such a request is done at that point, so continue with
    the next one once it's been moved.

    Signed-off-by: Alex Elder
    Reviewed-by: Sage Weil

    Alex Elder
     

21 Dec, 2012

6 commits

  • Pull Ceph update from Sage Weil:
    "There are a few different groups of commits here. The largest is
    Alex's ongoing work to enable the coming RBD features (cloning,
    striping). There is some cleanup in libceph that goes along with it.

    Cyril and David have fixed some problems with NFS reexport (leaking
    dentries and page locks), and there is a batch of patches from Yan
    fixing problems with the fs client when running against a clustered
    MDS. There are a few bug fixes mixed in for good measure, many of
    which will be going to the stable trees once they're upstream.

    My apologies for the late pull. There is still a gremlin in the rbd
    map/unmap code and I was hoping to include the fix for that as well,
    but we haven't been able to confirm the fix is correct yet; I'll send
    that in a separate pull once it's nailed down."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (68 commits)
    rbd: get rid of rbd_{get,put}_dev()
    libceph: register request before unregister linger
    libceph: don't use rb_init_node() in ceph_osdc_alloc_request()
    libceph: init event->node in ceph_osdc_create_event()
    libceph: init osd->o_node in create_osd()
    libceph: report connection fault with warning
    libceph: socket can close in any connection state
    rbd: don't use ENOTSUPP
    rbd: remove linger unconditionally
    rbd: get rid of RBD_MAX_SEG_NAME_LEN
    libceph: avoid using freed osd in __kick_osd_requests()
    ceph: don't reference req after put
    rbd: do not allow remove of mounted-on image
    libceph: Unlock unprocessed pages in start_read() error path
    ceph: call handle_cap_grant() for cap import message
    ceph: Fix __ceph_do_pending_vmtruncate
    ceph: Don't add dirty inode to dirty list if caps is in migration
    ceph: Fix infinite loop in __wake_requests
    ceph: Don't update i_max_size when handling non-auth cap
    bdi_register: add __printf verification, fix arg mismatch
    ...

    Linus Torvalds
     
  • In kick_requests(), we need to register the request before we
    unregister the linger request. Otherwise the unregister will
    reset the request's osd pointer to NULL.

    Signed-off-by: Alex Elder
    Reviewed-by: Sage Weil

    Alex Elder
     
  • The red-black node in the ceph osd request structure is initialized
    in ceph_osdc_alloc_request() using rbd_init_node(). We do need to
    initialize this, because in __unregister_request() we call
    RB_EMPTY_NODE(), which expects the node it's checking to have
    been initialized. But rb_init_node() is apparently overkill, and
    may in fact be on its way out. So use RB_CLEAR_NODE() instead.

    For a little more background, see this commit:
    4c199a93 rbtree: empty nodes have no color"

    Signed-off-by: Alex Elder
    Reviewed-by: Sage Weil

    Alex Elder
     
  • The red-black node node in the ceph osd event structure is not
    initialized in create_osdc_create_event(). Because this node can
    be the subject of a RB_EMPTY_NODE() call later on, we should ensure
    the node is initialized properly for that.

    Signed-off-by: Alex Elder
    Reviewed-by: Sage Weil

    Alex Elder
     
  • The red-black node node in the ceph osd structure is not initialized
    in create_osd(). Because this node can be the subject of a
    RB_EMPTY_NODE() call later on, we should ensure the node is
    initialized properly for that. Add a call to RB_CLEAR_NODE()
    initialize it.

    Signed-off-by: Alex Elder
    Reviewed-by: Sage Weil

    Alex Elder
     
  • When a connection's socket disconnects, or if there's a protocol
    error of some kind on the connection, a fault is signaled and
    the connection is reset (closed and reopened, basically). We
    currently get an error message on the log whenever this occurs.

    A ceph connection will attempt to reestablish a socket connection
    repeatedly if a fault occurs. This means that these error messages
    will get repeatedly added to the log, which is undesirable.

    Change the error message to be a warning, so they don't get
    logged by default.

    Signed-off-by: Alex Elder
    Reviewed-by: Sage Weil

    Alex Elder
     

18 Dec, 2012

2 commits

  • A connection's socket can close for any reason, independent of the
    state of the connection (and without irrespective of the connection
    mutex). As a result, the connectino can be in pretty much any state
    at the time its socket is closed.

    Handle those other cases at the top of con_work(). Pull this whole
    block of code into a separate function to reduce the clutter.

    Signed-off-by: Alex Elder
    Reviewed-by: Sage Weil

    Alex Elder
     
  • In __unregister_linger_request(), the request is being removed
    from the osd client's req_linger list only when the request
    has a non-null osd pointer. It should be done whether or not
    the request currently has an osd.

    This is most likely a non-issue because I believe the request
    will always have an osd when this function is called.

    Signed-off-by: Alex Elder
    Reviewed-by: Sage Weil

    Alex Elder
     

17 Dec, 2012

2 commits

  • If an osd has no requests and no linger requests, __reset_osd()
    will just remove it with a call to __remove_osd(). That drops
    a reference to the osd, and therefore the osd may have been free
    by the time __reset_osd() returns. That function offers no
    indication this may have occurred, and as a result the osd will
    continue to be used even when it's no longer valid.

    Change__reset_osd() so it returns an error (ENODEV) when it
    deletes the osd being reset. And change __kick_osd_requests() so it
    returns immediately (before referencing osd again) if __reset_osd()
    returns *any* error.

    Signed-off-by: Alex Elder
    Reviewed-by: Sage Weil

    Alex Elder
     
  • In __unregister_request(), there is a call to list_del_init()
    referencing a request that was the subject of a call to
    ceph_osdc_put_request() on the previous line. This is not
    safe, because the request structure could have been freed
    by the time we reach the list_del_init().

    Fix this by reversing the order of these lines.

    Signed-off-by: Alex Elder
    Reviewed-off-by: Sage Weil

    Alex Elder
     

13 Dec, 2012

1 commit

  • This would reset a connection with any OSD that had an outstanding
    request that was taking more than N seconds. The idea was that if the
    OSD was buggy, the client could compensate by resending the request.

    In reality, this only served to hide server bugs, and we haven't
    actually seen such a bug in quite a while. Moreover, the userspace
    client code never did this.

    More importantly, often the request is taking a long time because the
    OSD is trying to recover, or overloaded, and killing the connection
    and retrying would only make the situation worse by giving the OSD
    more work to do.

    Signed-off-by: Sage Weil
    Reviewed-by: Alex Elder

    Sage Weil
     

01 Nov, 2012

1 commit


30 Oct, 2012

1 commit

  • Ensure that we set the err value correctly so that we do not pass a 0
    value to ERR_PTR and confuse the calling code. (In particular,
    osd_client.c handle_map() will BUG(!newmap)).

    Signed-off-by: Sage Weil
    Reviewed-by: Alex Elder

    Sage Weil
     

29 Oct, 2012

1 commit

  • Pull Ceph fixes form Sage Weil:
    "There are two fixes in the messenger code, one that can trigger a NULL
    dereference, and one that error in refcounting (extra put). There is
    also a trivial fix that in the fs client code that is triggered by NFS
    reexport."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
    ceph: fix dentry reference leak in encode_fh()
    libceph: avoid NULL kref_put when osd reset races with alloc_msg
    rbd: reset BACKOFF if unable to re-queue

    Linus Torvalds
     

27 Oct, 2012

1 commit

  • The ceph_on_in_msg_alloc() method calls the ->alloc_msg() helper which
    may return NULL. It also drops con->mutex while it allocates a message,
    which means that the connection state may change (e.g., get closed). If
    that happens, we clean up and bail out. Avoid calling ceph_msg_put() on
    a NULL return value and triggering a crash.

    This was observed when an ->alloc_msg() call races with a timeout that
    resends a zillion messages and resets the connection, and ->alloc_msg()
    returns NULL (because the request was resent to another target).

    Fixes http://tracker.newdream.net/issues/3342

    Signed-off-by: Sage Weil
    Reviewed-by: Alex Elder

    Sage Weil
     

25 Oct, 2012

1 commit

  • The ceph_on_in_msg_alloc() method drops con->mutex while it allocates a
    message. If that races with a timeout that resends a zillion messages and
    resets the connection, and the ->alloc_msg() method returns a NULL message,
    it will call ceph_msg_put(NULL) and BUG.

    Fix by only calling put if msg is non-NULL.

    Fixes http://tracker.newdream.net/issues/3142

    Signed-off-by: Sage Weil

    Sage Weil
     

15 Oct, 2012

1 commit

  • Pull module signing support from Rusty Russell:
    "module signing is the highlight, but it's an all-over David Howells frenzy..."

    Hmm "Magrathea: Glacier signing key". Somebody has been reading too much HHGTTG.

    * 'modules-next' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (37 commits)
    X.509: Fix indefinite length element skip error handling
    X.509: Convert some printk calls to pr_devel
    asymmetric keys: fix printk format warning
    MODSIGN: Fix 32-bit overflow in X.509 certificate validity date checking
    MODSIGN: Make mrproper should remove generated files.
    MODSIGN: Use utf8 strings in signer's name in autogenerated X.509 certs
    MODSIGN: Use the same digest for the autogen key sig as for the module sig
    MODSIGN: Sign modules during the build process
    MODSIGN: Provide a script for generating a key ID from an X.509 cert
    MODSIGN: Implement module signature checking
    MODSIGN: Provide module signing public keys to the kernel
    MODSIGN: Automatically generate module signing keys if missing
    MODSIGN: Provide Kconfig options
    MODSIGN: Provide gitignore and make clean rules for extra files
    MODSIGN: Add FIPS policy
    module: signature checking hook
    X.509: Add a crypto key parser for binary (DER) X.509 certificates
    MPILIB: Provide a function to read raw data into an MPI
    X.509: Add an ASN.1 decoder
    X.509: Add simple ASN.1 grammar compiler
    ...

    Linus Torvalds
     

10 Oct, 2012

3 commits

  • This patch defines a single function, queue_con_delay() to call
    queue_delayed_work() for a connection. It basically generalizes
    what was previously queue_con() by adding the delay argument.
    queue_con() is now a simple helper that passes 0 for its delay.
    queue_con_delay() returns 0 if it queued work or an errno if it
    did not for some reason.

    If con_work() finds the BACKOFF flag set for a connection, it now
    calls queue_con_delay() to handle arranging to start again after a
    delay.

    Note about connection reference counts: con_work() only ever gets
    called as a work item function. At the time that work is scheduled,
    a reference to the connection is acquired, and the corresponding
    con_work() call is then responsible for dropping that reference
    before it returns.

    Previously, the backoff handling inside con_work() silently handed
    off its reference to delayed work it scheduled. Now that
    queue_con_delay() is used, a new reference is acquired for the
    newly-scheduled work, and the original reference is dropped by the
    con->ops->put() call at the end of the function.

    Signed-off-by: Alex Elder
    Reviewed-by: Sage Weil

    Alex Elder
     
  • Both ceph_fault() and con_work() include handling for imposing a
    delay before doing further processing on a faulted connection.
    The latter is used only if ceph_fault() is unable to.

    Instead, just let con_work() always be responsible for implementing
    the delay. After setting up the delay value, set the BACKOFF flag
    on the connection unconditionally and call queue_con() to ensure
    con_work() will get called to handle it.

    Signed-off-by: Alex Elder
    Reviewed-by: Sage Weil

    Alex Elder
     
  • If ceph_fault() is unable to queue work after a delay, it sets the
    BACKOFF connection flag so con_work() will attempt to do so.

    In con_work(), when BACKOFF is set, if queue_delayed_work() doesn't
    result in newly-queued work, it simply ignores this condition and
    proceeds as if no backoff delay were desired. There are two
    problems with this--one of which is a bug.

    The first problem is simply that the intended behavior is to back
    off, and if we aren't able queue the work item to run after a delay
    we're not doing that.

    The only reason queue_delayed_work() won't queue work is if the
    provided work item is already queued. In the messenger, this
    means that con_work() is already scheduled to be run again. So
    if we simply set the BACKOFF flag again when this occurs, we know
    the next con_work() call will again attempt to hold off activity
    on the connection until after the delay.

    The second problem--the bug--is a leak of a reference count. If
    queue_delayed_work() returns 0 in con_work(), con->ops->put() drops
    the connection reference held on entry to con_work(). However,
    processing is (was) allowed to continue, and at the end of the
    function a second con->ops->put() is called.

    This patch fixes both problems.

    Signed-off-by: Alex Elder
    Reviewed-by: Sage Weil

    Alex Elder
     

09 Oct, 2012

1 commit

  • Empty nodes have no color. We can make use of this property to simplify
    the code emitted by the RB_EMPTY_NODE and RB_CLEAR_NODE macros. Also,
    we can get rid of the rb_init_node function which had been introduced by
    commit 88d19cf37952 ("timers: Add rb_init_node() to allow for stack
    allocated rb nodes") to avoid some issue with the empty node's color not
    being initialized.

    I'm not sure what the RB_EMPTY_NODE checks in rb_prev() / rb_next() are
    doing there, though. axboe introduced them in commit 10fd48f2376d
    ("rbtree: fixed reversed RB_EMPTY_NODE and rb_next/prev"). The way I
    see it, the 'empty node' abstraction is only used by rbtree users to
    flag nodes that they haven't inserted in any rbtree, so asking the
    predecessor or successor of such nodes doesn't make any sense.

    One final rb_init_node() caller was recently added in sysctl code to
    implement faster sysctl name lookups. This code doesn't make use of
    RB_EMPTY_NODE at all, and from what I could see it only called
    rb_init_node() under the mistaken assumption that such initialization was
    required before node insertion.

    [sfr@canb.auug.org.au: fix net/ceph/osd_client.c build]
    Signed-off-by: Michel Lespinasse
    Cc: Andrea Arcangeli
    Acked-by: David Woodhouse
    Cc: Rik van Riel
    Cc: Peter Zijlstra
    Cc: Daniel Santos
    Cc: Jens Axboe
    Cc: "Eric W. Biederman"
    Cc: John Stultz
    Signed-off-by: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     

08 Oct, 2012

1 commit

  • Give the key type the opportunity to preparse the payload prior to the
    instantiation and update routines being called. This is done with the
    provision of two new key type operations:

    int (*preparse)(struct key_preparsed_payload *prep);
    void (*free_preparse)(struct key_preparsed_payload *prep);

    If the first operation is present, then it is called before key creation (in
    the add/update case) or before the key semaphore is taken (in the update and
    instantiate cases). The second operation is called to clean up if the first
    was called.

    preparse() is given the opportunity to fill in the following structure:

    struct key_preparsed_payload {
    char *description;
    void *type_data[2];
    void *payload;
    const void *data;
    size_t datalen;
    size_t quotalen;
    };

    Before the preparser is called, the first three fields will have been cleared,
    the payload pointer and size will be stored in data and datalen and the default
    quota size from the key_type struct will be stored into quotalen.

    The preparser may parse the payload in any way it likes and may store data in
    the type_data[] and payload fields for use by the instantiate() and update()
    ops.

    The preparser may also propose a description for the key by attaching it as a
    string to the description field. This can be used by passing a NULL or ""
    description to the add_key() system call or the key_create_or_update()
    function. This cannot work with request_key() as that required the description
    to tell the upcall about the key to be created.

    This, for example permits keys that store PGP public keys to generate their own
    name from the user ID and public key fingerprint in the key.

    The instantiate() and update() operations are then modified to look like this:

    int (*instantiate)(struct key *key, struct key_preparsed_payload *prep);
    int (*update)(struct key *key, struct key_preparsed_payload *prep);

    and the new payload data is passed in *prep, whether or not it was preparsed.

    Signed-off-by: David Howells
    Signed-off-by: Rusty Russell

    David Howells
     

02 Oct, 2012

5 commits


22 Sep, 2012

1 commit

  • In write_partial_msg_pages(), pages need to be kmapped in order to
    perform a CRC-32c calculation on them. As an artifact of the way
    this code used to be structured, the kunmap() call was separated
    from the kmap() call and both were done conditionally. But the
    conditions under which the kmap() and kunmap() calls were made
    differed, so there was a chance a kunmap() call would be done on a
    page that had not been mapped.

    The symptom of this was tripping a BUG() in kunmap_high() when
    pkmap_count[nr] became 0.

    Reported-by: Bryan K. Wright
    Signed-off-by: Alex Elder
    Reviewed-by: Sage Weil

    Alex Elder
     

22 Aug, 2012

1 commit

  • Because the Ceph client messenger uses a non-blocking connect, it is
    possible for the sending of the client banner to race with the
    arrival of the banner sent by the peer.

    When ceph_sock_state_change() notices the connect has completed, it
    schedules work to process the socket via con_work(). During this
    time the peer is writing its banner, and arrival of the peer banner
    races with con_work().

    If con_work() calls try_read() before the peer banner arrives, there
    is nothing for it to do, after which con_work() calls try_write() to
    send the client's banner. In this case Ceph's protocol negotiation
    can complete succesfully.

    The server-side messenger immediately sends its banner and addresses
    after accepting a connect request, *before* actually attempting to
    read or verify the banner from the client. As a result, it is
    possible for the banner from the server to arrive before con_work()
    calls try_read(). If that happens, try_read() will read the banner
    and prepare protocol negotiation info via prepare_write_connect().
    prepare_write_connect() calls con_out_kvec_reset(), which discards
    the as-yet-unsent client banner. Next, con_work() calls
    try_write(), which sends the protocol negotiation info rather than
    the banner that the peer is expecting.

    The result is that the peer sees an invalid banner, and the client
    reports "negotiation failed".

    Fix this by moving con_out_kvec_reset() out of
    prepare_write_connect() to its callers at all locations except the
    one where the banner might still need to be sent.

    [elder@inktak.com: added note about server-side behavior]

    Signed-off-by: Jim Schutt
    Reviewed-by: Alex Elder

    Jim Schutt
     

21 Aug, 2012

1 commit

  • The debugfs directory includes the cluster fsid and our unique global_id.
    We need to delay the initialization of the debug entry until we have
    learned both the fsid and our global_id from the monitor or else the
    second client can't create its debugfs entry and will fail (and multiple
    client instances aren't properly reflected in debugfs).

    Reported by: Yan, Zheng
    Signed-off-by: Sage Weil
    Reviewed-by: Yehuda Sadeh

    Sage Weil
     

03 Aug, 2012

1 commit


01 Aug, 2012

1 commit

  • Pull Ceph changes from Sage Weil:
    "Lots of stuff this time around:

    - lots of cleanup and refactoring in the libceph messenger code, and
    many hard to hit races and bugs closed as a result.
    - lots of cleanup and refactoring in the rbd code from Alex Elder,
    mostly in preparation for the layering functionality that will be
    coming in 3.7.
    - some misc rbd cleanups from Josh Durgin that are finally going
    upstream
    - support for CRUSH tunables (used by newer clusters to improve the
    data placement)
    - some cleanup in our use of d_parent that Al brought up a while back
    - a random collection of fixes across the tree

    There is another patch coming that fixes up our ->atomic_open()
    behavior, but I'm going to hammer on it a bit more before sending it."

    Fix up conflicts due to commits that were already committed earlier in
    drivers/block/rbd.c, net/ceph/{messenger.c, osd_client.c}

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (132 commits)
    rbd: create rbd_refresh_helper()
    rbd: return obj version in __rbd_refresh_header()
    rbd: fixes in rbd_header_from_disk()
    rbd: always pass ops array to rbd_req_sync_op()
    rbd: pass null version pointer in add_snap()
    rbd: make rbd_create_rw_ops() return a pointer
    rbd: have __rbd_add_snap_dev() return a pointer
    libceph: recheck con state after allocating incoming message
    libceph: change ceph_con_in_msg_alloc convention to be less weird
    libceph: avoid dropping con mutex before fault
    libceph: verify state after retaking con lock after dispatch
    libceph: revoke mon_client messages on session restart
    libceph: fix handling of immediate socket connect failure
    ceph: update MAINTAINERS file
    libceph: be less chatty about stray replies
    libceph: clear all flags on con_close
    libceph: clean up con flags
    libceph: replace connection state bits with states
    libceph: drop unnecessary CLOSED check in socket state change callback
    libceph: close socket directly from ceph_con_close()
    ...

    Linus Torvalds
     

31 Jul, 2012

3 commits

  • We drop the lock when calling the ->alloc_msg() con op, which means
    we need to (a) not clobber con->in_msg without the mutex held, and (b)
    we need to verify that we are still in the OPEN state when we retake
    it to avoid causing any mayhem. If the state does change, -EAGAIN
    will get us back to con_work() and loop.

    Signed-off-by: Sage Weil
    Reviewed-by: Alex Elder

    Sage Weil
     
  • This function's calling convention is very limiting. In particular,
    we can't return any error other than ENOMEM (and only implicitly),
    which is a problem (see next patch).

    Instead, return an normal 0 or error code, and make the skip a pointer
    output parameter. Drop the useless in_hdr argument (we have the con
    pointer).

    Signed-off-by: Sage Weil
    Reviewed-by: Alex Elder

    Sage Weil
     
  • The ceph_fault() function takes the con mutex, so we should avoid
    dropping it before calling it. This fixes a potential race with
    another thread calling ceph_con_close(), or _open(), or similar (we
    don't reverify con->state after retaking the lock).

    Add annotation so that lockdep realizes we will drop the mutex before
    returning.

    Signed-off-by: Sage Weil
    Reviewed-by: Alex Elder

    Sage Weil