11 Jan, 2012

3 commits


14 Dec, 2011

1 commit


13 Dec, 2011

1 commit


22 Nov, 2011

1 commit


12 Nov, 2011

1 commit

  • ceph_osd_request struct allocates a 40-byte buffer for object names.
    RBD image names can be up to 96 chars long (100 with the .rbd suffix),
    which results in the object name for the image being truncated, and a
    subsequent map failure.

    Increase the oid buffer in request messages, in order to avoid the
    truncation.

    Signed-off-by: Stratos Psomadakis
    Signed-off-by: Sage Weil

    Stratos Psomadakis
     

01 Nov, 2011

1 commit


26 Oct, 2011

7 commits


30 Sep, 2011

1 commit


29 Sep, 2011

2 commits

  • The incremental map updates have a record for each pg_temp mapping that is
    to be add/updated (len > 0) or removed (len == 0). The old code was
    written as if the updates were a complete enumeration; that was just wrong.
    Update the code to remove 0-length entries and drop the rbtree traversal.

    This avoids misdirected (and hung) requests that manifest as server
    errors like

    [WRN] client4104 10.0.1.219:0/275025290 misdirected client4104.1:129 0.1 to osd0 not [1,0] in e11/11

    Signed-off-by: Sage Weil

    Sage Weil
     
  • We need to apply the modulo pg_num calculation before looking up a pgid in
    the pg_temp mapping rbtree. This fixes pg_temp mappings, and fixes
    (some) misdirected requests that result in messages like

    [WRN] client4104 10.0.1.219:0/275025290 misdirected client4104.1:129 0.1 to osd0 not [1,0] in e11/11

    on the server and stall make the client block without getting a reply (at
    least until the pg_temp mapping goes way, but that can take a long long
    time).

    Reorder calc_pg_raw() a bit to make more sense.

    Signed-off-by: Sage Weil

    Sage Weil
     

17 Sep, 2011

3 commits

  • The r_req_lru_item list node moves between several lists, and that cycle
    is not directly related (and does not begin) with __register_request().
    Initialize it in the request constructor, not __register_request(). This
    fixes later badness (below) when OSDs restart underneath an rbd mount.

    Crashes we've seen due to this include:

    [ 213.974288] kernel BUG at net/ceph/messenger.c:2193!

    and

    [ 144.035274] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
    [ 144.035278] IP: [] con_work+0x1463/0x2ce0 [libceph]

    Signed-off-by: Sage Weil

    Sage Weil
     
  • ceph_destroy_options does not free opt->mon_addr that
    is allocated in ceph_parse_options.

    Signed-off-by: Noah Watkins
    Signed-off-by: Sage Weil

    Noah Watkins
     
  • Commit 4cf9d544631c recorded when an outgoing ceph message was ACKed,
    in order to avoid unnecessary connection resets when an OSD is busy.

    However, ack_stamp is uninitialized, so there is a window between
    when the message is sent and when it is ACKed in which handle_timeout()
    interprets the unitialized value as an expired timeout, and resets
    the connection unnecessarily.

    Close the window by initializing ack_stamp.

    Signed-off-by: Jim Schutt
    Signed-off-by: Sage Weil

    Jim Schutt
     

10 Sep, 2011

1 commit


01 Sep, 2011

1 commit


10 Aug, 2011

1 commit

  • There were several problems here:

    1- we weren't tagging allocations with the pool, so they were never
    returned to the pool.
    2- msgpool_put didn't add back to the mempool, even it were called.
    3- msgpool_release didn't clear the pool pointer, so it would have looped
    had #1 not been broken.

    These may or may not have been responsible for #1136 or #1381 (BUG due to
    non-empty mempool on umount). I can't seem to trigger the crash now using
    the method I was using before.

    Signed-off-by: Sage Weil

    Sage Weil
     

27 Jul, 2011

2 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (23 commits)
    ceph: document unlocked d_parent accesses
    ceph: explicitly reference rename old_dentry parent dir in request
    ceph: document locking for ceph_set_dentry_offset
    ceph: avoid d_parent in ceph_dentry_hash; fix ceph_encode_fh() hashing bug
    ceph: protect d_parent access in ceph_d_revalidate
    ceph: protect access to d_parent
    ceph: handle racing calls to ceph_init_dentry
    ceph: set dir complete frag after adding capability
    rbd: set blk_queue request sizes to object size
    ceph: set up readahead size when rsize is not passed
    rbd: cancel watch request when releasing the device
    ceph: ignore lease mask
    ceph: fix ceph_lookup_open intent usage
    ceph: only link open operations to directory unsafe list if O_CREAT|O_TRUNC
    ceph: fix bad parent_inode calc in ceph_lookup_open
    ceph: avoid carrying Fw cap during write into page cache
    libceph: don't time out osd requests that haven't been received
    ceph: report f_bfree based on kb_avail rather than diffing.
    ceph: only queue capsnap if caps are dirty
    ceph: fix snap writeback when racing with writes
    ...

    Linus Torvalds
     
  • Keep track of when an outgoing message is ACKed (i.e., the server fully
    received it and, presumably, queued it for processing). Time out OSD
    requests only if it's been too long since they've been received.

    This prevents timeouts and connection thrashing when the OSDs are simply
    busy and are throttling the requests they read off the network.

    Reviewed-by: Yehuda Sadeh
    Signed-off-by: Sage Weil

    Sage Weil
     

22 Jul, 2011

1 commit


20 Jul, 2011

1 commit

  • open(2) must always include one of O_RDONLY, O_WRONLY, or O_RDWR. No need
    for any O_APPEND special case.

    Passing O_WRONLY|O_RDWR is undefined according to the man page, but the
    Linux VFS interprets this as O_RDWR, so we'll do the same.

    This fixes open(2) with flags O_RDWR|O_APPEND, which was incorrectly being
    translated to readonly.

    Reported-by: Fyodor Ustinov
    Signed-off-by: Sage Weil

    Sage Weil
     

14 Jul, 2011

1 commit


21 Jun, 2011

1 commit


17 Jun, 2011

1 commit

  • Unnecessary casts of void * clutter the code.

    These are the remainder casts after several specific
    patches to remove netdev_priv and dev_priv.

    Done via coccinelle script:

    $ cat cast_void_pointer.cocci
    @@
    type T;
    T *pt;
    void *pv;
    @@

    - pt = (T *)pv;
    + pt = pv;

    Signed-off-by: Joe Perches
    Acked-by: Paul Moore
    Signed-off-by: David S. Miller

    Joe Perches
     

14 Jun, 2011

1 commit


08 Jun, 2011

1 commit


25 May, 2011

2 commits


20 May, 2011

5 commits