11 Jan, 2012

1 commit


14 Dec, 2011

1 commit


08 Dec, 2011

1 commit

  • We have been using i_lock to protect all kinds of data structures in the
    ceph_inode_info struct, including lists of inodes that we need to iterate
    over while avoiding races with inode destruction. That requires grabbing
    a reference to the inode with the list lock protected, but igrab() now
    takes i_lock to check the inode flags.

    Changing the list lock ordering would be a painful process.

    However, using a ceph-specific i_ceph_lock in the ceph inode instead of
    i_lock is a simple mechanical change and avoids the ordering constraints
    imposed by igrab().

    Reported-by: Amon Ott
    Signed-off-by: Sage Weil

    Sage Weil
     

06 Nov, 2011

2 commits

  • Quiet the following sparse noise:

    warning: symbol 'get_nonsnap_parent' was not declared. Should it be static?
    warning: symbol 'done_closing_sessions' was not declared. Should it be static?

    Local functions don't need external visability. Make them static.

    Signed-off-by: H Hartley Sweeten
    Cc: Sage Weil
    Signed-off-by: Sage Weil

    H Hartley Sweeten
     
  • We used to use a flag on the directory inode to track whether the dcache
    contents for a directory were a complete cached copy. Switch to a dentry
    flag CEPH_D_COMPLETE that is safely updated by ->d_prune().

    Signed-off-by: Sage Weil

    Sage Weil
     

26 Oct, 2011

1 commit


16 Aug, 2011

1 commit


27 Jul, 2011

4 commits

  • For the most part we don't care about racing with rename when directing
    MDS requests; either the old or new parent is fine. Document that, and
    do some minor cleanup.

    Reviewed-by: Yehuda Sadeh
    Signed-off-by: Sage Weil

    Sage Weil
     
  • We carry a pin on the parent directory for the rename source and dest
    dentries. For the source it's r_locked_dir; we need to explicitly
    reference the old_dentry parent as well, since the dentry's d_parent may
    change between when the request was created and pinned and when it is
    freed.

    Reviewed-by: Yehuda Sadeh
    Signed-off-by: Sage Weil

    Sage Weil
     
  • Have caller pass in a safely-obtained reference to the parent directory
    for calculating a dentry's hash valud.

    While we're here, simpify the flow through ceph_encode_fh() so that there
    is a single exit point and cleanup.

    Also fix a bug with the dentry hash calculation: calculate the hash for the
    dentry we were given, not its parent.

    Reviewed-by: Yehuda Sadeh
    Signed-off-by: Sage Weil

    Sage Weil
     
  • The lease mask is no longer used (and it changed a while back). Instead,
    use a non-zero duration to indicate that there is a lease being issued.

    Reviewed-by: Yehuda Sadeh
    Signed-off-by: Sage Weil

    Sage Weil
     

17 Jul, 2011

1 commit


25 May, 2011

1 commit

  • In e9964c10 we change cap flushing to do a delicate dance because some
    inodes on the cap_dirty list could be in a migrating state (got EXPORT but
    not IMPORT) in which we couldn't actually flush and move from
    dirty->flushing, breaking the while (!empty) { process first } loop
    structure. It worked for a single sync thread, but was not reentrant and
    triggered infinite loops when multiple syncers came along.

    Instead, move inodes with dirty to a separate cap_dirty_migrating list
    when in the limbo export-but-no-import state, allowing us to go back to
    the simple loop structure (which was reentrant). This is cleaner and more
    robust.

    Audited the cap_dirty users and this looks fine:
    list_empty(&ci->i_dirty_item) is still a reliable indicator of whether we
    have dirty caps (which list we're on is irrelevant) and list_del_init()
    calls still do the right thing.

    Signed-off-by: Sage Weil

    Sage Weil
     

20 May, 2011

2 commits


12 May, 2011

1 commit


26 Mar, 2011

1 commit

  • The release method for mds connections uses a backpointer to the
    mds_client, so we need to flush the workqueue of any pending work (and
    ceph_connection references) prior to freeing the mds_client. This fixes
    an oops easily triggered under UML by

    while true ; do mount ... ; umount ... ; done

    Also fix an outdated comment: the flush in ceph_destroy_client only flushes
    OSD connections out. This bug is basically an artifact of the ceph ->
    ceph+libceph conversion.

    Signed-off-by: Sage Weil

    Sage Weil
     

28 Jan, 2011

1 commit


26 Jan, 2011

1 commit

  • Ignore replication or auth frag data if it indicates an MDS that is not
    active. This can happen if the MDS shuts down and the client has stale
    data about the namespace distribution across the MDS cluster. If that's
    the case, fall back to directing the request based on the auth cap (which
    should always be accurate).

    Signed-off-by: Sage Weil

    Sage Weil
     

14 Jan, 2011

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
    rbd: fix cleanup when trying to mount inexistent image
    net/ceph: make ceph_msgr_wq non-reentrant
    ceph: fsc->*_wq's aren't used in memory reclaim path
    ceph: Always free allocated memory in osdmap_decode()
    ceph: Makefile: Remove unnessary code
    ceph: associate requests with opening sessions
    ceph: drop redundant r_mds field
    ceph: implement DIRLAYOUTHASH feature to get dir layout from MDS
    ceph: add dir_layout to inode

    Linus Torvalds
     

13 Jan, 2011

3 commits

  • Associate request with sessions that aren't yep open. This makes the
    debugfs mdsc request list more informative.

    Signed-off-by: Sage Weil

    Sage Weil
     
  • The r_mds field is redundant, since we can find the same information at
    r_session->s_mds, and when r_session is NULL then r_mds is meaningless.

    Signed-off-by: Sage Weil

    Sage Weil
     
  • This implements the DIRLAYOUTHASH protocol feature, which passes the dir
    layout over the wire from the MDS. This gives the client knowledge
    of the correct hash function to use for mapping dentries among dir
    fragments.

    Note that if this feature is _not_ present on the client but is on the
    MDS, the client may misdirect requests. This will result in a forward
    and degrade performance. It may also result in inaccurate NFS filehandle
    generation, which will prevent fh resolution when the inode is not present
    in the client cache and the parent directories have been fragmented.

    Signed-off-by: Sage Weil

    Sage Weil
     

07 Jan, 2011

1 commit

  • Make d_count non-atomic and protect it with d_lock. This allows us to ensure a
    0 refcount dentry remains 0 without dcache_lock. It is also fairly natural when
    we start protecting many other dentry members with d_lock.

    Signed-off-by: Nick Piggin

    Nick Piggin
     

02 Dec, 2010

1 commit


20 Nov, 2010

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
    ceph: fix readdir EOVERFLOW on 32-bit archs
    ceph: fix frag offset for non-leftmost frags
    ceph: fix dangling pointer
    ceph: explicitly specify page alignment in network messages
    ceph: make page alignment explicit in osd interface
    ceph: fix comment, remove extraneous args
    ceph: fix update of ctime from MDS
    ceph: fix version check on racing inode updates
    ceph: fix uid/gid on resent mds requests
    ceph: fix rdcache_gen usage and invalidate
    ceph: re-request max_size if cap auth changes
    ceph: only let auth caps update max_size
    ceph: fix open for write on clustered mds
    ceph: fix bad pointer dereference in ceph_fill_trace
    ceph: fix small seq message skipping
    Revert "ceph: update issue_seq on cap grant"

    Linus Torvalds
     

18 Nov, 2010

1 commit


08 Nov, 2010

1 commit

  • MDS requests can be rebuilt and resent in non-process context, but were
    filling in uid/gid from current_fsuid/gid. Put that information in the
    request struct on request setup.

    This fixes incorrect (and root) uid/gid getting set for requests that
    are forwarded between MDSs, usually due to metadata migrations.

    Signed-off-by: Sage Weil

    Sage Weil
     

21 Oct, 2010

3 commits

  • Switch from using the BKL explicitly to the new lock_flocks() interface.
    Eventually this will turn into a spinlock.

    Signed-off-by: Sage Weil

    Sage Weil
     
  • When the lock_kernel() turns into lock_flocks() and a spinlock, we won't
    be able to do allocations with the lock held. Preallocate space without
    the lock, and retry if the lock state changes out from underneath us.

    Signed-off-by: Greg Farnum
    Signed-off-by: Sage Weil

    Greg Farnum
     
  • This factors out protocol and low-level storage parts of ceph into a
    separate libceph module living in net/ceph and include/linux/ceph. This
    is mostly a matter of moving files around. However, a few key pieces
    of the interface change as well:

    - ceph_client becomes ceph_fs_client and ceph_client, where the latter
    captures the mon and osd clients, and the fs_client gets the mds client
    and file system specific pieces.
    - Mount option parsing and debugfs setup is correspondingly broken into
    two pieces.
    - The mon client gets a generic handler callback for otherwise unknown
    messages (mds map, in this case).
    - The basic supported/required feature bits can be expanded (and are by
    ceph_fs_client).

    No functional change, aside from some subtle error handling cases that got
    cleaned up in the refactoring process.

    Signed-off-by: Sage Weil

    Yehuda Sadeh
     

12 Sep, 2010

1 commit


27 Aug, 2010

1 commit


23 Aug, 2010

2 commits

  • When making a request in the virtual snapdir or a snapped portion of the
    namespace, we should choose the MDS based on the first nonsnap parent (and
    its caps). If that is not the best place, we will get forward hints to
    find the right MDS in the cluster. This fixes ESTALE errors when using
    the .snap directory and namespace with multiple MDSs.

    Signed-off-by: Sage Weil

    Sage Weil
     
  • The use of a completion when waiting for session shutdown during umount is
    inappropriate, given the complexity of the condition. For multiple MDS's,
    this resulted in the umount thread spinning, often preventing the session
    close message from being processed in some cases.

    Switch to a waitqueue and defined a condition helper. This cleans things
    up nicely.

    Signed-off-by: Sage Weil

    Sage Weil
     

04 Aug, 2010

1 commit


03 Aug, 2010

2 commits


02 Aug, 2010

2 commits