14 Dec, 2011

2 commits


08 Dec, 2011

1 commit

  • We have been using i_lock to protect all kinds of data structures in the
    ceph_inode_info struct, including lists of inodes that we need to iterate
    over while avoiding races with inode destruction. That requires grabbing
    a reference to the inode with the list lock protected, but igrab() now
    takes i_lock to check the inode flags.

    Changing the list lock ordering would be a painful process.

    However, using a ceph-specific i_ceph_lock in the ceph inode instead of
    i_lock is a simple mechanical change and avoids the ordering constraints
    imposed by igrab().

    Reported-by: Amon Ott
    Signed-off-by: Sage Weil

    Sage Weil
     

03 Dec, 2011

1 commit


12 Nov, 2011

1 commit

  • Set up d_fsdata on the root dentry. This fixes a NULL pointer dereference
    in ceph_d_prune on umount. It also means we can eventually strip out all
    of the conditional checks on d_fsdata because it is now set unconditionally
    (prior to setting up the d_ops).

    Fix the ceph_d_prune debug print while we're here.

    Signed-off-by: Sage Weil

    Sage Weil
     

06 Nov, 2011

4 commits

  • If we queue a work item that calls iput(), make sure we ihold() before
    attempting to queue work. Otherwise our queued work might miraculously run
    before we notice the queue_work() succeeded and call ihold(), allowing the
    inode to be destroyed.

    That is, instead of

    if (queue_work(...))
    ihold();

    we need to do

    ihold();
    if (!queue_work(...))
    iput();

    Reported-by: Amon Ott
    Signed-off-by: Sage Weil

    Sage Weil
     
  • Quiet the sparse noise:

    warning: symbol 'create_fs_client' was not declared. Should it be static?
    warning: symbol 'destroy_fs_client' was not declared. Should it be static?

    Signed-off-by: H Hartley Sweeten
    Cc: Sage Weil
    ceph-devel@vger.kernel.org
    Signed-off-by: Sage Weil

    H Hartley Sweeten
     
  • Quiet the following sparse noise:

    warning: symbol 'get_nonsnap_parent' was not declared. Should it be static?
    warning: symbol 'done_closing_sessions' was not declared. Should it be static?

    Local functions don't need external visability. Make them static.

    Signed-off-by: H Hartley Sweeten
    Cc: Sage Weil
    Signed-off-by: Sage Weil

    H Hartley Sweeten
     
  • We used to use a flag on the directory inode to track whether the dcache
    contents for a directory were a complete cached copy. Switch to a dentry
    flag CEPH_D_COMPLETE that is safely updated by ->d_prune().

    Signed-off-by: Sage Weil

    Sage Weil
     

04 Nov, 2011

1 commit


02 Nov, 2011

1 commit


26 Oct, 2011

11 commits


10 Sep, 2011

1 commit


23 Aug, 2011

1 commit


16 Aug, 2011

1 commit


27 Jul, 2011

15 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (23 commits)
    ceph: document unlocked d_parent accesses
    ceph: explicitly reference rename old_dentry parent dir in request
    ceph: document locking for ceph_set_dentry_offset
    ceph: avoid d_parent in ceph_dentry_hash; fix ceph_encode_fh() hashing bug
    ceph: protect d_parent access in ceph_d_revalidate
    ceph: protect access to d_parent
    ceph: handle racing calls to ceph_init_dentry
    ceph: set dir complete frag after adding capability
    rbd: set blk_queue request sizes to object size
    ceph: set up readahead size when rsize is not passed
    rbd: cancel watch request when releasing the device
    ceph: ignore lease mask
    ceph: fix ceph_lookup_open intent usage
    ceph: only link open operations to directory unsafe list if O_CREAT|O_TRUNC
    ceph: fix bad parent_inode calc in ceph_lookup_open
    ceph: avoid carrying Fw cap during write into page cache
    libceph: don't time out osd requests that haven't been received
    ceph: report f_bfree based on kb_avail rather than diffing.
    ceph: only queue capsnap if caps are dirty
    ceph: fix snap writeback when racing with writes
    ...

    Linus Torvalds
     
  • For the most part we don't care about racing with rename when directing
    MDS requests; either the old or new parent is fine. Document that, and
    do some minor cleanup.

    Reviewed-by: Yehuda Sadeh
    Signed-off-by: Sage Weil

    Sage Weil
     
  • We carry a pin on the parent directory for the rename source and dest
    dentries. For the source it's r_locked_dir; we need to explicitly
    reference the old_dentry parent as well, since the dentry's d_parent may
    change between when the request was created and pinned and when it is
    freed.

    Reviewed-by: Yehuda Sadeh
    Signed-off-by: Sage Weil

    Sage Weil
     
  • Reviewed-by: Yehuda Sadeh
    Signed-off-by: Sage Weil

    Sage Weil
     
  • Have caller pass in a safely-obtained reference to the parent directory
    for calculating a dentry's hash valud.

    While we're here, simpify the flow through ceph_encode_fh() so that there
    is a single exit point and cleanup.

    Also fix a bug with the dentry hash calculation: calculate the hash for the
    dentry we were given, not its parent.

    Reviewed-by: Yehuda Sadeh
    Signed-off-by: Sage Weil

    Sage Weil
     
  • Protect d_parent with d_lock. Carry a reference. Simplify the flow so
    that there is a single exit point and cleanup.

    Reviewed-by: Yehuda Sadeh
    Signed-off-by: Sage Weil

    Sage Weil
     
  • d_parent is protected by d_lock: use it when looking up a dentry's parent
    directory inode. Also take a reference and drop it in the caller to avoid
    a use-after-free.

    Reported-by: Al Viro
    Reviewed-by: Yehuda Sadeh
    Signed-off-by: Sage Weil

    Sage Weil
     
  • The ->lookup() and prepopulate_readdir() callers are working with unhashed
    dentries, so we don't have to worry. The export.c callers, though, need
    to initialize something they got back from d_obtain_alias() and are
    potentially racing with other callers. Make sure we don't return unless
    the dentry is properly initialized (by us or someone else).

    Reported-by: Al Viro
    Reviewed-by: Yehuda Sadeh
    Signed-off-by: Sage Weil

    Sage Weil
     
  • Curretly ceph_add_cap clears the complete bit if we are newly issued the
    FILE_SHARED cap, which is normally the case for a newly issue cap on a new
    directory. That means we clear the just-set bit. Move the check that sets
    the flag to after the cap is added/updated.

    Reviewed-by: Yehuda Sadeh
    Signed-off-by: Sage Weil

    Sage Weil
     
  • This should improve the default read performance, as without it
    readahead is practically disabled.

    Signed-off-by: Yehuda Sadeh

    Yehuda Sadeh
     
  • The lease mask is no longer used (and it changed a while back). Instead,
    use a non-zero duration to indicate that there is a lease being issued.

    Reviewed-by: Yehuda Sadeh
    Signed-off-by: Sage Weil

    Sage Weil
     
  • We weren't properly calling lookup_instantiate_filp when setting up the
    lookup intent, which could lead to file leakage on errors. So:

    - use separate helper for the hidden snapdir translation, immediately
    following the mds request
    - use ceph_finish_lookup for the final dentry/return value dance in the
    exit path
    - lookup_instantiate_filp on success

    Reported-by: Al Viro
    Reviewed-by: Yehuda Sadeh
    Signed-off-by: Sage Weil

    Sage Weil
     
  • We only need to put these on the directory unsafe list if they have
    side effects that fsync(2) should flush out.

    Reviewed-by: Yehuda Sadeh
    Signed-off-by: Sage Weil

    Sage Weil
     
  • We were always getting NULL here because the intent file f_dentry is always
    NULL at this point, which means we were always passing NULL to
    ceph_mdsc_do_request. In reality, this was fine, since this isn't
    currently ever a write operation that needs to get strung on the dir's
    unsafe list.

    Use the dir explicitly, and only pass it if this open has side-effects that
    a dir fsync should flush.

    Reviewed-by: Yehuda Sadeh
    Signed-off-by: Sage Weil

    Sage Weil
     
  • The generic_file_aio_write call may block on balance_dirty_pages while we
    flush data to the OSDs. If we hold a reference to the FILE_WR cap during
    that interval revocation by the MDS (e.g., to do a stat(2)) may be very
    slow.

    Reviewed-by: Yehuda Sadeh
    Signed-off-by: Sage Weil

    Sage Weil