04 Jan, 2012

1 commit


08 Dec, 2011

1 commit

  • We have been using i_lock to protect all kinds of data structures in the
    ceph_inode_info struct, including lists of inodes that we need to iterate
    over while avoiding races with inode destruction. That requires grabbing
    a reference to the inode with the list lock protected, but igrab() now
    takes i_lock to check the inode flags.

    Changing the list lock ordering would be a painful process.

    However, using a ceph-specific i_ceph_lock in the ceph inode instead of
    i_lock is a simple mechanical change and avoids the ordering constraints
    imposed by igrab().

    Reported-by: Amon Ott
    Signed-off-by: Sage Weil

    Sage Weil
     

06 Nov, 2011

1 commit

  • We used to use a flag on the directory inode to track whether the dcache
    contents for a directory were a complete cached copy. Switch to a dentry
    flag CEPH_D_COMPLETE that is safely updated by ->d_prune().

    Signed-off-by: Sage Weil

    Sage Weil
     

02 Nov, 2011

1 commit


26 Oct, 2011

1 commit


21 Jul, 2011

1 commit

  • Btrfs needs to be able to control how filemap_write_and_wait_range() is called
    in fsync to make it less of a painful operation, so push down taking i_mutex and
    the calling of filemap_write_and_wait() down into the ->fsync() handlers. Some
    file systems can drop taking the i_mutex altogether it seems, like ext3 and
    ocfs2. For correctness sake I just pushed everything down in all cases to make
    sure that we keep the current behavior the same for everybody, and then each
    individual fs maintainer can make up their mind about what to do from there.
    Thanks,

    Acked-by: Jan Kara
    Signed-off-by: Josef Bacik
    Signed-off-by: Al Viro

    Josef Bacik
     

08 Jun, 2011

1 commit


25 May, 2011

1 commit

  • In e9964c10 we change cap flushing to do a delicate dance because some
    inodes on the cap_dirty list could be in a migrating state (got EXPORT but
    not IMPORT) in which we couldn't actually flush and move from
    dirty->flushing, breaking the while (!empty) { process first } loop
    structure. It worked for a single sync thread, but was not reentrant and
    triggered infinite loops when multiple syncers came along.

    Instead, move inodes with dirty to a separate cap_dirty_migrating list
    when in the limbo export-but-no-import state, allowing us to go back to
    the simple loop structure (which was reentrant). This is cleaner and more
    robust.

    Audited the cap_dirty users and this looks fine:
    list_empty(&ci->i_dirty_item) is still a reliable indicator of whether we
    have dirty caps (which list we're on is irrelevant) and list_del_init()
    calls still do the right thing.

    Signed-off-by: Sage Weil

    Sage Weil
     

20 May, 2011

1 commit


12 May, 2011

1 commit

  • We increments i_wrbuffer_ref when taking the Fb cap. This breaks
    the dirty page accounting and causes looping in
    __ceph_do_pending_vmtruncate, and ceph client hangs.

    This bug can be reproduced occasionally by running blogbench.

    Add a new field i_wb_ref to inode and dedicate it to Fb reference
    counting.

    Signed-off-by: Henry C Chang
    Signed-off-by: Sage Weil

    Henry C Chang
     

05 May, 2011

1 commit


04 May, 2011

1 commit


31 Mar, 2011

1 commit


20 Jan, 2011

3 commits


08 Nov, 2010

2 commits

  • We used to use rdcache_gen to indicate whether we "might" have cached
    pages. Now we just look at the mapping to determine that. However, some
    old behavior remains from that transition.

    First, rdcache_gen == 0 no longer means we have no pages. That can happen
    at any time (presumably when we carry FILE_CACHE). We should not reset it
    to zero, and we should not check that it is zero.

    That means that the only purpose for rdcache_revoking is to resolve races
    between new issues of FILE_CACHE and an async invalidate. If they are
    equal, we should invalidate. On success, we decrement rdcache_revoking,
    so that it is no longer equal to rdcache_gen. Similarly, if we success
    in doing a sync invalidate, set revoking = gen - 1. (This is a small
    optimization to avoid doing unnecessary invalidate work and does not
    affect correctness.)

    Signed-off-by: Sage Weil

    Sage Weil
     
  • If the auth cap migrates to another MDS, clear requested_max_size so that
    we resend any pending max_size increase requests. This fixes potential
    hangs on writes that extend a file and race with an cap migration between
    MDSs.

    Signed-off-by: Sage Weil

    Sage Weil
     

28 Oct, 2010

1 commit

  • This reverts commit d91f2438d881514e4a923fd786dbd94b764a9440.

    The intent of issue_seq is to distinguish between mds->client messages that
    (re)create the cap and those that do not, which means we should _only_ be
    updating that value in the create paths. By updating it in handle_cap_grant,
    we reset it to zero, which then breaks release.

    The larger question is what workload/problem made me think it should be
    updated here...

    Signed-off-by: Sage Weil

    Sage Weil
     

21 Oct, 2010

3 commits

  • This is simpler and faster.

    Signed-off-by: Sage Weil

    Sage Weil
     
  • The i_rdcache_gen value only implies we MAY have cached pages; actually
    check the mapping to see if it's worth bothering with an invalidate.

    Signed-off-by: Sage Weil

    Sage Weil
     
  • This factors out protocol and low-level storage parts of ceph into a
    separate libceph module living in net/ceph and include/linux/ceph. This
    is mostly a matter of moving files around. However, a few key pieces
    of the interface change as well:

    - ceph_client becomes ceph_fs_client and ceph_client, where the latter
    captures the mon and osd clients, and the fs_client gets the mds client
    and file system specific pieces.
    - Mount option parsing and debugfs setup is correspondingly broken into
    two pieces.
    - The mon client gets a generic handler callback for otherwise unknown
    messages (mds map, in this case).
    - The basic supported/required feature bits can be expanded (and are by
    ceph_fs_client).

    No functional change, aside from some subtle error handling cases that got
    cleaned up in the refactoring process.

    Signed-off-by: Sage Weil

    Yehuda Sadeh
     

07 Oct, 2010

2 commits

  • We need to update the issue_seq on any grant operation, be it via an MDS
    reply or a separate grant message. The update in the grant path was
    missing. This broke cap release for inodes in which the MDS sent an
    explicit grant message that was not soon after followed by a successful
    MDS reply on the same inode.

    Also fix the signedness on seq locals.

    Signed-off-by: Sage Weil

    Sage Weil
     
  • If an MDS tries to revoke caps that we don't have, we want to send
    releases early since they probably contain the caps message the MDS
    is looking for.

    Previously, we only sent the messages if we didn't have the inode either. But
    in a multi-mds system we can retain the inode after dropping all caps for
    a single MDS.

    Signed-off-by: Greg Farnum
    Signed-off-by: Sage Weil

    Greg Farnum
     

18 Sep, 2010

1 commit


17 Sep, 2010

1 commit

  • Sending multiple flushsnap messages is problematic because we ignore
    the response if the tid doesn't match, and the server may only respond to
    each one once. It's also a waste.

    So, skip cap_snaps that are already on the flushing list, unless the caller
    tells us to resend (because we are reconnecting).

    Signed-off-by: Sage Weil

    Sage Weil
     

15 Sep, 2010

1 commit


25 Aug, 2010

1 commit

  • We used to use i_head_snapc to keep track of which snapc the current epoch
    of dirty data was dirtied under. It is used by queue_cap_snap to set up
    the cap_snap. However, since we queue cap snaps for any dirty caps, not
    just for dirty file data, we need to keep a valid i_head_snapc anytime
    we have dirty|flushing caps. This fixes a NULL pointer deref in
    queue_cap_snap when writing back dirty caps without data (e.g.,
    snaptest-authwb.sh).

    Signed-off-by: Sage Weil

    Sage Weil
     

23 Aug, 2010

2 commits

  • When we snapshot dirty metadata that needs to be written back to the MDS,
    include dirty xattr metadata. Make the capsnap reference the encoded
    xattr blob so that it will be written back in the FLUSHSNAP op.

    Also fix the capsnap creation guard to include dirty auth or file bits,
    not just tests specific to dirty file data or file writes in progress
    (this fixes auth metadata writeback).

    Signed-off-by: Sage Weil

    Sage Weil
     
  • We should include the xattr metadata blob in the cap update message any
    time we are flushing dirty state, NOT just when we are also dropping the
    cap. This fixes async xattr writeback.

    Also, clean up the code slightly to avoid duplicating the bit test.

    Signed-off-by: Sage Weil

    Sage Weil
     

06 Aug, 2010

1 commit

  • Normally, if the Fb cap bit is being revoked, we queue an async writeback.
    If there is no dirty data but we still hold the cap, this leaves the
    client sitting around doing nothing until the cap timeouts expire and the
    cap is released on its own (as it would have been without the revocation).

    Instead, only queue writeback if the bit is actually used (i.e., we have
    dirty data). If not, we can reply to the revocation immediately.

    Signed-off-by: Sage Weil

    Sage Weil
     

03 Aug, 2010

1 commit


02 Aug, 2010

8 commits