26 Jan, 2017

1 commit

  • commit 1097680d759918ce4a8705381c0ab2ed7bd60cf1 upstream.

    sparse says:

    fs/ceph/dir.c:1248:50: warning: incorrect type in assignment (different base types)
    fs/ceph/dir.c:1248:50: expected restricted __le32 [usertype] mask
    fs/ceph/dir.c:1248:50: got int [signed] [assigned] mask

    Fixes: 200fd27c8fa2 ("ceph: use lookup request to revalidate dentry")
    Signed-off-by: Jeff Layton
    Reviewed-by: Sage Weil
    Signed-off-by: Ilya Dryomov
    Signed-off-by: Greg Kroah-Hartman

    Jeff Layton
     

08 Dec, 2016

1 commit

  • This function sets req->r_locked_dir which is supposed to indicate to
    ceph_fill_trace that the parent's i_rwsem is locked for write.
    Unfortunately, there is no guarantee that the dir will be locked when
    d_revalidate is called, so we really don't want ceph_fill_trace to do
    any dcache manipulation from this context. Clear req->r_locked_dir since
    it's clearly not safe to do that.

    What we really want to know with d_revalidate is whether the dentry
    still points to the same inode. ceph_fill_trace installs a pointer to
    the inode in req->r_target_inode, so we can just compare that to
    d_inode(dentry) to see if it's the same one after the lookup.

    Also, since we aren't generally interested in the parent here, we can
    switch to using a GETATTR to hint that to the MDS, which also means that
    we only need to reserve one cap.

    Finally, just remove the d_unhashed check. That's really outside the
    purview of a filesystem's d_revalidate. If the thing became unhashed
    while we're checking it, then that's up to the VFS to handle anyway.

    Fixes: 200fd27c8fa2 ("ceph: use lookup request to revalidate dentry")
    Link: http://tracker.ceph.com/issues/18041
    Reported-by: Donatas Abraitis
    Signed-off-by: Jeff Layton
    Reviewed-by: "Yan, Zheng"
    Signed-off-by: Ilya Dryomov

    Jeff Layton
     

11 Oct, 2016

1 commit

  • Pull more vfs updates from Al Viro:
    ">rename2() work from Miklos + current_time() from Deepa"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    fs: Replace current_fs_time() with current_time()
    fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps
    fs: Replace CURRENT_TIME with current_time() for inode timestamps
    fs: proc: Delete inode time initializations in proc_alloc_inode()
    vfs: Add current_time() api
    vfs: add note about i_op->rename changes to porting
    fs: rename "rename2" i_op to "rename"
    vfs: remove unused i_op->rename
    fs: make remaining filesystems use .rename2
    libfs: support RENAME_NOREPLACE in simple_rename()
    fs: support RENAME_NOREPLACE for local filesystems
    ncpfs: fix unused variable warning

    Linus Torvalds
     

08 Oct, 2016

1 commit


27 Sep, 2016

2 commits

  • Generated patch:

    sed -i "s/\.rename2\t/\.rename\t\t/" `git grep -wl rename2`
    sed -i "s/\brename2\b/rename/g" `git grep -wl rename2`

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • This is trivial to do:

    - add flags argument to foo_rename()
    - check if flags is zero
    - assign foo_rename() to .rename2 instead of .rename

    This doesn't mean it's impossible to support RENAME_NOREPLACE for these
    filesystems, but it is not trivial, like for local filesystems.
    RENAME_NOREPLACE must guarantee atomicity (i.e. it shouldn't be possible
    for a file to be created on one host while it is overwritten by rename on
    another host).

    Filesystems converted:

    9p, afs, ceph, coda, ecryptfs, kernfs, lustre, ncpfs, nfs, ocfs2, orangefs.

    After this, we can get rid of the duplicate interfaces for rename.

    Signed-off-by: Miklos Szeredi
    Acked-by: Greg Kroah-Hartman
    Acked-by: David Howells [AFS]
    Acked-by: Mike Marshall
    Cc: Eric Van Hensbergen
    Cc: Ilya Dryomov
    Cc: Jan Harkes
    Cc: Tyler Hicks
    Cc: Oleg Drokin
    Cc: Trond Myklebust
    Cc: Mark Fasheh

    Miklos Szeredi
     

05 Sep, 2016

1 commit

  • Commit f3c4ebe65ea1 ("ceph: using hash value to compose dentry offset")
    modified "if (fpos_frag(new_pos) != fi->frag)" to "if (fi->frag |=
    fpos_frag(new_pos))" in need_reset_readdir(), thus replacing a
    comparison operator with an assignment one.

    This looks like a typo which is reported by clang when building the
    kernel with some warning flags:

    fs/ceph/dir.c:600:22: error: using the result of an assignment as a
    condition without parentheses [-Werror,-Wparentheses]
    } else if (fi->frag |= fpos_frag(new_pos)) {
    ~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~
    fs/ceph/dir.c:600:22: note: place parentheses around the assignment
    to silence this warning
    } else if (fi->frag |= fpos_frag(new_pos)) {
    ^
    ( )
    fs/ceph/dir.c:600:22: note: use '!=' to turn this compound
    assignment into an inequality comparison
    } else if (fi->frag |= fpos_frag(new_pos)) {
    ^~
    !=

    Fixes: f3c4ebe65ea1 ("ceph: using hash value to compose dentry offset")
    Signed-off-by: Nicolas Iooss
    Signed-off-by: Ilya Dryomov

    Nicolas Iooss
     

28 Jul, 2016

4 commits

  • We can now handle the snapshot cases under RCU, as well as the
    non-snapshot case when we don't need to queue up a lease renewal
    allow LOOKUP_RCU walks to proceed under those conditions.

    Signed-off-by: Jeff Layton
    Reviewed-by: Yan, Zheng

    Jeff Layton
     
  • Under rcuwalk, we need to take extra care when dereferencing d_parent.
    We want to do that once and pass a pointer to dentry_lease_is_valid.

    Also, we must ensure that that function can handle the case where we're
    racing with d_release. Check whether "di" is NULL under the d_lock, and
    just return 0 if so.

    Finally, we still need to kick off a renewal job if the lease is getting
    close to expiration. If that's the case, then just drop out of rcuwalk
    mode since that could block.

    Signed-off-by: Jeff Layton
    Reviewed-by: Yan, Zheng

    Jeff Layton
     
  • To check for a valid dentry lease, we need to get at the
    ceph_dentry_info. Under rcuwalk though, we may end up with a dentry that
    is on its way to destruction. Since we need to take the d_lock in
    dentry_lease_is_valid already, we can just ensure that we clear the
    d_fsinfo pointer out under the same lock before destroying it.

    Signed-off-by: Jeff Layton
    Reviewed-by: Yan, Zheng

    Jeff Layton
     
  • Pretty simple: just use ceph_dentry_info.time instead (which was already
    there, unused).

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     

27 May, 2016

1 commit

  • Pull Ceph updates from Sage Weil:
    "This changeset has a few main parts:

    - Ilya has finished a huge refactoring effort to sync up the
    client-side logic in libceph with the user-space client code, which
    has evolved significantly over the last couple years, with lots of
    additional behaviors (e.g., how requests are handled when cluster
    is full and transitions from full to non-full).

    This structure of the code is more closely aligned with userspace
    now such that it will be much easier to maintain going forward when
    behavior changes take place. There are some locking improvements
    bundled in as well.

    - Zheng adds multi-filesystem support (multiple namespaces within the
    same Ceph cluster)

    - Zheng has changed the readdir offsets and directory enumeration so
    that dentry offsets are hash-based and therefore stable across
    directory fragmentation events on the MDS.

    - Zheng has a smorgasbord of bug fixes across fs/ceph"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (71 commits)
    ceph: fix wake_up_session_cb()
    ceph: don't use truncate_pagecache() to invalidate read cache
    ceph: SetPageError() for writeback pages if writepages fails
    ceph: handle interrupted ceph_writepage()
    ceph: make ceph_update_writeable_page() uninterruptible
    libceph: make ceph_osdc_wait_request() uninterruptible
    ceph: handle -EAGAIN returned by ceph_update_writeable_page()
    ceph: make fault/page_mkwrite return VM_FAULT_OOM for -ENOMEM
    ceph: block non-fatal signals for fault/page_mkwrite
    ceph: make logical calculation functions return bool
    ceph: tolerate bad i_size for symlink inode
    ceph: improve fragtree change detection
    ceph: keep leaf frag when updating fragtree
    ceph: fix dir_auth check in ceph_fill_dirfrag()
    ceph: don't assume frag tree splits in mds reply are sorted
    ceph: fix inode reference leak
    ceph: using hash value to compose dentry offset
    ceph: don't forbid marking directory complete after forward seek
    ceph: record 'offset' for each entry of readdir result
    ceph: define 'end/complete' in readdir reply as bit flags
    ...

    Linus Torvalds
     

26 May, 2016

9 commits


19 May, 2016

1 commit

  • Pull remaining vfs xattr work from Al Viro:
    "The rest of work.xattr (non-cifs conversions)"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    btrfs: Switch to generic xattr handlers
    ubifs: Switch to generic xattr handlers
    jfs: Switch to generic xattr handlers
    jfs: Clean up xattr name mapping
    gfs2: Switch to generic xattr handlers
    ceph: kill __ceph_removexattr()
    ceph: Switch to generic xattr handlers
    ceph: Get rid of d_find_alias in ceph_set_acl

    Linus Torvalds
     

24 Apr, 2016

1 commit

  • Add a catch-all xattr handler at the end of ceph_xattr_handlers. Check
    for valid attribute names there, and remove those checks from
    __ceph_{get,set,remove}xattr instead. No "system.*" xattrs need to be
    handled by the catch-all handler anymore.

    The set xattr handler is called with a NULL value to indicate that the
    attribute should be removed; __ceph_setxattr already handles that case
    correctly (ceph_set_acl could already calling __ceph_setxattr with a NULL
    value).

    Move the check for snapshots from ceph_{set,remove}xattr into
    __ceph_{set,remove}xattr. With that, ceph_{get,set,remove}xattr can be
    replaced with the generic iops.

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: "Yan, Zheng"
    Signed-off-by: Al Viro

    Andreas Gruenbacher
     

05 Apr, 2016

1 commit

  • PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
    ago with promise that one day it will be possible to implement page
    cache with bigger chunks than PAGE_SIZE.

    This promise never materialized. And unlikely will.

    We have many places where PAGE_CACHE_SIZE assumed to be equal to
    PAGE_SIZE. And it's constant source of confusion on whether
    PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
    especially on the border between fs and mm.

    Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
    breakage to be doable.

    Let's stop pretending that pages in page cache are special. They are
    not.

    The changes are pretty straight-forward:

    - << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

    - page_cache_get() -> get_page();

    - page_cache_release() -> put_page();

    This patch contains automated changes generated with coccinelle using
    script below. For some reason, coccinelle doesn't patch header files.
    I've called spatch for them manually.

    The only adjustment after coccinelle is revert of changes to
    PAGE_CAHCE_ALIGN definition: we are going to drop it later.

    There are few places in the code where coccinelle didn't reach. I'll
    fix them manually in a separate patch. Comments and documentation also
    will be addressed with the separate patch.

    virtual patch

    @@
    expression E;
    @@
    - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    expression E;
    @@
    - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    @@
    - PAGE_CACHE_SHIFT
    + PAGE_SHIFT

    @@
    @@
    - PAGE_CACHE_SIZE
    + PAGE_SIZE

    @@
    @@
    - PAGE_CACHE_MASK
    + PAGE_MASK

    @@
    expression E;
    @@
    - PAGE_CACHE_ALIGN(E)
    + PAGE_ALIGN(E)

    @@
    expression E;
    @@
    - page_cache_get(E)
    + get_page(E)

    @@
    expression E;
    @@
    - page_cache_release(E)
    + put_page(E)

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Michal Hocko
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

26 Mar, 2016

4 commits

  • Use kmem_cache_zalloc() instead of kmem_cache_alloc() with flag GFP_ZERO.

    Signed-off-by: Geliang Tang
    Signed-off-by: Ilya Dryomov

    Geliang Tang
     
  • If dentry has no lease, ceph_d_revalidate() previously return 0.
    This causes VFS to invalidate the dentry and create a new dentry
    for later lookup. Invalidating a dentry also detach any underneath
    mount points. So mount point inside cephfs can disapear mystically
    (even the mount point is not modified by other hosts).

    The fix is using lookup request to revalidate dentry without lease.
    This can partly solve the mount points disapear issue (as long as
    the mount point is not modified by other hosts)

    Signed-off-by: Yan, Zheng

    Yan, Zheng
     
  • use vfs helper dget_parent() instead

    Signed-off-by: Yan, Zheng

    Yan, Zheng
     
  • When security is enabled, security module can call filesystem's
    getxattr/setxattr callbacks during d_instantiate(). For cephfs,
    d_instantiate() is usually called by MDS' dispatch thread, while
    handling MDS reply. If the MDS reply does not include xattrs and
    corresponding caps, getxattr/setxattr need to send a new request
    to MDS and waits for the reply. This makes MDS' dispatch sleep,
    nobody handles later MDS replies.

    The fix is make sure lookup/atomic_open reply include xattrs and
    corresponding caps. So getxattr can be handled by cached xattrs.
    This requires some modification to both MDS and request message.
    (Client tells MDS what caps it wants; MDS encodes proper caps in
    the reply)

    Smack security module may call setxattr during d_instantiate().
    Unlike getxattr, we can't force MDS to issue CEPH_CAP_XATTR_EXCL
    to us. So just make setxattr return error when called by MDS'
    dispatch thread.

    Signed-off-by: Yan, Zheng

    Yan, Zheng
     

23 Jan, 2016

1 commit

  • parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
    inode_foo(inode) being mutex_foo(&inode->i_mutex).

    Please, use those for access to ->i_mutex; over the coming cycle
    ->i_mutex will become rwsem, with ->lookup() done with it held
    only shared.

    Signed-off-by: Al Viro

    Al Viro
     

25 Jun, 2015

5 commits

  • Previously our dcache readdir code relies on that child dentries in
    directory dentry's d_subdir list are sorted by dentry's offset in
    descending order. When adding dentries to the dcache, if a dentry
    already exists, our readdir code moves it to head of directory
    dentry's d_subdir list. This design relies on dcache internals.
    Al Viro suggests using ncpfs's approach: keeping array of pointers
    to dentries in page cache of directory inode. the validity of those
    pointers are presented by directory inode's complete and ordered
    flags. When a dentry gets pruned, we clear directory inode's complete
    flag in the d_prune() callback. Before moving a dentry to other
    directory, we clear the ordered flag for both old and new directory.

    Signed-off-by: Yan, Zheng

    Yan, Zheng
     
  • GFP_NOFS memory allocation is required for page writeback path.
    But there is no need to use GFP_NOFS in syscall path and readpage
    path

    Signed-off-by: Yan, Zheng

    Yan, Zheng
     
  • fsync() on directory should flush dirty caps and wait for any
    uncommitted directory opertions to commit. But ceph_dir_fsync()
    only waits for uncommitted directory opertions.

    Signed-off-by: Yan, Zheng

    Yan, Zheng
     
  • No need to bifurcate wait now that we've got ceph_timeout_jiffies().

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Alex Elder
    Reviewed-by: Yan, Zheng

    Ilya Dryomov
     
  • There are currently three libceph-level timeouts that the user can
    specify on mount: mount_timeout, osd_idle_ttl and osdkeepalive. All of
    these are in seconds and no checking is done on user input: negative
    values are accepted, we multiply them all by HZ which may or may not
    overflow, arbitrarily large jiffies then get added together, etc.

    There is also a bug in the way mount_timeout=0 is handled. It's
    supposed to mean "infinite timeout", but that's not how wait.h APIs
    treat it and so __ceph_open_session() for example will busy loop
    without much chance of being interrupted if none of ceph-mons are
    there.

    Fix all this by verifying user input, storing timeouts capped by
    msecs_to_jiffies() in jiffies and using the new ceph_timeout_jiffies()
    helper for all user-specified waits to handle infinite timeouts
    correctly.

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Alex Elder

    Ilya Dryomov
     

27 Apr, 2015

1 commit

  • Pull fourth vfs update from Al Viro:
    "d_inode() annotations from David Howells (sat in for-next since before
    the beginning of merge window) + four assorted fixes"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    RCU pathwalk breakage when running into a symlink overmounting something
    fix I_DIO_WAKEUP definition
    direct-io: only inc/dec inode->i_dio_count for file systems
    fs/9p: fix readdir()
    VFS: assorted d_backing_inode() annotations
    VFS: fs/inode.c helpers: d_inode() annotations
    VFS: fs/cachefiles: d_backing_inode() annotations
    VFS: fs library helpers: d_inode() annotations
    VFS: assorted weird filesystems: d_inode() annotations
    VFS: normal filesystems (and lustre): d_inode() annotations
    VFS: security/: d_inode() annotations
    VFS: security/: d_backing_inode() annotations
    VFS: net/: d_inode() annotations
    VFS: net/unix: d_backing_inode() annotations
    VFS: kernel/: d_inode() annotations
    VFS: audit: d_backing_inode() annotations
    VFS: Fix up some ->d_inode accesses in the chelsio driver
    VFS: Cachefiles should perform fs modifications on the top layer only
    VFS: AF_UNIX sockets should call mknod on the top layer only

    Linus Torvalds
     

22 Apr, 2015

1 commit


20 Apr, 2015

3 commits


16 Apr, 2015

1 commit