18 Oct, 2016

1 commit


28 Sep, 2016

1 commit

  • current_fs_time() uses struct super_block* as an argument.
    As per Linus's suggestion, this is changed to take struct
    inode* as a parameter instead. This is because the function
    is primarily meant for vfs inode timestamps.
    Also the function was renamed as per Arnd's suggestion.

    Change all calls to current_fs_time() to use the new
    current_time() function instead. current_fs_time() will be
    deleted.

    Signed-off-by: Deepa Dinamani
    Signed-off-by: Al Viro

    Deepa Dinamani
     

28 Jul, 2016

2 commits


28 May, 2016

2 commits

  • Pull vfs fixes from Al Viro:
    "Followups to the parallel lookup work:

    - update docs

    - restore killability of the places that used to take ->i_mutex
    killably now that we have down_write_killable() merged

    - Additionally, it turns out that I missed a prerequisite for
    security_d_instantiate() stuff - ->getxattr() wasn't the only thing
    that could be called before dentry is attached to inode; with smack
    we needed the same treatment applied to ->setxattr() as well"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    switch ->setxattr() to passing dentry and inode separately
    switch xattr_handler->set() to passing dentry and inode separately
    restore killability of old mutex_lock_killable(&inode->i_mutex) users
    add down_write_killable_nested()
    update D/f/directory-locking

    Linus Torvalds
     
  • preparation for similar switch in ->setxattr() (see the next commit for
    rationale).

    Signed-off-by: Al Viro

    Al Viro
     

27 May, 2016

1 commit

  • Pull Ceph updates from Sage Weil:
    "This changeset has a few main parts:

    - Ilya has finished a huge refactoring effort to sync up the
    client-side logic in libceph with the user-space client code, which
    has evolved significantly over the last couple years, with lots of
    additional behaviors (e.g., how requests are handled when cluster
    is full and transitions from full to non-full).

    This structure of the code is more closely aligned with userspace
    now such that it will be much easier to maintain going forward when
    behavior changes take place. There are some locking improvements
    bundled in as well.

    - Zheng adds multi-filesystem support (multiple namespaces within the
    same Ceph cluster)

    - Zheng has changed the readdir offsets and directory enumeration so
    that dentry offsets are hash-based and therefore stable across
    directory fragmentation events on the MDS.

    - Zheng has a smorgasbord of bug fixes across fs/ceph"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (71 commits)
    ceph: fix wake_up_session_cb()
    ceph: don't use truncate_pagecache() to invalidate read cache
    ceph: SetPageError() for writeback pages if writepages fails
    ceph: handle interrupted ceph_writepage()
    ceph: make ceph_update_writeable_page() uninterruptible
    libceph: make ceph_osdc_wait_request() uninterruptible
    ceph: handle -EAGAIN returned by ceph_update_writeable_page()
    ceph: make fault/page_mkwrite return VM_FAULT_OOM for -ENOMEM
    ceph: block non-fatal signals for fault/page_mkwrite
    ceph: make logical calculation functions return bool
    ceph: tolerate bad i_size for symlink inode
    ceph: improve fragtree change detection
    ceph: keep leaf frag when updating fragtree
    ceph: fix dir_auth check in ceph_fill_dirfrag()
    ceph: don't assume frag tree splits in mds reply are sorted
    ceph: fix inode reference leak
    ceph: using hash value to compose dentry offset
    ceph: don't forbid marking directory complete after forward seek
    ceph: record 'offset' for each entry of readdir result
    ceph: define 'end/complete' in readdir reply as bit flags
    ...

    Linus Torvalds
     

26 May, 2016

2 commits

  • Setxattr with NULL value and XATTR_REPLACE flag should be equivalent
    to removexattr. But current MDS does not support deleting vxattrs through
    MDS_OP_SETXATTR request. The workaround is sending MDS_OP_RMXATTR request
    if setxattr actually removs xattr.

    Signed-off-by: Yan, Zheng

    Yan, Zheng
     
  • This is a major sync up, up to ~Jewel. The highlights are:

    - per-session request trees (vs a global per-client tree)
    - per-session locking (vs a global per-client rwlock)
    - homeless OSD session
    - no ad-hoc global per-client lists
    - support for pool quotas
    - foundation for watch/notify v2 support
    - foundation for map check (pool deletion detection) support

    The switchover is incomplete: lingering requests can be setup and
    teared down but aren't ever reestablished. This functionality is
    restored with the introduction of the new lingering infrastructure
    (ceph_osd_linger_request, linger_work, etc) in a later commit.

    Signed-off-by: Ilya Dryomov

    Ilya Dryomov
     

24 Apr, 2016

3 commits

  • when removing a xattr, generic_removexattr() calls __ceph_setxattr()
    with NULL value and XATTR_REPLACE flag. __ceph_removexattr() is not
    used any more.

    Signed-off-by: "Yan, Zheng"
    Signed-off-by: Al Viro

    Yan, Zheng
     
  • Add a catch-all xattr handler at the end of ceph_xattr_handlers. Check
    for valid attribute names there, and remove those checks from
    __ceph_{get,set,remove}xattr instead. No "system.*" xattrs need to be
    handled by the catch-all handler anymore.

    The set xattr handler is called with a NULL value to indicate that the
    attribute should be removed; __ceph_setxattr already handles that case
    correctly (ceph_set_acl could already calling __ceph_setxattr with a NULL
    value).

    Move the check for snapshots from ceph_{set,remove}xattr into
    __ceph_{set,remove}xattr. With that, ceph_{get,set,remove}xattr can be
    replaced with the generic iops.

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: "Yan, Zheng"
    Signed-off-by: Al Viro

    Andreas Gruenbacher
     
  • Create a variant of ceph_setattr that takes an inode instead of a
    dentry. Change __ceph_setxattr (and also __ceph_removexattr) to take an
    inode instead of a dentry. Use those in ceph_set_acl so that we no
    longer need a dentry there.

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: "Yan, Zheng"
    Signed-off-by: Al Viro

    Andreas Gruenbacher
     

11 Apr, 2016

1 commit


26 Mar, 2016

3 commits

  • When security is enabled, security module can call filesystem's
    getxattr/setxattr callbacks during d_instantiate(). For cephfs,
    d_instantiate() is usually called by MDS' dispatch thread, while
    handling MDS reply. If the MDS reply does not include xattrs and
    corresponding caps, getxattr/setxattr need to send a new request
    to MDS and waits for the reply. This makes MDS' dispatch sleep,
    nobody handles later MDS replies.

    The fix is make sure lookup/atomic_open reply include xattrs and
    corresponding caps. So getxattr can be handled by cached xattrs.
    This requires some modification to both MDS and request message.
    (Client tells MDS what caps it wants; MDS encodes proper caps in
    the reply)

    Smack security module may call setxattr during d_instantiate().
    Unlike getxattr, we can't force MDS to issue CEPH_CAP_XATTR_EXCL
    to us. So just make setxattr return error when called by MDS'
    dispatch thread.

    Signed-off-by: Yan, Zheng

    Yan, Zheng
     
  • It's uselese because MDS reply does not carry any vxattr.

    Signed-off-by: Yan, Zheng

    Yan, Zheng
     
  • CURRENT_TIME macro is not appropriate for filesystems as it
    doesn't use the right granularity for filesystem timestamps.
    Use current_fs_time() instead.

    Signed-off-by: Deepa Dinamani
    Signed-off-by: Yan, Zheng

    Deepa Dinamani
     

25 Jun, 2015

2 commits


27 Apr, 2015

1 commit

  • Pull fourth vfs update from Al Viro:
    "d_inode() annotations from David Howells (sat in for-next since before
    the beginning of merge window) + four assorted fixes"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    RCU pathwalk breakage when running into a symlink overmounting something
    fix I_DIO_WAKEUP definition
    direct-io: only inc/dec inode->i_dio_count for file systems
    fs/9p: fix readdir()
    VFS: assorted d_backing_inode() annotations
    VFS: fs/inode.c helpers: d_inode() annotations
    VFS: fs/cachefiles: d_backing_inode() annotations
    VFS: fs library helpers: d_inode() annotations
    VFS: assorted weird filesystems: d_inode() annotations
    VFS: normal filesystems (and lustre): d_inode() annotations
    VFS: security/: d_inode() annotations
    VFS: security/: d_backing_inode() annotations
    VFS: net/: d_inode() annotations
    VFS: net/unix: d_backing_inode() annotations
    VFS: kernel/: d_inode() annotations
    VFS: audit: d_backing_inode() annotations
    VFS: Fix up some ->d_inode accesses in the chelsio driver
    VFS: Cachefiles should perform fs modifications on the top layer only
    VFS: AF_UNIX sockets should call mknod on the top layer only

    Linus Torvalds
     

20 Apr, 2015

1 commit

  • Currently, there is no check for the kstrdup() for r_path2,
    r_path1 and snapdir_name as various locations as there is a
    possibility of failure during memory pressure. Therefore,
    returning ENOMEM where the checks have been missed.

    Signed-off-by: Sanidhya Kashyap
    Signed-off-by: Yan, Zheng

    Sanidhya Kashyap
     

16 Apr, 2015

1 commit


18 Dec, 2014

1 commit


15 Oct, 2014

3 commits

  • Current code uses page array to present MDS request data. Pages in the
    array are allocated/freed by caller of ceph_mdsc_do_request(). If request
    is interrupted, the pages can be freed while they are still being used by
    the request message.

    The fix is use pagelist to present MDS request data. Pagelist is
    reference counted.

    Signed-off-by: Yan, Zheng
    Reviewed-by: Sage Weil

    Yan, Zheng
     
  • only regular file and directory have vxattrs.

    Signed-off-by: Yan, Zheng

    Yan, Zheng
     
  • Following sequence of events can happen.
    - Client releases an inode, queues cap release message.
    - A 'lookup' reply brings the same inode back, but the reply
    doesn't contain xattrs because MDS didn't receive the cap release
    message and thought client already has up-to-data xattrs.

    The fix is force sending a getattr request to MDS if xattrs_version
    is 0. The getattr mask is set to CEPH_STAT_CAP_XATTR, so MDS knows client
    does not have xattr.

    Signed-off-by: Yan, Zheng

    Yan, Zheng
     

28 Jul, 2014

2 commits


05 Apr, 2014

2 commits


03 Apr, 2014

1 commit


18 Feb, 2014

3 commits


29 Jan, 2014

1 commit

  • The previous ceph-client merge resulted in ceph not even building,
    because there was a merge conflict that wasn't visible as an actual data
    conflict: commit 7221fe4c2ed7 ("ceph: add acl for cephfs") added support
    for POSIX ACL's into Ceph, but unluckily we also had the VFS tree change
    a lot of the POSIX ACL helper functions to be much more helpful to
    filesystems (see for example commits 2aeccbe957d0 "fs: add generic
    xattr_acl handlers", 5bf3258fd2ac "fs: make posix_acl_chmod more useful"
    and 37bc15392a23 "fs: make posix_acl_create more useful")

    The reason this conflict wasn't obvious was many-fold: because it was a
    semantic conflict rather than a data conflict, it wasn't visible in the
    git merge as a conflict. And because the VFS tree hadn't been in
    linux-next, people hadn't become aware of it that way. And because I
    was at jury duty this morning, I was using my laptop and as a result not
    doing constant "allmodconfig" builds.

    Anyway, this fixes the build and generally removes a fair chunk of the
    Ceph POSIX ACL support code, since the improved helpers seem to match
    really well for Ceph too. But I don't actually have any way to *test*
    the end result, and I was really hoping for some ACK's for this. Oh,
    well.

    Not compiling certainly doesn't make things easier to test, so I'm
    committing this without the acks after having waited for four hours...
    Plus it's what I would have done for the merge had I noticed the
    semantic conflict..

    Reported-by: Dave Jones
    Cc: Sage Weil
    Cc: Guangliang Zhao
    Cc: Li Wang
    Cc: Christoph Hellwig
    Cc: Al Viro
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

01 Jan, 2014

1 commit


04 Jul, 2013

1 commit

  • [ 1121.231883] BUG: sleeping function called from invalid context at kernel/rwsem.c:20
    [ 1121.231935] in_atomic(): 1, irqs_disabled(): 0, pid: 9831, name: mv
    [ 1121.231971] 1 lock held by mv/9831:
    [ 1121.231973] #0: (&(&ci->i_ceph_lock)->rlock){+.+...},at:[] ceph_getxattr+0x58/0x1d0 [ceph]
    [ 1121.231998] CPU: 3 PID: 9831 Comm: mv Not tainted 3.10.0-rc6+ #215
    [ 1121.232000] Hardware name: To Be Filled By O.E.M. To Be Filled By
    O.E.M./To be filled by O.E.M., BIOS 080015 11/09/2011
    [ 1121.232027] ffff88006d355a80 ffff880092f69ce0 ffffffff8168348c ffff880092f69cf8
    [ 1121.232045] ffffffff81070435 ffff88006d355a20 ffff880092f69d20 ffffffff816899ba
    [ 1121.232052] 0000000300000004 ffff8800b76911d0 ffff88006d355a20 ffff880092f69d68
    [ 1121.232056] Call Trace:
    [ 1121.232062] [] dump_stack+0x19/0x1b
    [ 1121.232067] [] __might_sleep+0xe5/0x110
    [ 1121.232071] [] down_read+0x2a/0x98
    [ 1121.232080] [] ceph_vxattrcb_layout+0x60/0xf0 [ceph]
    [ 1121.232088] [] ceph_getxattr+0x9f/0x1d0 [ceph]
    [ 1121.232093] [] vfs_getxattr+0xa8/0xd0
    [ 1121.232097] [] getxattr+0xab/0x1c0
    [ 1121.232100] [] ? final_putname+0x22/0x50
    [ 1121.232104] [] ? kmem_cache_free+0xb0/0x260
    [ 1121.232107] [] ? final_putname+0x22/0x50
    [ 1121.232110] [] ? trace_hardirqs_on+0xd/0x10
    [ 1121.232114] [] ? sysret_check+0x1b/0x56
    [ 1121.232120] [] SyS_fgetxattr+0x6c/0xc0
    [ 1121.232125] [] system_call_fastpath+0x16/0x1b
    [ 1121.232129] BUG: scheduling while atomic: mv/9831/0x10000002
    [ 1121.232154] 1 lock held by mv/9831:
    [ 1121.232156] #0: (&(&ci->i_ceph_lock)->rlock){+.+...}, at:
    [] ceph_getxattr+0x58/0x1d0 [ceph]

    I think move the ci->i_ceph_lock down is safe because we can't free
    ceph_inode_info at there.

    CC: stable@vger.kernel.org # 3.8+
    Signed-off-by: Jianpeng Ma
    Reviewed-by: Sage Weil

    majianpeng
     

26 Feb, 2013

1 commit

  • Fix the causes for sparse warnings reported in the ceph file system
    code. Here there are only two (and they're sort of silly but
    they're easy to fix).

    This partially resolves:
    http://tracker.ceph.com/issues/4184

    Reported-by: Fengguang Wu
    Signed-off-by: Alex Elder
    Reviewed-by: Josh Durgin

    Alex Elder
     

14 Feb, 2013

3 commits