12 Oct, 2020

1 commit


29 Aug, 2020

1 commit

  • Pull ceph fixes from Ilya Dryomov:
    "We have an inode number handling change, prompted by s390x which is a
    64-bit architecture with a 32-bit ino_t, a patch to disallow leases to
    avoid potential data integrity issues when CephFS is re-exported via
    NFS or CIFS and a fix for the bulk of W=1 compilation warnings"

    * tag 'ceph-for-5.9-rc3' of git://github.com/ceph/ceph-client:
    ceph: don't allow setlease on cephfs
    ceph: fix inode number handling on arches with 32-bit ino_t
    libceph: add __maybe_unused to DEFINE_CEPH_FEATURE

    Linus Torvalds
     

24 Aug, 2020

2 commits

  • Tuan and Ulrich mentioned that they were hitting a problem on s390x,
    which has a 32-bit ino_t value, even though it's a 64-bit arch (for
    historical reasons).

    I think the current handling of inode numbers in the ceph driver is
    wrong. It tries to use 32-bit inode numbers on 32-bit arches, but that's
    actually not a problem. 32-bit arches can deal with 64-bit inode numbers
    just fine when userland code is compiled with LFS support (the common
    case these days).

    What we really want to do is just use 64-bit numbers everywhere, unless
    someone has mounted with the ino32 mount option. In that case, we want
    to ensure that we hash the inode number down to something that will fit
    in 32 bits before presenting the value to userland.

    Add new helper functions that do this, and only do the conversion before
    presenting these values to userland in getattr and readdir.

    The inode table hashvalue is changed to just cast the inode number to
    unsigned long, as low-order bits are the most likely to vary anyway.

    While it's not strictly required, we do want to put something in
    inode->i_ino. Instead of basing it on BITS_PER_LONG, however, base it on
    the size of the ino_t type.

    NOTE: This is a user-visible change on 32-bit arches:

    1/ inode numbers will be seen to have changed between kernel versions.
    32-bit arches will see large inode numbers now instead of the hashed
    ones they saw before.

    2/ any really old software not built with LFS support may start failing
    stat() calls with -EOVERFLOW on inode numbers >2^32. Nothing much we
    can do about these, but hopefully the intersection of people running
    such code on ceph will be very small.

    The workaround for both problems is to mount with "-o ino32".

    [ idryomov: changelog tweak ]

    URL: https://tracker.ceph.com/issues/46828
    Reported-by: Ulrich Weigand
    Reported-and-Tested-by: Tuan Hoang1
    Signed-off-by: Jeff Layton
    Reviewed-by: "Yan, Zheng"
    Signed-off-by: Ilya Dryomov

    Jeff Layton
     
  • Replace the existing /* fall through */ comments and its variants with
    the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
    fall-through markings when it is the case.

    [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

    Signed-off-by: Gustavo A. R. Silva

    Gustavo A. R. Silva
     

05 Aug, 2020

1 commit

  • Symlink inodes should have the security context set in their xattrs on
    creation. We already set the context on creation, but we don't attach
    the pagelist. The effect is that symlink inodes don't get an SELinux
    context set on them at creation, so they end up unlabeled instead of
    inheriting the proper context. Make it do so.

    Cc: stable@vger.kernel.org
    Signed-off-by: Jeff Layton
    Reviewed-by: Ilya Dryomov
    Signed-off-by: Ilya Dryomov

    Jeff Layton
     

01 Jun, 2020

3 commits

  • Returning -EXDEV when trying to 'mv' files/directories from different
    quota realms results in copy+unlink operations instead of the faster
    CEPH_MDS_OP_RENAME. This will occur even when there aren't any quotas
    set in the destination directory, or if there's enough space left for
    the new file(s).

    This patch adds a new helper function to be called on rename operations
    which will allow these operations if they can be executed. This patch
    mimics userland fuse client commit b8954e5734b3 ("client:
    optimize rename operation under different quota root").

    Since ceph_quota_is_same_realm() is now called only from this new
    helper, make it static.

    URL: https://tracker.ceph.com/issues/44791
    Signed-off-by: Luis Henriques
    Reviewed-by: Jeff Layton
    Signed-off-by: Ilya Dryomov

    Luis Henriques
     
  • Count hits and misses in the caps cache. If the client has all of
    the necessary caps when a task needs references, then it's counted
    as a hit. Any other situation is a miss.

    URL: https://tracker.ceph.com/issues/43215
    Signed-off-by: Xiubo Li
    Reviewed-by: Jeff Layton
    Signed-off-by: Ilya Dryomov

    Xiubo Li
     
  • For dentry leases, only count the hit/miss info triggered from the vfs
    calls. For the cases like request reply handling and ceph_trim_dentries,
    ignore them.

    For now, these are only viewable using debugfs. Future patches will
    allow the client to send the stats to the MDS.

    The output looks like:

    item total miss hit
    -------------------------------------------------
    d_lease 11 7 141

    URL: https://tracker.ceph.com/issues/43215
    Signed-off-by: Xiubo Li
    Reviewed-by: Jeff Layton
    Signed-off-by: Ilya Dryomov

    Xiubo Li
     

14 Apr, 2020

1 commit

  • The new async dirops callback routines can pass ERR_PTR values to
    ceph_mdsc_free_path, which could cause an oops. Make ceph_mdsc_free_path
    ignore ERR_PTR values. Also, ensure that the pr_warn messages look sane
    even if ceph_mdsc_build_path fails.

    Reported-by: Dan Carpenter
    Signed-off-by: Jeff Layton
    Reviewed-by: Ilya Dryomov
    Signed-off-by: Ilya Dryomov

    Jeff Layton
     

30 Mar, 2020

4 commits

  • Add i_last_rd and i_last_wr to ceph_inode_info. These fields are
    used to track the last time the client acquired read/write caps for
    the inode.

    If there is no read/write on an inode for 'caps_wanted_delay_max'
    seconds, __ceph_caps_file_wanted() does not request caps for read/write
    even there are open files.

    Call __ceph_touch_fmode() for dir operations. __ceph_caps_file_wanted()
    calculates dir's wanted caps according to last dir read/modification. If
    there is recent dir read, dir inode wants CEPH_CAP_ANY_SHARED caps. If
    there is recent dir modification, also wants CEPH_CAP_FILE_EXCL.

    Readdir is a special case. Dir inode wants CEPH_CAP_FILE_EXCL after
    readdir, as with that, modifications do not need to release
    CEPH_CAP_FILE_SHARED or invalidate all dentry leases issued by readdir.

    Signed-off-by: "Yan, Zheng"
    Reviewed-by: Jeff Layton
    Signed-off-by: Ilya Dryomov

    Yan, Zheng
     
  • The MDS is getting a new lock-caching facility that will allow it
    to cache the necessary locks to allow asynchronous directory operations.
    Since the CEPH_CAP_FILE_* caps are currently unused on directories,
    we can repurpose those bits for this purpose.

    When performing an unlink, if we have Fx on the parent directory,
    and CEPH_CAP_DIR_UNLINK (aka Fr), and we know that the dentry being
    removed is the primary link, then then we can fire off an unlink
    request immediately and don't need to wait on reply before returning.

    In that situation, just fix up the dcache and link count and return
    immediately after issuing the call to the MDS. This does mean that we
    need to hold an extra reference to the inode being unlinked, and extra
    references to the caps to avoid races. Those references are put and
    error handling is done in the r_callback routine.

    If the operation ends up failing, then set a writeback error on the
    directory inode, and the inode itself that can be fetched later by
    an fsync on the dir.

    The behavior of dir caps is slightly different from caps on normal
    files. Because these are just considered an optimization, if the
    session is reconnected, we will not automatically reclaim them. They
    are instead considered lost until we do another synchronous op in the
    parent directory.

    Async dirops are enabled via the "nowsync" mount option, which is
    patterned after the xfs "wsync" mount option. For now, the default
    is "wsync", but eventually we may flip that.

    Signed-off-by: Jeff Layton
    Reviewed-by: "Yan, Zheng"
    Signed-off-by: Ilya Dryomov

    Jeff Layton
     
  • When we issue an async create, we must ensure that any later on-the-wire
    requests involving it wait for the create reply.

    Expand i_ceph_flags to be an unsigned long, and add a new bit that
    MDS requests can wait on. If the bit is set in the inode when sending
    caps, then don't send it and just return that it has been delayed.

    Signed-off-by: Jeff Layton
    Reviewed-by: "Yan, Zheng"
    Signed-off-by: Ilya Dryomov

    Jeff Layton
     
  • Newer versions of the MDS will flag a dentry as "primary". In later
    patches, we'll need to consult this info, so track it in di->flags.

    Signed-off-by: Jeff Layton
    Reviewed-by: "Yan, Zheng"
    Signed-off-by: Ilya Dryomov

    Jeff Layton
     

27 Jan, 2020

1 commit


02 Dec, 2019

1 commit

  • Pull removal of most of fs/compat_ioctl.c from Arnd Bergmann:
    "As part of the cleanup of some remaining y2038 issues, I came to
    fs/compat_ioctl.c, which still has a couple of commands that need
    support for time64_t.

    In completely unrelated work, I spent time on cleaning up parts of
    this file in the past, moving things out into drivers instead.

    After Al Viro reviewed an earlier version of this series and did a lot
    more of that cleanup, I decided to try to completely eliminate the
    rest of it and move it all into drivers.

    This series incorporates some of Al's work and many patches of my own,
    but in the end stops short of actually removing the last part, which
    is the scsi ioctl handlers. I have patches for those as well, but they
    need more testing or possibly a rewrite"

    * tag 'compat-ioctl-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: (42 commits)
    scsi: sd: enable compat ioctls for sed-opal
    pktcdvd: add compat_ioctl handler
    compat_ioctl: move SG_GET_REQUEST_TABLE handling
    compat_ioctl: ppp: move simple commands into ppp_generic.c
    compat_ioctl: handle PPPIOCGIDLE for 64-bit time_t
    compat_ioctl: move PPPIOCSCOMPRESS to ppp_generic
    compat_ioctl: unify copy-in of ppp filters
    tty: handle compat PPP ioctls
    compat_ioctl: move SIOCOUTQ out of compat_ioctl.c
    compat_ioctl: handle SIOCOUTQNSD
    af_unix: add compat_ioctl support
    compat_ioctl: reimplement SG_IO handling
    compat_ioctl: move WDIOC handling into wdt drivers
    fs: compat_ioctl: move FITRIM emulation into file systems
    gfs2: add compat_ioctl support
    compat_ioctl: remove unused convert_in_user macro
    compat_ioctl: remove last RAID handling code
    compat_ioctl: remove /dev/raw ioctl translation
    compat_ioctl: remove PCI ioctl translation
    compat_ioctl: remove joystick ioctl translation
    ...

    Linus Torvalds
     

30 Oct, 2019

1 commit

  • For RCU case ->d_revalidate() is called with rcu_read_lock() and
    without pinning the dentry passed to it. Which means that it
    can't rely upon ->d_inode remaining stable; that's the reason
    for d_inode_rcu(), actually.

    Make sure we don't reload ->d_inode there.

    Cc: stable@vger.kernel.org
    Signed-off-by: Al Viro
    Signed-off-by: Jeff Layton
    Signed-off-by: Ilya Dryomov

    Al Viro
     

23 Oct, 2019

1 commit

  • The ceph_ioctl function is used both for files and directories, but only
    the files support doing that in 32-bit compat mode.

    On the s390 architecture, there is also a problem with invalid 31-bit
    pointers that need to be passed through compat_ptr().

    Use the new compat_ptr_ioctl() to address both issues.

    Note: When backporting this patch to stable kernels, "compat_ioctl:
    add compat_ptr_ioctl()" is needed as well.

    Reviewed-by: "Yan, Zheng"
    Cc: stable@vger.kernel.org
    Signed-off-by: Arnd Bergmann

    Arnd Bergmann
     

21 Jul, 2019

1 commit

  • Pull dcache and mountpoint updates from Al Viro:
    "Saner handling of refcounts to mountpoints.

    Transfer the counting reference from struct mount ->mnt_mountpoint
    over to struct mountpoint ->m_dentry. That allows us to get rid of the
    convoluted games with ordering of mount shutdowns.

    The cost is in teaching shrink_dcache_{parent,for_umount} to cope with
    mixed-filesystem shrink lists, which we'll also need for the Slab
    Movable Objects patchset"

    * 'work.dcache2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    switch the remnants of releasing the mountpoint away from fs_pin
    get rid of detach_mnt()
    make struct mountpoint bear the dentry reference to mountpoint, not struct mount
    Teach shrink_dcache_parent() to cope with mixed-filesystem shrink lists
    fs/namespace.c: shift put_mountpoint() to callers of unhash_mnt()
    __detach_mounts(): lookup_mountpoint() can't return ERR_PTR() anymore
    nfs: dget_parent() never returns NULL
    ceph: don't open-code the check for dead lockref

    Linus Torvalds
     

08 Jul, 2019

4 commits

  • When creating new file/directory, use security_dentry_init_security() to
    prepare selinux context for the new inode, then send openc/mkdir request
    to MDS, together with selinux xattr.

    security_dentry_init_security() only supports single security module and
    only selinux has dentry_init_security hook. So only selinux is supported
    for now. We can add support for other security modules once kernel has a
    generic version of dentry_init_security()

    Signed-off-by: "Yan, Zheng"
    Reviewed-by: Jeff Layton
    Signed-off-by: Ilya Dryomov

    Yan, Zheng
     
  • Also rename ceph_release_acls_info() to ceph_release_acl_sec_ctx().
    And move their definitions to different files. This is preparation
    for security label support.

    Signed-off-by: "Yan, Zheng"
    Reviewed-by: Jeff Layton
    Signed-off-by: Ilya Dryomov

    Yan, Zheng
     
  • Signed-off-by: "Yan, Zheng"
    Reviewed-by: Jeff Layton
    Signed-off-by: Ilya Dryomov

    Yan, Zheng
     
  • It should call __ceph_dentry_dir_lease_touch() under dentry->d_lock.
    Besides, ceph_dentry(dentry) can be NULL when called by LOOKUP_RCU
    d_revalidate()

    Signed-off-by: "Yan, Zheng"
    Reviewed-by: Jeff Layton
    Signed-off-by: Ilya Dryomov

    Yan, Zheng
     

05 Jul, 2019

1 commit


24 Apr, 2019

1 commit


06 Mar, 2019

3 commits

  • If number of caps exceed the limit, ceph_trim_dentires() also trim
    dentries with valid leases. Trimming dentry releases references to
    associated inode, which may evict inode and release caps.

    By default, there is no limit for caps count.

    Signed-off-by: "Yan, Zheng"
    Reviewed-by: Jeff Layton
    Signed-off-by: Ilya Dryomov

    Yan, Zheng
     
  • Previous commit make VFS delete stale dentry when last reference is
    dropped. Lease also can become invalid when corresponding dentry has
    no reference. This patch make cephfs periodically scan lease list,
    delete corresponding dentry if lease is invalid.

    There are two types of lease, dentry lease and dir lease. dentry lease
    has life time and applies to singe dentry. Dentry lease is added to tail
    of a list when it's updated, leases at front of the list will expire
    first. Dir lease is CEPH_CAP_FILE_SHARED on directory inode, it applies
    to all dentries in the directory. Dentries have dir leases are added to
    another list. Dentries in the list are periodically checked in a round
    robin manner.

    Signed-off-by: "Yan, Zheng"
    Reviewed-by: Jeff Layton
    Signed-off-by: Ilya Dryomov

    Yan, Zheng
     
  • introduce ceph_d_delete(), which checks if dentry has valid lease.

    Signed-off-by: "Yan, Zheng"
    Reviewed-by: Jeff Layton
    Signed-off-by: Ilya Dryomov

    Yan, Zheng
     

03 Aug, 2018

3 commits


05 Jun, 2018

1 commit


11 Apr, 2018

1 commit

  • Pull ceph updates from Ilya Dryomov:
    "The big ticket items are:

    - support for rbd "fancy" striping (myself).

    The striping feature bit is now fully implemented, allowing mapping
    v2 images with non-default striping patterns. This completes
    support for --image-format 2.

    - CephFS quota support (Luis Henriques and Zheng Yan).

    This set is based on the new SnapRealm code in the upcoming v13.y.z
    ("Mimic") release. Quota handling will be rejected on older
    filesystems.

    - memory usage improvements in CephFS (Chengguang Xu).

    Directory specific bits have been split out of ceph_file_info and
    some effort went into improving cap reservation code to avoid OOM
    crashes.

    Also included a bunch of assorted fixes all over the place from
    Chengguang and others"

    * tag 'ceph-for-4.17-rc1' of git://github.com/ceph/ceph-client: (67 commits)
    ceph: quota: report root dir quota usage in statfs
    ceph: quota: add counter for snaprealms with quota
    ceph: quota: cache inode pointer in ceph_snap_realm
    ceph: fix root quota realm check
    ceph: don't check quota for snap inode
    ceph: quota: update MDS when max_bytes is approaching
    ceph: quota: support for ceph.quota.max_bytes
    ceph: quota: don't allow cross-quota renames
    ceph: quota: support for ceph.quota.max_files
    ceph: quota: add initial infrastructure to support cephfs quotas
    rbd: remove VLA usage
    rbd: fix spelling mistake: "reregisteration" -> "reregistration"
    ceph: rename function drop_leases() to a more descriptive name
    ceph: fix invalid point dereference for error case in mdsc destroy
    ceph: return proper bool type to caller instead of pointer
    ceph: optimize memory usage
    ceph: optimize mds session register
    libceph, ceph: add __init attribution to init funcitons
    ceph: filter out used flags when printing unused open flags
    ceph: don't wait on writeback when there is no more dirty pages
    ...

    Linus Torvalds
     

07 Apr, 2018

1 commit

  • Pull misc vfs updates from Al Viro:
    "Assorted stuff, including Christoph's I_DIRTY patches"

    * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    fs: move I_DIRTY_INODE to fs.h
    ubifs: fix bogus __mark_inode_dirty(I_DIRTY_SYNC | I_DIRTY_DATASYNC) call
    ntfs: fix bogus __mark_inode_dirty(I_DIRTY_SYNC | I_DIRTY_DATASYNC) call
    gfs2: fix bogus __mark_inode_dirty(I_DIRTY_SYNC | I_DIRTY_DATASYNC) calls
    fs: fold open_check_o_direct into do_dentry_open
    vfs: Replace stray non-ASCII homoglyph characters with their ASCII equivalents
    vfs: make sure struct filename->iname is word-aligned
    get rid of pointless includes of fs_struct.h
    [poll] annotate SAA6588_CMD_POLL users

    Linus Torvalds
     

02 Apr, 2018

6 commits

  • snap inode's i_snap_realm is not pointing to ceph_snap_realm.

    Signed-off-by: "Yan, Zheng"
    Signed-off-by: Ilya Dryomov

    Yan, Zheng
     
  • This patch changes ceph_rename so that -EXDEV is returned if an attempt is
    made to mv a file between two different dir trees with different quotas
    setup.

    Signed-off-by: Luis Henriques
    Reviewed-by: "Yan, Zheng"
    Signed-off-by: Ilya Dryomov

    Luis Henriques
     
  • This patch adds support for the max_files quota. It hooks into all the
    ceph functions that add new filesystem objects that need to be checked
    against the quota limits. When these limits are hit, -EDQUOT is returned.

    Note that we're not checking quotas on ceph_link(). ceph_link doesn't
    really create a new inode, and since the MDS doesn't update the directory
    statistics when a new (hard) link is created (only with symlinks), they
    are not accounted as a new file.

    Signed-off-by: Luis Henriques
    Reviewed-by: "Yan, Zheng"
    Signed-off-by: Ilya Dryomov

    Luis Henriques
     
  • In current code, regular file and directory use same struct
    ceph_file_info to store fs specific data so the struct has to
    include some fields which are only used for directory
    (e.g., readdir related info), when having plenty of regular files,
    it will lead to memory waste.

    This patch introduces dedicated ceph_dir_file_info cache for
    readdir related thins. So that regular file does not include those
    unused fields anymore.

    Signed-off-by: Chengguang Xu
    Reviewed-by: "Yan, Zheng"
    Signed-off-by: Ilya Dryomov

    Chengguang Xu
     
  • Variable name ci is mostly used for ceph_inode_info.
    Variable name fi is mostly used for ceph_file_info.
    Variable name cf is mostly used for ceph_cap_flush.

    Change variable name to follow above common rules
    in case of confusing.

    Signed-off-by: Chengguang Xu
    Reviewed-by: "Yan, Zheng"
    Signed-off-by: Ilya Dryomov

    Chengguang Xu
     
  • Some of dout format do not include newline in the end,
    fix for the files which are in fs/ceph and net/ceph directories,
    and changing printk to dout for printing debug info in super.c

    Signed-off-by: Chengguang Xu
    Reviewed-by: "Yan, Zheng"
    Signed-off-by: Ilya Dryomov

    Chengguang Xu
     

26 Feb, 2018

1 commit