06 Jan, 2021

1 commit


30 Dec, 2020

1 commit

  • commit e5cafce3ad0f8652d6849314d951459c2bff7233 upstream.

    A NULL pointer dereference may occur in __ceph_remove_cap with some of the
    callbacks used in ceph_iterate_session_caps, namely trim_caps_cb and
    remove_session_caps_cb. Those callers hold the session->s_mutex, so they
    are prevented from concurrent execution, but ceph_evict_inode does not.

    Since the callers of this function hold the i_ceph_lock, the fix is simply
    a matter of returning immediately if caps->ci is NULL.

    Cc: stable@vger.kernel.org
    URL: https://tracker.ceph.com/issues/43272
    Suggested-by: Jeff Layton
    Signed-off-by: Luis Henriques
    Reviewed-by: Jeff Layton
    Signed-off-by: Ilya Dryomov
    Signed-off-by: Greg Kroah-Hartman

    Luis Henriques
     

05 Nov, 2020

1 commit

  • Some messages sent by the MDS entail a session sequence number
    increment, and the MDS will drop certain types of requests on the floor
    when the sequence numbers don't match.

    In particular, a REQUEST_CLOSE message can cross with one of the
    sequence morphing messages from the MDS which can cause the client to
    stall, waiting for a response that will never come.

    Originally, this meant an up to 5s delay before the recurring workqueue
    job kicked in and resent the request, but a recent change made it so
    that the client would never resend, causing a 60s stall unmounting and
    sometimes a blockisting event.

    Add a new helper for incrementing the session sequence and then testing
    to see whether a REQUEST_CLOSE needs to be resent, and move the handling
    of CEPH_MDS_SESSION_CLOSING into that function. Change all of the
    bare sequence counter increments to use the new helper.

    Reorganize check_session_state with a switch statement. It should no
    longer be called when the session is CLOSING, so throw a warning if it
    ever is (but still handle that case sanely).

    [ idryomov: whitespace, pr_err() call fixup ]

    URL: https://tracker.ceph.com/issues/47563
    Fixes: fa9967734227 ("ceph: fix potential mdsc use-after-free crash")
    Reported-by: Patrick Donnelly
    Signed-off-by: Jeff Layton
    Reviewed-by: Ilya Dryomov
    Reviewed-by: Xiubo Li
    Signed-off-by: Ilya Dryomov

    Jeff Layton
     

25 Oct, 2020

1 commit

  • Pull misc vfs updates from Al Viro:
    "Assorted stuff all over the place (the largest group here is
    Christoph's stat cleanups)"

    * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    fs: remove KSTAT_QUERY_FLAGS
    fs: remove vfs_stat_set_lookup_flags
    fs: move vfs_fstatat out of line
    fs: implement vfs_stat and vfs_lstat in terms of vfs_fstatat
    fs: remove vfs_statx_fd
    fs: omfs: use kmemdup() rather than kmalloc+memcpy
    [PATCH] reduce boilerplate in fsid handling
    fs: Remove duplicated flag O_NDELAY occurring twice in VALID_OPEN_FLAGS
    selftests: mount: add nosymfollow tests
    Add a "nosymfollow" mount option.

    Linus Torvalds
     

12 Oct, 2020

21 commits


19 Sep, 2020

1 commit


29 Aug, 2020

1 commit

  • Pull ceph fixes from Ilya Dryomov:
    "We have an inode number handling change, prompted by s390x which is a
    64-bit architecture with a 32-bit ino_t, a patch to disallow leases to
    avoid potential data integrity issues when CephFS is re-exported via
    NFS or CIFS and a fix for the bulk of W=1 compilation warnings"

    * tag 'ceph-for-5.9-rc3' of git://github.com/ceph/ceph-client:
    ceph: don't allow setlease on cephfs
    ceph: fix inode number handling on arches with 32-bit ino_t
    libceph: add __maybe_unused to DEFINE_CEPH_FEATURE

    Linus Torvalds
     

25 Aug, 2020

1 commit

  • Leases don't currently work correctly on kcephfs, as they are not broken
    when caps are revoked. They could eventually be implemented similarly to
    how we did them in libcephfs, but for now don't allow them.

    [ idryomov: no need for simple_nosetlease() in ceph_dir_fops and
    ceph_snapdir_fops ]

    Signed-off-by: Jeff Layton
    Reviewed-by: Ilya Dryomov
    Signed-off-by: Ilya Dryomov

    Jeff Layton
     

24 Aug, 2020

2 commits

  • Tuan and Ulrich mentioned that they were hitting a problem on s390x,
    which has a 32-bit ino_t value, even though it's a 64-bit arch (for
    historical reasons).

    I think the current handling of inode numbers in the ceph driver is
    wrong. It tries to use 32-bit inode numbers on 32-bit arches, but that's
    actually not a problem. 32-bit arches can deal with 64-bit inode numbers
    just fine when userland code is compiled with LFS support (the common
    case these days).

    What we really want to do is just use 64-bit numbers everywhere, unless
    someone has mounted with the ino32 mount option. In that case, we want
    to ensure that we hash the inode number down to something that will fit
    in 32 bits before presenting the value to userland.

    Add new helper functions that do this, and only do the conversion before
    presenting these values to userland in getattr and readdir.

    The inode table hashvalue is changed to just cast the inode number to
    unsigned long, as low-order bits are the most likely to vary anyway.

    While it's not strictly required, we do want to put something in
    inode->i_ino. Instead of basing it on BITS_PER_LONG, however, base it on
    the size of the ino_t type.

    NOTE: This is a user-visible change on 32-bit arches:

    1/ inode numbers will be seen to have changed between kernel versions.
    32-bit arches will see large inode numbers now instead of the hashed
    ones they saw before.

    2/ any really old software not built with LFS support may start failing
    stat() calls with -EOVERFLOW on inode numbers >2^32. Nothing much we
    can do about these, but hopefully the intersection of people running
    such code on ceph will be very small.

    The workaround for both problems is to mount with "-o ino32".

    [ idryomov: changelog tweak ]

    URL: https://tracker.ceph.com/issues/46828
    Reported-by: Ulrich Weigand
    Reported-and-Tested-by: Tuan Hoang1
    Signed-off-by: Jeff Layton
    Reviewed-by: "Yan, Zheng"
    Signed-off-by: Ilya Dryomov

    Jeff Layton
     
  • Replace the existing /* fall through */ comments and its variants with
    the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
    fall-through markings when it is the case.

    [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

    Signed-off-by: Gustavo A. R. Silva

    Gustavo A. R. Silva
     

05 Aug, 2020

4 commits

  • Most session messages contain a feature mask, but the MDS will
    routinely send a REJECT message with one that is zero-length.

    Commit 0fa8263367db ("ceph: fix endianness bug when handling MDS
    session feature bits") fixed the decoding of the feature mask,
    but failed to account for the MDS sending a zero-length feature
    mask. This causes REJECT message decoding to fail.

    Skip trying to decode a feature mask if the word count is zero.

    Cc: stable@vger.kernel.org
    URL: https://tracker.ceph.com/issues/46823
    Fixes: 0fa8263367db ("ceph: fix endianness bug when handling MDS session feature bits")
    Signed-off-by: Jeff Layton
    Reviewed-by: Ilya Dryomov
    Tested-by: Patrick Donnelly
    Signed-off-by: Ilya Dryomov

    Jeff Layton
     
  • When doing some tests with multiple mds, we were seeing many mds
    forwarding requests between them, causing clients to resend.

    If the request is a modification operation and the mode is set to
    USE_AUTH_MDS, then the auth mds should be selected to handle the
    request. If auth mds for frag is already set, then it should be returned
    directly without further processing.

    The current logic is wrong because it only returns directly if
    mode is USE_AUTH_MDS, but we want to do that for all modes. If we don't,
    then when the frag's mds is not equal to cap session's mds, the request
    will get sent to the wrong MDS needlessly.

    Drop the mode check in this condition.

    Signed-off-by: Yanhu Cao
    Reviewed-by: Jeff Layton
    Signed-off-by: Ilya Dryomov

    Yanhu Cao
     
  • When doing some testing recently, I hit some page allocation failures
    on mount, when creating the wb_pagevec_pool for the mount. That
    requires 128k (32 contiguous pages), and after thrashing the memory
    during an xfstests run, sometimes that would fail.

    128k for each mount seems like a lot to hold in reserve for a rainy
    day, so let's change this to a global mempool that gets allocated
    when the module is plugged in.

    Signed-off-by: Jeff Layton
    Reviewed-by: Ilya Dryomov
    Signed-off-by: Ilya Dryomov

    Jeff Layton
     
  • Symlink inodes should have the security context set in their xattrs on
    creation. We already set the context on creation, but we don't attach
    the pagelist. The effect is that symlink inodes don't get an SELinux
    context set on them at creation, so they end up unlabeled instead of
    inheriting the proper context. Make it do so.

    Cc: stable@vger.kernel.org
    Signed-off-by: Jeff Layton
    Reviewed-by: Ilya Dryomov
    Signed-off-by: Ilya Dryomov

    Jeff Layton
     

03 Aug, 2020

6 commits

  • The variable mds is being initialized with a value that is never read
    and it is being updated later with a new value. The initialization is
    redundant and can be removed.

    Addresses-Coverity: ("Unused value")
    Signed-off-by: Colin Ian King
    Reviewed-by: Jeff Layton
    Signed-off-by: Ilya Dryomov

    Colin Ian King
     
  • If the ceph_mdsc_init() fails, it will free the mdsc already.

    Reported-by: syzbot+b57f46d8d6ea51960b8c@syzkaller.appspotmail.com
    Signed-off-by: Xiubo Li
    Reviewed-by: Jeff Layton
    Signed-off-by: Ilya Dryomov

    Xiubo Li
     
  • Fix build warnings:

    fs/ceph/mdsmap.c: In function ‘ceph_mdsmap_decode’:
    fs/ceph/mdsmap.c:192:7: warning: variable ‘info_cv’ set but not used [-Wunused-but-set-variable]
    fs/ceph/mdsmap.c:177:7: warning: variable ‘state_seq’ set but not used [-Wunused-but-set-variable]
    fs/ceph/mdsmap.c:123:15: warning: variable ‘mdsmap_cv’ set but not used [-Wunused-but-set-variable]

    Note that p is increased in ceph_decode_*.

    Signed-off-by: Jia Yang
    Reviewed-by: Ilya Dryomov
    Signed-off-by: Ilya Dryomov

    Jia Yang
     
  • Drop duplicated words "down" and "the" in fs/ceph/.

    [ idryomov: merge into a single patch ]

    Signed-off-by: Randy Dunlap
    Reviewed-by: Jeff Layton
    Signed-off-by: Ilya Dryomov

    Randy Dunlap
     
  • Send metric flags to the MDS, indicating what metrics the client
    supports. Currently that consists of cap statistics, and read, write and
    metadata latencies.

    URL: https://tracker.ceph.com/issues/43435
    Signed-off-by: Xiubo Li
    Reviewed-by: Jeff Layton
    Signed-off-by: Ilya Dryomov

    Xiubo Li
     
  • This will send the caps/read/write/metadata metrics to any available MDS
    once per second, which will be the same as the userland client. It will
    skip the MDS sessions which don't support the metric collection, as the
    MDSs will close socket connections when they get an unknown type
    message.

    We can disable the metric sending via the disable_send_metrics module
    parameter.

    [ jlayton: fix up endianness bug in ceph_mdsc_send_metrics() ]

    URL: https://tracker.ceph.com/issues/43215
    Signed-off-by: Xiubo Li
    Signed-off-by: Jeff Layton
    Signed-off-by: Ilya Dryomov

    Xiubo Li