25 May, 2022

1 commit

  • commit 75dbb685f4e8786c33ddef8279bab0eadfb0731f upstream.

    request_reinit() is not only ugly as the comment rightfully suggests,
    but also unsafe. Even though it is called with osdc->lock held for
    write in all cases, resetting the OSD request refcount can still race
    with handle_reply() and result in use-after-free. Taking linger ping
    as an example:

    handle_timeout thread handle_reply thread

    down_read(&osdc->lock)
    req = lookup_request(...)
    ...
    finish_request(req) # unregisters
    up_read(&osdc->lock)
    __complete_request(req)
    linger_ping_cb(req)

    # req->r_kref == 2 because handle_reply still holds its ref

    down_write(&osdc->lock)
    send_linger_ping(lreq)
    req = lreq->ping_req # same req
    # cancel_linger_request is NOT
    # called - handle_reply already
    # unregistered
    request_reinit(req)
    WARN_ON(req->r_kref != 1) # fires
    request_init(req)
    kref_init(req->r_kref)

    # req->r_kref == 1 after kref_init

    ceph_osdc_put_request(req)
    kref_put(req->r_kref)

    # req->r_kref == 0 after kref_put, req is freed

    !!!

    This happens because send_linger_ping() always (re)uses the same OSD
    request for watch ping requests, relying on cancel_linger_request() to
    unregister it from the OSD client and rip its messages out from the
    messenger. send_linger() does the same for watch/notify registration
    and watch reconnect requests. Unfortunately cancel_request() doesn't
    guarantee that after it returns the OSD client would be completely done
    with the OSD request -- a ref could still be held and the callback (if
    specified) could still be invoked too.

    The original motivation for request_reinit() was inability to deal with
    allocation failures in send_linger() and send_linger_ping(). Switching
    to using osdc->req_mempool (currently only used by CephFS) respects that
    and allows us to get rid of request_reinit().

    Cc: stable@vger.kernel.org
    Signed-off-by: Ilya Dryomov
    Reviewed-by: Xiubo Li
    Acked-by: Jeff Layton
    Signed-off-by: Greg Kroah-Hartman

    Ilya Dryomov
     

29 Jun, 2021

4 commits

  • Add description to fix the following W=1 kernel build warnings:

    net/ceph/cls_lock_client.c:28: warning: Function parameter or
    member 'osdc' not described in 'ceph_cls_lock'
    net/ceph/cls_lock_client.c:28: warning: Function parameter or
    member 'oid' not described in 'ceph_cls_lock'
    net/ceph/cls_lock_client.c:28: warning: Function parameter or
    member 'oloc' not described in 'ceph_cls_lock'

    [ idryomov: tweak osdc description ]

    Signed-off-by: Baokun Li
    Reviewed-by: Ilya Dryomov
    Signed-off-by: Ilya Dryomov

    Baokun Li
     
  • There is no necessary to define variable assignment, just return
    directly to simplify the steps.

    Signed-off-by: zuoqilin
    Reviewed-by: Ilya Dryomov
    Signed-off-by: Ilya Dryomov

    zuoqilin
     
  • Fix some spelling mistakes in comments:

    - enconding ==> encoding
    - ambigous ==> ambiguous
    - orignal ==> original
    - encyption ==> encryption

    Signed-off-by: Zheng Yongjun
    Reviewed-by: Ilya Dryomov
    Signed-off-by: Ilya Dryomov

    Zheng Yongjun
     
  • We never receive authorizer replies with cephx disabled, so it is
    bogus. Also, it still uses the old zero-length array style.

    Reported-by: Gustavo A. R. Silva
    Signed-off-by: Ilya Dryomov

    Ilya Dryomov
     

25 Jun, 2021

2 commits

  • Commit 61ca49a9105f ("libceph: don't set global_id until we get an
    auth ticket") delayed the setting of global_id too much. It is set
    only after all tickets are received, but in pre-nautilus clusters an
    auth ticket and the service tickets are obtained in separate steps
    (for a total of three MAuth replies). When the service tickets are
    requested, global_id is used to build an authorizer; if global_id is
    still 0 we never get them and fail to establish the session.

    Moving the setting of global_id into protocol implementations. This
    way global_id can be set exactly when an auth ticket is received, not
    sooner nor later.

    Fixes: 61ca49a9105f ("libceph: don't set global_id until we get an auth ticket")
    Signed-off-by: Ilya Dryomov
    Reviewed-by: Jeff Layton

    Ilya Dryomov
     
  • There is no result to pass in msgr2 case because authentication
    failures are reported through auth_bad_method frame and in MAuth
    case an error is returned immediately.

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Jeff Layton

    Ilya Dryomov
     

07 May, 2021

1 commit

  • Pull ceph updates from Ilya Dryomov:
    "Notable items here are

    - a series to take advantage of David Howells' netfs helper library
    from Jeff

    - three new filesystem client metrics from Xiubo

    - ceph.dir.rsnaps vxattr from Yanhu

    - two auth-related fixes from myself, marked for stable.

    Interspersed is a smattering of assorted fixes and cleanups across the
    filesystem"

    * tag 'ceph-for-5.13-rc1' of git://github.com/ceph/ceph-client: (24 commits)
    libceph: allow addrvecs with a single NONE/blank address
    libceph: don't set global_id until we get an auth ticket
    libceph: bump CephXAuthenticate encoding version
    ceph: don't allow access to MDS-private inodes
    ceph: fix up some bare fetches of i_size
    ceph: convert some PAGE_SIZE invocations to thp_size()
    ceph: support getting ceph.dir.rsnaps vxattr
    ceph: drop pinned_page parameter from ceph_get_caps
    ceph: fix inode leak on getattr error in __fh_to_dentry
    ceph: only check pool permissions for regular files
    ceph: send opened files/pinned caps/opened inodes metrics to MDS daemon
    ceph: avoid counting the same request twice or more
    ceph: rename the metric helpers
    ceph: fix kerneldoc copypasta over ceph_start_io_direct
    ceph: use attach/detach_page_private for tracking snap context
    ceph: don't use d_add in ceph_handle_snapdir
    ceph: don't clobber i_snap_caps on non-I_NEW inode
    ceph: fix fall-through warnings for Clang
    ceph: convert ceph_readpages to ceph_readahead
    ceph: convert ceph_write_begin to netfs_write_begin
    ...

    Linus Torvalds
     

04 May, 2021

1 commit

  • Normally, an unused OSD id/slot is represented by an empty addrvec.
    However, it also appears to be possible to generate an osdmap where
    an unused OSD id/slot has an addrvec with a single blank address of
    type NONE. Allow such addrvecs and make the end result be exactly
    the same as for the empty addrvec case -- leave addr intact.

    Cc: stable@vger.kernel.org # 5.11+
    Signed-off-by: Ilya Dryomov
    Reviewed-by: Jeff Layton

    Ilya Dryomov
     

28 Apr, 2021

2 commits

  • With the introduction of enforcing mode, setting global_id as soon
    as we get it in the first MAuth reply will result in EACCES if the
    connection is reset before we get the second MAuth reply containing
    an auth ticket -- because on retry we would attempt to reclaim that
    global_id with no auth ticket at hand.

    Neither ceph_auth_client nor ceph_mon_client depend on global_id
    being set ealy, so just delay the setting until we get and process
    the second MAuth reply. While at it, complain if the monitor sends
    a zero global_id or changes our global_id as the session is likely
    to fail after that.

    Cc: stable@vger.kernel.org # needs backporting for < 5.11
    Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil

    Ilya Dryomov
     
  • A dummy v3 encoding (exactly the same as v2) was introduced so that
    the monitors can distinguish broken clients that may not include their
    auth ticket in CEPHX_GET_AUTH_SESSION_KEY request on reconnects, thus
    failing to prove previous possession of their global_id (one part of
    CVE-2021-20288).

    The kernel client has always included its auth ticket, so it is
    compatible with enforcing mode as is. However we want to bump the
    encoding version to avoid having to authenticate twice on the initial
    connect -- all legacy (CephXAuthenticate < v3) are now forced do so in
    order to expose insecure global_id reclaim.

    Marking for stable since at least for 5.11 and 5.12 it is trivial
    (v2 -> v3).

    Cc: stable@vger.kernel.org # 5.11+
    URL: https://tracker.ceph.com/issues/50452
    Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil

    Ilya Dryomov
     

26 Mar, 2021

1 commit


16 Feb, 2021

2 commits

  • Commit 83aff95eb9d6 ("libceph: remove 'osdtimeout' option") deprecated
    osdtimeout over 8 years ago, but it is still recognized. Let's remove
    it entirely.

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Jeff Layton

    Ilya Dryomov
     
  • These options were introduced in 3.19 with support for message signing
    and are rather useless, as explained in commit a51983e4dd2d ("libceph:
    add nocephx_sign_messages option"). Deprecate them.

    In case there is someone out there with a cluster that lacks support
    for MSG_AUTH feature (very unlikely but has to be considered since we
    haven't formally raised the bar from argonaut to bobtail yet), make
    nocephx_sign_messages also waive MSG_AUTH requirement. This is probably
    how it should have been done in the first place -- if we aren't going
    to sign, requiring the signing feature makes no sense.

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Jeff Layton

    Ilya Dryomov
     

21 Jan, 2021

1 commit


05 Jan, 2021

2 commits

  • Since a few years, kernel addresses are no longer included in oops
    dumps, at least on x86. All we get is a symbol name with offset and
    size.

    This is a problem for ceph_connection_operations handlers, especially
    con->ops->dispatch(). All three handlers have the same name and there
    is little context to disambiguate between e.g. monitor and OSD clients
    because almost everything is inlined. gdb sneakily stops at the first
    matching symbol, so one has to resort to nm and addr2line.

    Some of these are already prefixed with mon_, osd_ or mds_. Let's do
    the same for all others.

    Signed-off-by: Ilya Dryomov
    Acked-by: Jeff Layton

    Ilya Dryomov
     
  • Try and avoid leaving bits and pieces of session key and connection
    secret (gets split into GCM key and a pair of GCM IVs) around.

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Jeff Layton

    Ilya Dryomov
     

29 Dec, 2020

2 commits

  • crypto_shash_setkey() and crypto_aead_setkey() will do a (small)
    GFP_ATOMIC allocation to align the key if it isn't suitably aligned.
    It's not a big deal, but at the same time easy to avoid.

    The actual alignment requirement is dynamic, queryable with
    crypto_shash_alignmask() and crypto_aead_alignmask(), but shouldn't
    be stricter than 16 bytes for our algorithms.

    Fixes: cd1a677cad99 ("libceph, ceph: implement msgr2.1 protocol (crc and secure modes)")
    Signed-off-by: Ilya Dryomov

    Ilya Dryomov
     
  • auth_signature frame is 68 bytes in plain mode and 96 bytes in
    secure mode but we are requesting 68 bytes in both modes. By luck,
    this doesn't actually result in any invalid memory accesses because
    the allocation is satisfied out of kmalloc-96 slab and so exactly
    96 bytes are allocated, but KASAN rightfully complains.

    Fixes: cd1a677cad99 ("libceph, ceph: implement msgr2.1 protocol (crc and secure modes)")
    Reported-by: Luis Henriques
    Signed-off-by: Ilya Dryomov

    Ilya Dryomov
     

18 Dec, 2020

1 commit

  • Pull ceph updates from Ilya Dryomov:
    "The big ticket item here is support for msgr2 on-wire protocol, which
    adds the option of full in-transit encryption using AES-GCM algorithm
    (myself).

    On top of that we have a series to avoid intermittent errors during
    recovery with recover_session=clean and some MDS request encoding work
    from Jeff, a cap handling fix and assorted observability improvements
    from Luis and Xiubo and a good number of cleanups.

    Luis also ran into a corner case with quotas which sadly means that we
    are back to denying cross-quota-realm renames"

    * tag 'ceph-for-5.11-rc1' of git://github.com/ceph/ceph-client: (59 commits)
    libceph: drop ceph_auth_{create,update}_authorizer()
    libceph, ceph: make use of __ceph_auth_get_authorizer() in msgr1
    libceph, ceph: implement msgr2.1 protocol (crc and secure modes)
    libceph: introduce connection modes and ms_mode option
    libceph, rbd: ignore addr->type while comparing in some cases
    libceph, ceph: get and handle cluster maps with addrvecs
    libceph: factor out finish_auth()
    libceph: drop ac->ops->name field
    libceph: amend cephx init_protocol() and build_request()
    libceph, ceph: incorporate nautilus cephx changes
    libceph: safer en/decoding of cephx requests and replies
    libceph: more insight into ticket expiry and invalidation
    libceph: move msgr1 protocol specific fields to its own struct
    libceph: move msgr1 protocol implementation to its own file
    libceph: separate msgr1 protocol implementation
    libceph: export remaining protocol independent infrastructure
    libceph: export zero_page
    libceph: rename and export con->flags bits
    libceph: rename and export con->state states
    libceph: make con->state an int
    ...

    Linus Torvalds
     

15 Dec, 2020

20 commits