Eric Lee / smarc-fsl-linux-kernel

22 Jul, 2020

1 commit

07253d24c libceph: don't omit recovery_deletes in target_copy() ... Browse Code »

commit 2f3fead62144002557f322c2a7c15e1255df0653 upstream.

Currently target_copy() is used only for sending linger pings, so
this doesn't come up, but generally omitting recovery_deletes can
result in unneeded resends (force_resend in calc_target()).

Fixes: ae78dd8139ce ("libceph: make RECOVERY_DELETES feature create a new interval")
Signed-off-by: Ilya Dryomov
Reviewed-by: Jeff Layton
Signed-off-by: Greg Kroah-Hartman

Ilya Dryomov
2020-07-22 15:33:17 +0800

03 Jun, 2020

1 commit

4d145e482 libceph: ignore pool overlay and cache logic on redirects ... Browse Code »

[ Upstream commit 890bd0f8997ae6ac0a367dd5146154a3963306dd ]

OSD client should ignore cache/overlay flag if got redirect reply.
Otherwise, the client hangs when the cache tier is in forward mode.

[ idryomov: Redirects are effectively deprecated and no longer
used or tested. The original tiering modes based on redirects
are inherently flawed because redirects can race and reorder,
potentially resulting in data corruption. The new proxy and
readproxy tiering modes should be used instead of forward and
readforward. Still marking for stable as obviously correct,
though. ]

Cc: stable@vger.kernel.org
URL: https://tracker.ceph.com/issues/23296
URL: https://tracker.ceph.com/issues/36406
Signed-off-by: Jerry Lee
Reviewed-by: Ilya Dryomov
Signed-off-by: Ilya Dryomov
Signed-off-by: Sasha Levin

Jerry Lee
2020-06-03 14:21:25 +0800

02 Apr, 2020

1 commit

693860e79 libceph: fix alloc_msg_with_page_vector() memory leaks ... Browse Code »

commit e886274031200bb60965c1b9c49b7acda56a93bd upstream.

Make it so that CEPH_MSG_DATA_PAGES data item can own pages,
fixing a bunch of memory leaks for a page vector allocated in
alloc_msg_with_page_vector(). Currently, only watch-notify
messages trigger this allocation, and normally the page vector
is freed either in handle_watch_notify() or by the caller of
ceph_osdc_notify(). But if the message is freed before that
(e.g. if the session faults while reading in the message or
if the notify is stale), we leak the page vector.

This was supposed to be fixed by switching to a message-owned
pagelist, but that never happened.

Fixes: 1907920324f1 ("libceph: support for sending notifies")
Reported-by: Roman Penyaev
Signed-off-by: Ilya Dryomov
Reviewed-by: Roman Penyaev
Signed-off-by: Greg Kroah-Hartman

Ilya Dryomov
2020-04-02 21:11:02 +0800

01 Apr, 2020

1 commit

ed24820d1 ceph: check POOL_FLAG_FULL/NEARFULL in addition to OSDMAP_FULL/NEARFULL ... Browse Code »

commit 7614209736fbc4927584d4387faade4f31444fce upstream.

CEPH_OSDMAP_FULL/NEARFULL aren't set since mimic, so we need to consult
per-pool flags as well. Unfortunately the backwards compatibility here
is lacking:

- the change that deprecated OSDMAP_FULL/NEARFULL went into mimic, but
was guarded by require_osd_release >= RELEASE_LUMINOUS
- it was subsequently backported to luminous in v12.2.2, but that makes
no difference to clients that only check OSDMAP_FULL/NEARFULL because
require_osd_release is not client-facing -- it is for OSDs

Since all kernels are affected, the best we can do here is just start
checking both map flags and pool flags and send that to stable.

These checks are best effort, so take osdc->lock and look up pool flags
just once. Remove the FIXME, since filesystem quotas are checked above
and RADOS quotas are reflected in POOL_FLAG_FULL: when the pool reaches
its quota, both POOL_FLAG_FULL and POOL_FLAG_FULL_QUOTA are set.

Cc: stable@vger.kernel.org
Reported-by: Yanhu Cao
Signed-off-by: Ilya Dryomov
Reviewed-by: Jeff Layton
Acked-by: Sage Weil
Signed-off-by: Greg Kroah-Hartman

Ilya Dryomov
2020-04-01 17:01:58 +0800

16 Sep, 2019

6 commits

cf73d882c libceph: use ceph_kvmalloc() for osdmap arrays ... Browse Code »

osdmap has a bunch of arrays that grow linearly with the number of
OSDs. osd_state, osd_weight and osd_primary_affinity take 4 bytes per
OSD. osd_addr takes 136 bytes per OSD because of sockaddr_storage.
The CRUSH workspace area also grows linearly with the number of OSDs.

Normally these arrays are allocated at client startup. The osdmap is
usually updated in small incrementals, but once in a while a full map
may need to be processed. For a cluster with 10000 OSDs, this means
a bunch of 40K allocations followed by a 1.3M allocation, all of which
are currently required to be physically contiguous. This results in
sporadic ENOMEM errors, hanging the client.

Go back to manually (re)allocating arrays and use ceph_kvmalloc() to
fall back to non-contiguous allocation when necessary.

Link: https://tracker.ceph.com/issues/40481
Signed-off-by: Ilya Dryomov
Reviewed-by: Jeff Layton

Ilya Dryomov
2019-09-16 18:06:25 +0800
10c12851a libceph: avoid a __vmalloc() deadlock in ceph_kvmalloc() ... Browse Code »

The vmalloc allocator doesn't fully respect the specified gfp mask:
while the actual pages are allocated as requested, the page table pages
are always allocated with GFP_KERNEL. ceph_kvmalloc() may be called
with GFP_NOFS and GFP_NOIO (for ceph and rbd respectively), so this may
result in a deadlock.

There is no real reason for the current PAGE_ALLOC_COSTLY_ORDER logic,
it's just something that seemed sensible at the time (ceph_kvmalloc()
predates kvmalloc()). kvmalloc() is smarter: in an attempt to reduce
long term fragmentation, it first tries to kmalloc non-disruptively.

Switch to kvmalloc() and set the respective PF_MEMALLOC_* flag using
the scope API to avoid the deadlock. Note that kvmalloc() needs to be
passed GFP_KERNEL to enable the fallback.

Signed-off-by: Ilya Dryomov
Reviewed-by: Jeff Layton

Ilya Dryomov
2019-09-16 18:06:25 +0800
8edf84ba4 libceph: drop unused con parameter of calc_target() ... Browse Code »

This bit was omitted from a561372405cf ("libceph: fix PG split vs OSD
(re)connect race") to avoid backport conflicts.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2019-09-16 18:06:25 +0800
4766815b1 libceph: handle OSD op ceph_pagelist_append() errors ... Browse Code »

osd_req_op_cls_init() and osd_req_op_xattr_init() currently propagate
ceph_pagelist_alloc() ENOMEM errors but ignore ceph_pagelist_append()
memory allocation failures. Add these checks and cleanup on error.

Signed-off-by: David Disseldorp
Reviewed-by: Jeff Layton
Signed-off-by: Ilya Dryomov

David Disseldorp
2019-09-16 18:06:25 +0800
2cef0ba80 libceph: add function that clears osd client's abort_err ... Browse Code »

Signed-off-by: "Yan, Zheng"
Reviewed-by: Jeff Layton
Signed-off-by: Ilya Dryomov

Yan, Zheng
2019-09-16 18:06:23 +0800
120a75ea9 libceph: add function that reset client's entity addr ... Browse Code »

This function also re-open connections to OSD/MON, and re-send in-flight
OSD requests after re-opening connections to OSD.

Signed-off-by: "Yan, Zheng"
Reviewed-by: Jeff Layton
Signed-off-by: Ilya Dryomov

Yan, Zheng
2019-09-16 18:06:23 +0800

28 Aug, 2019

1 commit

e8c99200b libceph: don't call crypto_free_sync_skcipher() on a NULL tfm ... Browse Code »

In set_secret(), key->tfm is assigned to NULL on line 55, and then
ceph_crypto_key_destroy(key) is executed.

ceph_crypto_key_destroy(key)
crypto_free_sync_skcipher(key->tfm)
crypto_free_skcipher(&tfm->base);

This happens to work because crypto_sync_skcipher is a trivial wrapper
around crypto_skcipher: &tfm->base is still 0 and crypto_free_skcipher()
handles that. Let's not rely on the layout of crypto_sync_skcipher.

This bug is found by a static analysis tool STCheck written by us.

Fixes: 69d6302b65a8 ("libceph: Remove VLA usage of skcipher").
Signed-off-by: Jia-Ju Bai
Reviewed-by: Ilya Dryomov
Signed-off-by: Ilya Dryomov

Jia-Ju Bai
2019-08-28 18:33:46 +0800

22 Aug, 2019

1 commit

a56137240 libceph: fix PG split vs OSD (re)connect race ... Browse Code »

We can't rely on ->peer_features in calc_target() because it may be
called both when the OSD session is established and open and when it's
not. ->peer_features is not valid unless the OSD session is open. If
this happens on a PG split (pg_num increase), that could mean we don't
resend a request that should have been resent, hanging the client
indefinitely.

In userspace this was fixed by looking at require_osd_release and
get_xinfo[osd].features fields of the osdmap. However these fields
belong to the OSD section of the osdmap, which the kernel doesn't
decode (only the client section is decoded).

Instead, let's drop this feature check. It effectively checks for
luminous, so only pre-luminous OSDs would be affected in that on a PG
split the kernel might resend a request that should not have been
resent. Duplicates can occur in other scenarios, so both sides should
already be prepared for them: see dup/replay logic on the OSD side and
retry_attempt check on the client side.

Cc: stable@vger.kernel.org
Fixes: 7de030d6b10a ("libceph: resend on PG splits if OSD has RESEND_ON_SPLIT")
Link: https://tracker.ceph.com/issues/41162
Reported-by: Jerry Lee
Signed-off-by: Ilya Dryomov
Tested-by: Jerry Lee
Reviewed-by: Jeff Layton

Ilya Dryomov
2019-08-22 16:47:41 +0800

19 Jul, 2019

1 commit

d9b9c8930 Merge tag 'ceph-for-5.3-rc1' of git://github.com/ceph/ceph-client ... Browse Code »

Pull ceph updates from Ilya Dryomov:
"Lots of exciting things this time!

- support for rbd object-map and fast-diff features (myself). This
will speed up reads, discards and things like snap diffs on sparse
images.

- ceph.snap.btime vxattr to expose snapshot creation time (David
Disseldorp). This will be used to integrate with "Restore Previous
Versions" feature added in Windows 7 for folks who reexport ceph
through SMB.

- security xattrs for ceph (Zheng Yan). Only selinux is supported for
now due to the limitations of ->dentry_init_security().

- support for MSG_ADDR2, FS_BTIME and FS_CHANGE_ATTR features (Jeff
Layton). This is actually a single feature bit which was missing
because of the filesystem pieces. With this in, the kernel client
will finally be reported as "luminous" by "ceph features" -- it is
still being reported as "jewel" even though all required Luminous
features were implemented in 4.13.

- stop NULL-terminating ceph vxattrs (Jeff Layton). The convention
with xattrs is to not terminate and this was causing
inconsistencies with ceph-fuse.

- change filesystem time granularity from 1 us to 1 ns, again fixing
an inconsistency with ceph-fuse (Luis Henriques).

On top of this there are some additional dentry name handling and cap
flushing fixes from Zheng. Finally, Jeff is formally taking over for
Zheng as the filesystem maintainer"

* tag 'ceph-for-5.3-rc1' of git://github.com/ceph/ceph-client: (71 commits)
ceph: fix end offset in truncate_inode_pages_range call
ceph: use generic_delete_inode() for ->drop_inode
ceph: use ceph_evict_inode to cleanup inode's resource
ceph: initialize superblock s_time_gran to 1
MAINTAINERS: take over for Zheng as CephFS kernel client maintainer
rbd: setallochint only if object doesn't exist
rbd: support for object-map and fast-diff
rbd: call rbd_dev_mapping_set() from rbd_dev_image_probe()
libceph: export osd_req_op_data() macro
libceph: change ceph_osdc_call() to take page vector for response
libceph: bump CEPH_MSG_MAX_DATA_LEN (again)
rbd: new exclusive lock wait/wake code
rbd: quiescing lock should wait for image requests
rbd: lock should be quiesced on reacquire
rbd: introduce copyup state machine
rbd: rename rbd_obj_setup_*() to rbd_obj_init_*()
rbd: move OSD request allocation into object request state machines
rbd: factor out __rbd_osd_setup_discard_ops()
rbd: factor out rbd_osd_setup_copyup()
rbd: introduce obj_req->osd_reqs list
...

Linus Torvalds
2019-07-19 02:05:25 +0800

13 Jul, 2019

1 commit

f632a8170 Merge tag 'driver-core-5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core ... Browse Code »

Pull driver core and debugfs updates from Greg KH:
"Here is the "big" driver core and debugfs changes for 5.3-rc1

It's a lot of different patches, all across the tree due to some api
changes and lots of debugfs cleanups.

Other than the debugfs cleanups, in this set of changes we have:

- bus iteration function cleanups

- scripts/get_abi.pl tool to display and parse Documentation/ABI
entries in a simple way

- cleanups to Documenatation/ABI/ entries to make them parse easier
due to typos and other minor things

- default_attrs use for some ktype users

- driver model documentation file conversions to .rst

- compressed firmware file loading

- deferred probe fixes

All of these have been in linux-next for a while, with a bunch of
merge issues that Stephen has been patient with me for"

* tag 'driver-core-5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (102 commits)
debugfs: make error message a bit more verbose
orangefs: fix build warning from debugfs cleanup patch
ubifs: fix build warning after debugfs cleanup patch
driver: core: Allow subsystems to continue deferring probe
drivers: base: cacheinfo: Ensure cpu hotplug work is done before Intel RDT
arch_topology: Remove error messages on out-of-memory conditions
lib: notifier-error-inject: no need to check return value of debugfs_create functions
swiotlb: no need to check return value of debugfs_create functions
ceph: no need to check return value of debugfs_create functions
sunrpc: no need to check return value of debugfs_create functions
ubifs: no need to check return value of debugfs_create functions
orangefs: no need to check return value of debugfs_create functions
nfsd: no need to check return value of debugfs_create functions
lib: 842: no need to check return value of debugfs_create functions
debugfs: provide pr_fmt() macro
debugfs: log errors when something goes wrong
drivers: s390/cio: Fix compilation warning about const qualifiers
drivers: Add generic helper to match by of_node
driver_find_device: Unify the match function with class_find_device()
bus_find_device: Unify the match callback with class_find_device
...

Linus Torvalds
2019-07-13 03:24:03 +0800

11 Jul, 2019

1 commit

028db3e29 Revert "Merge tag 'keys-acl-20190703' of git://git.kernel.org/pub/scm/linux/kern… ... Browse Code »

…el/git/dhowells/linux-fs"

This reverts merge 0f75ef6a9cff49ff612f7ce0578bced9d0b38325 (and thus
effectively commits

7a1ade847596 ("keys: Provide KEYCTL_GRANT_PERMISSION")
2e12256b9a76 ("keys: Replace uid/gid/perm permissions checking with an ACL")

that the merge brought in).

It turns out that it breaks booting with an encrypted volume, and Eric
biggers reports that it also breaks the fscrypt tests [1] and loading of
in-kernel X.509 certificates [2].

The root cause of all the breakage is likely the same, but David Howells
is off email so rather than try to work it out it's getting reverted in
order to not impact the rest of the merge window.

[1] https://lore.kernel.org/lkml/20190710011559.GA7973@sol.localdomain/
[2] https://lore.kernel.org/lkml/20190710013225.GB7973@sol.localdomain/

Link: https://lore.kernel.org/lkml/CAHk-=wjxoeMJfeBahnWH=9zShKp2bsVy527vo3_y8HfOdhwAAw@mail.gmail.com/
Reported-by: Eric Biggers <ebiggers@kernel.org>
Cc: David Howells <dhowells@redhat.com>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Linus Torvalds
2019-07-11 09:43:43 +0800

09 Jul, 2019

2 commits

0f75ef6a9 Merge tag 'keys-acl-20190703' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs ... Browse Code »

Pull keyring ACL support from David Howells:
"This changes the permissions model used by keys and keyrings to be
based on an internal ACL by the following means:

- Replace the permissions mask internally with an ACL that contains a
list of ACEs, each with a specific subject with a permissions mask.
Potted default ACLs are available for new keys and keyrings.

ACE subjects can be macroised to indicate the UID and GID specified
on the key (which remain). Future commits will be able to add
additional subject types, such as specific UIDs or domain
tags/namespaces.

Also split a number of permissions to give finer control. Examples
include splitting the revocation permit from the change-attributes
permit, thereby allowing someone to be granted permission to revoke
a key without allowing them to change the owner; also the ability
to join a keyring is split from the ability to link to it, thereby
stopping a process accessing a keyring by joining it and thus
acquiring use of possessor permits.

- Provide a keyctl to allow the granting or denial of one or more
permits to a specific subject. Direct access to the ACL is not
granted, and the ACL cannot be viewed"

* tag 'keys-acl-20190703' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
keys: Provide KEYCTL_GRANT_PERMISSION
keys: Replace uid/gid/perm permissions checking with an ACL

Linus Torvalds
2019-07-09 10:56:57 +0800
c84ca912b Merge tag 'keys-namespace-20190627' of git://git.kernel.org/pub/scm/linux/kernel… ... Browse Code »

…/git/dhowells/linux-fs

Pull keyring namespacing from David Howells:
"These patches help make keys and keyrings more namespace aware.

Firstly some miscellaneous patches to make the process easier:

- Simplify key index_key handling so that the word-sized chunks
assoc_array requires don't have to be shifted about, making it
easier to add more bits into the key.

- Cache the hash value in the key so that we don't have to calculate
on every key we examine during a search (it involves a bunch of
multiplications).

- Allow keying_search() to search non-recursively.

Then the main patches:

- Make it so that keyring names are per-user_namespace from the point
of view of KEYCTL_JOIN_SESSION_KEYRING so that they're not
accessible cross-user_namespace.

keyctl_capabilities() shows KEYCTL_CAPS1_NS_KEYRING_NAME for this.

- Move the user and user-session keyrings to the user_namespace
rather than the user_struct. This prevents them propagating
directly across user_namespaces boundaries (ie. the KEY_SPEC_*
flags will only pick from the current user_namespace).

- Make it possible to include the target namespace in which the key
shall operate in the index_key. This will allow the possibility of
multiple keys with the same description, but different target
domains to be held in the same keyring.

keyctl_capabilities() shows KEYCTL_CAPS1_NS_KEY_TAG for this.

- Make it so that keys are implicitly invalidated by removal of a
domain tag, causing them to be garbage collected.

- Institute a network namespace domain tag that allows keys to be
differentiated by the network namespace in which they operate. New
keys that are of a type marked 'KEY_TYPE_NET_DOMAIN' are assigned
the network domain in force when they are created.

- Make it so that the desired network namespace can be handed down
into the request_key() mechanism. This allows AFS, NFS, etc. to
request keys specific to the network namespace of the superblock.

This also means that the keys in the DNS record cache are
thenceforth namespaced, provided network filesystems pass the
appropriate network namespace down into dns_query().

For DNS, AFS and NFS are good, whilst CIFS and Ceph are not. Other
cache keyrings, such as idmapper keyrings, also need to set the
domain tag - for which they need access to the network namespace of
the superblock"

* tag 'keys-namespace-20190627' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
keys: Pass the network namespace into request_key mechanism
keys: Network namespace domain tag
keys: Garbage collect keys for which the domain has been removed
keys: Include target namespace in match criteria
keys: Move the user and user-session keyrings to the user_namespace
keys: Namespace keyring names
keys: Add a 'recurse' flag for keyring searches
keys: Cache the hash value to avoid lots of recalculation
keys: Simplify key description management

Linus Torvalds
2019-07-09 10:36:47 +0800

08 Jul, 2019

14 commits

22e8bd51b rbd: support for object-map and fast-diff ... Browse Code »

Speed up reads, discards and zeroouts through RBD_OBJ_FLAG_MAY_EXIST
and RBD_OBJ_FLAG_NOOP_FOR_NONEXISTENT based on object map.

Invalid object maps are not trusted, but still updated. Note that we
never iterate, resize or invalidate object maps. If object-map feature
is enabled but object map fails to load, we just fail the requester
(either "rbd map" or I/O, by way of post-acquire action).

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2019-07-08 20:01:45 +0800
4cf3e6dff libceph: export osd_req_op_data() macro ... Browse Code »

We already have one exported wrapper around it for extent.osd_data and
rbd_object_map_update_finish() needs another one for cls.request_data.

Signed-off-by: Ilya Dryomov
Reviewed-by: Dongsheng Yang
Reviewed-by: Jeff Layton

Ilya Dryomov
2019-07-08 20:01:45 +0800
68ada915e libceph: change ceph_osdc_call() to take page vector for response ... Browse Code »

This will be used for loading object map. rbd_obj_read_sync() isn't
suitable because object map must be accessed through class methods.

Signed-off-by: Ilya Dryomov
Reviewed-by: Dongsheng Yang
Reviewed-by: Jeff Layton

Ilya Dryomov
2019-07-08 20:01:45 +0800
94e857718 libceph: rename r_unsafe_item to r_private_item ... Browse Code »

This list item remained from when we had safe and unsafe replies
(commit vs ack). It has since become a private list item for use by
clients.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2019-07-08 20:01:44 +0800
2c66de560 libceph: rename ceph_encode_addr to ceph_encode_banner_addr ... Browse Code »

...ditto for the decode function. We only use these functions to fix
up banner addresses now, so let's name them more appropriately.

Signed-off-by: Jeff Layton
Reviewed-by: "Yan, Zheng"
Signed-off-by: Ilya Dryomov

Jeff Layton
2019-07-08 20:01:43 +0800
d3c3c0a84 libceph: use TYPE_LEGACY for entity addrs instead of TYPE_NONE ... Browse Code »

Going forward, we'll have different address types so let's use
the addr2 TYPE_LEGACY for internal tracking rather than TYPE_NONE.

Also, make ceph_pr_addr print the address type value as well.

Signed-off-by: Jeff Layton
Reviewed-by: "Yan, Zheng"
Signed-off-by: Ilya Dryomov

Jeff Layton
2019-07-08 20:01:43 +0800
2f9800c89 ceph: fix decode_locker to use ceph_decode_entity_addr ... Browse Code »

Signed-off-by: Jeff Layton
Reviewed-by: "Yan, Zheng"
Signed-off-by: Ilya Dryomov

Jeff Layton
2019-07-08 20:01:43 +0800
8cb5f2b4f libceph: correctly decode ADDR2 addresses in incremental OSD maps ... Browse Code »

Given the new format, we have to decode the addresses twice. Once to
skip past the new_up_client field, and a second time to collect the
addresses.

Signed-off-by: Jeff Layton
Reviewed-by: "Yan, Zheng"
Signed-off-by: Ilya Dryomov

Jeff Layton
2019-07-08 20:01:43 +0800
51fc7ab44 libceph: fix watch_item_t decoding to use ceph_decode_entity_addr ... Browse Code »

While we're in there, let's also fix up the decoder to do proper
bounds checking.

Signed-off-by: Jeff Layton
Reviewed-by: "Yan, Zheng"
Signed-off-by: Ilya Dryomov

Jeff Layton
2019-07-08 20:01:43 +0800
dcbc919a5 libceph: switch osdmap decoding to use ceph_decode_entity_addr ... Browse Code »

Signed-off-by: Jeff Layton
Reviewed-by: "Yan, Zheng"
Signed-off-by: Ilya Dryomov

Jeff Layton
2019-07-08 20:01:43 +0800
0bfb0f288 libceph: ADDR2 support for monmap ... Browse Code »

Switch the MonMap decoder to use the new decoding routine for
entity_addr_t's.

Signed-off-by: Jeff Layton
Reviewed-by: "Yan, Zheng"
Signed-off-by: Ilya Dryomov

Jeff Layton
2019-07-08 20:01:43 +0800
6c37f0e64 libceph: add ceph_decode_entity_addr ... Browse Code »

Add a function for decoding an entity_addr_t. Once
CEPH_FEATURE_MSG_ADDR2 is enabled, the server daemons will start
encoding entity_addr_t differently.

Add a new helper function that can handle either format.

Signed-off-by: Jeff Layton
Reviewed-by: "Yan, Zheng"
Signed-off-by: Ilya Dryomov

Jeff Layton
2019-07-08 20:01:43 +0800
bc07532cc libceph: fix sa_family just after reading address ... Browse Code »

It doesn't make sense to leave it undecoded until later.

Signed-off-by: Jeff Layton
Reviewed-by: "Yan, Zheng"
Signed-off-by: Ilya Dryomov

Jeff Layton
2019-07-08 20:01:43 +0800
97a385e55 libceph: remove ceph_get_direct_page_vector() ... Browse Code »

This function is entirely unused.

Signed-off-by: Christoph Hellwig
Reviewed-by: Ilya Dryomov
Signed-off-by: Ilya Dryomov

Christoph Hellwig
2019-07-08 20:01:40 +0800

03 Jul, 2019

1 commit

1a829ff2a ceph: no need to check return value of debugfs_create functions ... Browse Code »

When calling debugfs functions, there is no need to ever check the
return value. The function can work or not, but the code logic should
never do something different based on this.

This cleanup allows the return value of the functions to be made void,
as no logic should care if these files succeed or not.

Cc: "Yan, Zheng"
Cc: Sage Weil
Cc: Ilya Dryomov
Cc: "David S. Miller"
Cc: ceph-devel@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman
Link: https://lore.kernel.org/r/20190612145538.GA18772@kroah.com
Signed-off-by: Greg Kroah-Hartman

Greg Kroah-Hartman
2019-07-03 22:57:18 +0800

28 Jun, 2019

2 commits

2e12256b9 keys: Replace uid/gid/perm permissions checking with an ACL ... Browse Code »

Replace the uid/gid/perm permissions checking on a key with an ACL to allow
the SETATTR and SEARCH permissions to be split. This will also allow a
greater range of subjects to represented.

============
WHY DO THIS?
============

The problem is that SETATTR and SEARCH cover a slew of actions, not all of
which should be grouped together.

For SETATTR, this includes actions that are about controlling access to a
key:

(1) Changing a key's ownership.

(2) Changing a key's security information.

(3) Setting a keyring's restriction.

And actions that are about managing a key's lifetime:

(4) Setting an expiry time.

(5) Revoking a key.

and (proposed) managing a key as part of a cache:

(6) Invalidating a key.

Managing a key's lifetime doesn't really have anything to do with
controlling access to that key.

Expiry time is awkward since it's more about the lifetime of the content
and so, in some ways goes better with WRITE permission. It can, however,
be set unconditionally by a process with an appropriate authorisation token
for instantiating a key, and can also be set by the key type driver when a
key is instantiated, so lumping it with the access-controlling actions is
probably okay.

As for SEARCH permission, that currently covers:

(1) Finding keys in a keyring tree during a search.

(2) Permitting keyrings to be joined.

(3) Invalidation.

But these don't really belong together either, since these actions really
need to be controlled separately.

Finally, there are number of special cases to do with granting the
administrator special rights to invalidate or clear keys that I would like
to handle with the ACL rather than key flags and special checks.

===============
WHAT IS CHANGED
===============

The SETATTR permission is split to create two new permissions:

(1) SET_SECURITY - which allows the key's owner, group and ACL to be
changed and a restriction to be placed on a keyring.

(2) REVOKE - which allows a key to be revoked.

The SEARCH permission is split to create:

(1) SEARCH - which allows a keyring to be search and a key to be found.

(2) JOIN - which allows a keyring to be joined as a session keyring.

(3) INVAL - which allows a key to be invalidated.

The WRITE permission is also split to create:

(1) WRITE - which allows a key's content to be altered and links to be
added, removed and replaced in a keyring.

(2) CLEAR - which allows a keyring to be cleared completely. This is
split out to make it possible to give just this to an administrator.

(3) REVOKE - see above.

Keys acquire ACLs which consist of a series of ACEs, and all that apply are
unioned together. An ACE specifies a subject, such as:

(*) Possessor - permitted to anyone who 'possesses' a key
(*) Owner - permitted to the key owner
(*) Group - permitted to the key group
(*) Everyone - permitted to everyone

Note that 'Other' has been replaced with 'Everyone' on the assumption that
you wouldn't grant a permit to 'Other' that you wouldn't also grant to
everyone else.

Further subjects may be made available by later patches.

The ACE also specifies a permissions mask. The set of permissions is now:

VIEW Can view the key metadata
READ Can read the key content
WRITE Can update/modify the key content
SEARCH Can find the key by searching/requesting
LINK Can make a link to the key
SET_SECURITY Can change owner, ACL, expiry
INVAL Can invalidate
REVOKE Can revoke
JOIN Can join this keyring
CLEAR Can clear this keyring

The KEYCTL_SETPERM function is then deprecated.

The KEYCTL_SET_TIMEOUT function then is permitted if SET_SECURITY is set,
or if the caller has a valid instantiation auth token.

The KEYCTL_INVALIDATE function then requires INVAL.

The KEYCTL_REVOKE function then requires REVOKE.

The KEYCTL_JOIN_SESSION_KEYRING function then requires JOIN to join an
existing keyring.

The JOIN permission is enabled by default for session keyrings and manually
created keyrings only.

======================
BACKWARD COMPATIBILITY
======================

To maintain backward compatibility, KEYCTL_SETPERM will translate the
permissions mask it is given into a new ACL for a key - unless
KEYCTL_SET_ACL has been called on that key, in which case an error will be
returned.

It will convert possessor, owner, group and other permissions into separate
ACEs, if each portion of the mask is non-zero.

SETATTR permission turns on all of INVAL, REVOKE and SET_SECURITY. WRITE
permission turns on WRITE, REVOKE and, if a keyring, CLEAR. JOIN is turned
on if a keyring is being altered.

The KEYCTL_DESCRIBE function translates the ACL back into a permissions
mask to return depending on possessor, owner, group and everyone ACEs.

It will make the following mappings:

(1) INVAL, JOIN -> SEARCH

(2) SET_SECURITY -> SETATTR

(3) REVOKE -> WRITE if SETATTR isn't already set

(4) CLEAR -> WRITE

Note that the value subsequently returned by KEYCTL_DESCRIBE may not match
the value set with KEYCTL_SETATTR.

=======
TESTING
=======

This passes the keyutils testsuite for all but a couple of tests:

(1) tests/keyctl/dh_compute/badargs: The first wrong-key-type test now
returns EOPNOTSUPP rather than ENOKEY as READ permission isn't removed
if the type doesn't have ->read(). You still can't actually read the
key.

(2) tests/keyctl/permitting/valid: The view-other-permissions test doesn't
work as Other has been replaced with Everyone in the ACL.

Signed-off-by: David Howells

David Howells
2019-06-28 06:03:07 +0800
a58946c15 keys: Pass the network namespace into request_key mechanism ... Browse Code »

Create a request_key_net() function and use it to pass the network
namespace domain tag into DNS revolver keys and rxrpc/AFS keys so that keys
for different domains can coexist in the same keyring.

Signed-off-by: David Howells
cc: netdev@vger.kernel.org
cc: linux-nfs@vger.kernel.org
cc: linux-cifs@vger.kernel.org
cc: linux-afs@lists.infradead.org

David Howells
2019-06-28 06:02:12 +0800

05 Jun, 2019

1 commit

04672fe6d treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 268 ... Browse Code »

Based on 1 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation this program is
distributed in the hope that it will be useful but without any
warranty without even the implied warranty of merchantability or
fitness for a particular purpose see the gnu general public license
for more details you should have received a copy of the gnu general
public license along with this program if not write to the free
software foundation inc 51 franklin street fifth floor boston ma
02110 1301 usa

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-only

has been chosen to replace the boilerplate/reference in 46 file(s).

Signed-off-by: Thomas Gleixner
Reviewed-by: Allison Randal
Reviewed-by: Alexios Zavras
Reviewed-by: Richard Fontana
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190529141334.135501091@linutronix.de
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2019-06-05 23:30:29 +0800

21 May, 2019

2 commits

ec8f24b7f treewide: Add SPDX license identifier - Makefile/Kconfig ... Browse Code »

Add SPDX license identifiers to all Make/Kconfig files which:

- Have no license information of any form

These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:

GPL-2.0-only

Signed-off-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2019-05-21 16:50:46 +0800
09c434b8a treewide: Add SPDX license identifier for more missed files ... Browse Code »

Add SPDX license identifiers to all files which:

- Have no license information of any form

- Have MODULE_LICENCE("GPL*") inside which was used in the initial
scan/conversion to ignore the file

These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:

GPL-2.0-only

Signed-off-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2019-05-21 16:50:45 +0800

17 May, 2019

2 commits

227747fb9 Merge tag 'afs-fixes-20190516' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs ... Browse Code »

Pull misc AFS fixes from David Howells:
"This fixes a set of miscellaneous issues in the afs filesystem,
including:

- leak of keys on file close.

- broken error handling in xattr functions.

- missing locking when updating VL server list.

- volume location server DNS lookup whereby preloaded cells may not
ever get a lookup and regular DNS lookups to maintain server lists
consume power unnecessarily.

- incorrect error propagation and handling in the fileserver
iteration code causes operations to sometimes apparently succeed.

- interruption of server record check/update side op during
fileserver iteration causes uninterruptible main operations to fail
unexpectedly.

- callback promise expiry time miscalculation.

- over invalidation of the callback promise on directories.

- double locking on callback break waking up file locking waiters.

- double increment of the vnode callback break counter.

Note that it makes some changes outside of the afs code, including:

- an extra parameter to dns_query() to allow the dns_resolver key
just accessed to be immediately invalidated. AFS is caching the
results itself, so the key can be discarded.

- an interruptible version of wait_var_event().

- an rxrpc function to allow the maximum lifespan to be set on a
call.

- a way for an rxrpc call to be marked as non-interruptible"

* tag 'afs-fixes-20190516' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
afs: Fix double inc of vnode->cb_break
afs: Fix lock-wait/callback-break double locking
afs: Don't invalidate callback if AFS_VNODE_DIR_VALID not set
afs: Fix calculation of callback expiry time
afs: Make dynamic root population wait uninterruptibly for proc_cells_lock
afs: Make some RPC operations non-interruptible
rxrpc: Allow the kernel to mark a call as being non-interruptible
afs: Fix error propagation from server record check/update
afs: Fix the maximum lifespan of VL and probe calls
rxrpc: Provide kernel interface to set max lifespan on a call
afs: Fix "kAFS: AFS vnode with undefined type 0"
afs: Fix cell DNS lookup
Add wait_var_event_interruptible()
dns_resolver: Allow used keys to be invalidated
afs: Fix afs_cell records to always have a VL server list record
afs: Fix missing lock when replacing VL server list
afs: Fix afs_xattr_get_yfs() to not try freeing an error value
afs: Fix incorrect error handling in afs_xattr_get_acl()
afs: Fix key leak in afs_release() and afs_evict_inode()

Linus Torvalds
2019-05-17 08:00:13 +0800
1d9d7cbf2 Merge tag 'ceph-for-5.2-rc1' of git://github.com/ceph/ceph-client ... Browse Code »

Pull ceph updates from Ilya Dryomov:
"On the filesystem side we have:

- a fix to enforce quotas set above the mount point (Luis Henriques)

- support for exporting snapshots through NFS (Zheng Yan)

- proper statx implementation (Jeff Layton). statx flags are mapped
to MDS caps, with AT_STATX_{DONT,FORCE}_SYNC taken into account.

- some follow-up dentry name handling fixes, in particular
elimination of our hand-rolled helper and the switch to __getname()
as suggested by Al (Jeff Layton)

- a set of MDS client cleanups in preparation for async MDS requests
in the future (Jeff Layton)

- a fix to sync the filesystem before remounting (Jeff Layton)

On the rbd side, work is on-going on object-map and fast-diff image
features"

* tag 'ceph-for-5.2-rc1' of git://github.com/ceph/ceph-client: (29 commits)
ceph: flush dirty inodes before proceeding with remount
ceph: fix unaligned access in ceph_send_cap_releases
libceph: make ceph_pr_addr take an struct ceph_entity_addr pointer
libceph: fix unaligned accesses in ceph_entity_addr handling
rbd: don't assert on writes to snapshots
rbd: client_mutex is never nested
ceph: print inode number in __caps_issued_mask debugging messages
ceph: just call get_session in __ceph_lookup_mds_session
ceph: simplify arguments and return semantics of try_get_cap_refs
ceph: fix comment over ceph_drop_caps_for_unlink
ceph: move wait for mds request into helper function
ceph: have ceph_mdsc_do_request call ceph_mdsc_submit_request
ceph: after an MDS request, do callback and completions
ceph: use pathlen values returned by set_request_path_attr
ceph: use __getname/__putname in ceph_mdsc_build_path
ceph: use ceph_mdsc_build_path instead of clone_dentry_name
ceph: fix potential use-after-free in ceph_mdsc_build_path
ceph: dump granular cap info in "caps" debugfs file
ceph: make iterate_session_caps a public symbol
ceph: fix NULL pointer deref when debugging is enabled
...

Linus Torvalds
2019-05-17 07:24:01 +0800

16 May, 2019

1 commit

d0660f0b3 dns_resolver: Allow used keys to be invalidated ... Browse Code »

Allow used DNS resolver keys to be invalidated after use if the caller is
doing its own caching of the results. This reduces the amount of resources
required.

Fix AFS to invalidate DNS results to kill off permanent failure records
that get lodged in the resolver keyring and prevent future lookups from
happening.

Fixes: 0a5143f2f89c ("afs: Implement VL server rotation")
Signed-off-by: David Howells

David Howells
2019-05-16 00:35:54 +0800