Eric Lee / smarc-fsl-linux-kernel

30 May, 2018

1 commit

e080e814d libceph, ceph: avoid memory leak when specifying same option several times ... Browse Code »

[ Upstream commit 937441f3a3158d5510ca8cc78a82453f57a96365 ]

When parsing string option, in order to avoid memory leak we need to
carefully free it first in case of specifying same option several times.

Signed-off-by: Chengguang Xu
Reviewed-by: Ilya Dryomov
Signed-off-by: Ilya Dryomov
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Chengguang Xu
2018-05-30 13:52:04 +0800

02 May, 2018

3 commits

7563d6f2b libceph: validate con->state at the top of try_write() ... Browse Code »

commit 9c55ad1c214d9f8c4594ac2c3fa392c1c32431a7 upstream.

ceph_con_workfn() validates con->state before calling try_read() and
then try_write(). However, try_read() temporarily releases con->mutex,
notably in process_message() and ceph_con_in_msg_alloc(), opening the
window for ceph_con_close() to sneak in, close the connection and
release con->sock. When try_write() is called on the assumption that
con->state is still valid (i.e. not STANDBY or CLOSED), a NULL sock
gets passed to the networking stack:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
IP: selinux_socket_sendmsg+0x5/0x20

Make sure con->state is valid at the top of try_write() and add an
explicit BUG_ON for this, similar to try_read().

Cc: stable@vger.kernel.org
Link: https://tracker.ceph.com/issues/23706
Signed-off-by: Ilya Dryomov
Reviewed-by: Jason Dillaman
Signed-off-by: Greg Kroah-Hartman

Ilya Dryomov
2018-05-02 03:58:23 +0800
c2bc3eb55 libceph: reschedule a tick in finish_hunting() ... Browse Code »

commit 7b4c443d139f1d2b5570da475f7a9cbcef86740c upstream.

If we go without an established session for a while, backoff delay will
climb to 30 seconds. The keepalive timeout is also 30 seconds, so it's
pretty easily hit after a prolonged hunting for a monitor: we don't get
a chance to send out a keepalive in time, which means we never get back
a keepalive ack in time, cutting an established session and attempting
to connect to a different monitor every 30 seconds:

[Sun Apr 1 23:37:05 2018] libceph: mon0 10.80.20.99:6789 session established
[Sun Apr 1 23:37:36 2018] libceph: mon0 10.80.20.99:6789 session lost, hunting for new mon
[Sun Apr 1 23:37:36 2018] libceph: mon2 10.80.20.103:6789 session established
[Sun Apr 1 23:38:07 2018] libceph: mon2 10.80.20.103:6789 session lost, hunting for new mon
[Sun Apr 1 23:38:07 2018] libceph: mon1 10.80.20.100:6789 session established
[Sun Apr 1 23:38:37 2018] libceph: mon1 10.80.20.100:6789 session lost, hunting for new mon
[Sun Apr 1 23:38:37 2018] libceph: mon2 10.80.20.103:6789 session established
[Sun Apr 1 23:39:08 2018] libceph: mon2 10.80.20.103:6789 session lost, hunting for new mon

The regular keepalive interval is 10 seconds. After ->hunting is
cleared in finish_hunting(), call __schedule_delayed() to ensure we
send out a keepalive after 10 seconds.

Cc: stable@vger.kernel.org # 4.7+
Link: http://tracker.ceph.com/issues/23537
Signed-off-by: Ilya Dryomov
Reviewed-by: Jason Dillaman
Signed-off-by: Greg Kroah-Hartman

Ilya Dryomov
2018-05-02 03:58:23 +0800
76f7b52b5 libceph: un-backoff on tick when we have a authenticated session ... Browse Code »

commit facb9f6eba3df4e8027301cc0e514dc582a1b366 upstream.

This means that if we do some backoff, then authenticate, and are
healthy for an extended period of time, a subsequent failure won't
leave us starting our hunting sequence with a large backoff.

Mirrors ceph.git commit d466bc6e66abba9b464b0b69687cf45c9dccf383.

Cc: stable@vger.kernel.org # 4.7+
Signed-off-by: Ilya Dryomov
Reviewed-by: Jason Dillaman
Signed-off-by: Greg Kroah-Hartman

Ilya Dryomov
2018-05-02 03:58:23 +0800

30 Nov, 2017

1 commit

bcae2363e libceph: don't WARN() if user tries to add invalid key ... Browse Code »

commit b11270853fa3654f08d4a6a03b23ddb220512d8d upstream.

The WARN_ON(!key->len) in set_secret() in net/ceph/crypto.c is hit if a
user tries to add a key of type "ceph" with an invalid payload as
follows (assuming CONFIG_CEPH_LIB=y):

echo -e -n '\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' \
| keyctl padd ceph desc @s

This can be hit by fuzzers. As this is merely bad input and not a
kernel bug, replace the WARN_ON() with return -EINVAL.

Fixes: 7af3ea189a9a ("libceph: stop allocating a new cipher on every crypto request")
Signed-off-by: Eric Biggers
Reviewed-by: Ilya Dryomov
Signed-off-by: Ilya Dryomov
Signed-off-by: Greg Kroah-Hartman

Eric Biggers
2017-11-30 16:40:45 +0800

02 Nov, 2017

1 commit

b24413180 License cleanup: add SPDX GPL-2.0 license identifier to files with no license ... Browse Code »

Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if
Reviewed-by: Philippe Ombredanne
Reviewed-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman

Greg Kroah-Hartman
2017-11-02 18:10:55 +0800

20 Sep, 2017

1 commit

29a0cfbf9 libceph: don't allow bidirectional swap of pg-upmap-items ... Browse Code »

This reverts most of commit f53b7665c8ce ("libceph: upmap semantic
changes").

We need to prevent duplicates in the final result. For example, we
can currently take

[1,2,3] and apply [(1,2)] and get [2,2,3]

or

[1,2,3] and apply [(3,2)] and get [1,2,2]

The rest of the system is not prepared to handle duplicates in the
result set like this.

The reverted piece was intended to allow

[1,2,3] and [(1,2),(2,1)] to get [2,1,3]

to reorder primaries. First, this bidirectional swap is hard to
implement in a way that also prevents dups. For example, [1,2,3] and
[(1,4),(2,3),(3,4)] would give [4,3,4] but would we just drop the last
step we'd have [4,3,3] which is also invalid, etc. Simpler to just not
handle bidirectional swaps. In practice, they are not needed: if you
just want to choose a different primary then use primary_affinity, or
pg_upmap (not pg_upmap_items).

Cc: stable@vger.kernel.org # 4.13
Link: http://tracker.ceph.com/issues/21410
Signed-off-by: Ilya Dryomov
Reviewed-by: Sage Weil

Ilya Dryomov
2017-09-20 02:34:29 +0800

07 Sep, 2017

2 commits

06d74376c ceph: more accurate statfs ... Browse Code »

Improve accuracy of statfs reporting for Ceph filesystems comprising
exactly one data pool. In this case, the Ceph monitor can now report
the space usage for the single data pool instead of the global data
for the entire Ceph cluster. Include support for this message in
mon_client and leverage it in ceph/super.

Signed-off-by: Douglas Fuller
Reviewed-by: Yan, Zheng
Reviewed-by: Ilya Dryomov
Signed-off-by: Ilya Dryomov

Douglas Fuller
2017-09-07 01:56:49 +0800
3fb99d483 ceph: nuke startsync op ... Browse Code »

startsync is a no-op, has been for years. Remove it.

Link: http://tracker.ceph.com/issues/20604
Signed-off-by: Yanhu Cao
Reviewed-by: "Yan, Zheng"
Signed-off-by: Ilya Dryomov

Yanhu Cao
2017-09-07 01:56:43 +0800

01 Aug, 2017

6 commits

ae78dd813 libceph: make RECOVERY_DELETES feature create a new interval ... Browse Code »

This is needed so that the OSDs can regenerate the missing set at the
start of a new interval where support for recovery deletes changed.

Signed-off-by: Ilya Dryomov
Reviewed-by: Sage Weil

Ilya Dryomov
2017-08-01 22:46:45 +0800
f53b7665c libceph: upmap semantic changes ... Browse Code »

- apply both pg_upmap and pg_upmap_items
- allow bidirectional swap of pg-upmap-items

Signed-off-by: Ilya Dryomov
Reviewed-by: Sage Weil

Ilya Dryomov
2017-08-01 22:46:45 +0800
c7ed1a4bf crush: assume weight_set != null imples weight_set_size > 0 ... Browse Code »

Reflects ceph.git commit 5e8fa3e06b68fae1582c9230a3a8d1abc6146286.

Signed-off-by: Ilya Dryomov
Reviewed-by: Sage Weil

Ilya Dryomov
2017-08-01 22:46:44 +0800
e17e8969f libceph: fallback for when there isn't a pool-specific choose_arg ... Browse Code »

There is now a fallback to a choose_arg index of -1 if there isn't
a pool-specific choose_arg set. If you create a per-pool weight-set,
that works for that pool. Otherwise we try the compat/default one. If
that doesn't exist either, then we use the normal CRUSH weights.

Signed-off-by: Ilya Dryomov
Reviewed-by: Sage Weil

Ilya Dryomov
2017-08-01 22:46:44 +0800
4690faf00 libceph: don't call ->reencode_message() more than once per message ... Browse Code »

Reencoding an already reencoded message is a bad idea. This could
happen on Policy::stateful_server connections (!CEPH_MSG_CONNECT_LOSSY),
such as MDS sessions.

This didn't pop up in testing because currently only OSD requests are
reencoded and OSD sessions are always lossy.

Fixes: 98ad5ebd1505 ("libceph: ceph_connection_operations::reencode_message() method")
Signed-off-by: Ilya Dryomov
Reviewed-by: "Yan, Zheng"

Ilya Dryomov
2017-08-01 22:46:43 +0800
986e89898 libceph: make encode_request_*() work with r_mempool requests ... Browse Code »

Messages allocated out of ceph_msgpool have a fixed front length
(pool->front_len). Asserting that the entire front has been filled
while encoding is thus wrong.

Fixes: 8cb441c0545d ("libceph: MOSDOp v8 encoding (actual spgid + full hash)")
Reported-by: "Yan, Zheng"
Signed-off-by: Ilya Dryomov
Reviewed-by: "Yan, Zheng"

Ilya Dryomov
2017-08-01 22:46:31 +0800

17 Jul, 2017

5 commits

7c40b22f6 libceph: potential NULL dereference in ceph_msg_data_create() ... Browse Code »

If kmem_cache_zalloc() returns NULL then the INIT_LIST_HEAD(&data->links);
will Oops. The callers aren't really prepared for NULL returns so it
doesn't make a lot of difference in real life.

Fixes: 5240d9f95dfe ("libceph: replace message data pointer with list")
Signed-off-by: Dan Carpenter
Signed-off-by: Ilya Dryomov

Dan Carpenter
2017-07-17 20:54:59 +0800
914902af4 libceph: don't call encode_request_finish() on MOSDBackoff messages ... Browse Code »

encode_request_finish() is for MOSDOp messages. Calling it on
MOSDBackoff ack-block messages corrupts them.

Fixes: a02a946dfe96 ("libceph: respect RADOS_BACKOFF backoffs")
Signed-off-by: Ilya Dryomov

Ilya Dryomov
2017-07-17 20:54:59 +0800
f5cc68986 libceph: use alloc_pg_mapping() in __decode_pg_upmap_items() ... Browse Code »

... otherwise we die in insert_pg_mapping(), which wants pg->node to be
empty, i.e. initialized with RB_CLEAR_NODE.

Fixes: 6f428df47dae ("libceph: pg_upmap[_items] infrastructure")
Signed-off-by: Ilya Dryomov

Ilya Dryomov
2017-07-17 20:54:58 +0800
c2acfd95d libceph: set -EINVAL in one place in crush_decode() ... Browse Code »

No sooner than Dan had fixed this issue in commit 293dffaad8d5
("libceph: NULL deref on crush_decode() error path"), I brought it
back. Add a new label and set -EINVAL once, right before failing.

Fixes: 278b1d709c6a ("libceph: ceph_decode_skip_* helpers")
Reported-by: Dan Carpenter
Signed-off-by: Ilya Dryomov

Ilya Dryomov
2017-07-17 20:54:58 +0800
00c8ebb36 libceph: NULL deref on osdmap_apply_incremental() error path ... Browse Code »

There are hidden gotos in the ceph_decode_* macros. We need to set the
"err" variable on these error paths otherwise we end up returning
ERR_PTR(0) which is NULL. It causes NULL dereferences in the callers.

Fixes: 6f428df47dae ("libceph: pg_upmap[_items] infrastructure")
Signed-off-by: Dan Carpenter
[idryomov@gmail.com: similar bug in osdmap_decode(), changelog tweak]
Signed-off-by: Ilya Dryomov

Dan Carpenter
2017-07-17 20:54:58 +0800

16 Jul, 2017

1 commit

52f6c588c Merge tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random ... Browse Code »

Pull random updates from Ted Ts'o:
"Add wait_for_random_bytes() and get_random_*_wait() functions so that
callers can more safely get random bytes if they can block until the
CRNG is initialized.

Also print a warning if get_random_*() is called before the CRNG is
initialized. By default, only one single-line warning will be printed
per boot. If CONFIG_WARN_ALL_UNSEEDED_RANDOM is defined, then a
warning will be printed for each function which tries to get random
bytes before the CRNG is initialized. This can get spammy for certain
architecture types, so it is not enabled by default"

* tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random:
random: reorder READ_ONCE() in get_random_uXX
random: suppress spammy warnings about unseeded randomness
random: warn when kernel uses unseeded randomness
net/route: use get_random_int for random counter
net/neighbor: use get_random_u32 for 32-bit hash random
rhashtable: use get_random_u32 for hash_rnd
ceph: ensure RNG is seeded before using
iscsi: ensure RNG is seeded before use
cifs: use get_random_u32 for 32-bit lock random
random: add get_random_{bytes,u32,u64,int,long,once}_wait family
random: add wait_for_random_bytes() API

Linus Torvalds
2017-07-16 03:44:02 +0800

07 Jul, 2017

19 commits

0bb05da2e libceph: osd_state is 32 bits wide in luminous ... Browse Code »

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2017-07-07 23:25:19 +0800
9eebe45c0 crush: remove an obsolete comment ... Browse Code »

Reflects ceph.git commit dca1ae1e0a6b02029c3a7f9dec4114972be26d50.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2017-07-07 23:25:19 +0800
b88ed8d84 crush: crush_init_workspace starts with struct crush_work ... Browse Code »

It is not just a pointer to crush_work, it is the whole structure.
That is not a problem since it only contains a pointer. But it will
be a problem if new data members are added to crush_work.

Reflects ceph.git commit ee957dd431bfbeb6dadaf77764db8e0757417328.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2017-07-07 23:25:19 +0800
5cf9c4a99 libceph, crush: per-pool crush_choose_arg_map for crush_do_rule() ... Browse Code »

If there is no crush_choose_arg_map for a given pool, a NULL pointer is
passed to preserve existing crush_do_rule() behavior.

Reflects ceph.git commits 55fb91d64071552ea1bc65ab4ea84d3c8b73ab4b,
dbe36e08be00c6519a8c89718dd47b0219c20516.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2017-07-07 23:25:19 +0800
069f3222c crush: implement weight and id overrides for straw2 ... Browse Code »

bucket_straw2_choose needs to use weights that may be different from
weight_items. For instance to compensate for an uneven distribution
caused by a low number of values. Or to fix the probability biais
introduced by conditional probabilities (see
http://tracker.ceph.com/issues/15653 for more information).

We introduce a weight_set for each straw2 bucket to set the desired
weight for a given item at a given position. The weight of a given item
when picking the first replica (first position) may be different from
the weight the second replica (second position). For instance the weight
matrix for a given bucket containing items 3, 7 and 13 could be as
follows:

position 0 position 1

item 3 0x10000 0x100000
item 7 0x40000 0x10000
item 13 0x40000 0x10000

When crush_do_rule picks the first of two replicas (position 0), item 7,
3 are four times more likely to be choosen by bucket_straw2_choose than
item 13. When choosing the second replica (position 1), item 3 is ten
times more likely to be choosen than item 7, 13.

By default the weight_set of each bucket exactly matches the content of
item_weights for each position to ensure backward compatibility.

bucket_straw2_choose compares items by using their id. The same ids are
also used to index buckets and they must be unique. For each item in a
bucket an array of ids can be provided for placement purposes and they
are used instead of the ids. If no replacement ids are provided, the
legacy behavior is preserved.

Reflects ceph.git commit 19537a450fd5c5a0bb8b7830947507a76db2ceca.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2017-07-07 23:25:19 +0800
1c2e7b451 libceph: apply_upmap() ... Browse Code »

Previously, pg_to_raw_osds() didn't filter for existent OSDs because
raw_to_up_osds() would filter for "up" ("up" is predicated on "exists")
and raw_to_up_osds() was called directly after pg_to_raw_osds(). Now,
with apply_upmap() call in there, nonexistent OSDs in pg_to_raw_osds()
output can affect apply_upmap(). Introduce remove_nonexistent_osds()
to deal with that.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2017-07-07 23:25:18 +0800
463bb8da5 libceph: compute actual pgid in ceph_pg_to_up_acting_osds() ... Browse Code »

Move raw_pg_to_pg() call out of get_temp_osds() and into
ceph_pg_to_up_acting_osds(), for upcoming apply_upmap().

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2017-07-07 23:25:18 +0800
6f428df47 libceph: pg_upmap[_items] infrastructure ... Browse Code »

pg_temp and pg_upmap encodings are the same (PG -> array of osds),
except for the incremental remove: it's an empty mapping in new_pg_temp
for pg_temp and a separate old_pg_upmap set for pg_upmap. (This isn't
to allow for empty pg_upmap mappings -- apparently, pg_temp just wasn't
looked at as an example for pg_upmap encoding.)

Reuse __decode_pg_temp() for decoding pg_upmap and new_pg_upmap.
__decode_pg_temp() stores into pg_temp union member, but since pg_upmap
union member is identical, reading through pg_upmap later is OK.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2017-07-07 23:25:18 +0800
278b1d709 libceph: ceph_decode_skip_* helpers ... Browse Code »

Some of these won't be as efficient as they could be (e.g.
ceph_decode_skip_set(... 32 ...) could advance by len * sizeof(u32)
once instead of advancing by sizeof(u32) len times), but that's fine
and not worth a bunch of extra macro code.

Replace skip_name_map() with ceph_decode_skip_map as an example.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2017-07-07 23:25:18 +0800
ab75144be libceph: kill __{insert,lookup,remove}_pg_mapping() ... Browse Code »

Switch to DEFINE_RB_FUNCS2-generated {insert,lookup,erase}_pg_mapping().

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2017-07-07 23:25:18 +0800
a303bb0e5 libceph: introduce and switch to decode_pg_mapping() ... Browse Code »

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2017-07-07 23:25:18 +0800
33333d107 libceph: don't pass pgid by value ... Browse Code »

Make __{lookup,remove}_pg_mapping() look like their ceph_spg_mapping
counterparts: take const struct ceph_pg *.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2017-07-07 23:25:17 +0800
a02a946df libceph: respect RADOS_BACKOFF backoffs ... Browse Code »

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2017-07-07 23:25:17 +0800
df28152d5 libceph: avoid unnecessary pi lookups in calc_target() ... Browse Code »

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2017-07-07 23:25:17 +0800
6d637a540 libceph: use target pi for calc_target() calculations ... Browse Code »

For luminous and beyond we are encoding the actual spgid, which
requires operating with the correct pg_num, i.e. that of the target
pool.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2017-07-07 23:25:17 +0800
db098ec4e libceph: always populate t->target_{oid,oloc} in calc_target() ... Browse Code »

need_check_tiering logic doesn't make a whole lot of sense. Drop it
and apply tiering unconditionally on every calc_target() call instead.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2017-07-07 23:25:16 +0800
04c7d789e libceph: make sure need_resend targets reflect latest map ... Browse Code »

Otherwise we may miss events like PG splits, pool deletions, etc when
we get multiple incremental maps at once. Because check_pool_dne() can
now be fed an unlinked request, finish_request() needed to be taught to
handle unlinked requests.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2017-07-07 23:25:16 +0800
a10bcb19a libceph: delete from need_resend_linger before check_linger_pool_dne() ... Browse Code »

When processing a map update consisting of multiple incrementals, we
may end up running check_linger_pool_dne() on a lingering request that
was previously added to need_resend_linger list. If it is concluded
that the target pool doesn't exist, the request is killed off while
still on need_resend_linger list, which leads to a crash on a NULL
lreq->osd in kick_requests():

libceph: linger_id 18446462598732840961 pool does not exist
BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
IP: ceph_osdc_handle_map+0x4ae/0x870

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2017-07-07 23:25:16 +0800
7de030d6b libceph: resend on PG splits if OSD has RESEND_ON_SPLIT ... Browse Code »

Note that ceph_osd_request_target fields are updated regardless of
RESEND_ON_SPLIT.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2017-07-07 23:25:16 +0800