Eric Lee / smarc-fsl-linux-kernel

22 Sep, 2015

1 commit

d3b428f03 fs: create and use seq_show_option for escaping ... Browse Code »

commit a068acf2ee77693e0bf39d6e07139ba704f461c3 upstream.

Many file systems that implement the show_options hook fail to correctly
escape their output which could lead to unescaped characters (e.g. new
lines) leaking into /proc/mounts and /proc/[pid]/mountinfo files. This
could lead to confusion, spoofed entries (resulting in things like
systemd issuing false d-bus "mount" notifications), and who knows what
else. This looks like it would only be the root user stepping on
themselves, but it's possible weird things could happen in containers or
in other situations with delegated mount privileges.

Here's an example using overlay with setuid fusermount trusting the
contents of /proc/mounts (via the /etc/mtab symlink). Imagine the use
of "sudo" is something more sneaky:

$ BASE="ovl"
$ MNT="$BASE/mnt"
$ LOW="$BASE/lower"
$ UP="$BASE/upper"
$ WORK="$BASE/work/ 0 0
none /proc fuse.pwn user_id=1000"
$ mkdir -p "$LOW" "$UP" "$WORK"
$ sudo mount -t overlay -o "lowerdir=$LOW,upperdir=$UP,workdir=$WORK" none /mnt
$ cat /proc/mounts
none /root/ovl/mnt overlay rw,relatime,lowerdir=ovl/lower,upperdir=ovl/upper,workdir=ovl/work/ 0 0
none /proc fuse.pwn user_id=1000 0 0
$ fusermount -u /proc
$ cat /proc/mounts
cat: /proc/mounts: No such file or directory

This fixes the problem by adding new seq_show_option and
seq_show_option_n helpers, and updating the vulnerable show_option
handlers to use them as needed. Some, like SELinux, need to be open
coded due to unusual existing escape mechanisms.

[akpm@linux-foundation.org: add lost chunk, per Kees]
[keescook@chromium.org: seq_show_option should be using const parameters]
Signed-off-by: Kees Cook
Acked-by: Serge Hallyn
Acked-by: Jan Kara
Acked-by: Paul Moore
Cc: J. R. Okajima
Signed-off-by: Kees Cook
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Kees Cook
2015-09-22 01:05:45 +0800

04 Aug, 2015

1 commit

94fc30841 crush: fix a bug in tree bucket decode ... Browse Code »

commit 82cd003a77173c91b9acad8033fb7931dac8d751 upstream.

struct crush_bucket_tree::num_nodes is u8, so ceph_decode_8_safe()
should be used. -Wconversion catches this, but I guess it went
unnoticed in all the noise it spews. The actual problem (at least for
common crushmaps) isn't the u32 -> u8 truncation though - it's the
advancement by 4 bytes instead of 1 in the crushmap buffer.

Fixes: http://tracker.ceph.com/issues/2759

Signed-off-by: Ilya Dryomov
Reviewed-by: Josh Durgin
Signed-off-by: Greg Kroah-Hartman

Ilya Dryomov
2015-08-04 00:29:14 +0800

21 May, 2015

2 commits

521a04d06 Revert "libceph: clear r_req_lru_item in __unregister_linger_request()" ... Browse Code »

This reverts commit ba9d114ec5578e6e99a4dfa37ff8ae688040fd64.

.. which introduced a regression that prevented all lingering requests
requeued in kick_requests() from ever being sent to the OSDs, resulting
in a lot of missed notifies. In retrospect it's pretty obvious that
r_req_lru_item item in the case of lingering requests can be used not
only for notarget, but also for unsent linkage due to how tightly
actual map and enqueue operations are coupled in __map_request().

The assertion that was being silenced is taken care of in the previous
("libceph: request a new osdmap if lingering request maps to no osd")
commit: by always kicking homeless lingering requests we ensure that
none of them ends up on the notarget list outside of the critical
section guarded by request_mutex.

Cc: stable@vger.kernel.org # 3.18+, needs b0494532214b "libceph: request a new osdmap if lingering request maps to no osd"
Signed-off-by: Ilya Dryomov
Reviewed-by: Sage Weil

Ilya Dryomov
2015-05-21 02:02:46 +0800
b04945322 libceph: request a new osdmap if lingering request maps to no osd ... Browse Code »

This commit does two things. First, if there are any homeless
lingering requests, we now request a new osdmap even if the osdmap that
is being processed brought no changes, i.e. if a given lingering
request turned homeless in one of the previous epochs and remained
homeless in the current epoch. Not doing so leaves us with a stale
osdmap and as a result we may miss our window for reestablishing the
watch and lose notifies.

MON=1 OSD=1:

# cat linger-needmap.sh
#!/bin/bash
rbd create --size 1 test
DEV=$(rbd map test)
ceph osd out 0
rbd map dne/dne # obtain a new osdmap as a side effect (!)
sleep 1
ceph osd in 0
rbd resize --size 2 test
# rbd info test | grep size -> 2M
# blockdev --getsize $DEV -> 1M

N.B.: Not obtaining a new osdmap in between "osd out" and "osd in"
above is enough to make it miss that resize notify, but that is a
bug^Wlimitation of ceph watch/notify v1.

Second, homeless lingering requests are now kicked just like those
lingering requests whose mapping has changed. This is mainly to
recognize that a homeless lingering request makes no sense and to
preserve the invariant that a registered lingering request is not
sitting on any of r_req_lru_item lists. This spares us a WARN_ON,
which commit ba9d114ec557 ("libceph: clear r_req_lru_item in
__unregister_linger_request()") tried to fix the _wrong_ way.

Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Ilya Dryomov
Reviewed-by: Sage Weil

Ilya Dryomov
2015-05-21 02:02:14 +0800

22 Apr, 2015

3 commits

958a27658 crush: straw2 bucket type with an efficient 64-bit crush_ln() ... Browse Code »

This is an improved straw bucket that correctly avoids any data movement
between items A and B when neither A nor B's weights are changed. Said
differently, if we adjust the weight of item C (including adding it anew
or removing it completely), we will only see inputs move to or from C,
never between other items in the bucket.

Notably, there is not intermediate scaling factor that needs to be
calculated. The mapping function is a simple function of the item weights.

The below commits were squashed together into this one (mostly to avoid
adding and then yanking a ~6000 lines worth of crush_ln_table):

- crush: add a straw2 bucket type
- crush: add crush_ln to calculate nature log efficently
- crush: improve straw2 adjustment slightly
- crush: change crush_ln to provide 32 more digits
- crush: fix crush_get_bucket_item_weight and bucket destroy for straw2
- crush/mapper: fix divide-by-0 in straw2
(with div64_s64() for draw = ln / w and INT64_MIN -> S64_MIN - need
to create a proper compat.h in ceph.git)

Reflects ceph.git commits 242293c908e923d474910f2b8203fa3b41eb5a53,
32a1ead92efcd351822d22a5fc37d159c65c1338,
6289912418c4a3597a11778bcf29ed5415117ad9,
35fcb04e2945717cf5cfe150b9fa89cb3d2303a1,
6445d9ee7290938de1e4ee9563912a6ab6d8ee5f,
b5921d55d16796e12d66ad2c4add7305f9ce2353.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2015-04-22 23:33:43 +0800
45002267e crush: ensuring at most num-rep osds are selected ... Browse Code »

Crush temporary buffers are allocated as per replica size configured
by the user. When there are more final osds (to be selected as per
rule) than the replicas, buffer overlaps and it causes crash. Now, it
ensures that at most num-rep osds are selected even if more number of
osds are allowed by the rule.

Reflects ceph.git commits 6b4d1aa99718e3b367496326c1e64551330fabc0,
234b066ba04976783d15ff2abc3e81b6cc06fb10.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2015-04-22 23:33:42 +0800
9be6df215 crush: drop unnecessary include from mapper.c ... Browse Code »

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2015-04-22 23:33:42 +0800

20 Apr, 2015

3 commits

5cf7bd301 libceph: expose client options through debugfs ... Browse Code »

Add a client_options attribute for showing libceph options.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2015-04-20 23:55:39 +0800
ff40f9ae9 libceph, ceph: split ceph_show_options() ... Browse Code »

Split ceph_show_options() into two pieces and move the piece
responsible for printing client (libceph) options into net/ceph. This
way people adding a libceph option wouldn't have to remember to update
code in fs/ceph.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2015-04-20 23:55:38 +0800
67c64eb74 libceph: don't overwrite specific con error msgs ... Browse Code »

- specific con->error_msg messages (e.g. "protocol version mismatch")
end up getting overwritten by a catch-all "socket error on read
/ write", introduced in commit 3a140a0d5c4b ("libceph: report socket
read/write error message")
- "bad message sequence # for incoming message" loses to "bad crc" due
to the fact that -EBADMSG is used for both

Fix it, and tidy up con->error_msg assignments and pr_errs while at it.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2015-04-20 23:55:37 +0800

08 Apr, 2015

1 commit

6d7fdb0ab Revert "libceph: use memalloc flags for net IO" ... Browse Code »

This reverts commit 89baaa570ab0b476db09408d209578cfed700e9f.

Dirty page throttling should be sufficient for us in the general case
so there is no need to use __GFP_MEMALLOC - it would be needed only in
the swap-over-rbd case, which we currently don't support. (It would
probably take approximately the commit that is being reverted to add
that support, but we would also need the "swap" option to distinguish
from the general case and make sure swap ceph_client-s aren't shared
with anything else.) See ceph-devel threads [1] and [2] for the
details of why enabling pfmemalloc reserves for all cases is a bad
thing.

On top of potential system lockups related to drained emergency
reserves, this turned out to cause ceph lockups in case peers are on
the same host and communicating via loopback due to sk_filter()
dropping pfmemalloc skbs on the receiving side because the receiving
loopback socket is not tagged with SOCK_MEMALLOC.

[1] "SOCK_MEMALLOC vs loopback"
http://www.spinics.net/lists/ceph-devel/msg22998.html
[2] "[PATCH] libceph: don't set memalloc flags in loopback case"
http://www.spinics.net/lists/ceph-devel/msg23392.html

Conflicts:
net/ceph/messenger.c [ context: tcp_nodelay option ]

Cc: Mike Christie
Cc: Mel Gorman
Cc: Sage Weil
Cc: stable@vger.kernel.org # 3.18+, needs backporting
Signed-off-by: Ilya Dryomov
Acked-by: Mike Christie
Acked-by: Mel Gorman

Ilya Dryomov
2015-04-08 00:08:35 +0800

20 Feb, 2015

1 commit

4533f6e27 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client ... Browse Code »

Pull Ceph changes from Sage Weil:
"On the RBD side, there is a conversion to blk-mq from Christoph,
several long-standing bug fixes from Ilya, and some cleanup from
Rickard Strandqvist.

On the CephFS side there is a long list of fixes from Zheng, including
improved session handling, a few IO path fixes, some dcache management
correctness fixes, and several blocking while !TASK_RUNNING fixes.

The core code gets a few cleanups and Chaitanya has added support for
TCP_NODELAY (which has been used on the server side for ages but we
somehow missed on the kernel client).

There is also an update to MAINTAINERS to fix up some email addresses
and reflect that Ilya and Zheng are doing most of the maintenance for
RBD and CephFS these days. Do not be surprised to see a pull request
come from one of them in the future if I am unavailable for some
reason"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (27 commits)
MAINTAINERS: update Ceph and RBD maintainers
libceph: kfree() in put_osd() shouldn't depend on authorizer
libceph: fix double __remove_osd() problem
rbd: convert to blk-mq
ceph: return error for traceless reply race
ceph: fix dentry leaks
ceph: re-send requests when MDS enters reconnecting stage
ceph: show nocephx_require_signatures and notcp_nodelay options
libceph: tcp_nodelay support
rbd: do not treat standalone as flatten
ceph: fix atomic_open snapdir
ceph: properly mark empty directory as complete
client: include kernel version in client metadata
ceph: provide seperate {inode,file}_operations for snapdir
ceph: fix request time stamp encoding
ceph: fix reading inline data when i_size > PAGE_SIZE
ceph: avoid block operation when !TASK_RUNNING (ceph_mdsc_close_sessions)
ceph: avoid block operation when !TASK_RUNNING (ceph_get_caps)
ceph: avoid block operation when !TASK_RUNNING (ceph_mdsc_sync)
rbd: fix error paths in rbd_dev_refresh()
...

Linus Torvalds
2015-02-20 06:14:42 +0800

19 Feb, 2015

5 commits

b28ec2f37 libceph: kfree() in put_osd() shouldn't depend on authorizer ... Browse Code »

a255651d4cad ("ceph: ensure auth ops are defined before use") made
kfree() in put_osd() conditional on the authorizer. A mechanical
mistake most likely - fix it.

Cc: Alex Elder
Signed-off-by: Ilya Dryomov
Reviewed-by: Sage Weil
Reviewed-by: Alex Elder

Ilya Dryomov
2015-02-19 19:27:51 +0800
7eb71e035 libceph: fix double __remove_osd() problem ... Browse Code »

It turns out it's possible to get __remove_osd() called twice on the
same OSD. That doesn't sit well with rb_erase() - depending on the
shape of the tree we can get a NULL dereference, a soft lockup or
a random crash at some point in the future as we end up touching freed
memory. One scenario that I was able to reproduce is as follows:

con_fault_finish()
osd_reset()

ceph_osdc_handle_map()

kick_requests()

reset_changed_osds()
__reset_osd()
__remove_osd()

__kick_osd_requests()
__reset_osd()
__remove_osd()
Cc: stable@vger.kernel.org # 3.9+: 7c6e6fc53e73: libceph: assert both regular and lingering lists in __remove_osd()
Cc: stable@vger.kernel.org # 3.9+: cc9f1f518cec: libceph: change from BUG to WARN for __remove_osd() asserts
Cc: stable@vger.kernel.org # 3.9+
Signed-off-by: Ilya Dryomov
Reviewed-by: Sage Weil
Reviewed-by: Alex Elder

Ilya Dryomov
2015-02-19 19:27:50 +0800
ba988f87f libceph: tcp_nodelay support ... Browse Code »

TCP_NODELAY socket option set on connection sockets,
disables Nagle’s algorithm and improves latency characteristics.
tcp_nodelay(default)/notcp_nodelay option flags provided to
enable/disable setting the socket option.

Signed-off-by: Chaitanya Huilgol
[idryomov@redhat.com: NO_TCP_NODELAY -> TCP_NODELAY, minor adjustments]
Signed-off-by: Ilya Dryomov

Chaitanya Huilgol
2015-02-19 18:31:40 +0800
f646912d1 libceph: use mon_client.c/put_generic_request() more ... Browse Code »

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2015-02-19 18:31:37 +0800
7a6fdeb2b libceph: nuke pool op infrastructure ... Browse Code »

On Mon, Dec 22, 2014 at 5:35 PM, Sage Weil wrote:
> On Mon, 22 Dec 2014, Ilya Dryomov wrote:
>> Actually, pool op stuff has been unused for over two years - looks like
>> it was added for rbd create_snap and that got ripped out in 2012. It's
>> unlikely we'd ever need to manage pools or snaps from the kernel client
>> so I think it makes sense to nuke it. Sage?
>
> Yep!

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2015-02-19 18:31:37 +0800

12 Feb, 2015

1 commit

7e3391284 mm: gup: use get_user_pages_unlocked ... Browse Code »

This allows those get_user_pages calls to pass FAULT_FLAG_ALLOW_RETRY to
the page fault in order to release the mmap_sem during the I/O.

Signed-off-by: Andrea Arcangeli
Reviewed-by: Kirill A. Shutemov
Cc: Andres Lagar-Cavilla
Cc: Peter Feiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrea Arcangeli
2015-02-12 09:06:05 +0800

09 Jan, 2015

1 commit

d7d5a007b libceph: fix sparse endianness warnings ... Browse Code »

The only real issue is the one in auth_x.c and it came with
3.19-rc1 merge.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2015-01-09 01:36:57 +0800

18 Dec, 2014

7 commits

715e4cd40 libceph: specify position of extent operation ... Browse Code »

allow specifying position of extent operation in multi-operations
osd request. This is required for cephfs to convert inline data to
normal data (compare xattr, then write object).

Signed-off-by: Yan, Zheng
Reviewed-by: Ilya Dryomov

Yan, Zheng
2014-12-18 01:09:52 +0800
864e9197f libceph: add CREATE osd operation support ... Browse Code »

Add CEPH_OSD_OP_CREATE support. Also change libceph to not treat
CEPH_OSD_OP_DELETE as an extent op and add an assert to that end.

Signed-off-by: Yan, Zheng
Reviewed-by: Ilya Dryomov

Yan, Zheng
2014-12-18 01:09:51 +0800
d74b50bed libceph: add SETXATTR/CMPXATTR osd operations support ... Browse Code »

Signed-off-by: Yan, Zheng
Reviewed-by: Ilya Dryomov

Yan, Zheng
2014-12-18 01:09:51 +0800
a3fc98005 libceph: require cephx message signature by default ... Browse Code »

Signed-off-by: Yan, Zheng
Reviewed-by: Ilya Dryomov

Yan, Zheng
2014-12-18 01:09:51 +0800
33d073379 libceph: message signature support ... Browse Code »

Signed-off-by: Yan, Zheng

Yan, Zheng
2014-12-18 01:09:50 +0800
ae385eaf2 libceph: store session key in cephx authorizer ... Browse Code »

Session key is required when calculating message signature. Save the session
key in authorizer, this avoid lookup ticket handler for each message

Signed-off-by: Yan, Zheng

Yan, Zheng
2014-12-18 01:09:50 +0800
4965fc38c libceph: nuke ceph_kvfree() ... Browse Code »

Use kvfree() from linux/mm.h instead, which is identical. Also fix the
ceph_buffer comment: we will allocate with kmalloc() up to 32k - the
value of PAGE_ALLOC_COSTLY_ORDER, but that really is just an
implementation detail so don't mention it at all.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2014-12-18 01:09:50 +0800

14 Nov, 2014

4 commits

cc9f1f518 libceph: change from BUG to WARN for __remove_osd() asserts ... Browse Code »

No reason to use BUG_ON for osd request list assertions.

Signed-off-by: Ilya Dryomov
Reviewed-by: Alex Elder

Ilya Dryomov
2014-11-14 03:26:34 +0800
ba9d114ec libceph: clear r_req_lru_item in __unregister_linger_request() ... Browse Code »

kick_requests() can put linger requests on the notarget list. This
means we need to clear the much-overloaded req->r_req_lru_item in
__unregister_linger_request() as well, or we get an assertion failure
in ceph_osdc_release_request() - !list_empty(&req->r_req_lru_item).

AFAICT the assumption was that registered linger requests cannot be on
any of req->r_req_lru_item lists, but that's clearly not the case.

Signed-off-by: Ilya Dryomov
Reviewed-by: Alex Elder

Ilya Dryomov
2014-11-14 03:21:14 +0800
a390de020 libceph: unlink from o_linger_requests when clearing r_osd ... Browse Code »

Requests have to be unlinked from both osd->o_requests (normal
requests) and osd->o_linger_requests (linger requests) lists when
clearing req->r_osd. Otherwise __unregister_linger_request() gets
confused and we trip over a !list_empty(&osd->o_linger_requests)
assert in __remove_osd().

MON=1 OSD=1:

# cat remove-osd.sh
#!/bin/bash
rbd create --size 1 test
DEV=$(rbd map test)
ceph osd out 0
sleep 3
rbd map dne/dne # obtain a new osdmap as a side effect
rbd unmap $DEV & # will block
sleep 3
ceph osd in 0

Signed-off-by: Ilya Dryomov
Reviewed-by: Alex Elder

Ilya Dryomov
2014-11-14 03:21:13 +0800
aaef31703 libceph: do not crash on large auth tickets ... Browse Code »

Large (greater than 32k, the value of PAGE_ALLOC_COSTLY_ORDER) auth
tickets will have their buffers vmalloc'ed, which leads to the
following crash in crypto:

[ 28.685082] BUG: unable to handle kernel paging request at ffffeb04000032c0
[ 28.686032] IP: [] scatterwalk_pagedone+0x22/0x80
[ 28.686032] PGD 0
[ 28.688088] Oops: 0000 [#1] PREEMPT SMP
[ 28.688088] Modules linked in:
[ 28.688088] CPU: 0 PID: 878 Comm: kworker/0:2 Not tainted 3.17.0-vm+ #305
[ 28.688088] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[ 28.688088] Workqueue: ceph-msgr con_work
[ 28.688088] task: ffff88011a7f9030 ti: ffff8800d903c000 task.ti: ffff8800d903c000
[ 28.688088] RIP: 0010:[] [] scatterwalk_pagedone+0x22/0x80
[ 28.688088] RSP: 0018:ffff8800d903f688 EFLAGS: 00010286
[ 28.688088] RAX: ffffeb04000032c0 RBX: ffff8800d903f718 RCX: ffffeb04000032c0
[ 28.688088] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8800d903f750
[ 28.688088] RBP: ffff8800d903f688 R08: 00000000000007de R09: ffff8800d903f880
[ 28.688088] R10: 18df467c72d6257b R11: 0000000000000000 R12: 0000000000000010
[ 28.688088] R13: ffff8800d903f750 R14: ffff8800d903f8a0 R15: 0000000000000000
[ 28.688088] FS: 00007f50a41c7700(0000) GS:ffff88011fc00000(0000) knlGS:0000000000000000
[ 28.688088] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 28.688088] CR2: ffffeb04000032c0 CR3: 00000000da3f3000 CR4: 00000000000006b0
[ 28.688088] Stack:
[ 28.688088] ffff8800d903f698 ffffffff81392ca8 ffff8800d903f6e8 ffffffff81395d32
[ 28.688088] ffff8800dac96000 ffff880000000000 ffff8800d903f980 ffff880119b7e020
[ 28.688088] ffff880119b7e010 0000000000000000 0000000000000010 0000000000000010
[ 28.688088] Call Trace:
[ 28.688088] [] scatterwalk_done+0x38/0x40
[ 28.688088] [] scatterwalk_done+0x38/0x40
[ 28.688088] [] blkcipher_walk_done+0x182/0x220
[ 28.688088] [] crypto_cbc_encrypt+0x15f/0x180
[ 28.688088] [] ? crypto_aes_set_key+0x30/0x30
[ 28.688088] [] ceph_aes_encrypt2+0x29c/0x2e0
[ 28.688088] [] ceph_encrypt2+0x93/0xb0
[ 28.688088] [] ceph_x_encrypt+0x4a/0x60
[ 28.688088] [] ? ceph_buffer_new+0x5d/0xf0
[ 28.688088] [] ceph_x_build_authorizer.isra.6+0x297/0x360
[ 28.688088] [] ? kmem_cache_alloc_trace+0x11b/0x1c0
[ 28.688088] [] ? ceph_auth_create_authorizer+0x36/0x80
[ 28.688088] [] ceph_x_create_authorizer+0x63/0xd0
[ 28.688088] [] ceph_auth_create_authorizer+0x54/0x80
[ 28.688088] [] get_authorizer+0x80/0xd0
[ 28.688088] [] prepare_write_connect+0x18b/0x2b0
[ 28.688088] [] try_read+0x1e59/0x1f10

This is because we set up crypto scatterlists as if all buffers were
kmalloc'ed. Fix it.

Cc: stable@vger.kernel.org
Signed-off-by: Ilya Dryomov
Reviewed-by: Sage Weil

Ilya Dryomov
2014-11-14 03:21:12 +0800

01 Nov, 2014

1 commit

e9226d7c9 libceph: eliminate unnecessary allocation in process_one_ticket() ... Browse Code »

Commit c27a3e4d667f ("libceph: do not hard code max auth ticket len")
while fixing a buffer overlow tried to keep the same as much of the
surrounding code as possible and introduced an unnecessary kmalloc() in
the unencrypted ticket path. It is likely to fail on huge tickets, so
get rid of it.

Signed-off-by: Ilya Dryomov
Reviewed-by: Sage Weil

Ilya Dryomov
2014-11-01 04:43:08 +0800

30 Oct, 2014

1 commit

89baaa570 libceph: use memalloc flags for net IO ... Browse Code »

This patch has ceph's lib code use the memalloc flags.

If the VM layer needs to write data out to free up memory to handle new
allocation requests, the block layer must be able to make forward progress.
To handle that requirement we use structs like mempools to reserve memory for
objects like bios and requests.

The problem is when we send/receive block layer requests over the network
layer, net skb allocations can fail and the system can lock up.
To solve this, the memalloc related flags were added. NBD, iSCSI
and NFS uses these flags to tell the network/vm layer that it should
use memory reserves to fullfill allcation requests for structs like
skbs.

I am running ceph in a bunch of VMs in my laptop, so this patch was
not tested very harshly.

Signed-off-by: Mike Christie
Reviewed-by: Ilya Dryomov

Mike Christie
2014-10-30 18:11:50 +0800

15 Oct, 2014

8 commits

6b0490816 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client ... Browse Code »

Pull Ceph updates from Sage Weil:
"There is the long-awaited discard support for RBD (Guangliang Zhao,
Josh Durgin), a pile of RBD bug fixes that didn't belong in late -rc's
(Ilya Dryomov, Li RongQing), a pile of fs/ceph bug fixes and
performance and debugging improvements (Yan, Zheng, John Spray), and a
smattering of cleanups (Chao Yu, Fabian Frederick, Joe Perches)"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (40 commits)
ceph: fix divide-by-zero in __validate_layout()
rbd: rbd workqueues need a resque worker
libceph: ceph-msgr workqueue needs a resque worker
ceph: fix bool assignments
libceph: separate multiple ops with commas in debugfs output
libceph: sync osd op definitions in rados.h
libceph: remove redundant declaration
ceph: additional debugfs output
ceph: export ceph_session_state_name function
ceph: include the initial ACL in create/mkdir/mknod MDS requests
ceph: use pagelist to present MDS request data
libceph: reference counting pagelist
ceph: fix llistxattr on symlink
ceph: send client metadata to MDS
ceph: remove redundant code for max file size verification
ceph: remove redundant io_iter_advance()
ceph: move ceph_find_inode() outside the s_mutex
ceph: request xattrs if xattr_version is zero
rbd: set the remaining discard properties to enable support
rbd: use helpers to handle discard for layered images correctly
...

Linus Torvalds
2014-10-15 12:46:01 +0800
f9865f06f libceph: ceph-msgr workqueue needs a resque worker ... Browse Code »

Commit f363e45fd118 ("net/ceph: make ceph_msgr_wq non-reentrant")
effectively removed WQ_MEM_RECLAIM flag from ceph_msgr_wq. This is
wrong - libceph is very much a memory reclaim path, so restore it.

Cc: stable@vger.kernel.org # needs backporting for < 3.12
Signed-off-by: Ilya Dryomov
Tested-by: Micha Krause
Reviewed-by: Sage Weil

Ilya Dryomov
2014-10-15 03:57:04 +0800
25f897773 libceph: separate multiple ops with commas in debugfs output ... Browse Code »

For requests with multiple ops, separate ops with commas instead of \t,
which is a field separator here.

Signed-off-by: Ilya Dryomov
Reviewed-by: Sage Weil

Ilya Dryomov
2014-10-15 03:57:03 +0800
70b5bfa36 libceph: sync osd op definitions in rados.h ... Browse Code »

Bring in missing osd ops and strings, use macros to eliminate multiple
points of maintenance.

Signed-off-by: Ilya Dryomov
Reviewed-by: Sage Weil

Ilya Dryomov
2014-10-15 03:57:02 +0800
e4339d28f libceph: reference counting pagelist ... Browse Code »

this allow pagelist to present data that may be sent multiple times.

Signed-off-by: Yan, Zheng
Reviewed-by: Sage Weil

Yan, Zheng
2014-10-15 03:56:48 +0800
91883cd27 libceph: don't try checking queue_work() return value ... Browse Code »

queue_work() doesn't "fail to queue", it returns false if work was
already on a queue, which can't happen here since we allocate
event_work right before we queue it. So don't bother at all.

Signed-off-by: Ilya Dryomov
Reviewed-by: Alex Elder

Ilya Dryomov
2014-10-15 01:03:25 +0800
b9a678994 libceph: Convert pr_warning to pr_warn ... Browse Code »

Use the more common pr_warn.

Other miscellanea:

o Coalesce formats
o Realign arguments

Signed-off-by: Joe Perches
Signed-off-by: Ilya Dryomov

Joe Perches
2014-10-15 01:03:23 +0800
589506f1e libceph: fix a use after free issue in osdmap_set_max_osd ... Browse Code »

If the state variable is krealloced successfully, map->osd_state will be
freed, once following two reallocation failed, and exit the function
without resetting map->osd_state, map->osd_state become a wild pointer.

fix it by resetting them after krealloc successfully.

Signed-off-by: Li RongQing
Signed-off-by: Ilya Dryomov

Li RongQing
2014-10-15 01:03:21 +0800