Eric Lee / smarc-fsl-linux-kernel

10 May, 2017

1 commit

26c5eaa13 Merge tag 'ceph-for-4.12-rc1' of git://github.com/ceph/ceph-client ... Browse Code »

Pull ceph updates from Ilya Dryomov:
"The two main items are support for disabling automatic rbd exclusive
lock transfers from myself and the long awaited -ENOSPC handling
series from Jeff.

The former will allow rbd users to take advantage of exclusive lock's
built-in blacklist/break-lock functionality while staying in control
of who owns the lock. With the latter in place, we will abort
filesystem writes on -ENOSPC instead of having them block
indefinitely.

Beyond that we've got the usual pile of filesystem fixes from Zheng,
some refcount_t conversion patches from Elena and a patch for an
ancient open() flags handling bug from Alexander"

* tag 'ceph-for-4.12-rc1' of git://github.com/ceph/ceph-client: (31 commits)
ceph: fix memory leak in __ceph_setxattr()
ceph: fix file open flags on ppc64
ceph: choose readdir frag based on previous readdir reply
rbd: exclusive map option
rbd: return ResponseMessage result from rbd_handle_request_lock()
rbd: kill rbd_is_lock_supported()
rbd: support updating the lock cookie without releasing the lock
rbd: store lock cookie
rbd: ignore unlock errors
rbd: fix error handling around rbd_init_disk()
rbd: move rbd_unregister_watch() call into rbd_dev_image_release()
rbd: move rbd_dev_destroy() call out of rbd_dev_image_release()
ceph: when seeing write errors on an inode, switch to sync writes
Revert "ceph: SetPageError() for writeback pages if writepages fails"
ceph: handle epoch barriers in cap messages
libceph: add an epoch_barrier field to struct ceph_osd_client
libceph: abort already submitted but abortable requests when map or pool goes full
libceph: allow requests to return immediately on full conditions if caller wishes
libceph: remove req->r_replay_version
ceph: make seeky readdir more efficient
...

Linus Torvalds
2017-05-10 23:42:33 +0800

09 May, 2017

2 commits

1134e0910 fs: ceph: CURRENT_TIME with ktime_get_real_ts() ... Browse Code »

CURRENT_TIME is not y2038 safe. The macro will be deleted and all the
references to it will be replaced by ktime_get_* apis.

struct timespec is also not y2038 safe. Retain timespec for timestamp
representation here as ceph uses it internally everywhere. These
references will be changed to use struct timespec64 in a separate patch.

The current_fs_time() api is being changed to use vfs struct inode* as
an argument instead of struct super_block*.

Set the new mds client request r_stamp field using ktime_get_real_ts()
instead of using current_fs_time().

Also, since r_stamp is used as mtime on the server, use timespec_trunc()
to truncate the timestamp, using the right granularity from the
superblock.

This api will be transitioned to be y2038 safe along with vfs.

Link: http://lkml.kernel.org/r/1491613030-11599-5-git-send-email-deepa.kernel@gmail.com
Signed-off-by: Deepa Dinamani
Reviewed-by: Arnd Bergmann
M: Ilya Dryomov
M: "Yan, Zheng"
M: Sage Weil
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Deepa Dinamani
2017-05-09 08:15:15 +0800
19809c2da mm, vmalloc: use __GFP_HIGHMEM implicitly ... Browse Code »

__vmalloc* allows users to provide gfp flags for the underlying
allocation. This API is quite popular

$ git grep "=[[:space:]]__vmalloc\|return[[:space:]]*__vmalloc" | wc -l
77

The only problem is that many people are not aware that they really want
to give __GFP_HIGHMEM along with other flags because there is really no
reason to consume precious lowmemory on CONFIG_HIGHMEM systems for pages
which are mapped to the kernel vmalloc space. About half of users don't
use this flag, though. This signals that we make the API unnecessarily
too complex.

This patch simply uses __GFP_HIGHMEM implicitly when allocating pages to
be mapped to the vmalloc space. Current users which add __GFP_HIGHMEM
are simplified and drop the flag.

Link: http://lkml.kernel.org/r/20170307141020.29107-1-mhocko@kernel.org
Signed-off-by: Michal Hocko
Reviewed-by: Matthew Wilcox
Cc: Al Viro
Cc: Vlastimil Babka
Cc: David Rientjes
Cc: Cristopher Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2017-05-09 08:15:13 +0800

04 May, 2017

10 commits

14bb211d3 rbd: support updating the lock cookie without releasing the lock ... Browse Code »

As we no longer release the lock before potentially raising BLACKLISTED
in rbd_reregister_watch(), the "either locked or blacklisted" assert in
rbd_queue_workfn() needs to go: we can be both locked and blacklisted
at that point now.

Signed-off-by: Ilya Dryomov
Reviewed-by: Jason Dillaman

Ilya Dryomov
2017-05-04 15:19:23 +0800
58eb7932a libceph: add an epoch_barrier field to struct ceph_osd_client ... Browse Code »

Cephfs can get cap update requests that contain a new epoch barrier in
them. When that happens we want to pause all OSD traffic until the right
map epoch arrives.

Add an epoch_barrier field to ceph_osd_client that is protected by the
osdc->lock rwsem. When the barrier is set, and the current OSD map
epoch is below that, pause the request target when submitting the
request or when revisiting it. Add a way for upper layers (cephfs)
to update the epoch_barrier as well.

If we get a new map, compare the new epoch against the barrier before
kicking requests and request another map if the map epoch is still lower
than the one we want.

If we get a map with a full pool, or at quota condition, then set the
barrier to the current epoch value.

Signed-off-by: Jeff Layton
Reviewed-by: Ilya Dryomov
Signed-off-by: Ilya Dryomov

Jeff Layton
2017-05-04 15:19:21 +0800
fc36d0a42 libceph: abort already submitted but abortable requests when map or pool goes full ... Browse Code »

When a Ceph volume hits capacity, a flag is set in the OSD map to
indicate that, and a new map is sprayed around the cluster. With cephfs
we want it to shut down any abortable requests that are in progress with
an -ENOSPC error as they'd just hang otherwise.

Add a new ceph_osdc_abort_on_full helper function to handle this. It
will first check whether there is an out-of-space condition in the
cluster and then walk the tree and abort any request that has
r_abort_on_full set with a -ENOSPC error. Call this new function
directly whenever we get a new OSD map.

Signed-off-by: Jeff Layton
Reviewed-by: Ilya Dryomov
Signed-off-by: Ilya Dryomov

Jeff Layton
2017-05-04 15:19:21 +0800
a1f4020aa libceph: allow requests to return immediately on full conditions if caller wishes ... Browse Code »

Usually, when the osd map is flagged as full or the pool is at quota,
write requests just hang. This is not what we want for cephfs, where
it would be better to simply report -ENOSPC back to userland instead
of stalling.

If the caller knows that it will want an immediate error return instead
of blocking on a full or at-quota error condition then allow it to set a
flag to request that behavior.

Set that flag in ceph_osdc_new_request (since ceph.ko is the only caller),
and on any other write request from ceph.ko.

A later patch will deal with requests that were submitted before the new
map showing the full condition came in.

Signed-off-by: Jeff Layton
Reviewed-by: Ilya Dryomov
Signed-off-by: Ilya Dryomov

Jeff Layton
2017-05-04 15:19:21 +0800
aa26d662b libceph: remove req->r_replay_version ... Browse Code »

Nothing uses this anymore with the removal of the ack vs. commit code.
Remove the field and just encode zeroes into place in the request
encoding.

Signed-off-by: Jeff Layton
Reviewed-by: Ilya Dryomov
Signed-off-by: Ilya Dryomov

Jeff Layton
2017-05-04 15:19:20 +0800
0e1a5ee65 libceph: convert ceph_pagelist.refcnt from atomic_t to refcount_t ... Browse Code »

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova
Signed-off-by: Hans Liljestrand
Signed-off-by: Kees Cook
Signed-off-by: David Windsor
Signed-off-by: Ilya Dryomov

Elena Reshetova
2017-05-04 15:19:19 +0800
02113a0f1 libceph: convert ceph_osd.o_ref from atomic_t to refcount_t ... Browse Code »

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova
Signed-off-by: Hans Liljestrand
Signed-off-by: Kees Cook
Signed-off-by: David Windsor
Signed-off-by: Ilya Dryomov

Elena Reshetova
2017-05-04 15:19:19 +0800
06dfa9639 libceph: convert ceph_snap_context.nref from atomic_t to refcount_t ... Browse Code »

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova
Signed-off-by: Hans Liljestrand
Signed-off-by: Kees Cook
Signed-off-by: David Windsor
Signed-off-by: Ilya Dryomov

Elena Reshetova
2017-05-04 15:19:18 +0800
d6a3408a7 libceph: supported_features module parameter ... Browse Code »

Add a readonly, exported to sysfs module parameter so that userspace
can generate meaningful error messages. It's a bit funky, but there is
no other libceph-specific place.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2017-05-04 15:19:18 +0800
74da4a0f5 libceph, ceph: always advertise all supported features ... Browse Code »

No reason to hide CephFS-specific features in the rbd case. Recent
feature bits mix RADOS and CephFS-specific stuff together anyway.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2017-05-04 15:19:18 +0800

23 Mar, 2017

1 commit

633ee407b libceph: force GFP_NOIO for socket allocations ... Browse Code »

sock_alloc_inode() allocates socket+inode and socket_wq with
GFP_KERNEL, which is not allowed on the writeback path:

Workqueue: ceph-msgr con_work [libceph]
ffff8810871cb018 0000000000000046 0000000000000000 ffff881085d40000
0000000000012b00 ffff881025cad428 ffff8810871cbfd8 0000000000012b00
ffff880102fc1000 ffff881085d40000 ffff8810871cb038 ffff8810871cb148
Call Trace:
[] schedule+0x29/0x70
[] schedule_timeout+0x1bd/0x200
[] ? ttwu_do_wakeup+0x2c/0x120
[] ? ttwu_do_activate.constprop.135+0x66/0x70
[] wait_for_completion+0xbf/0x180
[] ? try_to_wake_up+0x390/0x390
[] flush_work+0x165/0x250
[] ? worker_detach_from_pool+0xd0/0xd0
[] xlog_cil_force_lsn+0x81/0x200 [xfs]
[] ? __slab_free+0xee/0x234
[] _xfs_log_force_lsn+0x4d/0x2c0 [xfs]
[] ? lookup_page_cgroup_used+0xe/0x30
[] ? xfs_reclaim_inode+0xa3/0x330 [xfs]
[] xfs_log_force_lsn+0x3f/0xf0 [xfs]
[] ? xfs_reclaim_inode+0xa3/0x330 [xfs]
[] xfs_iunpin_wait+0xc6/0x1a0 [xfs]
[] ? wake_atomic_t_function+0x40/0x40
[] xfs_reclaim_inode+0xa3/0x330 [xfs]
[] xfs_reclaim_inodes_ag+0x257/0x3d0 [xfs]
[] xfs_reclaim_inodes_nr+0x33/0x40 [xfs]
[] xfs_fs_free_cached_objects+0x15/0x20 [xfs]
[] super_cache_scan+0x178/0x180
[] shrink_slab_node+0x14e/0x340
[] ? mem_cgroup_iter+0x16b/0x450
[] shrink_slab+0x100/0x140
[] do_try_to_free_pages+0x335/0x490
[] try_to_free_pages+0xb9/0x1f0
[] ? __alloc_pages_direct_compact+0x69/0x1be
[] __alloc_pages_nodemask+0x69a/0xb40
[] alloc_pages_current+0x9e/0x110
[] new_slab+0x2c5/0x390
[] __slab_alloc+0x33b/0x459
[] ? sock_alloc_inode+0x2d/0xd0
[] ? inet_sendmsg+0x71/0xc0
[] ? sock_alloc_inode+0x2d/0xd0
[] kmem_cache_alloc+0x1a2/0x1b0
[] sock_alloc_inode+0x2d/0xd0
[] alloc_inode+0x26/0xa0
[] new_inode_pseudo+0x1a/0x70
[] sock_alloc+0x1e/0x80
[] __sock_create+0x95/0x220
[] sock_create_kern+0x24/0x30
[] con_work+0xef9/0x2050 [libceph]
[] ? rbd_img_request_submit+0x4c/0x60 [rbd]
[] process_one_work+0x159/0x4f0
[] worker_thread+0x11b/0x530
[] ? create_worker+0x1d0/0x1d0
[] kthread+0xc9/0xe0
[] ? flush_kthread_worker+0x90/0x90
[] ret_from_fork+0x58/0x90
[] ? flush_kthread_worker+0x90/0x90

Use memalloc_noio_{save,restore}() to temporarily force GFP_NOIO here.

Cc: stable@vger.kernel.org # 3.10+, needs backporting
Link: http://tracker.ceph.com/issues/19309
Reported-by: Sergey Jerusalimov
Signed-off-by: Ilya Dryomov
Reviewed-by: Jeff Layton

Ilya Dryomov
2017-03-23 19:03:36 +0800

07 Mar, 2017

3 commits

7cc5e38f2 libceph: osd_request_timeout option ... Browse Code »

osd_request_timeout specifies how many seconds to wait for a response
from OSDs before returning -ETIMEDOUT from an OSD request. 0 (default)
means no limit.

osd_request_timeout is osdkeepalive-precise -- in-flight requests are
swept through every osdkeepalive seconds. With ack vs commit behaviour
gone, abort_request() is really simple.

This is based on a patch from Artur Molchanov .

Tested-by: Artur Molchanov
Signed-off-by: Ilya Dryomov
Reviewed-by: Sage Weil

Ilya Dryomov
2017-03-07 21:30:38 +0800
b581a5854 libceph: don't set weight to IN when OSD is destroyed ... Browse Code »

Since ceph.git commit 4e28f9e63644 ("osd/OSDMap: clear osd_info,
osd_xinfo on osd deletion"), weight is set to IN when OSD is deleted.
This changes the result of applying an incremental for clients, not
just OSDs. Because CRUSH computations are obviously affected,
pre-4e28f9e63644 servers disagree with post-4e28f9e63644 clients on
object placement, resulting in misdirected requests.

Mirrors ceph.git commit a6009d1039a55e2c77f431662b3d6cc5a8e8e63f.

Fixes: 930c53286977 ("libceph: apply new_state before new_up_client on incrementals")
Link: http://tracker.ceph.com/issues/19122
Signed-off-by: Ilya Dryomov
Reviewed-by: Sage Weil

Ilya Dryomov
2017-03-07 21:30:38 +0800
9afd30dbc libceph: fix crush_decode() for older maps ... Browse Code »

Older (shorter) CRUSH maps too need to be finalized.

Fixes: 66a0e2d579db ("crush: remove mutable part of CRUSH map")
Signed-off-by: Ilya Dryomov

Ilya Dryomov
2017-03-07 21:30:37 +0800

04 Mar, 2017

1 commit

1827adb11 Merge branch 'WIP.sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull sched.h split-up from Ingo Molnar:
"The point of these changes is to significantly reduce the
header footprint, to speed up the kernel build and to
have a cleaner header structure.

After these changes the new 's typical preprocessed
size goes down from a previous ~0.68 MB (~22K lines) to ~0.45 MB (~15K
lines), which is around 40% faster to build on typical configs.

Not much changed from the last version (-v2) posted three weeks ago: I
eliminated quirks, backmerged fixes plus I rebased it to an upstream
SHA1 from yesterday that includes most changes queued up in -next plus
all sched.h changes that were pending from Andrew.

I've re-tested the series both on x86 and on cross-arch defconfigs,
and did a bisectability test at a number of random points.

I tried to test as many build configurations as possible, but some
build breakage is probably still left - but it should be mostly
limited to architectures that have no cross-compiler binaries
available on kernel.org, and non-default configurations"

* 'WIP.sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (146 commits)
sched/headers: Clean up
sched/headers: Remove #ifdefs from
sched/headers: Remove the include from
sched/headers, hrtimer: Remove the include from
sched/headers, x86/apic: Remove the header inclusion from
sched/headers, timers: Remove the include from
sched/headers: Remove from
sched/headers: Remove from
sched/core: Remove unused prefetch_stack()
sched/headers: Remove from
sched/headers: Remove the 'init_pid_ns' prototype from
sched/headers: Remove from
sched/headers: Remove from
sched/headers: Remove the runqueue_is_locked() prototype
sched/headers: Remove from
sched/headers: Remove from
sched/headers: Remove from
sched/headers: Remove from
sched/headers: Remove the include from
sched/headers: Remove from
...

Linus Torvalds
2017-03-04 02:16:38 +0800

03 Mar, 2017

1 commit

69fd110eb Merge branch 'work.sendmsg' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs sendmsg updates from Al Viro:
"More sendmsg work.

This is a fairly separate isolated stuff (there's a continuation
around lustre, but that one was too late to soak in -next), thus the
separate pull request"

* 'work.sendmsg' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
ncpfs: switch to sock_sendmsg()
ncpfs: don't mess with manually advancing iovec on send
ncpfs: sendmsg does *not* bugger iovec these days
ceph_tcp_sendpage(): use ITER_BVEC sendmsg
afs_send_pages(): use ITER_BVEC
rds: remove dead code
ceph: switch to sock_recvmsg()
usbip_recv(): switch to sock_recvmsg()
iscsi_target: deal with short writes on the tx side
[nbd] pass iov_iter to nbd_xmit()
[nbd] switch sock_xmit() to sock_{send,recv}msg()
[drbd] use sock_sendmsg()

Linus Torvalds
2017-03-03 07:16:38 +0800

02 Mar, 2017

1 commit

5b3cc15af sched/headers: Prepare to move the memalloc_noio_*() APIs to <linux/sched/mm.h> ... Browse Code »

Update the .c files that depend on these APIs.

Acked-by: Linus Torvalds
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

Ingo Molnar
2017-03-02 15:42:33 +0800

01 Mar, 2017

1 commit

b2deee2dc Merge tag 'ceph-for-4.11-rc1' of git://github.com/ceph/ceph-client ... Browse Code »

Pull ceph updates from Ilya Dryomov:
"This time around we have:

- support for rbd data-pool feature, which enables rbd images on
erasure-coded pools (myself). CEPH_PG_MAX_SIZE has been bumped to
allow erasure-coded profiles with k+m up to 32.

- a patch for ceph_d_revalidate() performance regression introduced
in 4.9, along with some cleanups in the area (Jeff Layton)

- a set of fixes for unsafe ->d_parent accesses in CephFS (Jeff
Layton)

- buffered reads are now processed in rsize windows instead of rasize
windows (Andreas Gerstmayr). The new default for rsize mount option
is 64M.

- ack vs commit distinction is gone, greatly simplifying ->fsync()
and MOSDOpReply handling code (myself)

... also a few filesystem bug fixes from Zheng, a CRUSH sync up (CRUSH
computations are still serialized though) and several minor fixes and
cleanups all over"

* tag 'ceph-for-4.11-rc1' of git://github.com/ceph/ceph-client: (52 commits)
libceph, rbd, ceph: WRITE | ONDISK -> WRITE
libceph: get rid of ack vs commit
ceph: remove special ack vs commit behavior
ceph: tidy some white space in get_nonsnap_parent()
crush: fix dprintk compilation
crush: do is_out test only if we do not collide
ceph: remove req from unsafe list when unregistering it
rbd: constify device_type structure
rbd: kill obj_request->object_name and rbd_segment_name_cache
rbd: store and use obj_request->object_no
rbd: RBD_V{1,2}_DATA_FORMAT macros
rbd: factor out __rbd_osd_req_create()
rbd: set offset and length outside of rbd_obj_request_create()
rbd: support for data-pool feature
rbd: introduce rbd_init_layout()
rbd: use rbd_obj_bytes() more
rbd: remove now unused rbd_obj_request_wait() and helpers
rbd: switch rbd_obj_method_sync() to ceph_osdc_call()
libceph: pass reply buffer length through ceph_osdc_call()
rbd: do away with obj_request in rbd_obj_read_sync()
...

Linus Torvalds
2017-03-01 07:36:09 +0800

25 Feb, 2017

2 commits

54ea0046b libceph, rbd, ceph: WRITE | ONDISK -> WRITE ... Browse Code »

CEPH_OSD_FLAG_ONDISK is set in account_request().

Signed-off-by: Ilya Dryomov
Reviewed-by: Jeff Layton
Reviewed-by: Sage Weil

Ilya Dryomov
2017-02-25 02:04:57 +0800
b18b9550e libceph: get rid of ack vs commit ... Browse Code »

- CEPH_OSD_FLAG_ACK shouldn't be set anymore, so assert on it
- remove support for handling ack replies (OSDs will send ack replies
only if clients request them)
- drop the "do lingering callbacks under osd->lock" logic from
handle_reply() -- lreq->lock is sufficient in all three cases

Signed-off-by: Ilya Dryomov
Reviewed-by: Jeff Layton
Reviewed-by: Sage Weil

Ilya Dryomov
2017-02-25 02:04:57 +0800

24 Feb, 2017

2 commits

7ba0487cc crush: fix dprintk compilation ... Browse Code »

The syntax error was not noticed because dprintk is a macro
and the code is discarded by default.

Reflects ceph.git commit f29b840c64a933b2cb13e3da6f3d785effd73a57.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2017-02-24 05:22:02 +0800
98ba6af72 crush: do is_out test only if we do not collide ... Browse Code »

The is_out() test may require an additional hashing operation, so we
should skip it whenever possible.

Reflects ceph.git commit db107cc7f15cf2481894add325dc93e33479f529.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2017-02-24 05:22:02 +0800

21 Feb, 2017

1 commit

42e1b14b6 Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull locking updates from Ingo Molnar:
"The main changes in this cycle were:

- Implement wraparound-safe refcount_t and kref_t types based on
generic atomic primitives (Peter Zijlstra)

- Improve and fix the ww_mutex code (Nicolai Hähnle)

- Add self-tests to the ww_mutex code (Chris Wilson)

- Optimize percpu-rwsems with the 'rcuwait' mechanism (Davidlohr
Bueso)

- Micro-optimize the current-task logic all around the core kernel
(Davidlohr Bueso)

- Tidy up after recent optimizations: remove stale code and APIs,
clean up the code (Waiman Long)

- ... plus misc fixes, updates and cleanups"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits)
fork: Fix task_struct alignment
locking/spinlock/debug: Remove spinlock lockup detection code
lockdep: Fix incorrect condition to print bug msgs for MAX_LOCKDEP_CHAIN_HLOCKS
lkdtm: Convert to refcount_t testing
kref: Implement 'struct kref' using refcount_t
refcount_t: Introduce a special purpose refcount type
sched/wake_q: Clarify queue reinit comment
sched/wait, rcuwait: Fix typo in comment
locking/mutex: Fix lockdep_assert_held() fail
locking/rtmutex: Flip unlikely() branch to likely() in __rt_mutex_slowlock()
locking/rwsem: Reinit wake_q after use
locking/rwsem: Remove unnecessary atomic_long_t casts
jump_labels: Move header guard #endif down where it belongs
locking/atomic, kref: Implement kref_put_lock()
locking/ww_mutex: Turn off __must_check for now
locking/atomic, kref: Avoid more abuse
locking/atomic, kref: Use kref_get_unless_zero() more
locking/atomic, kref: Kill kref_sub()
locking/atomic, kref: Add kref_read()
locking/atomic, kref: Add KREF_INIT()
...

Linus Torvalds
2017-02-21 05:23:30 +0800

20 Feb, 2017

8 commits

2544a0209 libceph: pass reply buffer length through ceph_osdc_call() ... Browse Code »

To spare checking for "this reply fits into a page, but does it fit
into my buffer?" in some callers, osd_req_op_cls_response_data_pages()
needs to know how big it is.

Signed-off-by: Ilya Dryomov
Reviewed-by: Jason Dillaman

Ilya Dryomov
2017-02-20 19:16:13 +0800
ef9324bb1 libceph: don't go through with the mapping if the PG is too wide ... Browse Code »

With EC overwrites maturing, the kernel client will be getting exposed
to potentially very wide EC pools. While "min(pi->size, X)" works fine
when the cluster is stable and happy, truncating OSD sets interferes
with resend logic (ceph_is_new_interval(), etc). Abort the mapping if
the pool is too wide, assigning the request to the homeless session.

Signed-off-by: Ilya Dryomov
Reviewed-by: Sage Weil

Ilya Dryomov
2017-02-20 19:16:11 +0800
743efcfff crush: merge working data and scratch ... Browse Code »

Much like Arlo Guthrie, I decided that one big pile is better than two
little piles.

Reflects ceph.git commit 95c2df6c7e0b22d2ea9d91db500cf8b9441c73ba.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2017-02-20 19:16:11 +0800
66a0e2d57 crush: remove mutable part of CRUSH map ... Browse Code »

Then add it to the working state. It would be very nice if we didn't
have to take a lock to calculate a crush placement. By moving the
permutation array into the working data, we can treat the CRUSH map as
immutable.

Reflects ceph.git commit cbcd039651c0569551cb90d26ce27e1432671f2a.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2017-02-20 19:16:11 +0800
1b6a78b5b libceph: add osdmap_set_crush() helper ... Browse Code »

Simplify osdmap_decode() and osdmap_apply_incremental() a bit.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2017-02-20 19:16:11 +0800
19def166f libceph: remove unneeded stddef.h include ... Browse Code »

This was causing a build failure for openrisc when using musl and
gcc 5.4.0 since the file is not available in the toolchain.

It doesnt seem this is needed and removing it does not cause any build
warnings for me.

Signed-off-by: Stafford Horne
Signed-off-by: Ilya Dryomov

Stafford Horne
2017-02-20 19:16:10 +0800
d641df819 ceph: update readpages osd request according to size of pages ... Browse Code »

add_to_page_cache_lru() can fails, so the actual pages to read
can be smaller than the initial size of osd request. We need to
update osd request size in that case.

Signed-off-by: Yan, Zheng
Reviewed-by: Jeff Layton

Yan, Zheng
2017-02-20 19:16:07 +0800
7fea24c6d libceph: include linux/sched.h into crypto.c directly ... Browse Code »

Currently crypto.c gets linux/sched.h indirectly through linux/slab.h
from linux/kasan.h. Include it directly for memalloc_noio_*() inlines.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2017-02-20 19:16:06 +0800

19 Jan, 2017

1 commit

124f930b8 libceph: make sure ceph_aes_crypt() IV is aligned ... Browse Code »

... otherwise the crypto stack will align it for us with a GFP_ATOMIC
allocation and a memcpy() -- see skcipher_walk_first().

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2017-01-19 00:58:45 +0800

14 Jan, 2017

1 commit

2c935bc57 locking/atomic, kref: Add kref_read() ... Browse Code »

Since we need to change the implementation, stop exposing internals.

Provide kref_read() to read the current reference count; typically
used for debug messages.

Kills two anti-patterns:

atomic_read(&kref->refcount)
kref->refcount.counter

Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Greg Kroah-Hartman
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

Peter Zijlstra
2017-01-14 18:37:18 +0800

27 Dec, 2016

2 commits

61ff6e9b4 ceph_tcp_sendpage(): use ITER_BVEC sendmsg ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2016-12-27 10:35:41 +0800
100803a84 ceph: switch to sock_recvmsg() ... Browse Code »

... and use ITER_BVEC instead of playing with kmap()

Signed-off-by: Al Viro

Al Viro
2016-12-27 10:35:38 +0800

15 Dec, 2016

2 commits

45ee2c1d6 libceph: remove now unused finish_request() wrapper ... Browse Code »

Kill the wrapper and rename __finish_request() to finish_request().

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2016-12-15 05:39:08 +0800
c297eb426 libceph: always signal completion when done ... Browse Code »

r_safe_completion is currently, and has always been, signaled only if
on-disk ack was requested. It's there for fsync and syncfs, which wait
for in-flight writes to flush - all data write requests set ONDISK.

However, the pool perm check code introduced in 4.2 sends a write
request with only ACK set. An unfortunately timed syncfs can then hang
forever: r_safe_completion won't be signaled because only an unsafe
reply was requested.

We could patch ceph_osdc_sync() to skip !ONDISK write requests, but
that is somewhat incomplete and yet another special case. Instead,
rename this completion to r_done_completion and always signal it when
the OSD client is done with the request, whether unsafe, safe, or
error. This is a bit cleaner and helps with the cancellation code.

Reported-by: Yan, Zheng
Signed-off-by: Ilya Dryomov

Ilya Dryomov
2016-12-15 05:39:08 +0800