03 Oct, 2016
16 commits
-
Signed-off-by: Yan, Zheng
-
Accessing / causes failuire if the client has caps that restrict path
Signed-off-by: Yan, Zheng
-
Signed-off-by: Yan, Zheng
-
If O_DIRECT writes are racing with buffered writes, then
the call to invalidate_inode_pages2_range() can call ceph_releasepage()
on dirty pages.Most filesystems hold inode_lock() across O_DIRECT writes so they do not
suffer this race, but cephfs deliberately drops the lock, and opens a window
for the race.This race can be triggered with the generic/036 test from the xfstests
test suite. It doesn't happen every time, but it does happen often.As the possibilty is expected, remove the warning, and instead include
the PageDirty() status in the debug message.Signed-off-by: NeilBrown
Reviewed-by: Jeff Layton
Reviewed-by: Yan, Zheng -
This call can fail if there are dirty pages. The preceding call to
filemap_write_and_wait_range() will normally remove dirty pages, but
as inode_lock() is not held over calls to ceph_direct_read_write(), it
could race with non-direct writes and pages could be dirtied
immediately after filemap_write_and_wait_range() returnsIf there are dirty pages, they will be removed by the subsequent call
to truncate_inode_pages_range(), so having them here is not a problem.If the 'ret' value is left holding an error, then in the async IO case
(aio_req is not NULL) the loop that would normally call
ceph_osdc_start_request() will see the error in 'ret' and abort all
requests. This doesn't seem like correct behaviour.So use separate 'ret2' instead of overloading 'ret'.
Signed-off-by: NeilBrown
Reviewed-by: Jeff Layton
Reviewed-by: Yan, Zheng -
If start_page() fails to add a page to page cache or fails to send
OSD request. It should cal put_page() (instead of free_page()) for
relevant pages.Besides, start_page() need to cancel fscache readpage if it fails
to send OSD request.Signed-off-by: Yan, Zheng
Reported-by: Zhi Zhang -
Pull setting an error and marking a request done code into a new
helper. obj_request_img_data_test() check isn't strictly needed right
now, but makes it applicable to !img_data requests and a bit safer.Signed-off-by: Ilya Dryomov
-
Move the check into rbd_obj_request_destroy() to avoid use-after-free
on errors in rbd_img_request_fill(..., OBJ_REQUEST_PAGES, ...), where
pages, owned by the caller, gets freed in rbd_img_request_fill().Signed-off-by: Ilya Dryomov
Reviewed-by: Alex Elder
Reviewed-by: David Disseldorp -
Accessing obj_request->img_request union field is only valid for object
requests associated with an image (i.e. if obj_request_img_data_test()
returns true). rbd_osd_req_format_read() used to do more, but now it
just sets osd_req->snap_id. Standalone and stat object requests always
go to the HEAD revision and are fine with CEPH_NOSNAP set by libceph,
so get around the invalid union field use by simply not calling
rbd_osd_req_format_read() in those places.Reported-by: David Disseldorp
Signed-off-by: Ilya Dryomov
Reviewed-by: Alex Elder
Reviewed-by: David Disseldorp -
- don't put obj_request before rbd_obj_request_get() if
rbd_obj_request_create() fails
- don't leak pages if rbd_obj_request_create() fails
- don't leak stat_request if rbd_osd_req_create() failsReported-by: David Disseldorp
Signed-off-by: Ilya Dryomov
Reviewed-by: Alex Elder
Reviewed-by: David Disseldorp -
- fix parent_length == img_request->xferred assert to not fire on
copyup read failures
- don't leak pages if copyup read fails or we can't allocate a new osd
requestSigned-off-by: Ilya Dryomov
Reviewed-by: Alex Elder
Reviewed-by: David Disseldorp -
Commit 0f2d5be792b0 ("rbd: use reference counts for image requests")
added rbd_img_request_get(), which rbd_img_request_fill() calls for
each obj_request added to img_request. It was an urgent band-aid for
the uglyness that is rbd_img_obj_callback() and none of the error paths
were updated.Given that this img_request reference is meant to represent an
obj_request that hasn't passed through rbd_img_obj_callback() yet,
proper cleanup in appropriate destructors is a challenge. However,
noting that if we don't get a chance to call rbd_obj_request_complete(),
there is not going to be a call to rbd_img_obj_callback(), we can move
rbd_img_request_get() into rbd_obj_request_submit() and fixup the two
places that call rbd_obj_request_complete() directly and not through
rbd_obj_request_submit() to temporarily bump img_request, so that
rbd_img_obj_callback() can put as usual.This takes care of img_request leaks on errors on the submit side.
Signed-off-by: Ilya Dryomov
Reviewed-by: Alex Elder -
If stat request fails with something other than -ENOENT (which just
means that we need to copyup), the original object request is never
marked as done and therefore never completed. Fix this by moving the
mark done + complete snippet from rbd_img_obj_parent_read_full() into
rbd_img_obj_exists_callback(). The former remains covered, as the
latter is its only caller (through rbd_img_obj_request_submit()).Signed-off-by: Ilya Dryomov
Reviewed-by: Alex Elder
Reviewed-by: David Disseldorp -
Assert once in rbd_img_obj_request_submit().
Signed-off-by: Ilya Dryomov
Reviewed-by: Alex Elder
Reviewed-by: David Disseldorp -
- osdc parameter is useless
- starting with commit 5aea3dcd5021 ("libceph: a major OSD client
update"), ceph_osdc_start_request() always returns successSigned-off-by: Ilya Dryomov
Reviewed-by: Alex Elder
Reviewed-by: David Disseldorp -
Add a per-device option to acquire exclusive lock on reads (in addition
to writes and discards). The use case is iSCSI, where it will be used
to prevent execution of stale writes after the implicit failover.Signed-off-by: Ilya Dryomov
Tested-by: Mike Christie
25 Aug, 2016
16 commits
-
This adds a force close option, so we can force the unmapping
of a rbd device that is open. If a path/device is blacklisted, apps
like multipathd can map a new device and then unmap the old one.
The unmapping cleanup would then be handled by the generic hotunplug
code paths in multipahd like is done for iSCSI, FC/FCOE, SAS, etc.Signed-off-by: Mike Christie
Signed-off-by: Ilya Dryomov -
Export the info used to setup the rbd image, so it can be used to remap
the image.Signed-off-by: Mike Christie
[idryomov@gmail.com: do_rbd_add() EH]
Signed-off-by: Ilya Dryomov -
Export snap id in sysfs, so tools like multipathd can use it in a uuid.
Signed-off-by: Mike Christie
Signed-off-by: Ilya Dryomov -
Export the cluster fsid, so tools like udev and multipath-tools can use
it for part of the uuid.Signed-off-by: Mike Christie
Signed-off-by: Ilya Dryomov -
Export client addr/nonce, so userspace can check if a image is being
blacklisted.Signed-off-by: Mike Christie
[idryomov@gmail.com: ceph_client_addr(), endianess fix]
Signed-off-by: Ilya Dryomov -
With exclusive-lock added and more to come, print features into dmesg.
Change capacity to decimal while at it.Signed-off-by: Ilya Dryomov
Reviewed-by: Mike Christie -
Add basic support for RBD_FEATURE_EXCLUSIVE_LOCK feature. Maintenance
operations (resize, snapshot create, etc) are offloaded to librbd via
returning -EOPNOTSUPP - librbd should request the lock and execute the
operation.Signed-off-by: Ilya Dryomov
Reviewed-by: Mike Christie
Tested-by: Mike Christie -
Revamp watch code to support retrying watch re-registration:
- add rbd_dev->watch_state for more robust errcb handling
- store watch cookie separately to avoid dereferencing watch_handle
which is set to NULL on unwatch
- move re-register code into a delayed work and retry re-registration
every second, unless the client is blacklistedSigned-off-by: Ilya Dryomov
Reviewed-by: Mike Christie
Tested-by: Mike Christie -
This is going to be used for re-registering watch requests and
exclusive-lock tasks: acquire/request lock, notify-acquired, release
lock, notify-released. Some refactoring in the map/unmap paths was
necessary to give this workqueue a meaningful name: "rbdX-tasks".Signed-off-by: Ilya Dryomov
Reviewed-by: Mike Christie -
It's gid / global_id in other places.
Signed-off-by: Ilya Dryomov
Reviewed-by: Mike Christie
Reviewed-by: Alex Elder -
Reuse ceph_mon_generic_request infrastructure for sending monitor
commands. In particular, add support for 'blacklist add' to prevent
other, non-responsive clients from making further updates.Signed-off-by: Douglas Fuller
[idryomov@gmail.com: refactor, misc fixes throughout]
Signed-off-by: Ilya Dryomov
Reviewed-by: Mike Christie
Reviewed-by: Alex Elder -
Add an interface for the Ceph OSD lock.lock_info method and associated
data structures.Based heavily on code by Mike Christie .
Signed-off-by: Douglas Fuller
[idryomov@gmail.com: refactor, misc fixes throughout]
Signed-off-by: Ilya Dryomov
Reviewed-by: Mike Christie
Reviewed-by: Alex Elder -
This patch adds support for rados lock, unlock and break lock.
Based heavily on code by Mike Christie .
Signed-off-by: Douglas Fuller
Signed-off-by: Ilya Dryomov
Reviewed-by: Mike Christie
Reviewed-by: Alex Elder -
Add a convenience function to osd_client to send Ceph OSD
'class' ops. The interface assumes that the request and
reply data each consist of single pages.Signed-off-by: Douglas Fuller
Signed-off-by: Ilya Dryomov
Reviewed-by: Mike Christie
Reviewed-by: Alex Elder -
Add support for this Ceph OSD op, needed to support the RBD exclusive
lock feature.Signed-off-by: Douglas Fuller
[idryomov@gmail.com: refactor, misc fixes throughout]
Signed-off-by: Ilya Dryomov
Reviewed-by: Mike Christie
Reviewed-by: Alex Elder -
Clear up EntityName vs entity_name_t confusion.
Signed-off-by: Ilya Dryomov
Reviewed-by: Mike Christie
Reviewed-by: Alex Elder
15 Aug, 2016
3 commits
-
Pull thermal updates from Zhang Rui:
- Fix a race condition when updating cooling device, which may lead to
a situation where a thermal governor never updates the cooling
device. From Michele Di Giorgio.- Fix a zero division error when disabling the forced idle injection
from the intel powerclamp. From Petr Mladek.- Add suspend/resume callback for intel_pch_thermal thermal driver.
From Srinivas Pandruvada.- Another two fixes for clocking cooling driver and hwmon sysfs I/F.
From Michele Di Giorgio and Kuninori Morimoto.[ Hmm. That suspend/resume callback for intel_pch_thermal doesn't look
like a fix, but I'm letting it slide.. - Linus ]* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux:
thermal: clock_cooling: Fix missing mutex_init()
thermal: hwmon: EXPORT_SYMBOL_GPL for thermal hwmon sysfs
thermal: fix race condition when updating cooling device
thermal/powerclamp: Prevent division by zero when counting interval
thermal: intel_pch_thermal: Add suspend/resume callback -
Pull m68knommu fix from Greg Ungerer:
"This contains only a single fix for a register corruption problem on
certain types of m68k flat format binaries"* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
m68knommu: fix user a5 register being overwritten
14 Aug, 2016
4 commits
-
…/groeck/linux-staging
Pull h8300 and unicore32 architecture fixes from Guenter Roeck:
"Two patches to fix h8300 and unicore32 builds.unicore32 builds have been broken since v4.6. The fix has been
available in -next since March of this year.h8300 builds have been broken since the last commit window. The fix
has been available in -next since June of this year"* tag 'fixes-for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
h8300: Add missing include file to asm/io.h
unicore32: mm: Add missing parameter to arch_vma_access_permitted -
Pull arm64 fixes from Catalin Marinas:
- support for nr_cpus= command line argument (maxcpus was previously
changed to allow secondary CPUs to be hot-plugged)- ARM PMU interrupt handling fix
- fix potential TLB conflict in the hibernate code
- improved handling of EL1 instruction aborts (better error reporting)
- removal of useless jprobes code for stack saving/restoring
- defconfig updates
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: defconfig: enable CONFIG_LOCALVERSION_AUTO
arm64: defconfig: add options for virtualization and containers
arm64: hibernate: handle allocation failures
arm64: hibernate: avoid potential TLB conflict
arm64: Handle el1 synchronous instruction aborts cleanly
arm64: Remove stack duplicating code from jprobes
drivers/perf: arm-pmu: Fix handling of SPI lacking "interrupt-affinity" property
drivers/perf: arm-pmu: convert arm_pmu_mutex to spinlock
arm64: Support hard limit of cpu count by nr_cpus -
Pull KVM fixes from Radim Krčmář:
"KVM:
- lock kvm_device list to prevent corruption on device creation.PPC:
- split debugfs initialization from creation of the xics device to
unlock the newly taken kvm lock earlier.s390:
- prevent userspace from triggering two WARN_ON_ONCE.MIPS:
- fix several issues in the management of TLB faults (Cc: stable)"* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
MIPS: KVM: Propagate kseg0/mapped tlb fault errors
MIPS: KVM: Fix gfn range check in kseg0 tlb faults
MIPS: KVM: Add missing gfn range check
MIPS: KVM: Fix mapped fault broken commpage handling
KVM: Protect device ops->create and list_add with kvm->lock
KVM: PPC: Move xics_debugfs_init out of create
KVM: s390: reset KVM_REQ_MMU_RELOAD if mapping the prefix failed
KVM: s390: set the prefix initially properly -
Pull block fixes from Jens Axboe:
- an NVMe fix from Gabriel, fixing a suspend/resume issue on some
setups- addition of a few missing entries in the block queue sysfs
documentation, from Joe- a fix for a sparse shadow warning for the bvec iterator, from
Johannes- a writeback deadlock involving raid issuing barriers, and not
flushing the plug when we wakeup the flusher threads. From
Konstantin- a set of patches for the NVMe target/loop/rdma code, from Roland and
Sagi* 'for-linus' of git://git.kernel.dk/linux-block:
bvec: avoid variable shadowing warning
doc: update block/queue-sysfs.txt entries
nvme: Suspend all queues before deletion
mm, writeback: flush plugged IO in wakeup_flusher_threads()
nvme-rdma: Remove unused includes
nvme-rdma: start async event handler after reconnecting to a controller
nvmet: Fix controller serial number inconsistency
nvmet-rdma: Don't use the inline buffer in order to avoid allocation for small reads
nvmet-rdma: Correctly handle RDMA device hot removal
nvme-rdma: Make sure to shutdown the controller if we can
nvme-loop: Remove duplicate call to nvme_remove_namespaces
nvme-rdma: Free the I/O tags when we delete the controller
nvme-rdma: Remove duplicate call to nvme_remove_namespaces
nvme-rdma: Fix device removal handling
nvme-rdma: Queue ns scanning after a sucessful reconnection
nvme-rdma: Don't leak uninitialized memory in connect request private data
13 Aug, 2016
1 commit
-
h8300 builds fail with
arch/h8300/include/asm/io.h:9:15: error: unknown type name ‘u8’
arch/h8300/include/asm/io.h:15:15: error: unknown type name ‘u16’
arch/h8300/include/asm/io.h:21:15: error: unknown type name ‘u32’and many related errors.
Fixes: 23c82d41bdf4 ("kexec-allow-architectures-to-override-boot-mapping-fix")
Cc: Andrew Morton
Signed-off-by: Guenter Roeck