03 Oct, 2019
1 commit
-
In case a user process using xenbus has open transactions and is killed
e.g. via ctrl-C the following cleanup of the allocated resources might
result in a deadlock due to trying to end a transaction in the xenbus
worker thread:[ 2551.474706] INFO: task xenbus:37 blocked for more than 120 seconds.
[ 2551.492215] Tainted: P OE 5.0.0-29-generic #5
[ 2551.510263] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2551.528585] xenbus D 0 37 2 0x80000080
[ 2551.528590] Call Trace:
[ 2551.528603] __schedule+0x2c0/0x870
[ 2551.528606] ? _cond_resched+0x19/0x40
[ 2551.528632] schedule+0x2c/0x70
[ 2551.528637] xs_talkv+0x1ec/0x2b0
[ 2551.528642] ? wait_woken+0x80/0x80
[ 2551.528645] xs_single+0x53/0x80
[ 2551.528648] xenbus_transaction_end+0x3b/0x70
[ 2551.528651] xenbus_file_free+0x5a/0x160
[ 2551.528654] xenbus_dev_queue_reply+0xc4/0x220
[ 2551.528657] xenbus_thread+0x7de/0x880
[ 2551.528660] ? wait_woken+0x80/0x80
[ 2551.528665] kthread+0x121/0x140
[ 2551.528667] ? xb_read+0x1d0/0x1d0
[ 2551.528670] ? kthread_park+0x90/0x90
[ 2551.528673] ret_from_fork+0x35/0x40Fix this by doing the cleanup via a workqueue instead.
Reported-by: James Dingwall
Fixes: fd8aa9095a95c ("xen: optimize xenbus driver for multiple concurrent xenstore accesses")
Cc: # 4.11
Signed-off-by: Juergen Gross
Reviewed-by: Boris Ostrovsky
Signed-off-by: Boris Ostrovsky
01 Jun, 2019
1 commit
-
Pull xen fixes from Juergen Gross:
"One minor cleanup patch and a fix for handling of live migration when
running as Xen guest"* tag 'for-linus-5.2b-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xenbus: Avoid deadlock during suspend due to open transactions
xen/pvcalls: Remove set but not used variable
29 May, 2019
1 commit
-
During a suspend/resume, the xenwatch thread waits for all outstanding
xenstore requests and transactions to complete. This does not work
correctly for transactions started by userspace because it waits for
them to complete after freezing userspace threads which means the
transactions have no way of completing, resulting in a deadlock. This is
trivial to reproduce by running this script and then suspending the VM:import pyxs, time
c = pyxs.client.Client(xen_bus_path="/dev/xen/xenbus")
c.connect()
c.transaction()
time.sleep(3600)Even if this deadlock were resolved, misbehaving userspace should not
prevent a VM from being migrated. So, instead of waiting for these
transactions to complete before suspending, store the current generation
id for each transaction when it is started. The global generation id is
incremented during resume. If the caller commits the transaction and the
generation id does not match the current generation id, return EAGAIN so
that they try again. If the transaction was instead discarded, return OK
since no changes were made anyway.This only affects users of the xenbus file interface. In-kernel users of
xenbus are assumed to be well-behaved and complete all transactions
before freezing.Signed-off-by: Ross Lagerwall
Reviewed-by: Juergen Gross
Signed-off-by: Boris Ostrovsky
21 May, 2019
1 commit
-
Add SPDX license identifiers to all files which:
- Have no license information of any form
- Have MODULE_LICENCE("GPL*") inside which was used in the initial
scan/conversion to ignore the fileThese files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:GPL-2.0-only
Signed-off-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman
25 Apr, 2019
1 commit
-
Drop LIST_HEAD where the variable it declares is never used.
The declarations were introduced with the file, but the declared
variables were not used.Fixes: 1107ba885e469 ("xen: add xenfs to allow usermode Xen interaction")
Signed-off-by: Mao Wenan
Reviewed-by: Juergen Gross
Signed-off-by: Boris Ostrovsky
07 Apr, 2019
1 commit
-
…multaneously without deadlock
Commit 9c225f2655e3 ("vfs: atomic f_pos accesses as per POSIX") added
locking for file.f_pos access and in particular made concurrent read and
write not possible - now both those functions take f_pos lock for the
whole run, and so if e.g. a read is blocked waiting for data, write will
deadlock waiting for that read to complete.This caused regression for stream-like files where previously read and
write could run simultaneously, but after that patch could not do so
anymore. See e.g. commit 581d21a2d02a ("xenbus: fix deadlock on writes
to /proc/xen/xenbus") which fixes such regression for particular case of
/proc/xen/xenbus.The patch that added f_pos lock in 2014 did so to guarantee POSIX thread
safety for read/write/lseek and added the locking to file descriptors of
all regular files. In 2014 that thread-safety problem was not new as it
was already discussed earlier in 2006.However even though 2006'th version of Linus's patch was adding f_pos
locking "only for files that are marked seekable with FMODE_LSEEK (thus
avoiding the stream-like objects like pipes and sockets)", the 2014
version - the one that actually made it into the tree as 9c225f2655e3 -
is doing so irregardless of whether a file is seekable or not.See
https://lore.kernel.org/lkml/53022DB1.4070805@gmail.com/
https://lwn.net/Articles/180387
https://lwn.net/Articles/180396for historic context.
The reason that it did so is, probably, that there are many files that
are marked non-seekable, but e.g. their read implementation actually
depends on knowing current position to correctly handle the read. Some
examples:kernel/power/user.c snapshot_read
fs/debugfs/file.c u32_array_read
fs/fuse/control.c fuse_conn_waiting_read + ...
drivers/hwmon/asus_atk0110.c atk_debugfs_ggrp_read
arch/s390/hypfs/inode.c hypfs_read_iter
...Despite that, many nonseekable_open users implement read and write with
pure stream semantics - they don't depend on passed ppos at all. And for
those cases where read could wait for something inside, it creates a
situation similar to xenbus - the write could be never made to go until
read is done, and read is waiting for some, potentially external, event,
for potentially unbounded time -> deadlock.Besides xenbus, there are 14 such places in the kernel that I've found
with semantic patch (see below):drivers/xen/evtchn.c:667:8-24: ERROR: evtchn_fops: .read() can deadlock .write()
drivers/isdn/capi/capi.c:963:8-24: ERROR: capi_fops: .read() can deadlock .write()
drivers/input/evdev.c:527:1-17: ERROR: evdev_fops: .read() can deadlock .write()
drivers/char/pcmcia/cm4000_cs.c:1685:7-23: ERROR: cm4000_fops: .read() can deadlock .write()
net/rfkill/core.c:1146:8-24: ERROR: rfkill_fops: .read() can deadlock .write()
drivers/s390/char/fs3270.c:488:1-17: ERROR: fs3270_fops: .read() can deadlock .write()
drivers/usb/misc/ldusb.c:310:1-17: ERROR: ld_usb_fops: .read() can deadlock .write()
drivers/hid/uhid.c:635:1-17: ERROR: uhid_fops: .read() can deadlock .write()
net/batman-adv/icmp_socket.c:80:1-17: ERROR: batadv_fops: .read() can deadlock .write()
drivers/media/rc/lirc_dev.c:198:1-17: ERROR: lirc_fops: .read() can deadlock .write()
drivers/leds/uleds.c:77:1-17: ERROR: uleds_fops: .read() can deadlock .write()
drivers/input/misc/uinput.c:400:1-17: ERROR: uinput_fops: .read() can deadlock .write()
drivers/infiniband/core/user_mad.c:985:7-23: ERROR: umad_fops: .read() can deadlock .write()
drivers/gnss/core.c:45:1-17: ERROR: gnss_fops: .read() can deadlock .write()In addition to the cases above another regression caused by f_pos
locking is that now FUSE filesystems that implement open with
FOPEN_NONSEEKABLE flag, can no longer implement bidirectional
stream-like files - for the same reason as above e.g. read can deadlock
write locking on file.f_pos in the kernel.FUSE's FOPEN_NONSEEKABLE was added in 2008 in a7c1b990f715 ("fuse:
implement nonseekable open") to support OSSPD. OSSPD implements /dev/dsp
in userspace with FOPEN_NONSEEKABLE flag, with corresponding read and
write routines not depending on current position at all, and with both
read and write being potentially blocking operations:See
https://github.com/libfuse/osspd
https://lwn.net/Articles/308445https://github.com/libfuse/osspd/blob/14a9cff0/osspd.c#L1406
https://github.com/libfuse/osspd/blob/14a9cff0/osspd.c#L1438-L1477
https://github.com/libfuse/osspd/blob/14a9cff0/osspd.c#L1479-L1510Corresponding libfuse example/test also describes FOPEN_NONSEEKABLE as
"somewhat pipe-like files ..." with read handler not using offset.
However that test implements only read without write and cannot exercise
the deadlock scenario:https://github.com/libfuse/libfuse/blob/fuse-3.4.2-3-ga1bff7d/example/poll.c#L124-L131
https://github.com/libfuse/libfuse/blob/fuse-3.4.2-3-ga1bff7d/example/poll.c#L146-L163
https://github.com/libfuse/libfuse/blob/fuse-3.4.2-3-ga1bff7d/example/poll.c#L209-L216I've actually hit the read vs write deadlock for real while implementing
my FUSE filesystem where there is /head/watch file, for which open
creates separate bidirectional socket-like stream in between filesystem
and its user with both read and write being later performed
simultaneously. And there it is semantically not easy to split the
stream into two separate read-only and write-only channels:https://lab.nexedi.com/kirr/wendelin.core/blob/f13aa600/wcfs/wcfs.go#L88-169
Let's fix this regression. The plan is:
1. We can't change nonseekable_open to include &~FMODE_ATOMIC_POS -
doing so would break many in-kernel nonseekable_open users which
actually use ppos in read/write handlers.2. Add stream_open() to kernel to open stream-like non-seekable file
descriptors. Read and write on such file descriptors would never use
nor change ppos. And with that property on stream-like files read and
write will be running without taking f_pos lock - i.e. read and write
could be running simultaneously.3. With semantic patch search and convert to stream_open all in-kernel
nonseekable_open users for which read and write actually do not
depend on ppos and where there is no other methods in file_operations
which assume @offset access.4. Add FOPEN_STREAM to fs/fuse/ and open in-kernel file-descriptors via
steam_open if that bit is present in filesystem open reply.It was tempting to change fs/fuse/ open handler to use stream_open
instead of nonseekable_open on just FOPEN_NONSEEKABLE flags, but
grepping through Debian codesearch shows users of FOPEN_NONSEEKABLE,
and in particular GVFS which actually uses offset in its read and
write handlershttps://codesearch.debian.net/search?q=-%3Enonseekable+%3D
https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1080
https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1247-1346
https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1399-1481so if we would do such a change it will break a real user.
5. Add stream_open and FOPEN_STREAM handling to stable kernels starting
from v3.14+ (the kernel where 9c225f2655 first appeared).This will allow to patch OSSPD and other FUSE filesystems that
provide stream-like files to return FOPEN_STREAM | FOPEN_NONSEEKABLE
in their open handler and this way avoid the deadlock on all kernel
versions. This should work because fs/fuse/ ignores unknown open
flags returned from a filesystem and so passing FOPEN_STREAM to a
kernel that is not aware of this flag cannot hurt. In turn the kernel
that is not aware of FOPEN_STREAM will be < v3.14 where just
FOPEN_NONSEEKABLE is sufficient to implement streams without read vs
write deadlock.This patch adds stream_open, converts /proc/xen/xenbus to it and adds
semantic patch to automatically locate in-kernel places that are either
required to be converted due to read vs write deadlock, or that are just
safe to be converted because read and write do not use ppos and there
are no other funky methods in file_operations.Regarding semantic patch I've verified each generated change manually -
that it is correct to convert - and each other nonseekable_open instance
left - that it is either not correct to convert there, or that it is not
converted due to current stream_open.cocci limitations.The script also does not convert files that should be valid to convert,
but that currently have .llseek = noop_llseek or generic_file_llseek for
unknown reason despite file being opened with nonseekable_open (e.g.
drivers/input/mousedev.c)Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Yongzhi Pan <panyongzhi@gmail.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Tejun Heo <tj@kernel.org>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Julia Lawall <Julia.Lawall@lip6.fr>
Cc: Nikolaus Rath <Nikolaus@rath.org>
Cc: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Kirill Smelkov <kirr@nexedi.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
26 Oct, 2018
1 commit
-
xenbus_va_dev_error() will try to write error messages to Xenstore
under the error//error node (with something like
"device/vbd/51872"). This will fail normally and another message
about this failure is added to dmesg.I believe this is a remnant from very ancient times, as it was added
in the first pvops rush of commits in 2007.So remove the additional message when writing to Xenstore failed as
a minimum step.Signed-off-by: Juergen Gross
Reviewed-by: Boris Ostrovsky
Signed-off-by: Juergen Gross
29 Aug, 2018
1 commit
-
Export device state to sysfs to allow for easier get device state.
Signed-off-by: Joe Jin
Reviewed-by: Boris Ostrovsky
Cc: Boris Ostrovsky
Cc: Juergen Gross
Cc: Konrad Rzeszutek Wilk
Signed-off-by: Boris Ostrovsky
17 May, 2018
1 commit
-
There's no need to store the xenstore page or event channel in
xen_start_info if they are locally initialized.This also fixes PVH local xenstore initialization due to the lack of
xen_start_info in that case.Signed-off-by: Boris Ostrovsky
Signed-off-by: Roger Pau Monné
Reviewed-by: Juergen Gross
Signed-off-by: Juergen Gross
17 Apr, 2018
1 commit
-
xenbus_command_reply() did not actually copy the response string and
leaked stack content instead.Fixes: 9a6161fe73bd ("xen: return xenstore command failures via response instead of rc")
Signed-off-by: Simon Gaiser
Reviewed-by: Juergen Gross
Signed-off-by: Boris Ostrovsky
21 Mar, 2018
3 commits
-
By guaranteeing that the argument of XS_TRANSACTION_END is valid we can
assume that the transaction has been closed when we get an XS_ERROR
response from xenstore (Note that we already verify that it's a valid
transaction id).Signed-off-by: Simon Gaiser
Reviewed-by: Juergen Gross
Signed-off-by: Boris Ostrovsky -
Users of the xenbus functions should never close a non existent
transaction (for example by trying to closing the same transaction
twice) but better catch it in xs_request_exit() than to corrupt the
reference counter.Signed-off-by: Simon Gaiser
Reviewed-by: Juergen Gross
Signed-off-by: Boris Ostrovsky -
Commit fd8aa9095a95 ("xen: optimize xenbus driver for multiple
concurrent xenstore accesses") made a subtle change to the semantic of
xenbus_dev_request_and_reply() and xenbus_transaction_end().Before on an error response to XS_TRANSACTION_END
xenbus_dev_request_and_reply() would not decrement the active
transaction counter. But xenbus_transaction_end() has always counted the
transaction as finished regardless of the response.The new behavior is that xenbus_dev_request_and_reply() and
xenbus_transaction_end() will always count the transaction as finished
regardless the response code (handled in xs_request_exit()).But xenbus_dev_frontend tries to end a transaction on closing of the
device if the XS_TRANSACTION_END failed before. Trying to close the
transaction twice corrupts the reference count. So fix this by also
considering a transaction closed if we have sent XS_TRANSACTION_END once
regardless of the return code.Cc: # 4.11
Fixes: fd8aa9095a95 ("xen: optimize xenbus driver for multiple concurrent xenstore accesses")
Signed-off-by: Simon Gaiser
Reviewed-by: Juergen Gross
Signed-off-by: Boris Ostrovsky
08 Mar, 2018
1 commit
-
Never directly free @dev after calling device_register(), even
if it returned an error! Always use put_device() to give up the
reference initialized.Signed-off-by: Arvind Yadav
Reviewed-by: Juergen Gross
Signed-off-by: Juergen Gross
17 Feb, 2018
1 commit
-
Commit fd8aa9095a95 ("xen: optimize xenbus driver for multiple concurrent
xenstore accesses") optimized xenbus concurrent accesses but in doing so
broke UABI of /dev/xen/xenbus. Through /dev/xen/xenbus applications are in
charge of xenbus message exchange with the correct header and body. Now,
after the mentioned commit the replies received by application will no
longer have the header req_id echoed back as it was on request (see
specification below for reference), because that particular field is being
overwritten by kernel.struct xsd_sockmsg
{
uint32_t type; /* XS_??? */
uint32_t req_id;/* Request identifier, echoed in daemon's response. */
uint32_t tx_id; /* Transaction id (0 if not related to a transaction). */
uint32_t len; /* Length of data following this. *//* Generally followed by nul-terminated string(s). */
};Before there was only one request at a time so req_id could simply be
forwarded back and forth. To allow simultaneous requests we need a
different req_id for each message thus kernel keeps a monotonic increasing
counter for this field and is written on every request irrespective of
userspace value.Forwarding again the req_id on userspace requests is not a solution because
we would open the possibility of userspace-generated req_id colliding with
kernel ones. So this patch instead takes another route which is to
artificially keep user req_id while keeping the xenbus logic as is. We do
that by saving the original req_id before xs_send(), use the private kernel
counter as req_id and then once reply comes and was validated, we restore
back the original req_id.Cc: # 4.11
Fixes: fd8aa9095a ("xen: optimize xenbus driver for multiple concurrent xenstore accesses")
Reported-by: Bhavesh Davda
Signed-off-by: Joao Martins
Reviewed-by: Juergen Gross
Signed-off-by: Juergen Gross
12 Feb, 2018
1 commit
-
This is the mindless scripted replacement of kernel use of POLL*
variables as described by Al, done by this script:for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
for f in $L; do sed -i "-es/^\([^\"]*\)\(\\)/\\1E\\2/" $f; done
donewith de-mangling cleanups yet to come.
NOTE! On almost all architectures, the EPOLL* constants have the same
values as the POLL* constants do. But they keyword here is "almost".
For various bad reasons they aren't the same, and epoll() doesn't
actually work quite correctly in some cases due to this on Sparc et al.The next patch from Al will sort out the final differences, and we
should be all done.Scripted-by: Al Viro
Signed-off-by: Linus Torvalds
29 Nov, 2017
1 commit
-
Signed-off-by: Al Viro
17 Nov, 2017
1 commit
-
Pull xen updates from Juergen Gross:
"Xen features and fixes for v4.15-rc1Apart from several small fixes it contains the following features:
- a series by Joao Martins to add vdso support of the pv clock
interface- a series by Juergen Gross to add support for Xen pv guests to be
able to run on 5 level paging hosts- a series by Stefano Stabellini adding the Xen pvcalls frontend
driver using a paravirtualized socket interface"* tag 'for-linus-4.15-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (34 commits)
xen/pvcalls: fix potential endless loop in pvcalls-front.c
xen/pvcalls: Add MODULE_LICENSE()
MAINTAINERS: xen, kvm: track pvclock-abi.h changes
x86/xen/time: setup vcpu 0 time info page
x86/xen/time: set pvclock flags on xen_time_init()
x86/pvclock: add setter for pvclock_pvti_cpu0_va
ptp_kvm: probe for kvm guest availability
xen/privcmd: remove unused variable pageidx
xen: select grant interface version
xen: update arch/x86/include/asm/xen/cpuid.h
xen: add grant interface version dependent constants to gnttab_ops
xen: limit grant v2 interface to the v1 functionality
xen: re-introduce support for grant v2 interface
xen: support priv-mapping in an HVM tools domain
xen/pvcalls: remove redundant check for irq >= 0
xen/pvcalls: fix unsigned less than zero error check
xen/time: Return -ENODEV from xen_get_wallclock()
xen/pvcalls-front: mark expected switch fall-through
xen: xenbus_probe_frontend: mark expected switch fall-throughs
xen/time: do not decrease steal time after live migration on xen
...
03 Nov, 2017
1 commit
-
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.Addresses-Coverity-ID: 146562
Addresses-Coverity-ID: 146563
Signed-off-by: Gustavo A. R. Silva
Reviewed-by: Juergen Gross
Signed-off-by: Boris Ostrovsky
02 Nov, 2017
1 commit
-
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.By default all files without license information are under the default
license of the kernel, which is GPL version 2.Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if
Reviewed-by: Philippe Ombredanne
Reviewed-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman
18 Sep, 2017
1 commit
-
xenbus_client.c contains some functions specific for pv guests.
Enclose them with #ifdef CONFIG_XEN_PV to avoid compiling them when
they are not needed (e.g. on ARM).Signed-off-by: Juergen Gross
Signed-off-by: Boris Ostrovsky
11 Aug, 2017
1 commit
-
When starting the xenwatch thread a theoretical deadlock situation is
possible:xs_init() contains:
task = kthread_run(xenwatch_thread, NULL, "xenwatch");
if (IS_ERR(task))
return PTR_ERR(task);
xenwatch_pid = task->pid;And xenwatch_thread() does:
mutex_lock(&xenwatch_mutex);
...
event->handle->callback();
...
mutex_unlock(&xenwatch_mutex);The callback could call unregister_xenbus_watch() which does:
...
if (current->pid != xenwatch_pid)
mutex_lock(&xenwatch_mutex);
...In case a watch is firing before xenwatch_pid could be set and the
callback of that watch unregisters a watch, then a self-deadlock would
occur.Avoid this by setting xenwatch_pid in xenwatch_thread().
Signed-off-by: Juergen Gross
Reviewed-by: Boris Ostrovsky
Signed-off-by: Juergen Gross
25 Jun, 2017
1 commit
-
There has been a report about a deadlock in the xenbus driver:
[ 247.979498] ======================================================
[ 247.985688] WARNING: possible circular locking dependency detected
[ 247.991882] 4.12.0-rc4-00022-gc4b25c0 #575 Not tainted
[ 247.997040] ------------------------------------------------------
[ 248.003232] xenbus/91 is trying to acquire lock:
[ 248.007875] (&u->msgbuffer_mutex){+.+.+.}, at: []
xenbus_dev_queue_reply+0x3c/0x230
[ 248.017163]
[ 248.017163] but task is already holding lock:
[ 248.023096] (xb_write_mutex){+.+...}, at: []
xenbus_thread+0x5f0/0x798
[ 248.031267]
[ 248.031267] which lock already depends on the new lock.
[ 248.031267]
[ 248.039615]
[ 248.039615] the existing dependency chain (in reverse order) is:
[ 248.047176]
[ 248.047176] -> #1 (xb_write_mutex){+.+...}:
[ 248.052943] __lock_acquire+0x1728/0x1778
[ 248.057498] lock_acquire+0xc4/0x288
[ 248.061630] __mutex_lock+0x84/0x868
[ 248.065755] mutex_lock_nested+0x3c/0x50
[ 248.070227] xs_send+0x164/0x1f8
[ 248.074015] xenbus_dev_request_and_reply+0x6c/0x88
[ 248.079427] xenbus_file_write+0x260/0x420
[ 248.084073] __vfs_write+0x48/0x138
[ 248.088113] vfs_write+0xa8/0x1b8
[ 248.091983] SyS_write+0x54/0xb0
[ 248.095768] el0_svc_naked+0x24/0x28
[ 248.099897]
[ 248.099897] -> #0 (&u->msgbuffer_mutex){+.+.+.}:
[ 248.106088] print_circular_bug+0x80/0x2e0
[ 248.110730] __lock_acquire+0x1768/0x1778
[ 248.115288] lock_acquire+0xc4/0x288
[ 248.119417] __mutex_lock+0x84/0x868
[ 248.123545] mutex_lock_nested+0x3c/0x50
[ 248.128016] xenbus_dev_queue_reply+0x3c/0x230
[ 248.133005] xenbus_thread+0x788/0x798
[ 248.137306] kthread+0x110/0x140
[ 248.141087] ret_from_fork+0x10/0x40It is rather easy to avoid by dropping xb_write_mutex before calling
xenbus_dev_queue_reply().Fixes: fd8aa9095a95c02dcc35540a263267c29b8fda9d ("xen: optimize xenbus
driver for multiple concurrent xenstore accesses").Cc: # 4.11
Reported-by: Andre Przywara
Signed-off-by: Juergen Gross
Tested-by: Andre Przywara
Signed-off-by: Juergen Gross
04 Apr, 2017
1 commit
-
After allocation the item is being placed on the list right away.
Consequently it needs to be taken off the list before freeing in the
case xenbus_dev_request_and_reply() failed, as in that case the
callback (xenbus_dev_queue_reply()) is not being called (and if it
was called, it should do both).Fixes: 5584ea250ae44f929feb4c7bd3877d1c5edbf813
Signed-off-by: Jan Beulich
Reviewed-by: Juergen Gross
Signed-off-by: Boris Ostrovsky
27 Feb, 2017
1 commit
-
This patch remove duplicate inclusion of linux/init.h in
xenbus_dev_frontend.c.
Confirm successfully compile after remove the line.Signed-off-by: Masanari Iida
Reviewed-by: Juergen Gross
10 Feb, 2017
3 commits
-
Handling of multiple concurrent Xenstore accesses through xenbus driver
either from the kernel or user land is rather lame today: xenbus is
capable to have one access active only at one point of time.Rewrite xenbus to handle multiple requests concurrently by making use
of the request id of the Xenstore protocol. This requires to:- Instead of blocking inside xb_read() when trying to read data from
the xenstore ring buffer do so only in the main loop of
xenbus_thread().- Instead of doing writes to the xenstore ring buffer in the context of
the caller just queue the request and do the write in the dedicated
xenbus thread.- Instead of just forwarding the request id specified by the caller of
xenbus to xenstore use a xenbus internal unique request id. This will
allow multiple outstanding requests.- Modify the locking scheme in order to allow multiple requests being
active in parallel.- Instead of waiting for the reply of a user's xenstore request after
writing the request to the xenstore ring buffer return directly to
the caller and do the waiting in the read path.Additionally signal handling was optimized by avoiding waking up the
xenbus thread or sending an event to Xenstore in case the addressed
entity is known to be running already.As a result communication with Xenstore is sped up by a factor of up
to 5: depending on the request type (read or write) and the amount of
data transferred the gain was at least 20% (small reads) and went up to
a factor of 5 for large writes.In the end some more rough edges of xenbus have been smoothed:
- Handling of memory shortage when reading from xenstore ring buffer in
the xenbus driver was not optimal: it was busy looping and issuing a
warning in each loop.- In case of xenstore not running in dom0 but in a stubdom we end up
with two xenbus threads running as the initialization of xenbus in
dom0 expecting a local xenstored will be redone later when connecting
to the xenstore domain. Up to now this was no problem as locking
would prevent the two xenbus threads interfering with each other, but
this was just a waste of kernel resources.- An out of memory situation while writing to or reading from the
xenstore ring buffer no longer will lead to a possible loss of
synchronization with xenstore.- The user read and write part are now interruptible by signals.
Signed-off-by: Juergen Gross
Signed-off-by: Boris Ostrovsky -
Today a Xenstore watch event is delivered via a callback function
declared as:void (*callback)(struct xenbus_watch *,
const char **vec, unsigned int len);As all watch events only ever come with two parameters (path and token)
changing the prototype to:void (*callback)(struct xenbus_watch *,
const char *path, const char *token);is the natural thing to do.
Apply this change and adapt all users.
Cc: konrad.wilk@oracle.com
Cc: roger.pau@citrix.com
Cc: wei.liu2@citrix.com
Cc: paul.durrant@citrix.com
Cc: netdev@vger.kernel.orgSigned-off-by: Juergen Gross
Reviewed-by: Paul Durrant
Reviewed-by: Wei Liu
Reviewed-by: Roger Pau Monné
Reviewed-by: Boris Ostrovsky
Signed-off-by: Boris Ostrovsky -
The xenbus driver has an awful mixture of internally and globally
visible headers: some of the internally used only stuff is defined in
the global header include/xen/xenbus.h while some stuff defined in
internal headers is used by other drivers, too.Clean this up by moving the externally used symbols to
include/xen/xenbus.h and the symbols used internally only to a new
header drivers/xen/xenbus/xenbus.h replacing xenbus_comms.h and
xenbus_probe.hSigned-off-by: Juergen Gross
Reviewed-by: Boris Ostrovsky
Signed-off-by: Boris Ostrovsky
08 Feb, 2017
1 commit
-
This function error patch can be simplified, so do so.
Remove fail: label and somewhat obfuscating, used once "error_path"
function.Signed-off-by: Joe Perches
Signed-off-by: Boris Ostrovsky
24 Dec, 2016
3 commits
-
In drivers/xen/xenbus/xenbus_comms.h there is a stale declaration of
xs_input_avail(). Remove it.Signed-off-by: Juergen Gross
Reviewed-by: Boris Ostrovsky
Signed-off-by: Juergen Gross -
When the xenbus driver does some special handling for a Xenstore
command any error condition related to the command should be returned
via an error response instead of letting the related write operation
fail. Otherwise the user land handler might take wrong decisions
assuming the connection to Xenstore is broken.While at it try to return the same error values xenstored would
return for those cases.Signed-off-by: Juergen Gross
Reviewed-by: Boris Ostrovsky
Signed-off-by: Juergen Gross -
When accessing Xenstore in a transaction the user is specifying a
transaction id which he normally obtained from Xenstore when starting
the transaction. Xenstore is validating a transaction id against all
known transaction ids of the connection the request came in. As all
requests of a domain not being the one where Xenstore lives share
one connection, validation of transaction ids of different users of
Xenstore in that domain should be done by the kernel of that domain
being the multiplexer between the Xenstore users in that domain and
Xenstore.In order to prohibit one Xenstore user "hijacking" a transaction from
another user the xenbus driver has to verify a given transaction id
against all known transaction ids of the user before forwarding it to
Xenstore.Signed-off-by: Juergen Gross
Reviewed-by: Boris Ostrovsky
Signed-off-by: Juergen Gross
12 Dec, 2016
1 commit
-
/proc/xen/xenbus does not work correctly. A read blocked waiting for
a xenstore message holds the mutex needed for atomic file position
updates. This blocks any writes on the same file handle, which can
deadlock if the write is needed to unblock the read.Clear FMODE_ATOMIC_POS when opening this device to always get
character device like sematics.Signed-off-by: David Vrabel
Reviewed-by: Juergen Gross
Signed-off-by: Juergen Gross
08 Dec, 2016
1 commit
-
Variable err is initialized with 0. As a result, the return value may
be 0 even if get_zeroed_page() fails to allocate memory. This patch fixes
the bug, initializing err with "-ENOMEM".Signed-off-by: Pan Bian
Reviewed-by: Juergen Gross
Signed-off-by: Juergen Gross
17 Nov, 2016
1 commit
-
Mounting proc in user namespace containers fails if the xenbus
filesystem is mounted on /proc/xen because this directory fails
the "permanently empty" test. proc_create_mount_point() exists
specifically to create such mountpoints in proc but is currently
proc-internal. Export this interface to modules, then use it in
xenbus when creating /proc/xen.Signed-off-by: Seth Forshee
Signed-off-by: David Vrabel
Signed-off-by: Juergen Gross
07 Nov, 2016
2 commits
-
Use xenbus_read_unsigned() instead of xenbus_scanf() when possible.
This requires to change the type of the reads from int to unsigned,
but these cases have been wrong before: negative values are not allowed
for the modified cases.Signed-off-by: Juergen Gross
Acked-by: David Vrabel -
There are multiple instances of code reading an optional unsigned
parameter from Xenstore via xenbus_scanf(). Instead of repeating the
same code over and over add a service function doing the job.Signed-off-by: Juergen Gross
Reviewed-by: David Vrabel
25 Oct, 2016
1 commit
-
Pull xen fixes from David Vrabel:
- advertise control feature flags in xenstore
- fix x86 build when XEN_PVHVM is disabled
* tag 'for-linus-4.9-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xenbus: check return value of xenbus_scanf()
xenbus: prefer list_for_each()
x86: xen: move cpu_up functions out of ifdef
xenbus: advertise control feature flags
24 Oct, 2016
2 commits
-
Don't ignore errors here: Set backend state to unknown when
unsuccessful.Signed-off-by: Jan Beulich
Signed-off-by: David Vrabel -
This is more efficient than list_for_each_safe() when list modification
is accompanied by breaking out of the loop.Signed-off-by: Jan Beulich
Reviewed-by: Juergen Gross
Signed-off-by: David Vrabel