03 Oct, 2019

1 commit

  • In case a user process using xenbus has open transactions and is killed
    e.g. via ctrl-C the following cleanup of the allocated resources might
    result in a deadlock due to trying to end a transaction in the xenbus
    worker thread:

    [ 2551.474706] INFO: task xenbus:37 blocked for more than 120 seconds.
    [ 2551.492215] Tainted: P OE 5.0.0-29-generic #5
    [ 2551.510263] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    [ 2551.528585] xenbus D 0 37 2 0x80000080
    [ 2551.528590] Call Trace:
    [ 2551.528603] __schedule+0x2c0/0x870
    [ 2551.528606] ? _cond_resched+0x19/0x40
    [ 2551.528632] schedule+0x2c/0x70
    [ 2551.528637] xs_talkv+0x1ec/0x2b0
    [ 2551.528642] ? wait_woken+0x80/0x80
    [ 2551.528645] xs_single+0x53/0x80
    [ 2551.528648] xenbus_transaction_end+0x3b/0x70
    [ 2551.528651] xenbus_file_free+0x5a/0x160
    [ 2551.528654] xenbus_dev_queue_reply+0xc4/0x220
    [ 2551.528657] xenbus_thread+0x7de/0x880
    [ 2551.528660] ? wait_woken+0x80/0x80
    [ 2551.528665] kthread+0x121/0x140
    [ 2551.528667] ? xb_read+0x1d0/0x1d0
    [ 2551.528670] ? kthread_park+0x90/0x90
    [ 2551.528673] ret_from_fork+0x35/0x40

    Fix this by doing the cleanup via a workqueue instead.

    Reported-by: James Dingwall
    Fixes: fd8aa9095a95c ("xen: optimize xenbus driver for multiple concurrent xenstore accesses")
    Cc: # 4.11
    Signed-off-by: Juergen Gross
    Reviewed-by: Boris Ostrovsky
    Signed-off-by: Boris Ostrovsky

    Juergen Gross
     

01 Jun, 2019

1 commit


29 May, 2019

1 commit

  • During a suspend/resume, the xenwatch thread waits for all outstanding
    xenstore requests and transactions to complete. This does not work
    correctly for transactions started by userspace because it waits for
    them to complete after freezing userspace threads which means the
    transactions have no way of completing, resulting in a deadlock. This is
    trivial to reproduce by running this script and then suspending the VM:

    import pyxs, time
    c = pyxs.client.Client(xen_bus_path="/dev/xen/xenbus")
    c.connect()
    c.transaction()
    time.sleep(3600)

    Even if this deadlock were resolved, misbehaving userspace should not
    prevent a VM from being migrated. So, instead of waiting for these
    transactions to complete before suspending, store the current generation
    id for each transaction when it is started. The global generation id is
    incremented during resume. If the caller commits the transaction and the
    generation id does not match the current generation id, return EAGAIN so
    that they try again. If the transaction was instead discarded, return OK
    since no changes were made anyway.

    This only affects users of the xenbus file interface. In-kernel users of
    xenbus are assumed to be well-behaved and complete all transactions
    before freezing.

    Signed-off-by: Ross Lagerwall
    Reviewed-by: Juergen Gross
    Signed-off-by: Boris Ostrovsky

    Ross Lagerwall
     

21 May, 2019

1 commit

  • Add SPDX license identifiers to all files which:

    - Have no license information of any form

    - Have MODULE_LICENCE("GPL*") inside which was used in the initial
    scan/conversion to ignore the file

    These files fall under the project license, GPL v2 only. The resulting SPDX
    license identifier is:

    GPL-2.0-only

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

25 Apr, 2019

1 commit


07 Apr, 2019

1 commit

  • …multaneously without deadlock

    Commit 9c225f2655e3 ("vfs: atomic f_pos accesses as per POSIX") added
    locking for file.f_pos access and in particular made concurrent read and
    write not possible - now both those functions take f_pos lock for the
    whole run, and so if e.g. a read is blocked waiting for data, write will
    deadlock waiting for that read to complete.

    This caused regression for stream-like files where previously read and
    write could run simultaneously, but after that patch could not do so
    anymore. See e.g. commit 581d21a2d02a ("xenbus: fix deadlock on writes
    to /proc/xen/xenbus") which fixes such regression for particular case of
    /proc/xen/xenbus.

    The patch that added f_pos lock in 2014 did so to guarantee POSIX thread
    safety for read/write/lseek and added the locking to file descriptors of
    all regular files. In 2014 that thread-safety problem was not new as it
    was already discussed earlier in 2006.

    However even though 2006'th version of Linus's patch was adding f_pos
    locking "only for files that are marked seekable with FMODE_LSEEK (thus
    avoiding the stream-like objects like pipes and sockets)", the 2014
    version - the one that actually made it into the tree as 9c225f2655e3 -
    is doing so irregardless of whether a file is seekable or not.

    See

    https://lore.kernel.org/lkml/53022DB1.4070805@gmail.com/
    https://lwn.net/Articles/180387
    https://lwn.net/Articles/180396

    for historic context.

    The reason that it did so is, probably, that there are many files that
    are marked non-seekable, but e.g. their read implementation actually
    depends on knowing current position to correctly handle the read. Some
    examples:

    kernel/power/user.c snapshot_read
    fs/debugfs/file.c u32_array_read
    fs/fuse/control.c fuse_conn_waiting_read + ...
    drivers/hwmon/asus_atk0110.c atk_debugfs_ggrp_read
    arch/s390/hypfs/inode.c hypfs_read_iter
    ...

    Despite that, many nonseekable_open users implement read and write with
    pure stream semantics - they don't depend on passed ppos at all. And for
    those cases where read could wait for something inside, it creates a
    situation similar to xenbus - the write could be never made to go until
    read is done, and read is waiting for some, potentially external, event,
    for potentially unbounded time -> deadlock.

    Besides xenbus, there are 14 such places in the kernel that I've found
    with semantic patch (see below):

    drivers/xen/evtchn.c:667:8-24: ERROR: evtchn_fops: .read() can deadlock .write()
    drivers/isdn/capi/capi.c:963:8-24: ERROR: capi_fops: .read() can deadlock .write()
    drivers/input/evdev.c:527:1-17: ERROR: evdev_fops: .read() can deadlock .write()
    drivers/char/pcmcia/cm4000_cs.c:1685:7-23: ERROR: cm4000_fops: .read() can deadlock .write()
    net/rfkill/core.c:1146:8-24: ERROR: rfkill_fops: .read() can deadlock .write()
    drivers/s390/char/fs3270.c:488:1-17: ERROR: fs3270_fops: .read() can deadlock .write()
    drivers/usb/misc/ldusb.c:310:1-17: ERROR: ld_usb_fops: .read() can deadlock .write()
    drivers/hid/uhid.c:635:1-17: ERROR: uhid_fops: .read() can deadlock .write()
    net/batman-adv/icmp_socket.c:80:1-17: ERROR: batadv_fops: .read() can deadlock .write()
    drivers/media/rc/lirc_dev.c:198:1-17: ERROR: lirc_fops: .read() can deadlock .write()
    drivers/leds/uleds.c:77:1-17: ERROR: uleds_fops: .read() can deadlock .write()
    drivers/input/misc/uinput.c:400:1-17: ERROR: uinput_fops: .read() can deadlock .write()
    drivers/infiniband/core/user_mad.c:985:7-23: ERROR: umad_fops: .read() can deadlock .write()
    drivers/gnss/core.c:45:1-17: ERROR: gnss_fops: .read() can deadlock .write()

    In addition to the cases above another regression caused by f_pos
    locking is that now FUSE filesystems that implement open with
    FOPEN_NONSEEKABLE flag, can no longer implement bidirectional
    stream-like files - for the same reason as above e.g. read can deadlock
    write locking on file.f_pos in the kernel.

    FUSE's FOPEN_NONSEEKABLE was added in 2008 in a7c1b990f715 ("fuse:
    implement nonseekable open") to support OSSPD. OSSPD implements /dev/dsp
    in userspace with FOPEN_NONSEEKABLE flag, with corresponding read and
    write routines not depending on current position at all, and with both
    read and write being potentially blocking operations:

    See

    https://github.com/libfuse/osspd
    https://lwn.net/Articles/308445

    https://github.com/libfuse/osspd/blob/14a9cff0/osspd.c#L1406
    https://github.com/libfuse/osspd/blob/14a9cff0/osspd.c#L1438-L1477
    https://github.com/libfuse/osspd/blob/14a9cff0/osspd.c#L1479-L1510

    Corresponding libfuse example/test also describes FOPEN_NONSEEKABLE as
    "somewhat pipe-like files ..." with read handler not using offset.
    However that test implements only read without write and cannot exercise
    the deadlock scenario:

    https://github.com/libfuse/libfuse/blob/fuse-3.4.2-3-ga1bff7d/example/poll.c#L124-L131
    https://github.com/libfuse/libfuse/blob/fuse-3.4.2-3-ga1bff7d/example/poll.c#L146-L163
    https://github.com/libfuse/libfuse/blob/fuse-3.4.2-3-ga1bff7d/example/poll.c#L209-L216

    I've actually hit the read vs write deadlock for real while implementing
    my FUSE filesystem where there is /head/watch file, for which open
    creates separate bidirectional socket-like stream in between filesystem
    and its user with both read and write being later performed
    simultaneously. And there it is semantically not easy to split the
    stream into two separate read-only and write-only channels:

    https://lab.nexedi.com/kirr/wendelin.core/blob/f13aa600/wcfs/wcfs.go#L88-169

    Let's fix this regression. The plan is:

    1. We can't change nonseekable_open to include &~FMODE_ATOMIC_POS -
    doing so would break many in-kernel nonseekable_open users which
    actually use ppos in read/write handlers.

    2. Add stream_open() to kernel to open stream-like non-seekable file
    descriptors. Read and write on such file descriptors would never use
    nor change ppos. And with that property on stream-like files read and
    write will be running without taking f_pos lock - i.e. read and write
    could be running simultaneously.

    3. With semantic patch search and convert to stream_open all in-kernel
    nonseekable_open users for which read and write actually do not
    depend on ppos and where there is no other methods in file_operations
    which assume @offset access.

    4. Add FOPEN_STREAM to fs/fuse/ and open in-kernel file-descriptors via
    steam_open if that bit is present in filesystem open reply.

    It was tempting to change fs/fuse/ open handler to use stream_open
    instead of nonseekable_open on just FOPEN_NONSEEKABLE flags, but
    grepping through Debian codesearch shows users of FOPEN_NONSEEKABLE,
    and in particular GVFS which actually uses offset in its read and
    write handlers

    https://codesearch.debian.net/search?q=-%3Enonseekable+%3D
    https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1080
    https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1247-1346
    https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1399-1481

    so if we would do such a change it will break a real user.

    5. Add stream_open and FOPEN_STREAM handling to stable kernels starting
    from v3.14+ (the kernel where 9c225f2655 first appeared).

    This will allow to patch OSSPD and other FUSE filesystems that
    provide stream-like files to return FOPEN_STREAM | FOPEN_NONSEEKABLE
    in their open handler and this way avoid the deadlock on all kernel
    versions. This should work because fs/fuse/ ignores unknown open
    flags returned from a filesystem and so passing FOPEN_STREAM to a
    kernel that is not aware of this flag cannot hurt. In turn the kernel
    that is not aware of FOPEN_STREAM will be < v3.14 where just
    FOPEN_NONSEEKABLE is sufficient to implement streams without read vs
    write deadlock.

    This patch adds stream_open, converts /proc/xen/xenbus to it and adds
    semantic patch to automatically locate in-kernel places that are either
    required to be converted due to read vs write deadlock, or that are just
    safe to be converted because read and write do not use ppos and there
    are no other funky methods in file_operations.

    Regarding semantic patch I've verified each generated change manually -
    that it is correct to convert - and each other nonseekable_open instance
    left - that it is either not correct to convert there, or that it is not
    converted due to current stream_open.cocci limitations.

    The script also does not convert files that should be valid to convert,
    but that currently have .llseek = noop_llseek or generic_file_llseek for
    unknown reason despite file being opened with nonseekable_open (e.g.
    drivers/input/mousedev.c)

    Cc: Michael Kerrisk <mtk.manpages@gmail.com>
    Cc: Yongzhi Pan <panyongzhi@gmail.com>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: David Vrabel <david.vrabel@citrix.com>
    Cc: Juergen Gross <jgross@suse.com>
    Cc: Miklos Szeredi <miklos@szeredi.hu>
    Cc: Tejun Heo <tj@kernel.org>
    Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
    Cc: Arnd Bergmann <arnd@arndb.de>
    Cc: Christoph Hellwig <hch@lst.de>
    Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Cc: Julia Lawall <Julia.Lawall@lip6.fr>
    Cc: Nikolaus Rath <Nikolaus@rath.org>
    Cc: Han-Wen Nienhuys <hanwen@google.com>
    Signed-off-by: Kirill Smelkov <kirr@nexedi.com>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

    Kirill Smelkov
     

26 Oct, 2018

1 commit

  • xenbus_va_dev_error() will try to write error messages to Xenstore
    under the error//error node (with something like
    "device/vbd/51872"). This will fail normally and another message
    about this failure is added to dmesg.

    I believe this is a remnant from very ancient times, as it was added
    in the first pvops rush of commits in 2007.

    So remove the additional message when writing to Xenstore failed as
    a minimum step.

    Signed-off-by: Juergen Gross
    Reviewed-by: Boris Ostrovsky
    Signed-off-by: Juergen Gross

    Juergen Gross
     

29 Aug, 2018

1 commit

  • Export device state to sysfs to allow for easier get device state.

    Signed-off-by: Joe Jin
    Reviewed-by: Boris Ostrovsky
    Cc: Boris Ostrovsky
    Cc: Juergen Gross
    Cc: Konrad Rzeszutek Wilk
    Signed-off-by: Boris Ostrovsky

    Joe Jin
     

17 May, 2018

1 commit

  • There's no need to store the xenstore page or event channel in
    xen_start_info if they are locally initialized.

    This also fixes PVH local xenstore initialization due to the lack of
    xen_start_info in that case.

    Signed-off-by: Boris Ostrovsky
    Signed-off-by: Roger Pau Monné
    Reviewed-by: Juergen Gross
    Signed-off-by: Juergen Gross

    Roger Pau Monne
     

17 Apr, 2018

1 commit


21 Mar, 2018

3 commits

  • By guaranteeing that the argument of XS_TRANSACTION_END is valid we can
    assume that the transaction has been closed when we get an XS_ERROR
    response from xenstore (Note that we already verify that it's a valid
    transaction id).

    Signed-off-by: Simon Gaiser
    Reviewed-by: Juergen Gross
    Signed-off-by: Boris Ostrovsky

    Simon Gaiser
     
  • Users of the xenbus functions should never close a non existent
    transaction (for example by trying to closing the same transaction
    twice) but better catch it in xs_request_exit() than to corrupt the
    reference counter.

    Signed-off-by: Simon Gaiser
    Reviewed-by: Juergen Gross
    Signed-off-by: Boris Ostrovsky

    Simon Gaiser
     
  • Commit fd8aa9095a95 ("xen: optimize xenbus driver for multiple
    concurrent xenstore accesses") made a subtle change to the semantic of
    xenbus_dev_request_and_reply() and xenbus_transaction_end().

    Before on an error response to XS_TRANSACTION_END
    xenbus_dev_request_and_reply() would not decrement the active
    transaction counter. But xenbus_transaction_end() has always counted the
    transaction as finished regardless of the response.

    The new behavior is that xenbus_dev_request_and_reply() and
    xenbus_transaction_end() will always count the transaction as finished
    regardless the response code (handled in xs_request_exit()).

    But xenbus_dev_frontend tries to end a transaction on closing of the
    device if the XS_TRANSACTION_END failed before. Trying to close the
    transaction twice corrupts the reference count. So fix this by also
    considering a transaction closed if we have sent XS_TRANSACTION_END once
    regardless of the return code.

    Cc: # 4.11
    Fixes: fd8aa9095a95 ("xen: optimize xenbus driver for multiple concurrent xenstore accesses")
    Signed-off-by: Simon Gaiser
    Reviewed-by: Juergen Gross
    Signed-off-by: Boris Ostrovsky

    Simon Gaiser
     

08 Mar, 2018

1 commit


17 Feb, 2018

1 commit

  • Commit fd8aa9095a95 ("xen: optimize xenbus driver for multiple concurrent
    xenstore accesses") optimized xenbus concurrent accesses but in doing so
    broke UABI of /dev/xen/xenbus. Through /dev/xen/xenbus applications are in
    charge of xenbus message exchange with the correct header and body. Now,
    after the mentioned commit the replies received by application will no
    longer have the header req_id echoed back as it was on request (see
    specification below for reference), because that particular field is being
    overwritten by kernel.

    struct xsd_sockmsg
    {
    uint32_t type; /* XS_??? */
    uint32_t req_id;/* Request identifier, echoed in daemon's response. */
    uint32_t tx_id; /* Transaction id (0 if not related to a transaction). */
    uint32_t len; /* Length of data following this. */

    /* Generally followed by nul-terminated string(s). */
    };

    Before there was only one request at a time so req_id could simply be
    forwarded back and forth. To allow simultaneous requests we need a
    different req_id for each message thus kernel keeps a monotonic increasing
    counter for this field and is written on every request irrespective of
    userspace value.

    Forwarding again the req_id on userspace requests is not a solution because
    we would open the possibility of userspace-generated req_id colliding with
    kernel ones. So this patch instead takes another route which is to
    artificially keep user req_id while keeping the xenbus logic as is. We do
    that by saving the original req_id before xs_send(), use the private kernel
    counter as req_id and then once reply comes and was validated, we restore
    back the original req_id.

    Cc: # 4.11
    Fixes: fd8aa9095a ("xen: optimize xenbus driver for multiple concurrent xenstore accesses")
    Reported-by: Bhavesh Davda
    Signed-off-by: Joao Martins
    Reviewed-by: Juergen Gross
    Signed-off-by: Juergen Gross

    Joao Martins
     

12 Feb, 2018

1 commit

  • This is the mindless scripted replacement of kernel use of POLL*
    variables as described by Al, done by this script:

    for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
    L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
    for f in $L; do sed -i "-es/^\([^\"]*\)\(\\)/\\1E\\2/" $f; done
    done

    with de-mangling cleanups yet to come.

    NOTE! On almost all architectures, the EPOLL* constants have the same
    values as the POLL* constants do. But they keyword here is "almost".
    For various bad reasons they aren't the same, and epoll() doesn't
    actually work quite correctly in some cases due to this on Sparc et al.

    The next patch from Al will sort out the final differences, and we
    should be all done.

    Scripted-by: Al Viro
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

29 Nov, 2017

1 commit


17 Nov, 2017

1 commit

  • Pull xen updates from Juergen Gross:
    "Xen features and fixes for v4.15-rc1

    Apart from several small fixes it contains the following features:

    - a series by Joao Martins to add vdso support of the pv clock
    interface

    - a series by Juergen Gross to add support for Xen pv guests to be
    able to run on 5 level paging hosts

    - a series by Stefano Stabellini adding the Xen pvcalls frontend
    driver using a paravirtualized socket interface"

    * tag 'for-linus-4.15-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (34 commits)
    xen/pvcalls: fix potential endless loop in pvcalls-front.c
    xen/pvcalls: Add MODULE_LICENSE()
    MAINTAINERS: xen, kvm: track pvclock-abi.h changes
    x86/xen/time: setup vcpu 0 time info page
    x86/xen/time: set pvclock flags on xen_time_init()
    x86/pvclock: add setter for pvclock_pvti_cpu0_va
    ptp_kvm: probe for kvm guest availability
    xen/privcmd: remove unused variable pageidx
    xen: select grant interface version
    xen: update arch/x86/include/asm/xen/cpuid.h
    xen: add grant interface version dependent constants to gnttab_ops
    xen: limit grant v2 interface to the v1 functionality
    xen: re-introduce support for grant v2 interface
    xen: support priv-mapping in an HVM tools domain
    xen/pvcalls: remove redundant check for irq >= 0
    xen/pvcalls: fix unsigned less than zero error check
    xen/time: Return -ENODEV from xen_get_wallclock()
    xen/pvcalls-front: mark expected switch fall-through
    xen: xenbus_probe_frontend: mark expected switch fall-throughs
    xen/time: do not decrease steal time after live migration on xen
    ...

    Linus Torvalds
     

03 Nov, 2017

1 commit


02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

18 Sep, 2017

1 commit


11 Aug, 2017

1 commit

  • When starting the xenwatch thread a theoretical deadlock situation is
    possible:

    xs_init() contains:

    task = kthread_run(xenwatch_thread, NULL, "xenwatch");
    if (IS_ERR(task))
    return PTR_ERR(task);
    xenwatch_pid = task->pid;

    And xenwatch_thread() does:

    mutex_lock(&xenwatch_mutex);
    ...
    event->handle->callback();
    ...
    mutex_unlock(&xenwatch_mutex);

    The callback could call unregister_xenbus_watch() which does:

    ...
    if (current->pid != xenwatch_pid)
    mutex_lock(&xenwatch_mutex);
    ...

    In case a watch is firing before xenwatch_pid could be set and the
    callback of that watch unregisters a watch, then a self-deadlock would
    occur.

    Avoid this by setting xenwatch_pid in xenwatch_thread().

    Signed-off-by: Juergen Gross
    Reviewed-by: Boris Ostrovsky
    Signed-off-by: Juergen Gross

    Juergen Gross
     

25 Jun, 2017

1 commit

  • There has been a report about a deadlock in the xenbus driver:

    [ 247.979498] ======================================================
    [ 247.985688] WARNING: possible circular locking dependency detected
    [ 247.991882] 4.12.0-rc4-00022-gc4b25c0 #575 Not tainted
    [ 247.997040] ------------------------------------------------------
    [ 248.003232] xenbus/91 is trying to acquire lock:
    [ 248.007875] (&u->msgbuffer_mutex){+.+.+.}, at: []
    xenbus_dev_queue_reply+0x3c/0x230
    [ 248.017163]
    [ 248.017163] but task is already holding lock:
    [ 248.023096] (xb_write_mutex){+.+...}, at: []
    xenbus_thread+0x5f0/0x798
    [ 248.031267]
    [ 248.031267] which lock already depends on the new lock.
    [ 248.031267]
    [ 248.039615]
    [ 248.039615] the existing dependency chain (in reverse order) is:
    [ 248.047176]
    [ 248.047176] -> #1 (xb_write_mutex){+.+...}:
    [ 248.052943] __lock_acquire+0x1728/0x1778
    [ 248.057498] lock_acquire+0xc4/0x288
    [ 248.061630] __mutex_lock+0x84/0x868
    [ 248.065755] mutex_lock_nested+0x3c/0x50
    [ 248.070227] xs_send+0x164/0x1f8
    [ 248.074015] xenbus_dev_request_and_reply+0x6c/0x88
    [ 248.079427] xenbus_file_write+0x260/0x420
    [ 248.084073] __vfs_write+0x48/0x138
    [ 248.088113] vfs_write+0xa8/0x1b8
    [ 248.091983] SyS_write+0x54/0xb0
    [ 248.095768] el0_svc_naked+0x24/0x28
    [ 248.099897]
    [ 248.099897] -> #0 (&u->msgbuffer_mutex){+.+.+.}:
    [ 248.106088] print_circular_bug+0x80/0x2e0
    [ 248.110730] __lock_acquire+0x1768/0x1778
    [ 248.115288] lock_acquire+0xc4/0x288
    [ 248.119417] __mutex_lock+0x84/0x868
    [ 248.123545] mutex_lock_nested+0x3c/0x50
    [ 248.128016] xenbus_dev_queue_reply+0x3c/0x230
    [ 248.133005] xenbus_thread+0x788/0x798
    [ 248.137306] kthread+0x110/0x140
    [ 248.141087] ret_from_fork+0x10/0x40

    It is rather easy to avoid by dropping xb_write_mutex before calling
    xenbus_dev_queue_reply().

    Fixes: fd8aa9095a95c02dcc35540a263267c29b8fda9d ("xen: optimize xenbus
    driver for multiple concurrent xenstore accesses").

    Cc: # 4.11
    Reported-by: Andre Przywara
    Signed-off-by: Juergen Gross
    Tested-by: Andre Przywara
    Signed-off-by: Juergen Gross

    Juergen Gross
     

04 Apr, 2017

1 commit

  • After allocation the item is being placed on the list right away.
    Consequently it needs to be taken off the list before freeing in the
    case xenbus_dev_request_and_reply() failed, as in that case the
    callback (xenbus_dev_queue_reply()) is not being called (and if it
    was called, it should do both).

    Fixes: 5584ea250ae44f929feb4c7bd3877d1c5edbf813
    Signed-off-by: Jan Beulich
    Reviewed-by: Juergen Gross
    Signed-off-by: Boris Ostrovsky

    Jan Beulich
     

27 Feb, 2017

1 commit


10 Feb, 2017

3 commits

  • Handling of multiple concurrent Xenstore accesses through xenbus driver
    either from the kernel or user land is rather lame today: xenbus is
    capable to have one access active only at one point of time.

    Rewrite xenbus to handle multiple requests concurrently by making use
    of the request id of the Xenstore protocol. This requires to:

    - Instead of blocking inside xb_read() when trying to read data from
    the xenstore ring buffer do so only in the main loop of
    xenbus_thread().

    - Instead of doing writes to the xenstore ring buffer in the context of
    the caller just queue the request and do the write in the dedicated
    xenbus thread.

    - Instead of just forwarding the request id specified by the caller of
    xenbus to xenstore use a xenbus internal unique request id. This will
    allow multiple outstanding requests.

    - Modify the locking scheme in order to allow multiple requests being
    active in parallel.

    - Instead of waiting for the reply of a user's xenstore request after
    writing the request to the xenstore ring buffer return directly to
    the caller and do the waiting in the read path.

    Additionally signal handling was optimized by avoiding waking up the
    xenbus thread or sending an event to Xenstore in case the addressed
    entity is known to be running already.

    As a result communication with Xenstore is sped up by a factor of up
    to 5: depending on the request type (read or write) and the amount of
    data transferred the gain was at least 20% (small reads) and went up to
    a factor of 5 for large writes.

    In the end some more rough edges of xenbus have been smoothed:

    - Handling of memory shortage when reading from xenstore ring buffer in
    the xenbus driver was not optimal: it was busy looping and issuing a
    warning in each loop.

    - In case of xenstore not running in dom0 but in a stubdom we end up
    with two xenbus threads running as the initialization of xenbus in
    dom0 expecting a local xenstored will be redone later when connecting
    to the xenstore domain. Up to now this was no problem as locking
    would prevent the two xenbus threads interfering with each other, but
    this was just a waste of kernel resources.

    - An out of memory situation while writing to or reading from the
    xenstore ring buffer no longer will lead to a possible loss of
    synchronization with xenstore.

    - The user read and write part are now interruptible by signals.

    Signed-off-by: Juergen Gross
    Signed-off-by: Boris Ostrovsky

    Juergen Gross
     
  • Today a Xenstore watch event is delivered via a callback function
    declared as:

    void (*callback)(struct xenbus_watch *,
    const char **vec, unsigned int len);

    As all watch events only ever come with two parameters (path and token)
    changing the prototype to:

    void (*callback)(struct xenbus_watch *,
    const char *path, const char *token);

    is the natural thing to do.

    Apply this change and adapt all users.

    Cc: konrad.wilk@oracle.com
    Cc: roger.pau@citrix.com
    Cc: wei.liu2@citrix.com
    Cc: paul.durrant@citrix.com
    Cc: netdev@vger.kernel.org

    Signed-off-by: Juergen Gross
    Reviewed-by: Paul Durrant
    Reviewed-by: Wei Liu
    Reviewed-by: Roger Pau Monné
    Reviewed-by: Boris Ostrovsky
    Signed-off-by: Boris Ostrovsky

    Juergen Gross
     
  • The xenbus driver has an awful mixture of internally and globally
    visible headers: some of the internally used only stuff is defined in
    the global header include/xen/xenbus.h while some stuff defined in
    internal headers is used by other drivers, too.

    Clean this up by moving the externally used symbols to
    include/xen/xenbus.h and the symbols used internally only to a new
    header drivers/xen/xenbus/xenbus.h replacing xenbus_comms.h and
    xenbus_probe.h

    Signed-off-by: Juergen Gross
    Reviewed-by: Boris Ostrovsky
    Signed-off-by: Boris Ostrovsky

    Juergen Gross
     

08 Feb, 2017

1 commit


24 Dec, 2016

3 commits

  • In drivers/xen/xenbus/xenbus_comms.h there is a stale declaration of
    xs_input_avail(). Remove it.

    Signed-off-by: Juergen Gross
    Reviewed-by: Boris Ostrovsky
    Signed-off-by: Juergen Gross

    Juergen Gross
     
  • When the xenbus driver does some special handling for a Xenstore
    command any error condition related to the command should be returned
    via an error response instead of letting the related write operation
    fail. Otherwise the user land handler might take wrong decisions
    assuming the connection to Xenstore is broken.

    While at it try to return the same error values xenstored would
    return for those cases.

    Signed-off-by: Juergen Gross
    Reviewed-by: Boris Ostrovsky
    Signed-off-by: Juergen Gross

    Juergen Gross
     
  • When accessing Xenstore in a transaction the user is specifying a
    transaction id which he normally obtained from Xenstore when starting
    the transaction. Xenstore is validating a transaction id against all
    known transaction ids of the connection the request came in. As all
    requests of a domain not being the one where Xenstore lives share
    one connection, validation of transaction ids of different users of
    Xenstore in that domain should be done by the kernel of that domain
    being the multiplexer between the Xenstore users in that domain and
    Xenstore.

    In order to prohibit one Xenstore user "hijacking" a transaction from
    another user the xenbus driver has to verify a given transaction id
    against all known transaction ids of the user before forwarding it to
    Xenstore.

    Signed-off-by: Juergen Gross
    Reviewed-by: Boris Ostrovsky
    Signed-off-by: Juergen Gross

    Juergen Gross
     

12 Dec, 2016

1 commit

  • /proc/xen/xenbus does not work correctly. A read blocked waiting for
    a xenstore message holds the mutex needed for atomic file position
    updates. This blocks any writes on the same file handle, which can
    deadlock if the write is needed to unblock the read.

    Clear FMODE_ATOMIC_POS when opening this device to always get
    character device like sematics.

    Signed-off-by: David Vrabel
    Reviewed-by: Juergen Gross
    Signed-off-by: Juergen Gross

    David Vrabel
     

08 Dec, 2016

1 commit

  • Variable err is initialized with 0. As a result, the return value may
    be 0 even if get_zeroed_page() fails to allocate memory. This patch fixes
    the bug, initializing err with "-ENOMEM".

    Signed-off-by: Pan Bian
    Reviewed-by: Juergen Gross
    Signed-off-by: Juergen Gross

    Pan Bian
     

17 Nov, 2016

1 commit

  • Mounting proc in user namespace containers fails if the xenbus
    filesystem is mounted on /proc/xen because this directory fails
    the "permanently empty" test. proc_create_mount_point() exists
    specifically to create such mountpoints in proc but is currently
    proc-internal. Export this interface to modules, then use it in
    xenbus when creating /proc/xen.

    Signed-off-by: Seth Forshee
    Signed-off-by: David Vrabel
    Signed-off-by: Juergen Gross

    Seth Forshee
     

07 Nov, 2016

2 commits

  • Use xenbus_read_unsigned() instead of xenbus_scanf() when possible.
    This requires to change the type of the reads from int to unsigned,
    but these cases have been wrong before: negative values are not allowed
    for the modified cases.

    Signed-off-by: Juergen Gross
    Acked-by: David Vrabel

    Juergen Gross
     
  • There are multiple instances of code reading an optional unsigned
    parameter from Xenstore via xenbus_scanf(). Instead of repeating the
    same code over and over add a service function doing the job.

    Signed-off-by: Juergen Gross
    Reviewed-by: David Vrabel

    Juergen Gross
     

25 Oct, 2016

1 commit


24 Oct, 2016

2 commits