29 Apr, 2020

1 commit

  • This provides a virtio transport driver over the Inter-VM shared memory
    device as found in QEMU and the Jailhouse hypervisor.

    ...

    Note: Specification work for both ivshmem and the virtio transport is
    ongoing, so details may still change.

    Acked-by: Ye Li
    Signed-off-by: Jan Kiszka

    Jan Kiszka
     

24 Feb, 2020

1 commit

  • [ Upstream commit 6e9826e77249355c09db6ba41cd3f84e89f4b614 ]

    Make sure, at build time, that pfn array is big enough to hold a single
    page. It happens to be true since the PAGE_SHIFT value at the moment is
    20, which is 1M - exactly 256 4K balloon pages.

    Signed-off-by: Michael S. Tsirkin
    Reviewed-by: David Hildenbrand
    Signed-off-by: Sasha Levin

    Michael S. Tsirkin
     

11 Feb, 2020

4 commits

  • commit 1ad6f58ea9364b0a5d8ae06249653ac9304a8578 upstream.

    We forget to put the inode and unmount the kernfs used for compaction.

    Fixes: 71994620bb25 ("virtio_balloon: replace oom notifier with shrinker")
    Cc: "Michael S. Tsirkin"
    Cc: Jason Wang
    Cc: Wei Wang
    Cc: Liang Li
    Signed-off-by: David Hildenbrand
    Link: https://lore.kernel.org/r/20200205163402.42627-3-david@redhat.com
    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: Greg Kroah-Hartman

    David Hildenbrand
     
  • commit 6c22dc61c76b7e7d355f1697ba0ecf26d1334ba6 upstream.

    When unloading the driver while hinting is in progress, we will not
    release the free page blocks back to MM, resulting in a memory leak.

    Fixes: 86a559787e6f ("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT")
    Cc: "Michael S. Tsirkin"
    Cc: Jason Wang
    Cc: Wei Wang
    Cc: Liang Li
    Signed-off-by: David Hildenbrand
    Link: https://lore.kernel.org/r/20200205163402.42627-2-david@redhat.com
    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: Greg Kroah-Hartman

    David Hildenbrand
     
  • commit 303090b513fd1ee45aa1536b71a3838dc054bc05 upstream.

    VQs without a name specified are not valid; they are skipped in the
    later loop that assigns MSI-X vectors to queues, but the per_vq_vectors
    loop above that counts the required number of vectors previously still
    counted any queue with a non-NULL callback as needing a vector.

    Add a check to the per_vq_vectors loop so that vectors with no name are
    not counted to make the two loops consistent. This prevents
    over-counting unnecessary vectors (e.g. for features which were not
    negotiated with the device).

    Cc: stable@vger.kernel.org
    Fixes: 86a559787e6f ("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT")
    Reviewed-by: Cornelia Huck
    Signed-off-by: Daniel Verkamp
    Signed-off-by: Michael S. Tsirkin
    Reviewed-by: Wang, Wei W
    Signed-off-by: Greg Kroah-Hartman

    Daniel Verkamp
     
  • commit 5790b53390e18fdd21e70776e46d058c05eda2f2 upstream.

    Ensure that elements of the callbacks array that correspond to
    unavailable features are set to NULL; previously, they would be left
    uninitialized.

    Since the corresponding names array elements were explicitly set to
    NULL, the uninitialized callback pointers would not actually be
    dereferenced; however, the uninitialized callbacks elements would still
    be read in vp_find_vqs_msix() and used to calculate the number of MSI-X
    vectors required.

    Cc: stable@vger.kernel.org
    Fixes: 86a559787e6f ("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT")
    Reviewed-by: Cornelia Huck
    Signed-off-by: Daniel Verkamp
    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: Greg Kroah-Hartman

    Daniel Verkamp
     

18 Dec, 2019

1 commit

  • commit 63341ab03706e11a31e3dd8ccc0fbc9beaf723f0 upstream.

    In case we have to migrate a ballon page to a newpage of another zone, the
    managed page count of both zones is wrong. Paired with memory offlining
    (which will adjust the managed page count), we can trigger kernel crashes
    and all kinds of different symptoms.

    One way to reproduce:
    1. Start a QEMU guest with 4GB, no NUMA
    2. Hotplug a 1GB DIMM and online the memory to ZONE_NORMAL
    3. Inflate the balloon to 1GB
    4. Unplug the DIMM (be quick, otherwise unmovable data ends up on it)
    5. Observe /proc/zoneinfo
    Node 0, zone Normal
    pages free 16810
    min 24848885473806
    low 18471592959183339
    high 36918337032892872
    spanned 262144
    present 262144
    managed 18446744073709533486
    6. Do anything that requires some memory (e.g., inflate the balloon some
    more). The OOM goes crazy and the system crashes
    [ 238.324946] Out of memory: Killed process 537 (login) total-vm:27584kB, anon-rss:860kB, file-rss:0kB, shmem-rss:00
    [ 238.338585] systemd invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
    [ 238.339420] CPU: 0 PID: 1 Comm: systemd Tainted: G D W 5.4.0-next-20191204+ #75
    [ 238.340139] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu4
    [ 238.341121] Call Trace:
    [ 238.341337] dump_stack+0x8f/0xd0
    [ 238.341630] dump_header+0x61/0x5ea
    [ 238.341942] oom_kill_process.cold+0xb/0x10
    [ 238.342299] out_of_memory+0x24d/0x5a0
    [ 238.342625] __alloc_pages_slowpath+0xd12/0x1020
    [ 238.343024] __alloc_pages_nodemask+0x391/0x410
    [ 238.343407] pagecache_get_page+0xc3/0x3a0
    [ 238.343757] filemap_fault+0x804/0xc30
    [ 238.344083] ? ext4_filemap_fault+0x28/0x42
    [ 238.344444] ext4_filemap_fault+0x30/0x42
    [ 238.344789] __do_fault+0x37/0x1a0
    [ 238.345087] __handle_mm_fault+0x104d/0x1ab0
    [ 238.345450] handle_mm_fault+0x169/0x360
    [ 238.345790] do_user_addr_fault+0x20d/0x490
    [ 238.346154] do_page_fault+0x31/0x210
    [ 238.346468] async_page_fault+0x43/0x50
    [ 238.346797] RIP: 0033:0x7f47eba4197e
    [ 238.347110] Code: Bad RIP value.
    [ 238.347387] RSP: 002b:00007ffd7c0c1890 EFLAGS: 00010293
    [ 238.347834] RAX: 0000000000000002 RBX: 000055d196a20a20 RCX: 00007f47eba4197e
    [ 238.348437] RDX: 0000000000000033 RSI: 00007ffd7c0c18c0 RDI: 0000000000000004
    [ 238.349047] RBP: 00007ffd7c0c1c20 R08: 0000000000000000 R09: 0000000000000033
    [ 238.349660] R10: 00000000ffffffff R11: 0000000000000293 R12: 0000000000000001
    [ 238.350261] R13: ffffffffffffffff R14: 0000000000000000 R15: 00007ffd7c0c18c0
    [ 238.350878] Mem-Info:
    [ 238.351085] active_anon:3121 inactive_anon:51 isolated_anon:0
    [ 238.351085] active_file:12 inactive_file:7 isolated_file:0
    [ 238.351085] unevictable:0 dirty:0 writeback:0 unstable:0
    [ 238.351085] slab_reclaimable:5565 slab_unreclaimable:10170
    [ 238.351085] mapped:3 shmem:111 pagetables:155 bounce:0
    [ 238.351085] free:720717 free_pcp:2 free_cma:0
    [ 238.353757] Node 0 active_anon:12484kB inactive_anon:204kB active_file:48kB inactive_file:28kB unevictable:0kB iss
    [ 238.355979] Node 0 DMA free:11556kB min:36kB low:48kB high:60kB reserved_highatomic:0KB active_anon:152kB inactivB
    [ 238.358345] lowmem_reserve[]: 0 2955 2884 2884 2884
    [ 238.358761] Node 0 DMA32 free:2677864kB min:7004kB low:10028kB high:13052kB reserved_highatomic:0KB active_anon:0B
    [ 238.361202] lowmem_reserve[]: 0 0 72057594037927865 72057594037927865 72057594037927865
    [ 238.361888] Node 0 Normal free:193448kB min:99395541895224kB low:73886371836733356kB high:147673348131571488kB reB
    [ 238.364765] lowmem_reserve[]: 0 0 0 0 0
    [ 238.365101] Node 0 DMA: 7*4kB (U) 5*8kB (UE) 6*16kB (UME) 2*32kB (UM) 1*64kB (U) 2*128kB (UE) 3*256kB (UME) 2*512B
    [ 238.366379] Node 0 DMA32: 0*4kB 1*8kB (U) 2*16kB (UM) 2*32kB (UM) 2*64kB (UM) 1*128kB (U) 1*256kB (U) 1*512kB (U)B
    [ 238.367654] Node 0 Normal: 1985*4kB (UME) 1321*8kB (UME) 844*16kB (UME) 524*32kB (UME) 300*64kB (UME) 138*128kB (B
    [ 238.369184] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
    [ 238.369915] 130 total pagecache pages
    [ 238.370241] 0 pages in swap cache
    [ 238.370533] Swap cache stats: add 0, delete 0, find 0/0
    [ 238.370981] Free swap = 0kB
    [ 238.371239] Total swap = 0kB
    [ 238.371488] 1048445 pages RAM
    [ 238.371756] 0 pages HighMem/MovableOnly
    [ 238.372090] 306992 pages reserved
    [ 238.372376] 0 pages cma reserved
    [ 238.372661] 0 pages hwpoisoned

    In another instance (older kernel), I was able to observe this
    (negative page count :/):
    [ 180.896971] Offlined Pages 32768
    [ 182.667462] Offlined Pages 32768
    [ 184.408117] Offlined Pages 32768
    [ 186.026321] Offlined Pages 32768
    [ 187.684861] Offlined Pages 32768
    [ 189.227013] Offlined Pages 32768
    [ 190.830303] Offlined Pages 32768
    [ 190.833071] Built 1 zonelists, mobility grouping on. Total pages: -36920272750453009

    In another instance (older kernel), I was no longer able to start any
    process:
    [root@vm ~]# [ 214.348068] Offlined Pages 32768
    [ 215.973009] Offlined Pages 32768
    cat /proc/meminfo
    -bash: fork: Cannot allocate memory
    [root@vm ~]# cat /proc/meminfo
    -bash: fork: Cannot allocate memory

    Fix it by properly adjusting the managed page count when migrating if
    the zone changed. The managed page count of the zones now looks after
    unplug of the DIMM (and after deflating the balloon) just like before
    inflating the balloon (and plugging+onlining the DIMM).

    We'll temporarily modify the totalram page count. If this ever becomes a
    problem, we can fine tune by providing helpers that don't touch
    the totalram pages (e.g., adjust_zone_managed_page_count()).

    Please note that fixing up the managed page count is only necessary when
    we adjusted the managed page count when inflating - only if we
    don't have VIRTIO_BALLOON_F_DEFLATE_ON_OOM. With that feature, the
    managed page count is not touched when inflating/deflating.

    Reported-by: Yumei Huang
    Fixes: 3dcc0571cd64 ("mm: correctly update zone->managed_pages")
    Cc: # v3.11+
    Cc: "Michael S. Tsirkin"
    Cc: Jason Wang
    Cc: Jiang Liu
    Cc: Andrew Morton
    Cc: Igor Mammedov
    Cc: virtualization@lists.linux-foundation.org
    Signed-off-by: David Hildenbrand
    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: Greg Kroah-Hartman

    David Hildenbrand
     

20 Nov, 2019

2 commits

  • Instead of multiplying by page order, virtio balloon divided by page
    order. The result is that it can return 0 if there are a bit less
    than MAX_ORDER - 1 pages in use, and then shrinker scan won't be called.

    Cc: stable@vger.kernel.org
    Fixes: 71994620bb25 ("virtio_balloon: replace oom notifier with shrinker")
    Signed-off-by: Wei Wang
    Signed-off-by: Michael S. Tsirkin
    Reviewed-by: David Hildenbrand

    Wei Wang
     
  • virtio_balloon_shrinker_scan should return number of system pages freed,
    but because it's calling functions that deal with balloon pages, it gets
    confused and sometimes returns the number of balloon pages.

    It does not matter practically as the exact number isn't
    used, but it seems better to be consistent in case someone
    starts using this API.

    Further, if we ever tried to iteratively leak pages as
    virtio_balloon_shrinker_scan tries to do, we'd run into issues - this is
    because freed_pages was accumulating total freed pages, but was also
    subtracted on each iteration from pages_to_free, which can result in
    either leaking less memory than we were supposed to free, or more if
    pages_to_free underruns.

    On a system with 4K pages we are lucky that we are never asked to leak
    more than 128 pages while we can leak up to 256 at a time,
    but it looks like a real issue for systems with page size != 4K.

    Fixes: 71994620bb25 ("virtio_balloon: replace oom notifier with shrinker")
    Reported-by: Khazhismel Kumykov
    Reviewed-by: Wei Wang
    Signed-off-by: Michael S. Tsirkin

    Michael S. Tsirkin
     

19 Nov, 2019

1 commit

  • Commit 780bc7903a32 ("virtio_ring: Support DMA APIs") makes
    virtqueue_add() return -EIO when we fail to map our I/O buffers. This is
    a very realistic scenario for guests with encrypted memory, as swiotlb
    may run out of space, depending on it's size and the I/O load.

    The virtio-blk driver interprets -EIO form virtqueue_add() as an IO
    error, despite the fact that swiotlb full is in absence of bugs a
    recoverable condition.

    Let us change the return code to -ENOMEM, and make the block layer
    recover form these failures when virtio-blk encounters the condition
    described above.

    Cc: stable@vger.kernel.org
    Fixes: 780bc7903a32 ("virtio_ring: Support DMA APIs")
    Signed-off-by: Halil Pasic
    Tested-by: Michael Mueller
    Signed-off-by: Michael S. Tsirkin

    Halil Pasic
     

28 Oct, 2019

1 commit

  • When VIRTIO_F_RING_EVENT_IDX is negotiated, virtio devices can
    use virtqueue_enable_cb_delayed_packed to reduce the number of device
    interrupts. At the moment, this is the case for virtio-net when the
    napi_tx module parameter is set to false.

    In this case, the virtio driver selects an event offset and expects that
    the device will send a notification when rolling over the event offset
    in the ring. However, if this roll-over happens before the event
    suppression structure update, the notification won't be sent. To address
    this race condition the driver needs to check wether the device rolled
    over the offset after updating the event suppression structure.

    With VIRTIO_F_RING_PACKED, the virtio driver did this by reading the
    flags field of the descriptor at the specified offset.

    Unfortunately, checking at the event offset isn't reliable: if
    descriptors are chained (e.g. when INDIRECT is off) not all descriptors
    are overwritten by the device, so it's possible that the device skipped
    the specific descriptor driver is checking when writing out used
    descriptors. If this happens, the driver won't detect the race condition
    and will incorrectly expect the device to send a notification.

    For virtio-net, the result will be a TX queue stall, with the
    transmission getting blocked forever.

    With the packed ring, it isn't easy to find a location which is
    guaranteed to change upon the roll-over, except the next device
    descriptor, as described in the spec:

    Writes of device and driver descriptors can generally be
    reordered, but each side (driver and device) are only required to
    poll (or test) a single location in memory: the next device descriptor after
    the one they processed previously, in circular order.

    while this might be sub-optimal, let's do exactly this for now.

    Cc: stable@vger.kernel.org
    Cc: Jason Wang
    Fixes: f51f982682e2a ("virtio_ring: leverage event idx in packed ring")
    Signed-off-by: Marvin Liu
    Signed-off-by: Michael S. Tsirkin

    Marvin Liu
     

09 Sep, 2019

1 commit

  • The function virtqueue_add_split() DMA-maps the scatterlist buffers. In
    case a mapping error occurs the already mapped buffers must be unmapped.
    This happens by jumping to the 'unmap_release' label.

    In case of indirect descriptors the release is wrong and may leak kernel
    memory. Because the implementation assumes that the head descriptor is
    already mapped it starts iterating over the descriptor list starting
    from the head descriptor. However for indirect descriptors the head
    descriptor is never mapped in case of an error.

    The fix is to initialize the start index with zero in case of indirect
    descriptors and use the 'desc' pointer directly for iterating over the
    descriptor chain.

    Signed-off-by: Matthias Lange
    Signed-off-by: Michael S. Tsirkin

    Matthias Lange
     

20 Jul, 2019

1 commit

  • Pull vfs mount updates from Al Viro:
    "The first part of mount updates.

    Convert filesystems to use the new mount API"

    * 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
    mnt_init(): call shmem_init() unconditionally
    constify ksys_mount() string arguments
    don't bother with registering rootfs
    init_rootfs(): don't bother with init_ramfs_fs()
    vfs: Convert smackfs to use the new mount API
    vfs: Convert selinuxfs to use the new mount API
    vfs: Convert securityfs to use the new mount API
    vfs: Convert apparmorfs to use the new mount API
    vfs: Convert openpromfs to use the new mount API
    vfs: Convert xenfs to use the new mount API
    vfs: Convert gadgetfs to use the new mount API
    vfs: Convert oprofilefs to use the new mount API
    vfs: Convert ibmasmfs to use the new mount API
    vfs: Convert qib_fs/ipathfs to use the new mount API
    vfs: Convert efivarfs to use the new mount API
    vfs: Convert configfs to use the new mount API
    vfs: Convert binfmt_misc to use the new mount API
    convenience helper: get_tree_single()
    convenience helper get_tree_nodev()
    vfs: Kill sget_userns()
    ...

    Linus Torvalds
     

19 Jul, 2019

1 commit

  • Pull libnvdimm updates from Dan Williams:
    "Primarily just the virtio_pmem driver:

    - virtio_pmem

    The new virtio_pmem facility introduces a paravirtualized
    persistent memory device that allows a guest VM to use DAX
    mechanisms to access a host-file with host-page-cache. It arranges
    for MAP_SYNC to be disabled and instead triggers a host fsync()
    when a 'write-cache flush' command is sent to the virtual disk
    device.

    - Miscellaneous small fixups"

    * tag 'libnvdimm-for-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
    virtio_pmem: fix sparse warning
    xfs: disable map_sync for async flush
    ext4: disable map_sync for async flush
    dax: check synchronous mapping is supported
    dm: enable synchronous dax
    libnvdimm: add dax_dev sync flag
    virtio-pmem: Add virtio pmem driver
    libnvdimm: nd_region flush callback support
    libnvdimm, namespace: Drop uuid_t implementation detail

    Linus Torvalds
     

12 Jul, 2019

1 commit


06 Jul, 2019

1 commit

  • This patch adds virtio-pmem driver for KVM guest.

    Guest reads the persistent memory range information from
    Qemu over VIRTIO and registers it on nvdimm_bus. It also
    creates a nd_region object with the persistent memory
    range information so that existing 'nvdimm/pmem' driver
    can reserve this into system memory map. This way
    'virtio-pmem' driver uses existing functionality of pmem
    driver to register persistent memory compatible for DAX
    capable filesystems.

    This also provides function to perform guest flush over
    VIRTIO from 'pmem' driver when userspace performs flush
    on DAX memory range.

    Signed-off-by: Pankaj Gupta
    Reviewed-by: Yuval Shaia
    Acked-by: Michael S. Tsirkin
    Acked-by: Jakub Staron
    Tested-by: Jakub Staron
    Reviewed-by: Cornelia Huck
    Signed-off-by: Dan Williams

    Pankaj Gupta
     

27 May, 2019

1 commit


26 May, 2019

2 commits

  • Convert the virtio_balloon filesystem to the new internal mount API as the old
    one will be obsoleted and removed. This allows greater flexibility in
    communication of mount parameters between userspace, the VFS and the
    filesystem.

    See Documentation/filesystems/mount_api.txt for more information.

    Signed-off-by: David Howells
    cc: "Michael S. Tsirkin"
    cc: Jason Wang
    cc: virtualization@lists.linux-foundation.org
    Signed-off-by: Al Viro

    David Howells
     
  • Once upon a time we used to set ->d_name of e.g. pipefs root
    so that d_path() on pipes would work. These days it's
    completely pointless - dentries of pipes are not even connected
    to pipefs root. However, mount_pseudo() had set the root
    dentry name (passed as the second argument) and callers
    kept inventing names to pass to it. Including those that
    didn't *have* any non-root dentries to start with...

    All of that had been pointless for about 8 years now; it's
    time to get rid of that cargo-culting...

    Signed-off-by: Al Viro

    Al Viro
     

24 May, 2019

2 commits

  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license as published by
    the free software foundation either version 2 of the license or at
    your option any later version this program is distributed in the
    hope that it will be useful but without any warranty without even
    the implied warranty of merchantability or fitness for a particular
    purpose see the gnu general public license for more details you
    should have received a copy of the gnu general public license along
    with this program if not write to the free software foundation inc
    51 franklin st fifth floor boston ma 02110 1301 usa

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-or-later

    has been chosen to replace the boilerplate/reference in 50 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Kate Stewart
    Reviewed-by: Allison Randal
    Reviewed-by: Richard Fontana
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190523091649.499889647@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     
  • Based on 1 normalized pattern(s):

    this work is licensed under the terms of the gnu gpl version 2 or
    later see the copying file in the top level directory

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-or-later

    has been chosen to replace the boilerplate/reference in 6 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Allison Randal
    Reviewed-by: Richard Fontana
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190520075210.858783702@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

21 May, 2019

2 commits


20 May, 2019

1 commit


13 May, 2019

2 commits


09 Apr, 2019

1 commit

  • vring_create_virtqueue() allows the caller to specify via the
    may_reduce_num parameter whether the vring code is allowed to
    allocate a smaller ring than specified.

    However, the split ring allocation code tries to allocate a
    smaller ring on allocation failure regardless of what the
    caller specified. This may cause trouble for e.g. virtio-pci
    in legacy mode, which does not support ring resizing. (The
    packed ring code does not resize in any case.)

    Let's fix this by bailing out immediately in the split ring code
    if the requested size cannot be allocated and may_reduce_num has
    not been specified.

    While at it, fix a typo in the usage instructions.

    Fixes: 2a2d1382fe9d ("virtio: Add improved queue allocation API")
    Cc: stable@vger.kernel.org # v4.6+
    Signed-off-by: Cornelia Huck
    Signed-off-by: Michael S. Tsirkin
    Reviewed-by: Halil Pasic
    Reviewed-by: Jens Freimann

    Cornelia Huck
     

08 Apr, 2019

1 commit

  • If the msix_affinity_masks is alloced failed, then we'll
    try to free some resources in vp_free_vectors() that may
    access it directly.

    We met the following stack in our production:
    [ 29.296767] BUG: unable to handle kernel NULL pointer dereference at (null)
    [ 29.311151] IP: [] vp_free_vectors+0x6a/0x150 [virtio_pci]
    [ 29.324787] PGD 0
    [ 29.333224] Oops: 0000 [#1] SMP
    [...]
    [ 29.425175] RIP: 0010:[] [] vp_free_vectors+0x6a/0x150 [virtio_pci]
    [ 29.441405] RSP: 0018:ffff9a55c2dcfa10 EFLAGS: 00010206
    [ 29.453491] RAX: 0000000000000000 RBX: ffff9a55c322c400 RCX: 0000000000000000
    [ 29.467488] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9a55c322c400
    [ 29.481461] RBP: ffff9a55c2dcfa20 R08: 0000000000000000 R09: ffffc1b6806ff020
    [ 29.495427] R10: 0000000000000e95 R11: 0000000000aaaaaa R12: 0000000000000000
    [ 29.509414] R13: 0000000000010000 R14: ffff9a55bd2d9e98 R15: ffff9a55c322c400
    [ 29.523407] FS: 00007fdcba69f8c0(0000) GS:ffff9a55c2840000(0000) knlGS:0000000000000000
    [ 29.538472] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 29.551621] CR2: 0000000000000000 CR3: 000000003ce52000 CR4: 00000000003607a0
    [ 29.565886] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [ 29.580055] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    [ 29.594122] Call Trace:
    [ 29.603446] [] vp_request_msix_vectors+0xe2/0x260 [virtio_pci]
    [ 29.618017] [] vp_try_to_find_vqs+0x95/0x3b0 [virtio_pci]
    [ 29.632152] [] vp_find_vqs+0x37/0xb0 [virtio_pci]
    [ 29.645582] [] init_vq+0x153/0x260 [virtio_blk]
    [ 29.658831] [] virtblk_probe+0xe8/0x87f [virtio_blk]
    [...]

    Cc: Gonglei
    Signed-off-by: Longpeng
    Signed-off-by: Michael S. Tsirkin
    Reviewed-by: Gonglei

    Longpeng
     

07 Mar, 2019

4 commits

  • A virtio transport is free to implement some of the callbacks in
    virtio_config_ops in a matter that they cannot be called from
    atomic context (e.g. virtio-ccw, which maps a lot of the callbacks
    to channel I/O, which is an inherently asynchronous mechanism).
    This can be very surprising for developers using the much more
    common virtio-pci transport, just to find out that things break
    when used on s390.

    The documentation for virtio_config_ops now contains a comment
    explaining this, but it makes sense to add a might_sleep() annotation
    to various wrapper functions in the virtio core to avoid surprises
    later.

    Note that annotations are NOT added to two classes of calls:
    - direct calls from device drivers (all current callers should be
    fine, however)
    - calls which clearly won't be made from atomic context (such as
    those ultimately coming in via the driver core)

    Signed-off-by: Cornelia Huck
    Signed-off-by: Michael S. Tsirkin

    Cornelia Huck
     
  • We've changed to kzalloc the vb struct, so no need to 0-initialize
    this field one more time.

    Signed-off-by: Wei Wang
    Signed-off-by: Michael S. Tsirkin
    Reviewed-by: Cornelia Huck

    Wei Wang
     
  • There is no need to update the balloon actual register when there is no
    ballooning request. This patch avoids update_balloon_size when diff is 0.

    Signed-off-by: Wei Wang
    Reviewed-by: Cornelia Huck
    Reviewed-by: Halil Pasic
    Signed-off-by: Michael S. Tsirkin

    Wei Wang
     
  • This function returns the maximum segment size for a single
    dma transaction of a virtio device. The possible limit comes
    from the SWIOTLB implementation in the Linux kernel, that
    has an upper limit of (currently) 256kb of contiguous
    memory it can map. Other DMA-API implementations might also
    have limits.

    Use the new dma_max_mapping_size() function to determine the
    maximum mapping size when DMA-API is in use for virtio.

    Cc: stable@vger.kernel.org
    Reviewed-by: Konrad Rzeszutek Wilk
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Joerg Roedel
    Signed-off-by: Michael S. Tsirkin

    Joerg Roedel
     

06 Feb, 2019

1 commit


24 Jan, 2019

1 commit

  • This patch introduces the support for VIRTIO_F_ORDER_PLATFORM.
    If this feature is negotiated, the driver must use the barriers
    suitable for hardware devices. Otherwise, the device and driver
    are assumed to be implemented in software, that is they can be
    assumed to run on identical CPUs in an SMP configuration. Thus
    a weaker form of memory barriers is sufficient to yield better
    performance.

    It is recommended that an add-in card based PCI device offers
    this feature for portability. The device will fail to operate
    further or will operate in a slower emulation mode if this
    feature is offered but not accepted.

    Signed-off-by: Tiwei Bie
    Signed-off-by: Michael S. Tsirkin

    Tiwei Bie
     

15 Jan, 2019

3 commits

  • virtio-ccw has deadlock issues with reading the config space inside the
    interrupt context, so we tweak the virtballoon_changed implementation
    by moving the config read operations into the related workqueue contexts.
    The config_read_bitmap is used as a flag to the workqueue callbacks
    about the related config fields that need to be read.

    The cmd_id_received is also renamed to cmd_id_received_cache, and
    the value should be obtained via virtio_balloon_cmd_id_received.

    Reported-by: Christian Borntraeger
    Signed-off-by: Wei Wang
    Reviewed-by: Cornelia Huck
    Reviewed-by: Halil Pasic
    Signed-off-by: Michael S. Tsirkin
    Cc: stable@vger.kernel.org
    Fixes: 86a559787e6f ("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT")
    Tested-by: Christian Borntraeger

    Wei Wang
     
  • Some vqs may not need to be allocated when their related feature bits
    are disabled. So callers may pass in such vqs with "names = NULL".
    Then we skip such vq allocations.

    Signed-off-by: Wei Wang
    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: Wei Wang
    Signed-off-by: Wei Wang
    Reviewed-by: Cornelia Huck
    Cc: stable@vger.kernel.org
    Fixes: 86a559787e6f ("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT")

    Wei Wang
     
  • When find_vqs, there will be no vq[i] allocation if its corresponding
    names[i] is NULL. For example, the caller may pass in names[i] (i=4)
    with names[2] being NULL because the related feature bit is turned off,
    so technically there are 3 queues on the device, and name[4] should
    correspond to the 3rd queue on the device.

    So we use queue_idx as the queue index, which is increased only when the
    queue exists.

    Signed-off-by: Wei Wang
    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: Wei Wang
    Signed-off-by: Wei Wang

    Wei Wang
     

03 Jan, 2019

1 commit

  • Pull virtio/vhost updates from Michael Tsirkin:
    "Features, fixes, cleanups:

    - discard in virtio blk

    - misc fixes and cleanups"

    * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
    vhost: correct the related warning message
    vhost: split structs into a separate header file
    virtio: remove deprecated VIRTIO_PCI_CONFIG()
    vhost/vsock: switch to a mutex for vhost_vsock_hash
    virtio_blk: add discard and write zeroes support

    Linus Torvalds
     

20 Dec, 2018

1 commit


27 Nov, 2018

1 commit