08 Mar, 2020

2 commits

  • Clang warns when CONFIG_BALLOON_COMPACTION is unset:

    ../drivers/virtio/virtio_balloon.c:963:1: warning: unused label
    'out_del_vqs' [-Wunused-label]
    out_del_vqs:
    ^~~~~~~~~~~~
    1 warning generated.

    Move the label within the preprocessor block since it is only used when
    CONFIG_BALLOON_COMPACTION is set.

    Fixes: 1ad6f58ea936 ("virtio_balloon: Fix memory leaks on errors in virtballoon_probe()")
    Link: https://github.com/ClangBuiltLinux/linux/issues/886
    Signed-off-by: Nathan Chancellor
    Link: https://lore.kernel.org/r/20200216004039.23464-1-natechancellor@gmail.com
    Signed-off-by: Michael S. Tsirkin
    Reviewed-by: David Hildenbrand

    Nathan Chancellor
     
  • The functions vring_new_virtqueue() and __vring_new_virtqueue() are used
    with split rings, and any allocations within these functions are managed
    outside of the .we_own_ring flag. The commit cbeedb72b97a ("virtio_ring:
    allocate desc state for split ring separately") allocates the desc state
    within the __vring_new_virtqueue() but frees it only when the .we_own_ring
    flag is set. This leads to a memory leak when freeing such allocated
    virtqueues with the vring_del_virtqueue() function.

    Fix this by moving the desc_state free code outside the flag and only
    for split rings. Issue was discovered during testing with remoteproc
    and virtio_rpmsg.

    Fixes: cbeedb72b97a ("virtio_ring: allocate desc state for split ring separately")
    Signed-off-by: Suman Anna
    Link: https://lore.kernel.org/r/20200224212643.30672-1-s-anna@ti.com
    Signed-off-by: Michael S. Tsirkin
    Acked-by: Jason Wang

    Suman Anna
     

06 Feb, 2020

6 commits

  • We forget to put the inode and unmount the kernfs used for compaction.

    Fixes: 71994620bb25 ("virtio_balloon: replace oom notifier with shrinker")
    Cc: "Michael S. Tsirkin"
    Cc: Jason Wang
    Cc: Wei Wang
    Cc: Liang Li
    Signed-off-by: David Hildenbrand
    Link: https://lore.kernel.org/r/20200205163402.42627-3-david@redhat.com
    Signed-off-by: Michael S. Tsirkin

    David Hildenbrand
     
  • When unloading the driver while hinting is in progress, we will not
    release the free page blocks back to MM, resulting in a memory leak.

    Fixes: 86a559787e6f ("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT")
    Cc: "Michael S. Tsirkin"
    Cc: Jason Wang
    Cc: Wei Wang
    Cc: Liang Li
    Signed-off-by: David Hildenbrand
    Link: https://lore.kernel.org/r/20200205163402.42627-2-david@redhat.com
    Signed-off-by: Michael S. Tsirkin

    David Hildenbrand
     
  • Make sure, at build time, that pfn array is big enough to hold a single
    page. It happens to be true since the PAGE_SHIFT value at the moment is
    20, which is 1M - exactly 256 4K balloon pages.

    Signed-off-by: Michael S. Tsirkin
    Reviewed-by: David Hildenbrand

    Michael S. Tsirkin
     
  • VQs without a name specified are not valid; they are skipped in the
    later loop that assigns MSI-X vectors to queues, but the per_vq_vectors
    loop above that counts the required number of vectors previously still
    counted any queue with a non-NULL callback as needing a vector.

    Add a check to the per_vq_vectors loop so that vectors with no name are
    not counted to make the two loops consistent. This prevents
    over-counting unnecessary vectors (e.g. for features which were not
    negotiated with the device).

    Cc: stable@vger.kernel.org
    Fixes: 86a559787e6f ("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT")
    Reviewed-by: Cornelia Huck
    Signed-off-by: Daniel Verkamp
    Signed-off-by: Michael S. Tsirkin
    Reviewed-by: Wang, Wei W

    Daniel Verkamp
     
  • Ensure that elements of the callbacks array that correspond to
    unavailable features are set to NULL; previously, they would be left
    uninitialized.

    Since the corresponding names array elements were explicitly set to
    NULL, the uninitialized callback pointers would not actually be
    dereferenced; however, the uninitialized callbacks elements would still
    be read in vp_find_vqs_msix() and used to calculate the number of MSI-X
    vectors required.

    Cc: stable@vger.kernel.org
    Fixes: 86a559787e6f ("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT")
    Reviewed-by: Cornelia Huck
    Signed-off-by: Daniel Verkamp
    Signed-off-by: Michael S. Tsirkin

    Daniel Verkamp
     
  • Use devm_platform_ioremap_resource() to simplify code, which
    contains platform_get_resource, devm_request_mem_region and
    devm_ioremap.

    Signed-off-by: Yangtao Li
    Signed-off-by: Michael S. Tsirkin

    Yangtao Li
     

11 Dec, 2019

3 commits

  • We managed to get confused about the shift direction at least once.
    Let's switch to division/multiplcation instead. Add a number of pages
    macro for this purpose. We still keep the order macro around too since
    this is what alloc/free pages want.

    Signed-off-by: Michael S. Tsirkin
    Reviewed-by: Wei Wang
    Reviewed-by: David Hildenbrand

    Michael S. Tsirkin
     
  • free_page_order is a confusing name. It's not a page order
    actually, it's the order of the block of memory we are hinting.
    Rename to hint_block_order. Also, rename SIZE to BYTES
    to make it clear it's the block size in bytes.

    Signed-off-by: Michael S. Tsirkin
    Reviewed-by: Wei Wang
    Reviewed-by: David Hildenbrand

    Michael S. Tsirkin
     
  • In case we have to migrate a ballon page to a newpage of another zone, the
    managed page count of both zones is wrong. Paired with memory offlining
    (which will adjust the managed page count), we can trigger kernel crashes
    and all kinds of different symptoms.

    One way to reproduce:
    1. Start a QEMU guest with 4GB, no NUMA
    2. Hotplug a 1GB DIMM and online the memory to ZONE_NORMAL
    3. Inflate the balloon to 1GB
    4. Unplug the DIMM (be quick, otherwise unmovable data ends up on it)
    5. Observe /proc/zoneinfo
    Node 0, zone Normal
    pages free 16810
    min 24848885473806
    low 18471592959183339
    high 36918337032892872
    spanned 262144
    present 262144
    managed 18446744073709533486
    6. Do anything that requires some memory (e.g., inflate the balloon some
    more). The OOM goes crazy and the system crashes
    [ 238.324946] Out of memory: Killed process 537 (login) total-vm:27584kB, anon-rss:860kB, file-rss:0kB, shmem-rss:00
    [ 238.338585] systemd invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
    [ 238.339420] CPU: 0 PID: 1 Comm: systemd Tainted: G D W 5.4.0-next-20191204+ #75
    [ 238.340139] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu4
    [ 238.341121] Call Trace:
    [ 238.341337] dump_stack+0x8f/0xd0
    [ 238.341630] dump_header+0x61/0x5ea
    [ 238.341942] oom_kill_process.cold+0xb/0x10
    [ 238.342299] out_of_memory+0x24d/0x5a0
    [ 238.342625] __alloc_pages_slowpath+0xd12/0x1020
    [ 238.343024] __alloc_pages_nodemask+0x391/0x410
    [ 238.343407] pagecache_get_page+0xc3/0x3a0
    [ 238.343757] filemap_fault+0x804/0xc30
    [ 238.344083] ? ext4_filemap_fault+0x28/0x42
    [ 238.344444] ext4_filemap_fault+0x30/0x42
    [ 238.344789] __do_fault+0x37/0x1a0
    [ 238.345087] __handle_mm_fault+0x104d/0x1ab0
    [ 238.345450] handle_mm_fault+0x169/0x360
    [ 238.345790] do_user_addr_fault+0x20d/0x490
    [ 238.346154] do_page_fault+0x31/0x210
    [ 238.346468] async_page_fault+0x43/0x50
    [ 238.346797] RIP: 0033:0x7f47eba4197e
    [ 238.347110] Code: Bad RIP value.
    [ 238.347387] RSP: 002b:00007ffd7c0c1890 EFLAGS: 00010293
    [ 238.347834] RAX: 0000000000000002 RBX: 000055d196a20a20 RCX: 00007f47eba4197e
    [ 238.348437] RDX: 0000000000000033 RSI: 00007ffd7c0c18c0 RDI: 0000000000000004
    [ 238.349047] RBP: 00007ffd7c0c1c20 R08: 0000000000000000 R09: 0000000000000033
    [ 238.349660] R10: 00000000ffffffff R11: 0000000000000293 R12: 0000000000000001
    [ 238.350261] R13: ffffffffffffffff R14: 0000000000000000 R15: 00007ffd7c0c18c0
    [ 238.350878] Mem-Info:
    [ 238.351085] active_anon:3121 inactive_anon:51 isolated_anon:0
    [ 238.351085] active_file:12 inactive_file:7 isolated_file:0
    [ 238.351085] unevictable:0 dirty:0 writeback:0 unstable:0
    [ 238.351085] slab_reclaimable:5565 slab_unreclaimable:10170
    [ 238.351085] mapped:3 shmem:111 pagetables:155 bounce:0
    [ 238.351085] free:720717 free_pcp:2 free_cma:0
    [ 238.353757] Node 0 active_anon:12484kB inactive_anon:204kB active_file:48kB inactive_file:28kB unevictable:0kB iss
    [ 238.355979] Node 0 DMA free:11556kB min:36kB low:48kB high:60kB reserved_highatomic:0KB active_anon:152kB inactivB
    [ 238.358345] lowmem_reserve[]: 0 2955 2884 2884 2884
    [ 238.358761] Node 0 DMA32 free:2677864kB min:7004kB low:10028kB high:13052kB reserved_highatomic:0KB active_anon:0B
    [ 238.361202] lowmem_reserve[]: 0 0 72057594037927865 72057594037927865 72057594037927865
    [ 238.361888] Node 0 Normal free:193448kB min:99395541895224kB low:73886371836733356kB high:147673348131571488kB reB
    [ 238.364765] lowmem_reserve[]: 0 0 0 0 0
    [ 238.365101] Node 0 DMA: 7*4kB (U) 5*8kB (UE) 6*16kB (UME) 2*32kB (UM) 1*64kB (U) 2*128kB (UE) 3*256kB (UME) 2*512B
    [ 238.366379] Node 0 DMA32: 0*4kB 1*8kB (U) 2*16kB (UM) 2*32kB (UM) 2*64kB (UM) 1*128kB (U) 1*256kB (U) 1*512kB (U)B
    [ 238.367654] Node 0 Normal: 1985*4kB (UME) 1321*8kB (UME) 844*16kB (UME) 524*32kB (UME) 300*64kB (UME) 138*128kB (B
    [ 238.369184] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
    [ 238.369915] 130 total pagecache pages
    [ 238.370241] 0 pages in swap cache
    [ 238.370533] Swap cache stats: add 0, delete 0, find 0/0
    [ 238.370981] Free swap = 0kB
    [ 238.371239] Total swap = 0kB
    [ 238.371488] 1048445 pages RAM
    [ 238.371756] 0 pages HighMem/MovableOnly
    [ 238.372090] 306992 pages reserved
    [ 238.372376] 0 pages cma reserved
    [ 238.372661] 0 pages hwpoisoned

    In another instance (older kernel), I was able to observe this
    (negative page count :/):
    [ 180.896971] Offlined Pages 32768
    [ 182.667462] Offlined Pages 32768
    [ 184.408117] Offlined Pages 32768
    [ 186.026321] Offlined Pages 32768
    [ 187.684861] Offlined Pages 32768
    [ 189.227013] Offlined Pages 32768
    [ 190.830303] Offlined Pages 32768
    [ 190.833071] Built 1 zonelists, mobility grouping on. Total pages: -36920272750453009

    In another instance (older kernel), I was no longer able to start any
    process:
    [root@vm ~]# [ 214.348068] Offlined Pages 32768
    [ 215.973009] Offlined Pages 32768
    cat /proc/meminfo
    -bash: fork: Cannot allocate memory
    [root@vm ~]# cat /proc/meminfo
    -bash: fork: Cannot allocate memory

    Fix it by properly adjusting the managed page count when migrating if
    the zone changed. The managed page count of the zones now looks after
    unplug of the DIMM (and after deflating the balloon) just like before
    inflating the balloon (and plugging+onlining the DIMM).

    We'll temporarily modify the totalram page count. If this ever becomes a
    problem, we can fine tune by providing helpers that don't touch
    the totalram pages (e.g., adjust_zone_managed_page_count()).

    Please note that fixing up the managed page count is only necessary when
    we adjusted the managed page count when inflating - only if we
    don't have VIRTIO_BALLOON_F_DEFLATE_ON_OOM. With that feature, the
    managed page count is not touched when inflating/deflating.

    Reported-by: Yumei Huang
    Fixes: 3dcc0571cd64 ("mm: correctly update zone->managed_pages")
    Cc: # v3.11+
    Cc: "Michael S. Tsirkin"
    Cc: Jason Wang
    Cc: Jiang Liu
    Cc: Andrew Morton
    Cc: Igor Mammedov
    Cc: virtualization@lists.linux-foundation.org
    Signed-off-by: David Hildenbrand
    Signed-off-by: Michael S. Tsirkin

    David Hildenbrand
     

20 Nov, 2019

2 commits

  • Instead of multiplying by page order, virtio balloon divided by page
    order. The result is that it can return 0 if there are a bit less
    than MAX_ORDER - 1 pages in use, and then shrinker scan won't be called.

    Cc: stable@vger.kernel.org
    Fixes: 71994620bb25 ("virtio_balloon: replace oom notifier with shrinker")
    Signed-off-by: Wei Wang
    Signed-off-by: Michael S. Tsirkin
    Reviewed-by: David Hildenbrand

    Wei Wang
     
  • virtio_balloon_shrinker_scan should return number of system pages freed,
    but because it's calling functions that deal with balloon pages, it gets
    confused and sometimes returns the number of balloon pages.

    It does not matter practically as the exact number isn't
    used, but it seems better to be consistent in case someone
    starts using this API.

    Further, if we ever tried to iteratively leak pages as
    virtio_balloon_shrinker_scan tries to do, we'd run into issues - this is
    because freed_pages was accumulating total freed pages, but was also
    subtracted on each iteration from pages_to_free, which can result in
    either leaking less memory than we were supposed to free, or more if
    pages_to_free underruns.

    On a system with 4K pages we are lucky that we are never asked to leak
    more than 128 pages while we can leak up to 256 at a time,
    but it looks like a real issue for systems with page size != 4K.

    Fixes: 71994620bb25 ("virtio_balloon: replace oom notifier with shrinker")
    Reported-by: Khazhismel Kumykov
    Reviewed-by: Wei Wang
    Signed-off-by: Michael S. Tsirkin

    Michael S. Tsirkin
     

19 Nov, 2019

1 commit

  • Commit 780bc7903a32 ("virtio_ring: Support DMA APIs") makes
    virtqueue_add() return -EIO when we fail to map our I/O buffers. This is
    a very realistic scenario for guests with encrypted memory, as swiotlb
    may run out of space, depending on it's size and the I/O load.

    The virtio-blk driver interprets -EIO form virtqueue_add() as an IO
    error, despite the fact that swiotlb full is in absence of bugs a
    recoverable condition.

    Let us change the return code to -ENOMEM, and make the block layer
    recover form these failures when virtio-blk encounters the condition
    described above.

    Cc: stable@vger.kernel.org
    Fixes: 780bc7903a32 ("virtio_ring: Support DMA APIs")
    Signed-off-by: Halil Pasic
    Tested-by: Michael Mueller
    Signed-off-by: Michael S. Tsirkin

    Halil Pasic
     

28 Oct, 2019

1 commit

  • When VIRTIO_F_RING_EVENT_IDX is negotiated, virtio devices can
    use virtqueue_enable_cb_delayed_packed to reduce the number of device
    interrupts. At the moment, this is the case for virtio-net when the
    napi_tx module parameter is set to false.

    In this case, the virtio driver selects an event offset and expects that
    the device will send a notification when rolling over the event offset
    in the ring. However, if this roll-over happens before the event
    suppression structure update, the notification won't be sent. To address
    this race condition the driver needs to check wether the device rolled
    over the offset after updating the event suppression structure.

    With VIRTIO_F_RING_PACKED, the virtio driver did this by reading the
    flags field of the descriptor at the specified offset.

    Unfortunately, checking at the event offset isn't reliable: if
    descriptors are chained (e.g. when INDIRECT is off) not all descriptors
    are overwritten by the device, so it's possible that the device skipped
    the specific descriptor driver is checking when writing out used
    descriptors. If this happens, the driver won't detect the race condition
    and will incorrectly expect the device to send a notification.

    For virtio-net, the result will be a TX queue stall, with the
    transmission getting blocked forever.

    With the packed ring, it isn't easy to find a location which is
    guaranteed to change upon the roll-over, except the next device
    descriptor, as described in the spec:

    Writes of device and driver descriptors can generally be
    reordered, but each side (driver and device) are only required to
    poll (or test) a single location in memory: the next device descriptor after
    the one they processed previously, in circular order.

    while this might be sub-optimal, let's do exactly this for now.

    Cc: stable@vger.kernel.org
    Cc: Jason Wang
    Fixes: f51f982682e2a ("virtio_ring: leverage event idx in packed ring")
    Signed-off-by: Marvin Liu
    Signed-off-by: Michael S. Tsirkin

    Marvin Liu
     

09 Sep, 2019

1 commit

  • The function virtqueue_add_split() DMA-maps the scatterlist buffers. In
    case a mapping error occurs the already mapped buffers must be unmapped.
    This happens by jumping to the 'unmap_release' label.

    In case of indirect descriptors the release is wrong and may leak kernel
    memory. Because the implementation assumes that the head descriptor is
    already mapped it starts iterating over the descriptor list starting
    from the head descriptor. However for indirect descriptors the head
    descriptor is never mapped in case of an error.

    The fix is to initialize the start index with zero in case of indirect
    descriptors and use the 'desc' pointer directly for iterating over the
    descriptor chain.

    Signed-off-by: Matthias Lange
    Signed-off-by: Michael S. Tsirkin

    Matthias Lange
     

20 Jul, 2019

1 commit

  • Pull vfs mount updates from Al Viro:
    "The first part of mount updates.

    Convert filesystems to use the new mount API"

    * 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
    mnt_init(): call shmem_init() unconditionally
    constify ksys_mount() string arguments
    don't bother with registering rootfs
    init_rootfs(): don't bother with init_ramfs_fs()
    vfs: Convert smackfs to use the new mount API
    vfs: Convert selinuxfs to use the new mount API
    vfs: Convert securityfs to use the new mount API
    vfs: Convert apparmorfs to use the new mount API
    vfs: Convert openpromfs to use the new mount API
    vfs: Convert xenfs to use the new mount API
    vfs: Convert gadgetfs to use the new mount API
    vfs: Convert oprofilefs to use the new mount API
    vfs: Convert ibmasmfs to use the new mount API
    vfs: Convert qib_fs/ipathfs to use the new mount API
    vfs: Convert efivarfs to use the new mount API
    vfs: Convert configfs to use the new mount API
    vfs: Convert binfmt_misc to use the new mount API
    convenience helper: get_tree_single()
    convenience helper get_tree_nodev()
    vfs: Kill sget_userns()
    ...

    Linus Torvalds
     

19 Jul, 2019

1 commit

  • Pull libnvdimm updates from Dan Williams:
    "Primarily just the virtio_pmem driver:

    - virtio_pmem

    The new virtio_pmem facility introduces a paravirtualized
    persistent memory device that allows a guest VM to use DAX
    mechanisms to access a host-file with host-page-cache. It arranges
    for MAP_SYNC to be disabled and instead triggers a host fsync()
    when a 'write-cache flush' command is sent to the virtual disk
    device.

    - Miscellaneous small fixups"

    * tag 'libnvdimm-for-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
    virtio_pmem: fix sparse warning
    xfs: disable map_sync for async flush
    ext4: disable map_sync for async flush
    dax: check synchronous mapping is supported
    dm: enable synchronous dax
    libnvdimm: add dax_dev sync flag
    virtio-pmem: Add virtio pmem driver
    libnvdimm: nd_region flush callback support
    libnvdimm, namespace: Drop uuid_t implementation detail

    Linus Torvalds
     

12 Jul, 2019

1 commit


06 Jul, 2019

1 commit

  • This patch adds virtio-pmem driver for KVM guest.

    Guest reads the persistent memory range information from
    Qemu over VIRTIO and registers it on nvdimm_bus. It also
    creates a nd_region object with the persistent memory
    range information so that existing 'nvdimm/pmem' driver
    can reserve this into system memory map. This way
    'virtio-pmem' driver uses existing functionality of pmem
    driver to register persistent memory compatible for DAX
    capable filesystems.

    This also provides function to perform guest flush over
    VIRTIO from 'pmem' driver when userspace performs flush
    on DAX memory range.

    Signed-off-by: Pankaj Gupta
    Reviewed-by: Yuval Shaia
    Acked-by: Michael S. Tsirkin
    Acked-by: Jakub Staron
    Tested-by: Jakub Staron
    Reviewed-by: Cornelia Huck
    Signed-off-by: Dan Williams

    Pankaj Gupta
     

27 May, 2019

1 commit


26 May, 2019

2 commits

  • Convert the virtio_balloon filesystem to the new internal mount API as the old
    one will be obsoleted and removed. This allows greater flexibility in
    communication of mount parameters between userspace, the VFS and the
    filesystem.

    See Documentation/filesystems/mount_api.txt for more information.

    Signed-off-by: David Howells
    cc: "Michael S. Tsirkin"
    cc: Jason Wang
    cc: virtualization@lists.linux-foundation.org
    Signed-off-by: Al Viro

    David Howells
     
  • Once upon a time we used to set ->d_name of e.g. pipefs root
    so that d_path() on pipes would work. These days it's
    completely pointless - dentries of pipes are not even connected
    to pipefs root. However, mount_pseudo() had set the root
    dentry name (passed as the second argument) and callers
    kept inventing names to pass to it. Including those that
    didn't *have* any non-root dentries to start with...

    All of that had been pointless for about 8 years now; it's
    time to get rid of that cargo-culting...

    Signed-off-by: Al Viro

    Al Viro
     

24 May, 2019

2 commits

  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license as published by
    the free software foundation either version 2 of the license or at
    your option any later version this program is distributed in the
    hope that it will be useful but without any warranty without even
    the implied warranty of merchantability or fitness for a particular
    purpose see the gnu general public license for more details you
    should have received a copy of the gnu general public license along
    with this program if not write to the free software foundation inc
    51 franklin st fifth floor boston ma 02110 1301 usa

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-or-later

    has been chosen to replace the boilerplate/reference in 50 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Kate Stewart
    Reviewed-by: Allison Randal
    Reviewed-by: Richard Fontana
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190523091649.499889647@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     
  • Based on 1 normalized pattern(s):

    this work is licensed under the terms of the gnu gpl version 2 or
    later see the copying file in the top level directory

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-or-later

    has been chosen to replace the boilerplate/reference in 6 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Allison Randal
    Reviewed-by: Richard Fontana
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190520075210.858783702@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

21 May, 2019

2 commits


20 May, 2019

1 commit


13 May, 2019

2 commits


09 Apr, 2019

1 commit

  • vring_create_virtqueue() allows the caller to specify via the
    may_reduce_num parameter whether the vring code is allowed to
    allocate a smaller ring than specified.

    However, the split ring allocation code tries to allocate a
    smaller ring on allocation failure regardless of what the
    caller specified. This may cause trouble for e.g. virtio-pci
    in legacy mode, which does not support ring resizing. (The
    packed ring code does not resize in any case.)

    Let's fix this by bailing out immediately in the split ring code
    if the requested size cannot be allocated and may_reduce_num has
    not been specified.

    While at it, fix a typo in the usage instructions.

    Fixes: 2a2d1382fe9d ("virtio: Add improved queue allocation API")
    Cc: stable@vger.kernel.org # v4.6+
    Signed-off-by: Cornelia Huck
    Signed-off-by: Michael S. Tsirkin
    Reviewed-by: Halil Pasic
    Reviewed-by: Jens Freimann

    Cornelia Huck
     

08 Apr, 2019

1 commit

  • If the msix_affinity_masks is alloced failed, then we'll
    try to free some resources in vp_free_vectors() that may
    access it directly.

    We met the following stack in our production:
    [ 29.296767] BUG: unable to handle kernel NULL pointer dereference at (null)
    [ 29.311151] IP: [] vp_free_vectors+0x6a/0x150 [virtio_pci]
    [ 29.324787] PGD 0
    [ 29.333224] Oops: 0000 [#1] SMP
    [...]
    [ 29.425175] RIP: 0010:[] [] vp_free_vectors+0x6a/0x150 [virtio_pci]
    [ 29.441405] RSP: 0018:ffff9a55c2dcfa10 EFLAGS: 00010206
    [ 29.453491] RAX: 0000000000000000 RBX: ffff9a55c322c400 RCX: 0000000000000000
    [ 29.467488] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9a55c322c400
    [ 29.481461] RBP: ffff9a55c2dcfa20 R08: 0000000000000000 R09: ffffc1b6806ff020
    [ 29.495427] R10: 0000000000000e95 R11: 0000000000aaaaaa R12: 0000000000000000
    [ 29.509414] R13: 0000000000010000 R14: ffff9a55bd2d9e98 R15: ffff9a55c322c400
    [ 29.523407] FS: 00007fdcba69f8c0(0000) GS:ffff9a55c2840000(0000) knlGS:0000000000000000
    [ 29.538472] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 29.551621] CR2: 0000000000000000 CR3: 000000003ce52000 CR4: 00000000003607a0
    [ 29.565886] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [ 29.580055] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    [ 29.594122] Call Trace:
    [ 29.603446] [] vp_request_msix_vectors+0xe2/0x260 [virtio_pci]
    [ 29.618017] [] vp_try_to_find_vqs+0x95/0x3b0 [virtio_pci]
    [ 29.632152] [] vp_find_vqs+0x37/0xb0 [virtio_pci]
    [ 29.645582] [] init_vq+0x153/0x260 [virtio_blk]
    [ 29.658831] [] virtblk_probe+0xe8/0x87f [virtio_blk]
    [...]

    Cc: Gonglei
    Signed-off-by: Longpeng
    Signed-off-by: Michael S. Tsirkin
    Reviewed-by: Gonglei

    Longpeng
     

07 Mar, 2019

4 commits

  • A virtio transport is free to implement some of the callbacks in
    virtio_config_ops in a matter that they cannot be called from
    atomic context (e.g. virtio-ccw, which maps a lot of the callbacks
    to channel I/O, which is an inherently asynchronous mechanism).
    This can be very surprising for developers using the much more
    common virtio-pci transport, just to find out that things break
    when used on s390.

    The documentation for virtio_config_ops now contains a comment
    explaining this, but it makes sense to add a might_sleep() annotation
    to various wrapper functions in the virtio core to avoid surprises
    later.

    Note that annotations are NOT added to two classes of calls:
    - direct calls from device drivers (all current callers should be
    fine, however)
    - calls which clearly won't be made from atomic context (such as
    those ultimately coming in via the driver core)

    Signed-off-by: Cornelia Huck
    Signed-off-by: Michael S. Tsirkin

    Cornelia Huck
     
  • We've changed to kzalloc the vb struct, so no need to 0-initialize
    this field one more time.

    Signed-off-by: Wei Wang
    Signed-off-by: Michael S. Tsirkin
    Reviewed-by: Cornelia Huck

    Wei Wang
     
  • There is no need to update the balloon actual register when there is no
    ballooning request. This patch avoids update_balloon_size when diff is 0.

    Signed-off-by: Wei Wang
    Reviewed-by: Cornelia Huck
    Reviewed-by: Halil Pasic
    Signed-off-by: Michael S. Tsirkin

    Wei Wang
     
  • This function returns the maximum segment size for a single
    dma transaction of a virtio device. The possible limit comes
    from the SWIOTLB implementation in the Linux kernel, that
    has an upper limit of (currently) 256kb of contiguous
    memory it can map. Other DMA-API implementations might also
    have limits.

    Use the new dma_max_mapping_size() function to determine the
    maximum mapping size when DMA-API is in use for virtio.

    Cc: stable@vger.kernel.org
    Reviewed-by: Konrad Rzeszutek Wilk
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Joerg Roedel
    Signed-off-by: Michael S. Tsirkin

    Joerg Roedel
     

06 Feb, 2019

1 commit


24 Jan, 2019

1 commit

  • This patch introduces the support for VIRTIO_F_ORDER_PLATFORM.
    If this feature is negotiated, the driver must use the barriers
    suitable for hardware devices. Otherwise, the device and driver
    are assumed to be implemented in software, that is they can be
    assumed to run on identical CPUs in an SMP configuration. Thus
    a weaker form of memory barriers is sufficient to yield better
    performance.

    It is recommended that an add-in card based PCI device offers
    this feature for portability. The device will fail to operate
    further or will operate in a slower emulation mode if this
    feature is offered but not accepted.

    Signed-off-by: Tiwei Bie
    Signed-off-by: Michael S. Tsirkin

    Tiwei Bie
     

15 Jan, 2019

2 commits

  • virtio-ccw has deadlock issues with reading the config space inside the
    interrupt context, so we tweak the virtballoon_changed implementation
    by moving the config read operations into the related workqueue contexts.
    The config_read_bitmap is used as a flag to the workqueue callbacks
    about the related config fields that need to be read.

    The cmd_id_received is also renamed to cmd_id_received_cache, and
    the value should be obtained via virtio_balloon_cmd_id_received.

    Reported-by: Christian Borntraeger
    Signed-off-by: Wei Wang
    Reviewed-by: Cornelia Huck
    Reviewed-by: Halil Pasic
    Signed-off-by: Michael S. Tsirkin
    Cc: stable@vger.kernel.org
    Fixes: 86a559787e6f ("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT")
    Tested-by: Christian Borntraeger

    Wei Wang
     
  • Some vqs may not need to be allocated when their related feature bits
    are disabled. So callers may pass in such vqs with "names = NULL".
    Then we skip such vq allocations.

    Signed-off-by: Wei Wang
    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: Wei Wang
    Signed-off-by: Wei Wang
    Reviewed-by: Cornelia Huck
    Cc: stable@vger.kernel.org
    Fixes: 86a559787e6f ("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT")

    Wei Wang