18 Dec, 2014

1 commit

  • Pull virtio updates from Rusty Russell:
    "A balloon enhancement, and a minor race-on-module-unload theoretical
    bug which doesn't merit cc: stable.

    All the exciting stuff went via MST this cycle"

    * tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
    virtio_balloon: free some memory from balloon on OOM
    virtio_balloon: return the amount of freed memory from leak_balloon()
    virtio_blk: fix race at module removal
    virtio: Fix comment typo 'CONFIG_S_FAILED'

    Linus Torvalds
     

15 Dec, 2014

1 commit

  • Pull driver core update from Greg KH:
    "Here's the set of driver core patches for 3.19-rc1.

    They are dominated by the removal of the .owner field in platform
    drivers. They touch a lot of files, but they are "simple" changes,
    just removing a line in a structure.

    Other than that, a few minor driver core and debugfs changes. There
    are some ath9k patches coming in through this tree that have been
    acked by the wireless maintainers as they relied on the debugfs
    changes.

    Everything has been in linux-next for a while"

    * tag 'driver-core-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (324 commits)
    Revert "ath: ath9k: use debugfs_create_devm_seqfile() helper for seq_file entries"
    fs: debugfs: add forward declaration for struct device type
    firmware class: Deletion of an unnecessary check before the function call "vunmap"
    firmware loader: fix hung task warning dump
    devcoredump: provide a one-way disable function
    device: Add dev__once variants
    ath: ath9k: use debugfs_create_devm_seqfile() helper for seq_file entries
    ath: use seq_file api for ath9k debugfs files
    debugfs: add helper function to create device related seq_file
    drivers/base: cacheinfo: remove noisy error boot message
    Revert "core: platform: add warning if driver has no owner"
    drivers: base: support cpu cache information interface to userspace via sysfs
    drivers: base: add cpu_device_create to support per-cpu devices
    topology: replace custom attribute macros with standard DEVICE_ATTR*
    cpumask: factor out show_cpumap into separate helper function
    driver core: Fix unbalanced device reference in drivers_probe
    driver core: fix race with userland in device_add()
    sysfs/kernfs: make read requests on pre-alloc files use the buffer.
    sysfs/kernfs: allow attributes to request write buffer be pre-allocated.
    fs: sysfs: return EGBIG on write if offset is larger than file size
    ...

    Linus Torvalds
     

10 Dec, 2014

12 commits


09 Dec, 2014

12 commits


11 Nov, 2014

2 commits

  • Excessive virtio_balloon inflation can cause invocation of OOM-killer,
    when Linux is under severe memory pressure. Various mechanisms are
    responsible for correct virtio_balloon memory management. Nevertheless
    it is often the case that these control tools does not have enough time
    to react on fast changing memory load. As a result OS runs out of memory
    and invokes OOM-killer. The balancing of memory by use of the virtio
    balloon should not cause the termination of processes while there are
    pages in the balloon. Now there is no way for virtio balloon driver to
    free some memory at the last moment before some process will be get
    killed by OOM-killer.

    This does not provide a security breach as balloon itself is running
    inside guest OS and is working in the cooperation with the host. Thus
    some improvements from guest side should be considered as normal.

    To solve the problem, introduce a virtio_balloon callback which is
    expected to be called from the oom notifier call chain in out_of_memory()
    function. If virtio balloon could release some memory, it will make
    the system to return and retry the allocation that forced the out of
    memory killer to run.

    Allocate virtio feature bit for this: it is not set by default,
    the the guest will not deflate virtio balloon on OOM without explicit
    permission from host.

    Signed-off-by: Raushaniya Maksudova
    Signed-off-by: Denis V. Lunev
    Acked-by: Michael S. Tsirkin
    Signed-off-by: Rusty Russell

    Raushaniya Maksudova
     
  • This value would be useful in the next patch to provide the amount of
    the freed memory for OOM killer.

    Signed-off-by: Raushaniya Maksudova
    Signed-off-by: Denis V. Lunev
    CC: Rusty Russell
    CC: Michael S. Tsirkin
    Signed-off-by: Rusty Russell

    Raushaniya Maksudova
     

20 Oct, 2014

1 commit


19 Oct, 2014

1 commit

  • Pull virtio updates from Rusty Russell:
    "One cc: stable commit, the rest are a series of minor cleanups which
    have been sitting in MST's tree during my vacation. I changed a
    function name and made one trivial change, then they spent two days in
    linux-next"

    * tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (25 commits)
    virtio-rng: refactor probe error handling
    virtio_scsi: drop scan callback
    virtio_balloon: enable VQs early on restore
    virtio_scsi: fix race on device removal
    virito_scsi: use freezable WQ for events
    virtio_net: enable VQs early on restore
    virtio_console: enable VQs early on restore
    virtio_scsi: enable VQs early on restore
    virtio_blk: enable VQs early on restore
    virtio_scsi: move kick event out from virtscsi_init
    virtio_net: fix use after free on allocation failure
    9p/trans_virtio: enable VQs early
    virtio_console: enable VQs early
    virtio_blk: enable VQs early
    virtio_net: enable VQs early
    virtio: add API to enable VQs early
    virtio_net: minor cleanup
    virtio-net: drop config_mutex
    virtio_net: drop config_enable
    virtio-blk: drop config_mutex
    ...

    Linus Torvalds
     

15 Oct, 2014

5 commits

  • virtio spec requires drivers to set DRIVER_OK before using VQs.
    This is set automatically after resume returns, virtio balloon
    violated this rule by adding bufs, which causes the VQ to be used
    directly within restore.

    To fix, call virtio_device_ready before using VQ.

    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: Rusty Russell

    Michael S. Tsirkin
     
  • Defer config changed notifications that arrive during
    probe/scan/freeze/restore.

    This will allow drivers to set DRIVER_OK earlier, without worrying about
    racing with config change interrupts.

    This change will also benefit old hypervisors (before 2009)
    that send interrupts without checking DRIVER_OK: previously,
    the callback could race with driver-specific initialization.

    This will also help simplify drivers.

    Signed-off-by: Michael S. Tsirkin
    Reviewed-by: Cornelia Huck
    Signed-off-by: Rusty Russell (cosmetic changes)

    Michael S. Tsirkin
     
  • This is in preparation to extending config changed event handling
    in core.
    Wrapping these in an API also seems to make for a cleaner code.

    Signed-off-by: Michael S. Tsirkin
    Reviewed-by: Cornelia Huck
    Signed-off-by: Rusty Russell

    Michael S. Tsirkin
     
  • Replace duplicated code in all transports with a single wrapper in
    virtio.c.

    The only functional change is in virtio_mmio.c: if a buggy device sends
    us an interrupt before driver is set, we previously returned IRQ_NONE,
    now we return IRQ_HANDLED.

    As this must not happen in practice, this does not look like a big deal.

    See also commit 3fff0179e33cd7d0a688dab65700c46ad089e934
    virtio-pci: do not oops on config change if driver not loaded.
    for the original motivation behind the driver check.

    Signed-off-by: Michael S. Tsirkin
    Reviewed-by: Cornelia Huck
    Signed-off-by: Rusty Russell

    Michael S. Tsirkin
     
  • On restore, virtio pci does the following:
    + set features
    + init vqs etc - device can be used at this point!
    + set ACKNOWLEDGE,DRIVER and DRIVER_OK status bits

    This is in violation of the virtio spec, which
    requires the following order:
    - ACKNOWLEDGE
    - DRIVER
    - init vqs
    - DRIVER_OK

    This behaviour will break with hypervisors that assume spec compliant
    behaviour. It seems like a good idea to have this patch applied to
    stable branches to reduce the support butden for the hypervisors.

    Cc: stable@vger.kernel.org
    Cc: Amit Shah
    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: Rusty Russell

    Michael S. Tsirkin
     

10 Oct, 2014

3 commits

  • Always mark pages with PageBalloon even if balloon compaction is disabled
    and expose this mark in /proc/kpageflags as KPF_BALLOON.

    Also this patch adds three counters into /proc/vmstat: "balloon_inflate",
    "balloon_deflate" and "balloon_migrate". They accumulate balloon
    activity. Current size of balloon is (balloon_inflate - balloon_deflate)
    pages.

    All generic balloon code now gathered under option CONFIG_MEMORY_BALLOON.
    It should be selected by ballooning driver which wants use this feature.
    Currently virtio-balloon is the only user.

    Signed-off-by: Konstantin Khlebnikov
    Cc: Rafael Aquini
    Cc: Andrey Ryabinin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Konstantin Khlebnikov
     
  • Now ballooned pages are detected using PageBalloon(). Fake mapping is no
    longer required. This patch links ballooned pages to balloon device using
    field page->private instead of page->mapping. Also this patch embeds
    balloon_dev_info directly into struct virtio_balloon.

    Signed-off-by: Konstantin Khlebnikov
    Cc: Rafael Aquini
    Cc: Andrey Ryabinin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Konstantin Khlebnikov
     
  • Sasha Levin reported KASAN splash inside isolate_migratepages_range().
    Problem is in the function __is_movable_balloon_page() which tests
    AS_BALLOON_MAP in page->mapping->flags. This function has no protection
    against anonymous pages. As result it tried to check address space flags
    inside struct anon_vma.

    Further investigation shows more problems in current implementation:

    * Special branch in __unmap_and_move() never works:
    balloon_page_movable() checks page flags and page_count. In
    __unmap_and_move() page is locked, reference counter is elevated, thus
    balloon_page_movable() always fails. As a result execution goes to the
    normal migration path. virtballoon_migratepage() returns
    MIGRATEPAGE_BALLOON_SUCCESS instead of MIGRATEPAGE_SUCCESS,
    move_to_new_page() thinks this is an error code and assigns
    newpage->mapping to NULL. Newly migrated page lose connectivity with
    balloon an all ability for further migration.

    * lru_lock erroneously required in isolate_migratepages_range() for
    isolation ballooned page. This function releases lru_lock periodically,
    this makes migration mostly impossible for some pages.

    * balloon_page_dequeue have a tight race with balloon_page_isolate:
    balloon_page_isolate could be executed in parallel with dequeue between
    picking page from list and locking page_lock. Race is rare because they
    use trylock_page() for locking.

    This patch fixes all of them.

    Instead of fake mapping with special flag this patch uses special state of
    page->_mapcount: PAGE_BALLOON_MAPCOUNT_VALUE = -256. Buddy allocator uses
    PAGE_BUDDY_MAPCOUNT_VALUE = -128 for similar purpose. Storing mark
    directly in struct page makes everything safer and easier.

    PagePrivate is used to mark pages present in page list (i.e. not
    isolated, like PageLRU for normal pages). It replaces special rules for
    reference counter and makes balloon migration similar to migration of
    normal pages. This flag is protected by page_lock together with link to
    the balloon device.

    Signed-off-by: Konstantin Khlebnikov
    Reported-by: Sasha Levin
    Link: http://lkml.kernel.org/p/53E6CEAA.9020105@oracle.com
    Cc: Rafael Aquini
    Cc: Andrey Ryabinin
    Cc: [3.8+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Konstantin Khlebnikov
     

14 Sep, 2014

2 commits

  • virtqueue_add() populates the virtqueue descriptor table from the sgs
    given. If it uses an indirect descriptor table, then it puts a single
    descriptor in the descriptor table pointing to the kmalloc'ed indirect
    table where the sg is populated.

    Previously vring_add_indirect() did the allocation and the simple
    linear layout. We replace that with alloc_indirect() which allocates
    the indirect table then chains it like the normal descriptor table so
    we can reuse the core logic.

    This slows down pktgen by less than 1/2 a percent (which uses direct
    descriptors), as well as vring_bench, but it's far neater.

    vring_bench before:
    1061485790-1104800648(1.08254e+09+/-6.6e+06)ns
    vring_bench after:
    1125610268-1183528965(1.14172e+09+/-8e+06)ns

    pktgen before:
    787781-796334(793165+/-2.4e+03)pps 365-369(367.5+/-1.2)Mb/sec (365530384-369498976(3.68028e+08+/-1.1e+06)bps) errors: 0

    pktgen after:
    779988-790404(786391+/-2.5e+03)pps 361-366(364.35+/-1.3)Mb/sec (361914432-366747456(3.64885e+08+/-1.2e+06)bps) errors: 0

    Now, if we make force indirect descriptors by turning off any_header_sg
    in virtio_net.c:

    pktgen before:
    713773-721062(718374+/-2.1e+03)pps 331-334(332.95+/-0.92)Mb/sec (331190672-334572768(3.33325e+08+/-9.6e+05)bps) errors: 0
    pktgen after:
    710542-719195(714898+/-2.4e+03)pps 329-333(331.15+/-1.1)Mb/sec (329691488-333706480(3.31713e+08+/-1.1e+06)bps) errors: 0

    Signed-off-by: Rusty Russell
    Signed-off-by: David S. Miller

    Rusty Russell
     
  • We used to have several callers which just used arrays. They're
    gone, so we can use sg_next() everywhere, simplifying the code.

    On my laptop, this slowed down vring_bench by 15%:

    vring_bench before:
    936153354-967745359(9.44739e+08+/-6.1e+06)ns
    vring_bench after:
    1061485790-1104800648(1.08254e+09+/-6.6e+06)ns

    However, a more realistic test using pktgen on a AMD FX(tm)-8320 saw
    a few percent improvement:

    pktgen before:
    767390-792966(785159+/-6.5e+03)pps 356-367(363.75+/-2.9)Mb/sec (356068960-367936224(3.64314e+08+/-3e+06)bps) errors: 0

    pktgen after:
    787781-796334(793165+/-2.4e+03)pps 365-369(367.5+/-1.2)Mb/sec (365530384-369498976(3.68028e+08+/-1.1e+06)bps) errors: 0

    Signed-off-by: Rusty Russell
    Signed-off-by: David S. Miller

    Rusty Russell