18 Dec, 2015

6 commits

  • commit f598282f51 ("PCI: Fix the NIU MSI-X problem in a better way")
    teaches us that dealing with MSI-X can be troublesome.

    Further checks in the MSI-X architecture shows that if the
    PCI_COMMAND_MEMORY bit is turned of in the PCI_COMMAND we
    may not be able to access the BAR (since they are memory regions).

    Since the MSI-X tables are located in there.. that can lead
    to us causing PCIe errors. Inhibit us performing any
    operation on the MSI-X unless the MEMORY bit is set.

    Note that Xen hypervisor with:
    "x86/MSI-X: access MSI-X table only after having enabled MSI-X"
    will return:
    xen_pciback: 0000:0a:00.1: error -6 enabling MSI-X for guest 3!

    When the generic MSI code tries to setup the PIRQ without
    MEMORY bit set. Which means with later versions of Xen
    (4.6) this patch is not neccessary.

    This is part of XSA-157

    CC: stable@vger.kernel.org
    Reviewed-by: Jan Beulich
    Signed-off-by: Konrad Rzeszutek Wilk

    Konrad Rzeszutek Wilk
     
  • Otherwise just continue on, returning the same values as
    previously (return of 0, and op->result has the PIRQ value).

    This does not change the behavior of XEN_PCI_OP_disable_msi[|x].

    The pci_disable_msi or pci_disable_msix have the checks for
    msi_enabled or msix_enabled so they will error out immediately.

    However the guest can still call these operations and cause
    us to disable the 'ack_intr'. That means the backend IRQ handler
    for the legacy interrupt will not respond to interrupts anymore.

    This will lead to (if the device is causing an interrupt storm)
    for the Linux generic code to disable the interrupt line.

    Naturally this will only happen if the device in question
    is plugged in on the motherboard on shared level interrupt GSI.

    This is part of XSA-157

    CC: stable@vger.kernel.org
    Reviewed-by: David Vrabel
    Signed-off-by: Konrad Rzeszutek Wilk

    Konrad Rzeszutek Wilk
     
  • Otherwise an guest can subvert the generic MSI code to trigger
    an BUG_ON condition during MSI interrupt freeing:

    for (i = 0; i < entry->nvec_used; i++)
    BUG_ON(irq_has_action(entry->irq + i));

    Xen PCI backed installs an IRQ handler (request_irq) for
    the dev->irq whenever the guest writes PCI_COMMAND_MEMORY
    (or PCI_COMMAND_IO) to the PCI_COMMAND register. This is
    done in case the device has legacy interrupts the GSI line
    is shared by the backend devices.

    To subvert the backend the guest needs to make the backend
    to change the dev->irq from the GSI to the MSI interrupt line,
    make the backend allocate an interrupt handler, and then command
    the backend to free the MSI interrupt and hit the BUG_ON.

    Since the backend only calls 'request_irq' when the guest
    writes to the PCI_COMMAND register the guest needs to call
    XEN_PCI_OP_enable_msi before any other operation. This will
    cause the generic MSI code to setup an MSI entry and
    populate dev->irq with the new PIRQ value.

    Then the guest can write to PCI_COMMAND PCI_COMMAND_MEMORY
    and cause the backend to setup an IRQ handler for dev->irq
    (which instead of the GSI value has the MSI pirq). See
    'xen_pcibk_control_isr'.

    Then the guest disables the MSI: XEN_PCI_OP_disable_msi
    which ends up triggering the BUG_ON condition in 'free_msi_irqs'
    as there is an IRQ handler for the entry->irq (dev->irq).

    Note that this cannot be done using MSI-X as the generic
    code does not over-write dev->irq with the MSI-X PIRQ values.

    The patch inhibits setting up the IRQ handler if MSI or
    MSI-X (for symmetry reasons) code had been called successfully.

    P.S.
    Xen PCIBack when it sets up the device for the guest consumption
    ends up writting 0 to the PCI_COMMAND (see xen_pcibk_reset_device).
    XSA-120 addendum patch removed that - however when upstreaming said
    addendum we found that it caused issues with qemu upstream. That
    has now been fixed in qemu upstream.

    This is part of XSA-157

    CC: stable@vger.kernel.org
    Reviewed-by: David Vrabel
    Signed-off-by: Konrad Rzeszutek Wilk

    Konrad Rzeszutek Wilk
     
  • The guest sequence of:

    a) XEN_PCI_OP_enable_msix
    b) XEN_PCI_OP_enable_msix

    results in hitting an NULL pointer due to using freed pointers.

    The device passed in the guest MUST have MSI-X capability.

    The a) constructs and SysFS representation of MSI and MSI groups.
    The b) adds a second set of them but adding in to SysFS fails (duplicate entry).
    'populate_msi_sysfs' frees the newly allocated msi_irq_groups (note that
    in a) pdev->msi_irq_groups is still set) and also free's ALL of the
    MSI-X entries of the device (the ones allocated in step a) and b)).

    The unwind code: 'free_msi_irqs' deletes all the entries and tries to
    delete the pdev->msi_irq_groups (which hasn't been set to NULL).
    However the pointers in the SysFS are already freed and we hit an
    NULL pointer further on when 'strlen' is attempted on a freed pointer.

    The patch adds a simple check in the XEN_PCI_OP_enable_msix to guard
    against that. The check for msi_enabled is not stricly neccessary.

    This is part of XSA-157

    CC: stable@vger.kernel.org
    Reviewed-by: David Vrabel
    Reviewed-by: Jan Beulich
    Signed-off-by: Konrad Rzeszutek Wilk

    Konrad Rzeszutek Wilk
     
  • The guest sequence of:

    a) XEN_PCI_OP_enable_msi
    b) XEN_PCI_OP_enable_msi
    c) XEN_PCI_OP_disable_msi

    results in hitting an BUG_ON condition in the msi.c code.

    The MSI code uses an dev->msi_list to which it adds MSI entries.
    Under the above conditions an BUG_ON() can be hit. The device
    passed in the guest MUST have MSI capability.

    The a) adds the entry to the dev->msi_list and sets msi_enabled.
    The b) adds a second entry but adding in to SysFS fails (duplicate entry)
    and deletes all of the entries from msi_list and returns (with msi_enabled
    is still set). c) pci_disable_msi passes the msi_enabled checks and hits:

    BUG_ON(list_empty(dev_to_msi_list(&dev->dev)));

    and blows up.

    The patch adds a simple check in the XEN_PCI_OP_enable_msi to guard
    against that. The check for msix_enabled is not stricly neccessary.

    This is part of XSA-157.

    CC: stable@vger.kernel.org
    Reviewed-by: David Vrabel
    Reviewed-by: Jan Beulich
    Signed-off-by: Konrad Rzeszutek Wilk

    Konrad Rzeszutek Wilk
     
  • Double fetch vulnerabilities that happen when a variable is
    fetched twice from shared memory but a security check is only
    performed the first time.

    The xen_pcibk_do_op function performs a switch statements on the op->cmd
    value which is stored in shared memory. Interestingly this can result
    in a double fetch vulnerability depending on the performed compiler
    optimization.

    This patch fixes it by saving the xen_pci_op command before
    processing it. We also use 'barrier' to make sure that the
    compiler does not perform any optimization.

    This is part of XSA155.

    CC: stable@vger.kernel.org
    Reviewed-by: Konrad Rzeszutek Wilk
    Signed-off-by: Jan Beulich
    Signed-off-by: David Vrabel
    Signed-off-by: Konrad Rzeszutek Wilk

    Konrad Rzeszutek Wilk
     

18 Apr, 2014

1 commit

  • Mostly scripted conversion of the smp_mb__* barriers.

    Signed-off-by: Peter Zijlstra
    Acked-by: Paul E. McKenney
    Link: http://lkml.kernel.org/n/tip-55dhyhocezdw1dg7u19hmh1u@git.kernel.org
    Cc: Linus Torvalds
    Cc: linux-arch@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

16 Apr, 2014

1 commit


01 Mar, 2014

1 commit

  • As result of deprecation of MSI-X/MSI enablement functions
    pci_enable_msix() and pci_enable_msi_block() all drivers
    using these two interfaces need to be updated to use the
    new pci_enable_msi_range() or pci_enable_msi_exact()
    and pci_enable_msix_range() or pci_enable_msix_exact()
    interfaces.

    Signed-off-by: Alexander Gordeev
    Cc: Konrad Rzeszutek Wilk
    Cc: Boris Ostrovsky
    Cc: David Vrabel
    Cc: linux-pci@vger.kernel.org
    Signed-off-by: Konrad Rzeszutek Wilk
    Reviewed-by: Boris Ostrovsky

    Alexander Gordeev
     

28 Jun, 2013

1 commit

  • Convert printks to pr_ (excludes printk(KERN_DEBUG...)
    to be more consistent throughout the xen subsystem.

    Add pr_fmt with KBUILD_MODNAME or "xen:" KBUILD_MODNAME
    Coalesce formats and add missing word spaces
    Add missing newlines
    Align arguments and reflow to 80 columns
    Remove DRV_NAME from formats as pr_fmt adds the same content

    This does change some of the prefixes of these messages
    but it also does make them more consistent.

    Signed-off-by: Joe Perches
    Signed-off-by: Konrad Rzeszutek Wilk

    Joe Perches
     

06 Mar, 2013

1 commit

  • While shuting down a HVM guest with pci devices passed through we
    get this:

    pciback 0000:04:00.0: restoring config space at offset 0x4 (was 0x100000, writing 0x100002)
    ------------[ cut here ]------------
    WARNING: at drivers/pci/pci.c:1397 pci_disable_device+0x88/0xa0()
    Hardware name: MS-7640
    Device pciback
    disabling already-disabled device
    Modules linked in:
    Pid: 53, comm: xenwatch Not tainted 3.9.0-rc1-20130304a+ #1
    Call Trace:
    [] warn_slowpath_common+0x7a/0xc0
    [] warn_slowpath_fmt+0x41/0x50
    [] pci_disable_device+0x88/0xa0
    [] xen_pcibk_reset_device+0x37/0xd0
    [] ? pcistub_put_pci_dev+0x6f/0x120
    [] pcistub_put_pci_dev+0x8d/0x120
    [] __xen_pcibk_release_devices+0x59/0xa0

    This fixes the bug.

    CC: stable@vger.kernel.org
    Reported-and-Tested-by: Sander Eikelenboom
    Signed-off-by: Konrad Rzeszutek Wilk

    Konrad Rzeszutek Wilk
     

06 Feb, 2013

1 commit


07 Apr, 2012

1 commit

  • Prior to 2.6.19 and as of 2.6.31, pci_enable_msix() can return a
    positive value to indicate the number of vectors (less than the amount
    requested) that can be set up for a given device. Returning this as an
    operation value (secondary result) is fine, but (primary) operation
    results are expected to be negative (error) or zero (success) according
    to the protocol. With the frontend fixed to match the XenoLinux
    behavior, the backend can now validly return zero (success) here,
    passing the upper limit on the number of vectors in op->value.

    Signed-off-by: Jan Beulich
    Signed-off-by: Konrad Rzeszutek Wilk

    Jan Beulich
     

22 Sep, 2011

1 commit

  • This is a minor bugfix and a set of small cleanups; as it is not clear
    whether this needs splitting into pieces (and if so, at what
    granularity), it is a single combined patch.
    - add a missing return statement to an error path in
    kill_domain_by_device()
    - use pci_is_enabled() rather than raw atomic_read()
    - remove a bogus attempt to zero-terminate an already zero-terminated
    string
    - #define DRV_NAME once uniformly in the shared local header
    - make DRIVER_ATTR() variables static
    - eliminate a pointless use of list_for_each_entry_safe()
    - add MODULE_ALIAS()
    - a little bit of constification
    - adjust a few messages
    - remove stray semicolons from inline function definitions

    Signed-off-by: Jan Beulich
    [v1: Dropped the resource_size fix, altered the description]
    [v2: Fixed cleanpatch.pl comments]
    Signed-off-by: Konrad Rzeszutek Wilk

    Jan Beulich
     

20 Jul, 2011

5 commits

  • - Remove the slot and controller controller backend as they
    are not used.
    - Document the find pciback_[read|write]_config_[byte|word|dword]
    to make it easier to find.
    - Collapse the code from conf_space_capability_msi into pciback_ops.c
    - Collapse conf_space_capability_[pm|vpd].c in conf_space_capability.c
    [and remove the conf_space_capability.h file]
    - Rename all visible functions from pciback to xen_pcibk.
    - Rename all the printk/pr_info, etc that use the "pciback" to say
    "xen-pciback".
    - Convert functions that are not referenced outside the code to be
    static to save on name space.
    - Do the same thing for structures that are internal to the driver.
    - Run checkpatch.pl after the renames and fixup its warnings and
    fix any compile errors caused by the variable rename
    - Cleanup any structs that checkpath.pl commented about or just
    look odd.

    Signed-off-by: Konrad Rzeszutek Wilk

    Konrad Rzeszutek Wilk
     
  • If we try to setup an fake IRQ handler for legacy interrupts
    for devices that only have MSI-X (most if not all SR-IOV cards),
    we will fail with this:

    pciback[0000:01:10.0]: failed to install fake IRQ handler for IRQ 0! (rc:-38)

    Since those cards don't have anything in dev->irq.

    Signed-off-by: Konrad Rzeszutek Wilk

    Konrad Rzeszutek Wilk
     
  • If the device that is to be shared with a guest is a level device and
    the IRQ is shared with the initial domain we need to take actions.
    Mainly we install a dummy IRQ handler that will ACK on the interrupt
    line so as to not have the initial domain disable the interrupt line.

    This dummy IRQ handler is not enabled when the device MSI/MSI-X lines
    are set, nor for edge interrupts. And also not for level interrupts
    that are not shared amongst devices. Lastly, if the user passes
    to the guest all of the PCI devices on the shared line the we won't
    install the dummy handler either.

    There is also SysFS instrumentation to check its state and turn
    IRQ ACKing on/off if necessary.

    Signed-off-by: Konrad Rzeszutek Wilk

    Konrad Rzeszutek Wilk
     
  • In cases where the guest is abruptly killed and has not disabled
    MSI/MSI-X interrupts we want to do it for it.

    Otherwise when the guest is started up and enables MSI, we would
    get a WARN() that the device already had been enabled.

    Signed-off-by: Konrad Rzeszutek Wilk

    Konrad Rzeszutek Wilk
     
  • This is the host side counterpart to the frontend driver in
    drivers/pci/xen-pcifront.c. The PV protocol is also implemented by
    frontend drivers in other OSes too, such as the BSDs.

    The PV protocol is rather simple. There is page shared with the guest,
    which has the 'struct xen_pci_sharedinfo' embossed in it. The backend
    has a thread that is kicked every-time the structure is changed and
    based on the operation field it performs specific tasks:

    XEN_PCI_OP_conf_[read|write]:
    Read/Write 0xCF8/0xCFC filtered data. (conf_space*.c)
    Based on which field is probed, we either enable/disable the PCI
    device, change power state, read VPD, etc. The major goal of this
    call is to provide a Physical IRQ (PIRQ) to the guest.

    The PIRQ is Xen hypervisor global IRQ value irrespective of the IRQ
    is tied in to the IO-APIC, or is a vector. For GSI type
    interrupts, the PIRQ==GSI holds. For MSI/MSI-X the
    PIRQ value != Linux IRQ number (thought PIRQ==vector).

    Please note, that with Xen, all interrupts (except those level shared ones)
    are injected directly to the guest - there is no host interaction.

    XEN_PCI_OP_[enable|disable]_msi[|x] (pciback_ops.c)
    Enables/disables the MSI/MSI-X capability of the device. These operations
    setup the MSI/MSI-X vectors for the guest and pass them to the frontend.

    When the device is activated, the interrupts are directly injected in the
    guest without involving the host.

    XEN_PCI_OP_aer_[detected|resume|mmio|slotreset]: In case of failure,
    perform the appropriate AER commands on the guest. Right now that is
    a cop-out - we just kill the guest.

    Besides implementing those commands, it can also

    - hide a PCI device from the host. When booting up, the user can specify
    xen-pciback.hide=(1:0:0)(BDF..) so that host does not try to use the
    device.

    The driver was lifted from linux-2.6.18.hg tree and fixed up
    so that it could compile under v3.0. Per suggestion from Jesse Barnes
    moved the driver to drivers/xen/xen-pciback.

    Signed-off-by: Konrad Rzeszutek Wilk
    Signed-off-by: Jeremy Fitzhardinge

    Konrad Rzeszutek Wilk