07 Jan, 2012

6 commits

  • During S3 or S4 resume or PCI reset, ATS regs aren't restored correctly.
    This patch enables ATS at the device state restore if PCI device has ATS
    capability.

    Signed-off-by: Xudong Hao
    Signed-off-by: Xiantao Zhang
    Signed-off-by: Jesse Barnes

    Hao, Xudong
     
  • When the runtime PM is activated on PCI, if a device switches state
    frequently (e.g. an EHCI controller with autosuspending USB devices
    connected) the PCI configuration traces might be very verbose in the
    kernel log. Let's guard those traces with DEBUG condition.

    Acked-by: "Rafael J. Wysocki"
    Signed-off-by: Vincent Palatin
    Signed-off-by: Jesse Barnes

    Vincent Palatin
     
  • The latency timer is read-only and hardwired to zero for all PCIe
    devices, both Type 0 and Type 1, so don't bother trying to update it
    and cluttering the dmesg log with meaningless "setting latency timer
    to 64" messages.

    Signed-off-by: Myron Stowe
    Signed-off-by: Jesse Barnes

    Myron Stowe
     
  • The 'latency timer' of PCI devices, both Type 0 and Type 1,
    is setup in architecture-specific code [see: 'pcibios_set_master()'].
    There are two approaches being taken by all the architectures - check
    if the 'latency timer' is currently set between 16 and 255 and if not
    bring it within bounds, or, do nothing (and then there is the
    gratuitously different PA-RISC implementation).

    There is nothing architecture-specific about PCI's 'latency timer' so
    this patch pulls its setup functionality up into the PCI core by
    creating a generic 'pcibios_set_master()' function using the '__weak'
    attribute which can be used by all architectures as a default which,
    if necessary, can then be over-ridden by architecture-specific code.

    No functional change.

    Signed-off-by: Myron Stowe
    Signed-off-by: Jesse Barnes

    Myron Stowe
     
  • These new PCI services allow to probe for 2.3-compliant INTx masking
    support and then use the feature from PCI interrupt handlers. The
    services are properly synchronized with concurrent config space access
    via sysfs or on device reset.

    This enables generic PCI device drivers like uio_pci_generic or KVM's
    device assignment to implement the necessary kernel-side IRQ handling
    without any knowledge about device-specific interrupt status and control
    registers.

    Acked-by: Michael S. Tsirkin
    Signed-off-by: Jan Kiszka
    Signed-off-by: Jesse Barnes

    Jan Kiszka
     
  • pci_block_user_cfg_access was designed for the use case that a single
    context, the IPR driver, temporarily delays user space accesses to the
    config space via sysfs. This assumption became invalid by the time
    pci_dev_reset was added as locking instance. Today, if you run two loops
    in parallel that reset the same device via sysfs, you end up with a
    kernel BUG as pci_block_user_cfg_access detect the broken assumption.

    This reworks the pci_block_user_cfg_access to a sleeping service
    pci_cfg_access_lock and an atomic-compatible variant called
    pci_cfg_access_trylock. The former not only blocks user space access as
    before but also waits if access was already locked. The latter service
    just returns false in this case, allowing the caller to resolve the
    conflict instead of raising a BUG.

    Adaptions of the ipr driver were originally written by Brian King.

    Acked-by: Brian King
    Acked-by: Michael S. Tsirkin
    Signed-off-by: Jan Kiszka
    Signed-off-by: Jesse Barnes

    Jan Kiszka
     

19 Dec, 2011

1 commit

  • I noticed that hotplug of one setup does not work with recent change in
    pci tree.

    After checking the bridge conf setup, I noticed that the bridges get
    assigned but do not get enabled.

    The reason is the following commit, while simply ignores bridge
    resources when enabling a pci device:

    | commit bbef98ab0f019f1b0c25c1acdf1683c68933d41b
    | Author: Ram Pai
    | Date: Sun Nov 6 10:33:10 2011 +0800
    |
    | PCI: defer enablement of SRIOV BARS
    |...
    | NOTE: Note, there is subtle change in the pci_enable_device() API. Any
    | driver that depends on SRIOV BARS to be enabled in pci_enable_device()
    | can fail.

    Put back bridge resource and ROM resource checking to fix the problem.

    That should fix regression like BIOS does not assign correct resource to
    bridge.

    Discussion can be found at:
    http://www.spinics.net/lists/linux-pci/msg12874.html

    Signed-off-by: Yinghai Lu
    Acked-by: Jesse Barnes
    Signed-off-by: Linus Torvalds

    Yinghai Lu
     

15 Dec, 2011

1 commit

  • During test of one IB card with guest VM, found that, msi is not
    initialized properly.

    It turns out __write_msi_msg will do nothing if device current_state is
    not PCI_D0. And, that pci device does not have pm_cap in guest VM.

    There is an error in setting of power state to PCI_D0 in
    pci_enable_device(), but error is not returned for this. Following is
    code flow:

    pci_enable_device() --> __pci_enable_device_flags() -->
    do_pci_enable_device() --> pci_set_power_state() -->
    __pci_start_power_transition()

    We have following condition inside __pci_start_power_transition():
    if (platform_pci_power_manageable(dev)) {
    error = platform_pci_set_power_state(dev, state);
    if (!error)
    pci_update_current_state(dev, state);
    } else {
    error = -ENODEV;
    /* Fall back to PCI_D0 if native PM is not supported */
    if (!dev->pm_cap)
    dev->current_state = PCI_D0;
    }

    Here, from platform_pci_set_power_state(), acpi_pci_set_power_state() is
    getting called and that is failing with ENODEV because of following
    condition:

    if (!handle || ACPI_SUCCESS(acpi_get_handle(handle, "_EJ0",&tmp)))
    return -ENODEV;

    Because of that, pci_update_current_state() is not getting called.

    With this patch, if device power state can not be set via
    platform_pci_set_power_state and that device does not have native pm
    support, then PCI device power state will be set to PCI_D0.

    -v2: This also reverts 47e9037ac16637cd7f12b8790ea7ce6680e42168, as it's
    not needed after this change.

    Acked-by: "Rafael J. Wysocki"
    Signed-off-by: Ajaykumar Hotchandani
    Signed-off-by: Yinghai Lu
    Signed-off-by: Jesse Barnes

    Ajaykumar Hotchandani
     

06 Dec, 2011

1 commit

  • All the PCI BARs of a device are enabled when the device is enabled
    using pci_enable_device(). This unnecessarily enables SRIOV BARs of the
    device.

    On some platforms, which do not support SRIOV as yet, the
    pci_enable_device() fails to enable the device if its SRIOV BARs are not
    allocated resources correctly.

    The following patch fixes the above problem. The SRIOV BARs are now
    enabled when IOV capability of the device is enabled in sriov_enable().

    NOTE: Note, there is subtle change in the pci_enable_device() API. Any
    driver that depends on SRIOV BARS to be enabled in pci_enable_device()
    can fail.

    The patch has been touch tested on power and x86 platform.

    Tested-by: Michael Wang
    Signed-off-by: Ram Pai
    Signed-off-by: Jesse Barnes

    Ram Pai
     

28 Oct, 2011

1 commit

  • When configuring the PCIe settings for "performance", we allow parents
    to have a larger Max Payload Size than children and rely on children
    Max Read Request Size to not be larger than their own MPS to avoid
    having the host bridge generate responses they can't cope with.

    However, various drivers in Linux call pci_set_readrq() with arbitrary
    values, assuming this to be a simple performance tweak. This breaks
    under our "performance" configuration.

    Fix that by making sure the value programmed by pcie_set_readrq() is
    never larger than the configured MPS for that device.

    Signed-off-by: Benjamin Herrenschmidt
    Signed-off-by: Jon Mason
    Signed-off-by: Jesse Barnes

    Benjamin Herrenschmidt
     

15 Oct, 2011

1 commit

  • The land of PCI power management is a land of sorrow and ugliness,
    especially in the area of signaling events by devices. There are
    devices that set their PME Status bits, but don't really bother
    to send a PME message or assert PME#. There are hardware vendors
    who don't connect PME# lines to the system core logic (they know
    who they are). There are PCI Express Root Ports that don't bother
    to trigger interrupts when they receive PME messages from the devices
    below. There are ACPI BIOSes that forget to provide _PRW methods for
    devices capable of signaling wakeup. Finally, there are BIOSes that
    do provide _PRW methods for such devices, but then don't bother to
    call Notify() for those devices from the corresponding _Lxx/_Exx
    GPE-handling methods. In all of these cases the kernel doesn't have
    a chance to receive a proper notification that it should wake up a
    device, so devices stay in low-power states forever. Worse yet, in
    some cases they continuously send PME Messages that are silently
    ignored, because the kernel simply doesn't know that it should clear
    the device's PME Status bit.

    This problem was first observed for "parallel" (non-Express) PCI
    devices on add-on cards and Matthew Garrett addressed it by adding
    code that polls PME Status bits of such devices, if they are enabled
    to signal PME, to the kernel. Recently, however, it has turned out
    that PCI Express devices are also affected by this issue and that it
    is not limited to add-on devices, so it seems necessary to extend
    the PME polling to all PCI devices, including PCI Express and planar
    ones. Still, it would be wasteful to poll the PME Status bits of
    devices that are known to receive proper PME notifications, so make
    the kernel (1) poll the PME Status bits of all PCI and PCIe devices
    enabled to signal PME and (2) disable the PME Status polling for
    devices for which correct PME notifications are received.

    Tested-by: Sarah Sharp
    Signed-off-by: Rafael J. Wysocki
    Signed-off-by: Jesse Barnes

    Rafael J. Wysocki
     

05 Oct, 2011

1 commit

  • Add the ability to disable PCI-E MPS turning and using the BIOS
    configured MPS defaults. Due to the number of issues recently
    discovered on some x86 chipsets, make this the default behavior.

    Also, add the option for peer to peer DMA MPS configuration. Peer to
    peer DMA is outside the scope of this patch, but MPS configuration could
    prevent it from working by having the MPS on one root port different
    than the MPS on another. To work around this, simply make the system
    wide MPS the smallest possible value (128B).

    Signed-off-by: Jon Mason
    Acked-by: Benjamin Herrenschmidt
    Signed-off-by: Linus Torvalds

    Jon Mason
     

10 Sep, 2011

1 commit

  • Modifying the Maximum Read Request Size to 0 (value of 128Bytes) has
    massive negative ramifications on some devices. Without knowing which
    devices have this issue, do not modify from the default value when
    walking the PCI-E bus in pcie_bus_safe mode. Also, make pcie_bus_safe
    the default procedure.

    Tested-by: Sven Schnelle
    Tested-by: Simon Kirby
    Tested-by: Stephen M. Cameron
    Reported-and-tested-by: Eric Dumazet
    Reported-and-tested-by: Niels Ole Salscheider
    References: https://bugzilla.kernel.org/show_bug.cgi?id=42162
    Signed-off-by: Jon Mason
    Acked-by: Jesse Barnes
    Signed-off-by: Linus Torvalds

    Jon Mason
     

21 Aug, 2011

1 commit

  • Fix new kernel-doc warning in pci.c:

    Warning(drivers/pci/pci.c:3259): No description found for parameter 'mps'
    Warning(drivers/pci/pci.c:3259): Excess function parameter 'rq' description in 'pcie_set_mps'

    Signed-off-by: Randy Dunlap
    Cc: Jesse Barnes
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     

02 Aug, 2011

1 commit

  • On a given PCI-E fabric, each device, bridge, and root port can have a
    different PCI-E maximum payload size. There is a sizable performance
    boost for having the largest possible maximum payload size on each PCI-E
    device. However, if improperly configured, fatal bus errors can occur.
    Thus, it is important to ensure that PCI-E payloads sends by a device
    are never larger than the MPS setting of all devices on the way to the
    destination.

    This can be achieved two ways:

    - A conservative approach is to use the smallest common denominator of
    the entire tree below a root complex for every device on that fabric.

    This means for example that having a 128 bytes MPS USB controller on one
    leg of a switch will dramatically reduce performances of a video card or
    10GE adapter on another leg of that same switch.

    It also means that any hierarchy supporting hotplug slots (including
    expresscard or thunderbolt I suppose, dbl check that) will have to be
    entirely clamped to 128 bytes since we cannot predict what will be
    plugged into those slots, and we cannot change the MPS on a "live"
    system.

    - A more optimal way is possible, if it falls within a couple of
    constraints:
    * The top-level host bridge will never generate packets larger than the
    smallest TLP (or if it can be controlled independently from its MPS at
    least)
    * The device will never generate packets larger than MPS (which can be
    configured via MRRS)
    * No support of direct PCI-E PCI-E transfers between devices without
    some additional code to specifically deal with that case

    Then we can use an approach that basically ignores downstream requests
    and focuses exclusively on upstream requests. In that case, all we need
    to care about is that a device MPS is no larger than its parent MPS,
    which allows us to keep all switches/bridges to the max MPS supported by
    their parent and eventually the PHB.

    In this case, your USB controller would no longer "starve" your 10GE
    Ethernet and your hotplug slots won't affect your global MPS.
    Additionally, the hotplugged devices themselves can be configured to a
    larger MPS up to the value configured in the hotplug bridge.

    To choose between the two available options, two PCI kernel boot args
    have been added to the PCI calls. "pcie_bus_safe" will provide the
    former behavior, while "pcie_bus_perf" will perform the latter behavior.
    By default, the latter behavior is used.

    NOTE: due to the location of the enablement, each arch will need to add
    calls to this function. This patch only enables x86.

    This patch includes a number of changes recommended by Benjamin
    Herrenschmidt.

    Tested-by: Jordan_Hargrave@dell.com
    Signed-off-by: Jon Mason
    Signed-off-by: Jesse Barnes

    Jon Mason
     

23 Jul, 2011

1 commit

  • When setting the PCI-E MRRS, pcie_set_readrq queries the current
    settings via a pci_read_config_word call but writes the modified result
    via a pci_write_config_dword. This results in writing 16 more bits than
    were queried.

    Also, the function description comment is slightly incorrect.

    Signed-off-by: Jon Mason
    Signed-off-by: Jesse Barnes

    Jon Mason
     

22 Jul, 2011

1 commit

  • The function pci_enable_ari() may mistakenly set the downstream port
    of a v1 PCIe switch in ARI Forwarding mode. This is a PCIe v2 feature,
    and with an SR-IOV device on that switch port believing the switch above
    is ARI capable it may attempt to use functions 8-255, translating into
    invalid (non-zero) device numbers for that bus. This has been seen
    to cause Completion Timeouts and general misbehaviour including hangs
    and panics.

    Cc: stable@kernel.org
    Acked-by: Don Dutile
    Tested-by: Don Dutile
    Signed-off-by: Chris Wright
    Signed-off-by: Jesse Barnes

    Chris Wright
     

09 Jul, 2011

1 commit

  • Multiple attempts to dynamically reallocate pci resources have
    unfortunately lead to regressions. Though we continue to fix the
    regressions and fine tune the dynamic-reallocation behavior, we have not
    reached a acceptable state yet.

    This patch provides a interim solution. It disables dynamic reallocation
    by default, but adds the ability to enable it through pci=realloc kernel
    command line parameter.

    Tested-by: Oliver Hartkopp
    Signed-off-by: Ram Pai
    Signed-off-by: Jesse Barnes

    Ram Pai
     

24 Jun, 2011

1 commit


14 Jun, 2011

1 commit


02 Jun, 2011

1 commit

  • Fix pci.c kernel-doc warnings:

    Warning(drivers/pci/pci.c:3292): No description found for parameter 'flags'
    Warning(drivers/pci/pci.c:3292): Excess function parameter 'change_bridge_flags' description in 'pci_set_vga_state'

    Signed-off-by: Randy Dunlap
    Signed-off-by: Jesse Barnes

    Randy Dunlap
     

25 May, 2011

1 commit

  • * 'drm-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6: (169 commits)
    drivers/gpu/drm/radeon/atom.c: fix warning
    drm/radeon/kms: bump kms version number
    drm/radeon/kms: properly set num banks for fusion asics
    drm/radeon/kms/atom: move dig phy init out of modesetting
    drm/radeon/kms/cayman: fix typo in register mask
    drm/radeon/kms: fix typo in spread spectrum code
    drm/radeon/kms: fix tile_config value reported to userspace on cayman.
    drm/radeon/kms: fix incorrect comparison in cayman setup code.
    drm/radeon/kms: add wait idle ioctl for eg->cayman
    drm/radeon/cayman: setup hdp to invalidate and flush when asked
    drm/radeon/evergreen/btc/fusion: setup hdp to invalidate and flush when asked
    agp/uninorth: Fix lockups with radeon KMS and >1x.
    drm/radeon/kms: the SS_Id field in the LCD table if for LVDS only
    drm/radeon/kms: properly set the CLK_REF bit for DCE3 devices
    drm/radeon/kms: fixup eDP connector handling
    drm/radeon/kms: bail early for eDP in hotplug callback
    drm/radeon/kms: simplify hotplug handler logic
    drm/radeon/kms: rewrite DP handling
    drm/radeon/kms/atom: add support for setting DP panel mode
    drm/radeon/kms: atombios.h updates for DP panel mode
    ...

    Linus Torvalds
     

22 May, 2011

2 commits


12 May, 2011

3 commits

  • Latency tolerance reporting allows devices to send messages to the root
    complex indicating their latency tolerance for snooped & unsnooped
    memory transactions. Add support for enabling & disabling this
    feature, along with a routine to set the max latencies a device should
    send upstream.

    Signed-off-by: Jesse Barnes

    Jesse Barnes
     
  • OBFF (optimized buffer flush/fill), where supported, can help improve
    energy efficiency by giving devices information about when interrupts
    and other activity will have a reduced power impact. It requires
    support from both the device and system (i.e. not only does the device
    need to respond to OBFF messages, but the platform must be capable of
    generating and routing them to the end point).

    Signed-off-by: Jesse Barnes

    Jesse Barnes
     
  • Add support to allow drivers to enable/disable ID-based ordering. Where
    supported, ID-based ordering can significantly improve the latency of
    individual requests by preventing them from queueing up behind unrelated
    traffic.

    Signed-off-by: Jesse Barnes

    Jesse Barnes
     

11 May, 2011

1 commit

  • The pci_pm_reset() function is not a very nice interface due to its
    limitations and conditional behavior (e.g. it doesn't affect devices
    in low-power states), but it cannot be simply dropped, because
    existing device drivers may depend on it. However, its behavior and
    limitations should be well documented, so add an appropriate
    kerneldoc comment to it.

    Signed-off-by: Rafael J. Wysocki
    Signed-off-by: Jesse Barnes

    Rafael J. Wysocki
     

04 May, 2011

1 commit

  • So in a lot of modern systems, a GPU will always be below a parent bridge that won't share with any other GPUs. This means VGA arbitration on those GPUs can be controlled by using the bridge routing instead of io/mem decodes.

    The problem is locating which GPUs share which upstream bridges. This patch attempts to identify all the GPUs which can be controlled via bridges, and ones that can't. This patch endeavours to work out the bridge sharing semantics.

    When disabling GPUs via a bridge, it doesn't do irq callbacks or touch the io/mem decodes for the gpu.

    Signed-off-by: Dave Airlie

    Dave Airlie
     

22 Mar, 2011

1 commit

  • v3 -> v2: Moved ASPM enabling logic to pci_set_power_state()
    v2 -> v1: Preserved the logic in pci_raw_set_power_state()
    : Added ASPM enabling logic after scanning Root Bridge
    : http://marc.info/?l=linux-pci&m=130046996216391&w=2
    v1 : http://marc.info/?l=linux-pci&m=130013164703283&w=2

    The assumption made in commit 41cd766b065970ff6f6c89dd1cf55fa706c84a3d
    (PCI: Don't enable aspm before drivers have had a chance to veto it) that
    pci_enable_device() will result in re-configuring ASPM when aspm_policy is
    POWERSAVE is no longer valid. This is due to commit
    97c145f7c87453cec90e91238fba5fe2c1561b32 (PCI: read current power state
    at enable time) which resets dev->current_state to D0. Due to this the
    call to pcie_aspm_pm_state_change() is never made. Note the equality check
    (below) that returns early:
    ./drivers/pci/pci.c: pci_raw_set_pci_power_state()
    546 /* Check if we're already there */
    547 if (dev->current_state == state)
    548 return 0;

    Therefore OSPM never configures the PCIe links for ASPM to turn them "on".

    Fix it by configuring ASPM from the pci_enable_device() code path. This
    also allows a driver such as the e1000e networking driver a chance to
    disable ASPM (L0s, L1), if need be, prior to enabling the device. A
    driver may perform this action if the device is known to mis-behave
    wrt ASPM.

    Signed-off-by: Naga Chumbalkar
    Acked-by: Rafael J. Wysocki
    Cc: Matthew Garrett
    Signed-off-by: Jesse Barnes

    Naga Chumbalkar
     

15 Jan, 2011

2 commits


24 Dec, 2010

1 commit

  • pci_restore_state only ever returns 0, thus there is no benefit in
    having it return any value. Also, a large majority of the callers do
    not check the return code of pci_restore_state. Make the
    pci_restore_state a void return and avoid the overhead.

    Acked-by: Mauro Carvalho Chehab
    Signed-off-by: Jon Mason
    Signed-off-by: Jesse Barnes

    Jon Mason
     

12 Nov, 2010

1 commit

  • When we enable a PCI device, we avoid doing a lot of the initial setup
    work if the device's enable count is non-zero. If we don't fetch the
    power state though, we may later fail to set up MSI due to the unknown
    status. So pick it up before we short circuit the rest due to a
    pre-existing enable or mismatched enable/disable pair (as happens with
    VGA devices, which are special in a special way).

    Tested-by: Jesse Brandeburg
    Reported-by: Dave Airlie
    Tested-by: Dave Airlie
    Signed-off-by: Jesse Barnes

    Jesse Barnes
     

18 Oct, 2010

1 commit


16 Oct, 2010

1 commit

  • Indent the branch of an if.

    The semantic match that finds this problem is as follows:
    (http://coccinelle.lip6.fr/)

    //
    @r disable braces4@
    position p1,p2;
    statement S1,S2;
    @@

    (
    if (...) { ... }
    |
    if (...) S1@p1 S2@p2
    )

    @script:python@
    p1 << r.p1;
    p2 << r.p2;
    @@

    if (p1[0].column == p2[0].column):
    cocci.print_main("branch",p1)
    cocci.print_secs("after",p2)
    //

    Signed-off-by: Julia Lawall
    Signed-off-by: Jesse Barnes

    Julia Lawall
     

07 Aug, 2010

1 commit

  • * 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (30 commits)
    PCI: update for owner removal from struct device_attribute
    PCI: Fix warnings when CONFIG_DMI unset
    PCI: Do not run NVidia quirks related to MSI with MSI disabled
    x86/PCI: use for_each_pci_dev()
    PCI: use for_each_pci_dev()
    PCI: MSI: Restore read_msi_msg_desc(); add get_cached_msi_msg_desc()
    PCI: export SMBIOS provided firmware instance and label to sysfs
    PCI: Allow read/write access to sysfs I/O port resources
    x86/PCI: use host bridge _CRS info on ASRock ALiveSATA2-GLAN
    PCI: remove unused HAVE_ARCH_PCI_SET_DMA_MAX_SEGMENT_{SIZE|BOUNDARY}
    PCI: disable mmio during bar sizing
    PCI: MSI: Remove unsafe and unnecessary hardware access
    PCI: Default PCIe ASPM control to on and require !EMBEDDED to disable
    PCI: kernel oops on access to pci proc file while hot-removal
    PCI: pci-sysfs: remove casts from void*
    ACPI: Disable ASPM if the platform won't provide _OSC control for PCIe
    PCI hotplug: make sure child bridges are enabled at hotplug time
    PCI hotplug: shpchp: Removed check for hotplug of display devices
    PCI hotplug: pciehp: Fixed return value sign for pciehp_unconfigure_device
    PCI: Don't enable aspm before drivers have had a chance to veto it
    ...

    Linus Torvalds
     

31 Jul, 2010

1 commit

  • In 2.6.34, we transformed the PCI DMA API into the generic device
    mode. The PCI DMA API is just the wrapper of the DMA API.

    So we don't need HAVE_ARCH_PCI_SET_DMA_MAX_SEGMENT_SIZE or
    HAVE_ARCH_PCI_SET_DMA_SEGMENT_BOUNDARY (which enable architectures to
    have the own implementations). Both haven't been used anyway.

    Signed-off-by: FUJITA Tomonori
    Signed-off-by: Jesse Barnes

    FUJITA Tomonori
     

19 Jul, 2010

1 commit

  • One of the arguments during the suspend blockers discussion was that
    the mainline kernel didn't contain any mechanisms making it possible
    to avoid races between wakeup and system suspend.

    Generally, there are two problems in that area. First, if a wakeup
    event occurs exactly when /sys/power/state is being written to, it
    may be delivered to user space right before the freezer kicks in, so
    the user space consumer of the event may not be able to process it
    before the system is suspended. Second, if a wakeup event occurs
    after user space has been frozen, it is not generally guaranteed that
    the ongoing transition of the system into a sleep state will be
    aborted.

    To address these issues introduce a new global sysfs attribute,
    /sys/power/wakeup_count, associated with a running counter of wakeup
    events and three helper functions, pm_stay_awake(), pm_relax(), and
    pm_wakeup_event(), that may be used by kernel subsystems to control
    the behavior of this attribute and to request the PM core to abort
    system transitions into a sleep state already in progress.

    The /sys/power/wakeup_count file may be read from or written to by
    user space. Reads will always succeed (unless interrupted by a
    signal) and return the current value of the wakeup events counter.
    Writes, however, will only succeed if the written number is equal to
    the current value of the wakeup events counter. If a write is
    successful, it will cause the kernel to save the current value of the
    wakeup events counter and to abort the subsequent system transition
    into a sleep state if any wakeup events are reported after the write
    has returned.

    [The assumption is that before writing to /sys/power/state user space
    will first read from /sys/power/wakeup_count. Next, user space
    consumers of wakeup events will have a chance to acknowledge or
    veto the upcoming system transition to a sleep state. Finally, if
    the transition is allowed to proceed, /sys/power/wakeup_count will
    be written to and if that succeeds, /sys/power/state will be written
    to as well. Still, if any wakeup events are reported to the PM core
    by kernel subsystems after that point, the transition will be
    aborted.]

    Additionally, put a wakeup events counter into struct dev_pm_info and
    make these per-device wakeup event counters available via sysfs,
    so that it's possible to check the activity of various wakeup event
    sources within the kernel.

    To illustrate how subsystems can use pm_wakeup_event(), make the
    low-level PCI runtime PM wakeup-handling code use it.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Jesse Barnes
    Acked-by: Greg Kroah-Hartman
    Acked-by: markgross
    Reviewed-by: Alan Stern

    Rafael J. Wysocki
     

23 Jun, 2010

1 commit

  • virtio-pci resets the device at startup by writing to the status
    register, but this does not clear the pci config space,
    specifically msi enable status which affects register
    layout.

    This breaks things like kdump when they try to use e.g. virtio-blk.

    Fix by forcing msi off at startup. Since pci.c already has
    a routine to do this, we export and use it instead of duplicating code.

    Signed-off-by: Michael S. Tsirkin
    Tested-by: Vivek Goyal
    Acked-by: Jesse Barnes
    Cc: linux-pci@vger.kernel.org
    Signed-off-by: Rusty Russell
    Cc: stable@kernel.org

    Michael S. Tsirkin