07 Nov, 2011

2 commits

  • * 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
    Revert "tracing: Include module.h in define_trace.h"
    irq: don't put module.h into irq.h for tracking irqgen modules.
    bluetooth: macroize two small inlines to avoid module.h
    ip_vs.h: fix implicit use of module_get/module_put from module.h
    nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
    include: replace linux/module.h with "struct module" wherever possible
    include: convert various register fcns to macros to avoid include chaining
    crypto.h: remove unused crypto_tfm_alg_modname() inline
    uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
    pm_runtime.h: explicitly requires notifier.h
    linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
    miscdevice.h: fix up implicit use of lists and types
    stop_machine.h: fix implicit use of smp.h for smp_processor_id
    of: fix implicit use of errno.h in include/linux/of.h
    of_platform.h: delete needless include
    acpi: remove module.h include from platform/aclinux.h
    miscdevice.h: delete unnecessary inclusion of module.h
    device_cgroup.h: delete needless include
    net: sch_generic remove redundant use of
    net: inet_timewait_sock doesnt need
    ...

    Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in
    - drivers/media/dvb/frontends/dibx000_common.c
    - drivers/media/video/{mt9m111.c,ov6650.c}
    - drivers/mfd/ab3550-core.c
    - include/linux/dmaengine.h

    Linus Torvalds
     
  • * 'trivial' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
    scsi: drop unused Kconfig symbol
    pci: drop unused Kconfig symbol
    stmmac: drop unused Kconfig symbol
    x86: drop unused Kconfig symbol
    powerpc: drop unused Kconfig symbols
    powerpc: 40x: drop unused Kconfig symbol
    mips: drop unused Kconfig symbols
    openrisc: drop unused Kconfig symbols
    arm: at91: drop unused Kconfig symbol
    samples: drop unused Kconfig symbol
    m32r: drop unused Kconfig symbol
    score: drop unused Kconfig symbols
    sh: drop unused Kconfig symbol
    um: drop unused Kconfig symbol
    sparc: drop unused Kconfig symbol
    alpha: drop unused Kconfig symbol

    Fix up trivial conflict in drivers/net/ethernet/stmicro/stmmac/Kconfig
    as per Michal: the STMMAC_DUAL_MAC config variable is still unused and
    should be deleted.

    Linus Torvalds
     

01 Nov, 2011

3 commits


29 Oct, 2011

1 commit

  • * 'next-rebase' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci:
    PCI: Clean-up MPS debug output
    pci: Clamp pcie_set_readrq() when using "performance" settings
    PCI: enable MPS "performance" setting to properly handle bridge MPS
    PCI: Workaround for Intel MPS errata
    PCI: Add support for PASID capability
    PCI: Add implementation for PRI capability
    PCI: Export ATS functions to modules
    PCI: Move ATS implementation into own file
    PCI / PM: Remove unnecessary error variable from acpi_dev_run_wake()
    PCI hotplug: acpiphp: Prevent deadlock on PCI-to-PCI bridge remove
    PCI / PM: Extend PME polling to all PCI devices
    PCI quirk: mmc: Always check for lower base frequency quirk for Ricoh 1180:e823
    PCI: Make pci_setup_bridge() non-static for use by arch code
    x86: constify PCI raw ops structures
    PCI: Add quirk for known incorrect MPSS
    PCI: Add Solarflare vendor ID and SFC4000 device IDs

    Linus Torvalds
     

28 Oct, 2011

4 commits

  • Clean-up MPS debug output to make it a single line and aligned, thus
    making it more readable for a large number of buses and devices in a
    single system.

    Suggested by Benjamin Herrenschmidt

    Signed-off-by: Jon Mason
    Signed-off-by: Jesse Barnes

    Jon Mason
     
  • When configuring the PCIe settings for "performance", we allow parents
    to have a larger Max Payload Size than children and rely on children
    Max Read Request Size to not be larger than their own MPS to avoid
    having the host bridge generate responses they can't cope with.

    However, various drivers in Linux call pci_set_readrq() with arbitrary
    values, assuming this to be a simple performance tweak. This breaks
    under our "performance" configuration.

    Fix that by making sure the value programmed by pcie_set_readrq() is
    never larger than the configured MPS for that device.

    Signed-off-by: Benjamin Herrenschmidt
    Signed-off-by: Jon Mason
    Signed-off-by: Jesse Barnes

    Benjamin Herrenschmidt
     
  • Rework the "performance" MPS option to configure the device MPS with the
    smaller of the device MPSS or the bridge MPS (which is assumed to be
    properly configured at this point to the largest allowable MPS based on
    its parent bus).

    Also, rework the MRRS setting to report an inability to set the MRRS to
    a valid setting.

    Signed-off-by: Jon Mason
    Acked-by: Benjamin Herrenschmidt
    Signed-off-by: Jesse Barnes

    Jon Mason
     
  • Intel 5000 and 5100 series memory controllers have a known issue if read
    completion coalescing is enabled and the PCI-E Maximum Payload Size is
    set to 256B. To work around this issue, disable read completion
    coalescing in the memory controller and root complexes. Unfortunately,
    it must always be disabled, even if no 256B MPS devices are present, due
    to the possibility of one being hotplugged.

    Links to erratas:
    http://www.intel.com/content/dam/doc/specification-update/5000-chipset-memory-controller-hub-specification-update.pdf
    http://www.intel.com/content/dam/doc/specification-update/5100-memory-controller-hub-chipset-specification-update.pdf

    Thanks to Jesse Brandeburg and Ben Hutchings for providing insight into
    the problem.

    Tested-and-Reported-by: Avi Kivity
    Signed-off-by: Jon Mason
    Signed-off-by: Jesse Barnes

    Jon Mason
     

26 Oct, 2011

1 commit

  • * 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86, ioapic: Consolidate the explicit EOI code
    x86, ioapic: Restore the mask bit correctly in eoi_ioapic_irq()
    x86, kdump, ioapic: Reset remote-IRR in clear_IO_APIC
    iommu: Rename the DMAR and INTR_REMAP config options
    x86, ioapic: Define irq_remap_modify_chip_defaults()
    x86, msi, intr-remap: Use the ioapic set affinity routine
    iommu: Cleanup ifdefs in detect_intel_iommu()
    iommu: No need to set dmar_disabled in check_zero_address()
    iommu: Move IOMMU specific code to intel-iommu.c
    intr_remap: Call dmar_dev_scope_init() explicitly
    x86, x2apic: Enable the bios request for x2apic optout

    Linus Torvalds
     

25 Oct, 2011

1 commit

  • …ci.fixes-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen

    * 'stable/drivers-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
    xenbus: don't rely on xen_initial_domain to detect local xenstore
    xenbus: Fix loopback event channel assuming domain 0
    xen/pv-on-hvm:kexec: Fix implicit declaration of function 'xen_hvm_domain'
    xen/pv-on-hvm kexec: add xs_reset_watches to shutdown watches from old kernel
    xen/pv-on-hvm kexec: update xs_wire.h:xsd_sockmsg_type from xen-unstable
    xen/pv-on-hvm kexec+kdump: reset PV devices in kexec or crash kernel
    xen/pv-on-hvm kexec: rebind virqs to existing eventchannel ports
    xen/pv-on-hvm kexec: prevent crash in xenwatch_thread() when stale watch events arrive

    * 'stable/drivers.bugfixes-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
    xen/pciback: Check if the device is found instead of blindly assuming so.
    xen/pciback: Do not dereference psdev during printk when it is NULL.
    xen: remove XEN_PLATFORM_PCI config option
    xen: XEN_PVHVM depends on PCI
    xen/pciback: double lock typo
    xen/pciback: use mutex rather than spinlock in vpci backend
    xen/pciback: Use mutexes when working with Xenbus state transitions.
    xen/pciback: miscellaneous adjustments
    xen/pciback: use mutex rather than spinlock in passthrough backend
    xen/pciback: use resource_size()

    * 'stable/pci.fixes-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
    xen/pci: support multi-segment systems
    xen-swiotlb: When doing coherent alloc/dealloc check before swizzling the MFNs.
    xen/pci: make bus notifier handler return sane values
    xen-swiotlb: fix printk and panic args
    xen-swiotlb: Fix wrong panic.
    xen-swiotlb: Retry up three times to allocate Xen-SWIOTLB
    xen-pcifront: Update warning comment to use 'e820_host' option.

    Linus Torvalds
     

15 Oct, 2011

10 commits

  • Devices supporting Process Address Space Identifiers
    (PASIDs) can use an IOMMU to access multiple IO address
    spaces at the same time. A PCIe device indicates support for
    this feature by implementing the PASID capability. This
    patch adds support for the capability to the Linux kernel.

    Reviewed-by: Bjorn Helgaas
    Signed-off-by: Joerg Roedel
    Signed-off-by: Jesse Barnes

    Joerg Roedel
     
  • Implement the necessary functions to handle PRI capabilities
    on PCIe devices. With PRI devices behind an IOMMU can signal
    page fault conditions to software and recover from such
    faults.

    Reviewed-by: Bjorn Helgaas
    Signed-off-by: Joerg Roedel
    Signed-off-by: Jesse Barnes

    Joerg Roedel
     
  • This patch makes the ATS functions usable for modules.
    They will be used by a module implementing some advanced
    AMD IOMMU features.

    Reviewed-by: Bjorn Helgaas
    Signed-off-by: Joerg Roedel
    Signed-off-by: Jesse Barnes

    Joerg Roedel
     
  • ATS does not depend on IOV support, so move the code into
    its own file. This file will also include support for the
    PRI and PASID capabilities later.
    Also give ATS its own Kconfig variable to allow selecting it
    without IOV support.

    Reviewed-by: Bjorn Helgaas
    Signed-off-by: Joerg Roedel
    Signed-off-by: Jesse Barnes

    Joerg Roedel
     
  • The result returned by acpi_dev_run_wake() is always either -EINVAL
    or -ENODEV, while obviously it should return 0 on success. The
    problem is that the leftover error variable, that's not really used
    in the function, is initialized with -ENODEV and then returned
    without modification.

    To fix this issue remove the error variable from acpi_dev_run_wake()
    and make the function return 0 on success as appropriate.

    Signed-off-by: Rafael J. Wysocki
    Signed-off-by: Jesse Barnes

    Rafael J. Wysocki
     
  • I originally submitted a patch to workaround this by pushing all Ejection
    Requests and Device Checks onto the kacpi_hotplug queue.

    http://marc.info/?l=linux-acpi&m=131678270930105&w=2

    The patch is still insufficient in that Bus Checks also need to be added.

    Rather than add all events, including non-PCI-hotplug events, to the
    hotplug queue, mjg suggested that a better approach would be to modify
    the acpiphp driver so only acpiphp events would be added to the
    kacpi_hotplug queue.

    It's a longer patch, but at least we maintain the benefit of having separate
    queues in ACPI. This, of course, is still only a workaround the problem.
    As Bjorn and mjg pointed out, we have to refactor a lot of this code to do
    the right thing but at this point it is a better to have this code working.

    The acpi core places all events on the kacpi_notify queue. When the acpiphp
    driver is loaded and a PCI card with a PCI-to-PCI bridge is removed the
    following call sequence occurs:

    cleanup_p2p_bridge()
    -> cleanup_bridge()
    -> acpi_remove_notify_handler()
    -> acpi_os_wait_events_complete()
    -> flush_workqueue(kacpi_notify_wq)

    which is the queue we are currently executing on and the process will hang.

    Move all hotplug acpiphp events onto the kacpi_hotplug workqueue. In
    handle_hotplug_event_bridge() and handle_hotplug_event_func() we can simply
    push the rest of the work onto the kacpi_hotplug queue and then avoid the
    deadlock.

    Signed-off-by: Prarit Bhargava
    Cc: mjg@redhat.com
    Cc: bhelgaas@google.com
    Cc: linux-acpi@vger.kernel.org
    Signed-off-by: Jesse Barnes

    Prarit Bhargava
     
  • The land of PCI power management is a land of sorrow and ugliness,
    especially in the area of signaling events by devices. There are
    devices that set their PME Status bits, but don't really bother
    to send a PME message or assert PME#. There are hardware vendors
    who don't connect PME# lines to the system core logic (they know
    who they are). There are PCI Express Root Ports that don't bother
    to trigger interrupts when they receive PME messages from the devices
    below. There are ACPI BIOSes that forget to provide _PRW methods for
    devices capable of signaling wakeup. Finally, there are BIOSes that
    do provide _PRW methods for such devices, but then don't bother to
    call Notify() for those devices from the corresponding _Lxx/_Exx
    GPE-handling methods. In all of these cases the kernel doesn't have
    a chance to receive a proper notification that it should wake up a
    device, so devices stay in low-power states forever. Worse yet, in
    some cases they continuously send PME Messages that are silently
    ignored, because the kernel simply doesn't know that it should clear
    the device's PME Status bit.

    This problem was first observed for "parallel" (non-Express) PCI
    devices on add-on cards and Matthew Garrett addressed it by adding
    code that polls PME Status bits of such devices, if they are enabled
    to signal PME, to the kernel. Recently, however, it has turned out
    that PCI Express devices are also affected by this issue and that it
    is not limited to add-on devices, so it seems necessary to extend
    the PME polling to all PCI devices, including PCI Express and planar
    ones. Still, it would be wasteful to poll the PME Status bits of
    devices that are known to receive proper PME notifications, so make
    the kernel (1) poll the PME Status bits of all PCI and PCIe devices
    enabled to signal PME and (2) disable the PME Status polling for
    devices for which correct PME notifications are received.

    Tested-by: Sarah Sharp
    Signed-off-by: Rafael J. Wysocki
    Signed-off-by: Jesse Barnes

    Rafael J. Wysocki
     
  • Commit 15bed0f2f added a quirk for the e823 Ricoh card reader to lower the
    base frequency. However, the quirk first checks to see if the proprietary
    MMC controller is disabled, and returns if so. On some devices, such as the
    Lenovo X220, the MMC controller is already disabled by firmware it seems,
    but the frequency change is still needed so sdhci-pci can talk to the cards.
    Since the MMC controller is disabled, the frequency fixup was never being run
    on these machines.

    This moves the e823 check above the MMC controller check so that it always
    gets run.

    This fixes https://bugzilla.redhat.com/show_bug.cgi?id=722509

    Signed-off-by: Josh Boyer
    Signed-off-by: Jesse Barnes

    Josh Boyer
     
  • The "powernv" platform of the powerpc architecture needs to assign PCI
    resources using a specific algorithm to fit some HW constraints of
    the IBM "IODA" architecture (related to the ability to create error
    handling domains that encompass specific segments of MMIO space).

    For doing so, it wants to call pci_setup_bridge() from architecture
    specific resource management in order to configure bridges after all
    resources have been assigned. So make it non-static.

    Signed-off-by: Benjamin Herrenschmidt
    Signed-off-by: Jesse Barnes

    Benjamin Herrenschmidt
     
  • Using legacy interrupts and TLPs > 256 bytes on the SFC4000 (all
    revisions) may cause interrupt messages to be replayed. In some
    systems this results in a non-recoverable MCE. Early boards using the
    SFC4000 set the maximum payload size supported (MPSS) to 1024 bytes
    and we should override that.

    There are probably other devices with similar issues, so give this
    quirk a generic name.

    Signed-off-by: Ben Hutchings
    Signed-off-by: Jesse Barnes

    Ben Hutchings
     

05 Oct, 2011

1 commit

  • Add the ability to disable PCI-E MPS turning and using the BIOS
    configured MPS defaults. Due to the number of issues recently
    discovered on some x86 chipsets, make this the default behavior.

    Also, add the option for peer to peer DMA MPS configuration. Peer to
    peer DMA is outside the scope of this patch, but MPS configuration could
    prevent it from working by having the MPS on one root port different
    than the MPS on another. To work around this, simply make the system
    wide MPS the smallest possible value (128B).

    Signed-off-by: Jon Mason
    Acked-by: Benjamin Herrenschmidt
    Signed-off-by: Linus Torvalds

    Jon Mason
     

21 Sep, 2011

1 commit

  • Change the CONFIG_DMAR to CONFIG_INTEL_IOMMU to be consistent
    with the other IOMMU options.

    Rename the CONFIG_INTR_REMAP to CONFIG_IRQ_REMAP to match the
    irq subsystem name.

    And define the CONFIG_DMAR_TABLE for the common ACPI DMAR
    routines shared by both CONFIG_INTEL_IOMMU and CONFIG_IRQ_REMAP.

    Signed-off-by: Suresh Siddha
    Cc: yinghai@kernel.org
    Cc: youquan.song@intel.com
    Cc: joerg.roedel@amd.com
    Cc: tony.luck@intel.com
    Cc: dwmw2@infradead.org
    Link: http://lkml.kernel.org/r/20110824001456.558630224@sbsiddha-desk.sc.intel.com
    Signed-off-by: Ingo Molnar

    Suresh Siddha
     

14 Sep, 2011

1 commit

  • In pcie_find_smpss(), we have the following statement:

    if (dev->is_hotplug_bridge && (!list_is_singular(&dev->bus->devices) ||
    dev->bus->self->pcie_type != PCI_EXP_TYPE_ROOT_PORT))

    The problem is that at least on my machine, this gets called for the
    root complex (virtual P2P bridge), and dev->bus->self is NULL since
    the parent bus for this is not itself anchor to a PCI device.

    This adds the necessary NULL check.

    Signed-off-by: Benjamin Herrenschmidt
    Acked-by: Jon Mason
    Signed-off-by: Linus Torvalds

    Benjamin Herrenschmidt
     

10 Sep, 2011

2 commits

  • Modifying the Maximum Read Request Size to 0 (value of 128Bytes) has
    massive negative ramifications on some devices. Without knowing which
    devices have this issue, do not modify from the default value when
    walking the PCI-E bus in pcie_bus_safe mode. Also, make pcie_bus_safe
    the default procedure.

    Tested-by: Sven Schnelle
    Tested-by: Simon Kirby
    Tested-by: Stephen M. Cameron
    Reported-and-tested-by: Eric Dumazet
    Reported-and-tested-by: Niels Ole Salscheider
    References: https://bugzilla.kernel.org/show_bug.cgi?id=42162
    Signed-off-by: Jon Mason
    Acked-by: Jesse Barnes
    Signed-off-by: Linus Torvalds

    Jon Mason
     
  • Commit b03e7495a862 ("PCI: Set PCI-E Max Payload Size on fabric")
    introduced a potential NULL pointer dereference in calls to
    pcie_bus_configure_settings due to attempts to access pci_bus self
    variables when the self pointer is NULL.

    To correct this, verify that the self pointer in pci_bus is non-NULL
    before dereferencing it.

    Reported-by: Stanislaw Gruszka
    Signed-off-by: Shyam Iyer
    Signed-off-by: Jon Mason
    Acked-by: Jesse Barnes
    Signed-off-by: Linus Torvalds

    Shyam Iyer
     

27 Aug, 2011

1 commit

  • With Xen changeset 23428 "libxl: Add 'e820_host' option to config file"
    the E820 as seen from the host can now be passed into the guest.
    This means that a PV guest can now:
    - Use the correct PCI I/O gap. Before these patches, Linux guest would
    boot up and would tell:
    [ 0.000000] Allocating PCI resources starting at 40000000 (gap: 40000000:c0000000)
    while in actuality the PCI I/O gap should have been:
    [ 0.000000] Allocating PCI resources starting at b0000000 (gap: b0000000:4c000000)

    - The PV domain with PCI devices was limited to 3GB. It now can be booted
    with 4GB, 8GB, or whatever number you want. The PCI devices will now _not_ conflict
    with System RAM. Meaning the drivers can load.

    CC: Jesse Barnes
    CC: linux-pci@vger.kernel.org
    CC: stable@kernel.org
    [v2: Made the string less broken up. Suggested by Joe Perches]
    Signed-off-by: Konrad Rzeszutek Wilk

    Konrad Rzeszutek Wilk
     

21 Aug, 2011

1 commit

  • Fix new kernel-doc warning in pci.c:

    Warning(drivers/pci/pci.c:3259): No description found for parameter 'mps'
    Warning(drivers/pci/pci.c:3259): Excess function parameter 'rq' description in 'pcie_set_mps'

    Signed-off-by: Randy Dunlap
    Cc: Jesse Barnes
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     

20 Aug, 2011

1 commit


19 Aug, 2011

1 commit


03 Aug, 2011

1 commit

  • * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (28 commits)
    ACPI: delete stale reference in kernel-parameters.txt
    ACPI: add missing _OSI strings
    ACPI: remove NID_INVAL
    thermal: make THERMAL_HWMON implementation fully internal
    thermal: split hwmon lookup to a separate function
    thermal: hide CONFIG_THERMAL_HWMON
    ACPI print OSI(Linux) warning only once
    ACPI: DMI workaround for Asus A8N-SLI Premium and Asus A8N-SLI DELUX
    ACPI / Battery: propagate sysfs error in acpi_battery_add()
    ACPI / Battery: avoid acpi_battery_add() use-after-free
    ACPI: introduce "acpi_rsdp=" parameter for kdump
    ACPI: constify ops structs
    ACPI: fix CONFIG_X86_REROUTE_FOR_BROKEN_BOOT_IRQS
    ACPI: fix 80 char overflow
    ACPI / Battery: Resolve the race condition in the sysfs_remove_battery()
    ACPI / Battery: Add the check before refresh sysfs in the battery_notify()
    ACPI / Battery: Add the hibernation process in the battery_notify()
    ACPI / Battery: Rename acpi_battery_quirks2 with acpi_battery_quirks
    ACPI / Battery: Change 16-bit signed negative battery current into correct value
    ACPI / Battery: Add the power unit macro
    ...

    Linus Torvalds
     

02 Aug, 2011

7 commits

  • pcie_bus_configure_settings needs to be exported if the PCI hotplug
    driver is being compiled as a module.

    Reported-by: Stephen Rothwell
    Signed-off-by: Jon Mason
    Signed-off-by: Jesse Barnes

    Jon Mason
     
  • a) adjust_resource_sorted() is now called reassign_resource_sorted()
    b) nice-to-have is now called optional
    c) add_list is now called realloc_list.

    Signed-off-by: Ram Pai
    Signed-off-by: Jesse Barnes

    Ram Pai
     
  • Allocate resources to cardbus bridge only after all other genuine
    resources requests are satisfied. Dont retry if resource allocation
    for cardbus-bridges fail.

    Signed-off-by: Ram Pai
    Signed-off-by: Jesse Barnes

    Ram Pai
     
  • From: Yinghai Lu

    Allocate resources to SRIOV BARs only after all other required
    resource-requests are satisfied. Dont retry if resource allocation for SRIOV
    BARs fail.

    Signed-off-by: Ram Pai
    Signed-off-by: Yinghai Lu
    Signed-off-by: Jesse Barnes

    Yinghai Lu
     
  • Currently pci-bridges are allocated enough resources to satisfy their immediate
    requirements. Any additional resource-requests fail if additional free space,
    contiguous to the one already allocated, is not available. This behavior is not
    reasonable since sufficient contiguous resources, that can satisfy the request,
    are available at a different location.

    This patch provides the ability to expand and relocate a allocated resource.

    v2: Changelog: Fixed size calculation in pci_reassign_resource()
    v3: Changelog : Split this patch. The resource.c changes are already
    upstream. All the pci driver changes are in here.

    Signed-off-by: Ram Pai
    Signed-off-by: Jesse Barnes

    Ram Pai
     
  • git commit c8adf9a3e873eddaaec11ac410a99ef6b9656938
    "PCI: pre-allocate additional resources to devices only after
    successful allocation of essential resources."

    fails to take into consideration the optional-resources needed by children
    devices while calculating the optional-resource needed by the bridge.

    This can be a problem on some setup. For example, if a hotplug bridge has 8
    children hotplug bridges, the bridge should have enough resources to accomodate
    the hotplug requirements for each of its children hotplug bridges. Currently
    this is not the case.

    This patch fixes the problem.

    Signed-off-by: Yinghai Lu
    Reviewed-by: Ram Pai
    Signed-off-by: Jesse Barnes

    Yinghai Lu
     
  • On a given PCI-E fabric, each device, bridge, and root port can have a
    different PCI-E maximum payload size. There is a sizable performance
    boost for having the largest possible maximum payload size on each PCI-E
    device. However, if improperly configured, fatal bus errors can occur.
    Thus, it is important to ensure that PCI-E payloads sends by a device
    are never larger than the MPS setting of all devices on the way to the
    destination.

    This can be achieved two ways:

    - A conservative approach is to use the smallest common denominator of
    the entire tree below a root complex for every device on that fabric.

    This means for example that having a 128 bytes MPS USB controller on one
    leg of a switch will dramatically reduce performances of a video card or
    10GE adapter on another leg of that same switch.

    It also means that any hierarchy supporting hotplug slots (including
    expresscard or thunderbolt I suppose, dbl check that) will have to be
    entirely clamped to 128 bytes since we cannot predict what will be
    plugged into those slots, and we cannot change the MPS on a "live"
    system.

    - A more optimal way is possible, if it falls within a couple of
    constraints:
    * The top-level host bridge will never generate packets larger than the
    smallest TLP (or if it can be controlled independently from its MPS at
    least)
    * The device will never generate packets larger than MPS (which can be
    configured via MRRS)
    * No support of direct PCI-E PCI-E transfers between devices without
    some additional code to specifically deal with that case

    Then we can use an approach that basically ignores downstream requests
    and focuses exclusively on upstream requests. In that case, all we need
    to care about is that a device MPS is no larger than its parent MPS,
    which allows us to keep all switches/bridges to the max MPS supported by
    their parent and eventually the PHB.

    In this case, your USB controller would no longer "starve" your 10GE
    Ethernet and your hotplug slots won't affect your global MPS.
    Additionally, the hotplugged devices themselves can be configured to a
    larger MPS up to the value configured in the hotplug bridge.

    To choose between the two available options, two PCI kernel boot args
    have been added to the PCI calls. "pcie_bus_safe" will provide the
    former behavior, while "pcie_bus_perf" will perform the latter behavior.
    By default, the latter behavior is used.

    NOTE: due to the location of the enablement, each arch will need to add
    calls to this function. This patch only enables x86.

    This patch includes a number of changes recommended by Benjamin
    Herrenschmidt.

    Tested-by: Jordan_Hargrave@dell.com
    Signed-off-by: Jon Mason
    Signed-off-by: Jesse Barnes

    Jon Mason
     

30 Jul, 2011

1 commit

  • * 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
    PCI: remove printks about disabled bridge windows
    PCI: fold pci_calc_resource_flags() into decode_bar()
    PCI: treat mem BAR type "11" (reserved) as 32-bit, not 64-bit, BAR
    PCI: correct pcie_set_readrq write size
    PCI: pciehp: change wait time for valid configuration access
    x86/PCI: Preserve existing pci=bfsort whitelist for Dell systems
    PCI: ARI is a PCIe v2 feature
    x86/PCI: quirks: Use pci_dev->revision
    PCI: Make the struct pci_dev * argument of pci_fixup_irqs const.
    PCI hotplug: cpqphp: use pci_dev->vendor
    PCI hotplug: cpqphp: use pci_dev->subsystem_{vendor|device}
    x86/PCI: config space accessor functions should not ignore the segment argument
    PCI: Assign values to 'pci_obff_signal_type' enumeration constants
    x86/PCI: reduce severity of host bridge window conflict warnings
    PCI: enumerate the PCI device only removed out PCI hieratchy of OS when re-scanning PCI
    PCI: PCIe AER: add aer_recover_queue
    x86/PCI: select direct access mode for mmconfig option
    PCI hotplug: Rename is_ejectable which also exists in dock.c

    Linus Torvalds