28 Apr, 2014

6 commits

  • Basically, we have 3 types of resets to fulfil PE reset: fundamental,
    hot and PHB reset. For the later 2 cases, we need PCI bus reset hold
    and settlement delay as specified by PCI spec. PowerNV and pSeries
    platforms are running on top of different firmware and some of the
    delays have been covered by underly firmware (PowerNV).

    The patch makes the delays unified to be done in backend, instead of
    EEH core.

    Signed-off-by: Gavin Shan
    Signed-off-by: Benjamin Herrenschmidt

    Gavin Shan
     
  • The issue was detected in a bit complicated test case where
    we have multiple hierarchical PEs shown as following figure:

    +-----------------+
    | PE#3 p2p#0 |
    | p2p#1 |
    +-----------------+
    |
    +-----------------+
    | PE#4 pdev#0 |
    | pdev#1 |
    +-----------------+

    PE#4 (have 2 PCI devices) is the child of PE#3, which has 2 p2p
    bridges. We accidentally had less-known scenario: PE#4 was removed
    permanently from the system because of permanent failure (e.g.
    exceeding the max allowd failure times in last hour), then we detects
    EEH errors on PE#3 and tried to recover it. However, eeh_dev instances
    for pdev#0/1 were not detached from PE#4, which was still connected to
    PE#3. All of that was because of the fact that we rely on count-based
    pcibios_release_device(), which isn't reliable enough. When doing
    recovery for PE#3, we still apply hotplug on PE#4 and pdev#0/1, which
    are not valid any more. Eventually, we run into kernel crash.

    The patch fixes above issue from two aspects. For unplug, we simply
    skip those permanently removed PE, whose state is (EEH_PE_STATE_ISOLATED
    && !EEH_PE_STATE_RECOVERING) and its frozen count should be greater
    than EEH_MAX_ALLOWED_FREEZES. For plug, we marked all permanently
    removed EEH devices with EEH_DEV_REMOVED and return 0xFF's on read
    its PCI config so that PCI core will omit them.

    Signed-off-by: Gavin Shan
    Signed-off-by: Benjamin Herrenschmidt

    Gavin Shan
     
  • There're 2 EEH subsystem variables: eeh_subsystem_enabled and
    eeh_probe_mode. We needn't maintain 2 variables and we can just
    have one variable and introduce different flags. The patch also
    introduces additional flag EEH_FORCE_DISABLE, which will be used
    to disable EEH subsystem via boot parameter ("eeh=off") in future.
    Besides, the patch also introduces flag EEH_ENABLED, which is
    changed to disable or enable EEH functionality on the fly through
    debugfs entry in future.

    With the patch applied, the creteria to check the enabled EEH
    functionality is changed to:

    !EEH_FORCE_DISABLED && EEH_ENABLED : Enabled
    Other cases : Disabled

    Signed-off-by: Gavin Shan
    Signed-off-by: Benjamin Herrenschmidt

    Gavin Shan
     
  • When calling into eeh_gather_pci_data() on pSeries platform, we
    possiblly don't have pci_dev instance yet, but eeh_dev is always
    ready. So we use cached capability from eeh_dev instead of pci_dev
    for log dump there. In order to keep things unified, we also cache
    PCI capability positions to eeh_dev for PowerNV as well.

    Signed-off-by: Gavin Shan
    Signed-off-by: Benjamin Herrenschmidt

    Gavin Shan
     
  • We've observed multiple PE reset failures because of PCI-CFG
    access during that period. Potentially, some device drivers
    can't support EEH very well and they can't put the device to
    motionless state before PE reset. So those device drivers might
    produce PCI-CFG accesses during PE reset. Also, we could have
    PCI-CFG access from user space (e.g. "lspci"). Since access to
    frozen PE should return 0xFF's, we can block PCI-CFG access
    during the period of PE reset so that we won't get recrusive EEH
    errors.

    The patch adds flag EEH_PE_RESET, which is kept during PE reset.
    The PowerNV/pSeries PCI-CFG accessors reuse the flag to block
    PCI-CFG accordingly.

    Signed-off-by: Gavin Shan
    Signed-off-by: Benjamin Herrenschmidt

    Gavin Shan
     
  • The PE state (for eeh_pe instance) EEH_PE_PHB_DEAD is duplicate to
    EEH_PE_ISOLATED. Originally, those PHBs (PHB PE) with EEH_PE_PHB_DEAD
    would be removed from the system. However, it's safe to replace
    that with EEH_PE_ISOLATED.

    The patch also clear EEH_PE_RECOVERING after fenced PHB has been handled,
    either failure or success. It makes the PHB PE state consistent with:

    PHB functions normally NONE
    PHB has been removed EEH_PE_ISOLATED
    PHB fenced, recovery in progress EEH_PE_ISOLATED | RECOVERING

    Signed-off-by: Gavin Shan
    Signed-off-by: Benjamin Herrenschmidt

    Gavin Shan
     

17 Feb, 2014

1 commit

  • The patch cleans up variable eeh_subsystem_enabled so that we needn't
    refer the variable directly from external. Instead, we will use
    function eeh_enabled() and eeh_set_enable() to operate the variable.

    Signed-off-by: Gavin Shan
    Signed-off-by: Benjamin Herrenschmidt

    Gavin Shan
     

15 Jan, 2014

3 commits

  • For one PCI error relevant OPAL event, we possibly have multiple
    EEH errors for that. For example, multiple frozen PEs detected on
    different PHBs. Unfortunately, we didn't cover the case. The patch
    enumarates the return value from eeh_ops::next_error() and change
    eeh_handle_special_event() and eeh_ops::next_error() to handle all
    existing EEH errors.

    As Ben pointed out, we needn't list_for_each_entry_safe() since we
    are not deleting any PHB from the hose_list and the EEH serialized
    lock should be held while purging EEH events. The patch covers those
    suggestions as well.

    Signed-off-by: Gavin Shan
    Signed-off-by: Benjamin Herrenschmidt

    Gavin Shan
     
  • When EEH error comes to one specific PCI device before its driver
    is loaded, we will apply hotplug to recover the error. During the
    plug time, the PCI device will be probed and its driver is loaded.
    Then we wrongly calls to the error handlers if the driver supports
    EEH explicitly.

    The patch intends to fix by introducing flag EEH_DEV_NO_HANDLER and
    set it before we remove the PCI device. In turn, we can avoid wrongly
    calls the error handlers of the PCI device after its driver loaded.

    Signed-off-by: Gavin Shan
    Signed-off-by: Benjamin Herrenschmidt

    Gavin Shan
     
  • After reset on the specific PE or PHB, we never configure AER
    correctly on PowerNV platform. We needn't care it on pSeries
    platform. The patch introduces additional EEH operation eeh_ops::
    restore_config() so that we have chance to configure AER correctly
    for PowerNV platform.

    Signed-off-by: Gavin Shan
    Signed-off-by: Benjamin Herrenschmidt

    Gavin Shan
     

24 Jul, 2013

6 commits

  • The patch introduces flag EEH_DEV_SYSFS to keep track that the sysfs
    entries for the corresponding EEH device (then PCI device) has been
    added or removed, in order to avoid race condition.

    Signed-off-by: Gavin Shan
    Signed-off-by: Benjamin Herrenschmidt

    Gavin Shan
     
  • While restoring BARs for one specific PCI device, the pci_dev
    instance should have been released. So it's not reliable to use
    the pci_dev instance on restoring BARs. However, we still need
    some information (e.g. PCIe capability position, header type) from
    the pci_dev instance. So we have to store those information to
    EEH device in advance.

    Signed-off-by: Gavin Shan
    Signed-off-by: Benjamin Herrenschmidt

    Gavin Shan
     
  • When EEH error happens to one specific PE, some devices with drivers
    supporting EEH won't except hotplug on the device. However, there
    might have other deivces without driver, or with driver without EEH
    support. For the case, we need do partial hotplug in order to make
    sure that the PE becomes absolutely quite during reset. Otherise,
    the PE reset might fail and leads to failure of error recovery.

    The current code doesn't handle that 'mixed' case properly, it either
    uses the error callbacks to the drivers, or tries hotplug, but doesn't
    handle a PE (EEH domain) composed of a combination of the two.

    The patch intends to support so-called "partial" hotplug for EEH:
    Before we do reset, we stop and remove those PCI devices without
    EEH sensitive driver. The corresponding EEH devices are not detached
    from its PE, but with special flag. After the reset is done, those
    EEH devices with the special flag will be scanned one by one.

    Signed-off-by: Gavin Shan
    Signed-off-by: Benjamin Herrenschmidt

    Gavin Shan
     
  • Currently, we're trasversing the EEH devices list using list_for_each_entry().
    That's not safe enough because the EEH devices might be removed from
    its parent PE while doing iteration. The patch replaces that with
    list_for_each_entry_safe().

    Signed-off-by: Gavin Shan
    Signed-off-by: Benjamin Herrenschmidt

    Gavin Shan
     
  • When we do normal hotplug, the PE (shadow EEH structure) shouldn't be
    kept around.

    However, we need to keep it if the hotplug an artifial one caused by
    EEH errors recovery.

    Since we remove EEH device through the PCI hook pcibios_release_device(),
    the flag "purge_pe" passed to various functions is meaningless. So the patch
    removes the meaningless flag and introduce new flag "EEH_PE_KEEP"
    to save the PE while doing hotplug during EEH error recovery.

    Signed-off-by: Gavin Shan
    Signed-off-by: Benjamin Herrenschmidt

    Gavin Shan
     
  • Make some functions public in order to support hotplug on either specific
    PCI bus or PCI device in future.

    Signed-off-by: Gavin Shan
    Signed-off-by: Benjamin Herrenschmidt

    Gavin Shan
     

01 Jul, 2013

1 commit

  • The patch is for avoiding following build warnings:

    The function .pnv_pci_ioda_fixup() references
    the function __init .eeh_init().
    This is often because .pnv_pci_ioda_fixup lacks a __init

    The function .pnv_pci_ioda_fixup() references
    the function __init .eeh_addr_cache_build().
    This is often because .pnv_pci_ioda_fixup lacks a __init

    Signed-off-by: Gavin Shan
    Signed-off-by: Benjamin Herrenschmidt

    Gavin Shan
     

25 Jun, 2013

1 commit

  • Originally, eeh_mutex was introduced to protect the PE hierarchy
    tree and the attached EEH devices because EEH core was possiblly
    running with multiple threads to access the PE hierarchy tree.
    However, we now have only one kthread in EEH core. So we needn't
    the eeh_mutex and just remove it.

    Signed-off-by: Gavin Shan
    Signed-off-by: Benjamin Herrenschmidt

    Gavin Shan
     

20 Jun, 2013

9 commits

  • On PowerNV platform, the EEH event caused by interrupt won't have
    binding PE. The patch enables EEH core to handle the special event.
    To avoid the current logic we have, The eeh_handle_event() is renamed
    to eeh_handle_normal_event(), and the eeh_handle_special_event() is
    introduced. The function eeh_handle_event() dispatches to above two
    functions according to the input parameter. Besides, new backend
    "next_error" added to eeh_ops and it's expected to have following
    return values:

    4 - Dead IOC 3 - Dead PHB
    2 - Fenced PHB 1 - Frozen PE
    0 - No error found

    Signed-off-by: Gavin Shan
    Signed-off-by: Benjamin Herrenschmidt

    Gavin Shan
     
  • An EEH event is created and queued to the event queue for each
    ingress EEH error. When there're mutiple EEH errors, we need serialize
    the process to keep consistent PE state (flags). The spinlock
    "confirm_error_lock" was introduced for the purpose. We'll inject
    EEH event upon error reporting interrupts on PowerNV platform. So
    we export the spinlock for that to use for consistent PE state.

    Signed-off-by: Gavin Shan
    Signed-off-by: Benjamin Herrenschmidt

    Gavin Shan
     
  • We're not expecting that one specific PE got frozen for over 5
    times in last hour. Otherwise, the PE will be removed from the
    system upon newly coming EEH errors. The patch introduces time
    stamp to trace the first error on specific PE in last hour and
    function to update that accordingly. Besides, the time stamp
    is recovered during PE hotplug path as we did for frozen count.

    Signed-off-by: Gavin Shan
    Signed-off-by: Benjamin Herrenschmidt

    Gavin Shan
     
  • The patch adds new EEH operation post_init. It's used to notify
    the platform that EEH core has completed the EEH probe. By that,
    PowerNV platform starts to use the services supplied by EEH
    functionality.

    Signed-off-by: Gavin Shan
    Signed-off-by: Benjamin Herrenschmidt

    Gavin Shan
     
  • For EEH on PowerNV platform, we will do EEH probe based on the
    real PCI devices. The PCI devices are available after PCI probe.
    So we have to call eeh_init() explicitly on PowerNV platform
    after PCI probe. The patch also does EEH probe for PowerNV platform
    in eeh_init().

    Signed-off-by: Gavin Shan
    Signed-off-by: Benjamin Herrenschmidt

    Gavin Shan
     
  • There're several types of PEs can be supported for now: PHB, Bus
    and Device dependent PE. For PCI bus dependent PE, tracing the
    corresponding PCI bus from PE (struct eeh_pe) would make the code
    more efficient. The patch also enables the retrieval of PCI bus based
    on the PCI bus dependent PE.

    Signed-off-by: Gavin Shan
    Signed-off-by: Benjamin Herrenschmidt

    Gavin Shan
     
  • While processing EEH event interrupt from P7IOC, we need function
    to retrieve the PE according to the indicated EEH device. The patch
    makes function eeh_pe_get() public so that other source files can call
    it for that purpose. Also, the patch fixes referring to wrong BDF
    (Bus/Device/Function) address while searching PE in function
    __eeh_pe_get().

    Signed-off-by: Gavin Shan
    Signed-off-by: Benjamin Herrenschmidt

    Gavin Shan
     
  • One of the possible cases indicated by P7IOC interrupt is fenced
    PHB. For that case, we need fetch the PE corresponding to the PHB
    and disable the PHB and all subordinate PCI buses/devices, recover
    from the fenced state and eventually enable the whole PHB. We need
    one function to fetch the PHB PE outside eeh_pe.c and the patch is
    going to make eeh_phb_pe_get() public for that purpose.

    Signed-off-by: Gavin Shan
    Signed-off-by: Benjamin Herrenschmidt

    Gavin Shan
     
  • Under some special circumstances, the EEH device doesn't have the
    associated device tree node or PCI device. The patch enhances those
    functions converting EEH device to device tree node or PCI device
    accordingly to avoid unnecessary system crash.

    Signed-off-by: Gavin Shan
    Signed-off-by: Benjamin Herrenschmidt

    Gavin Shan
     

10 Jan, 2013

1 commit

  • The DDW code uses a eeh_dev struct from the pci_dev. However, this is
    not set until eeh_add_device_late is called.

    Since pci_bus_add_devices is called before eeh_add_device_late, the PCI
    devices are added to the bus, making drivers' probe hooks to be called.
    These will call set_dma_mask, which will call the DDW code, which will
    require the eeh_dev struct from pci_dev. This would result in a crash,
    due to a NULL dereference.

    Calling eeh_add_device_late after pci_bus_add_devices would make the
    system BUG, because device files shouldn't be added to devices there
    were not added to the system. So, a new function is needed to add such
    files only after pci_bus_add_devices have been called.

    Cc: stable@vger.kernel.org
    Signed-off-by: Thadeu Lima de Souza Cascardo
    Acked-by: Gavin Shan
    Signed-off-by: Benjamin Herrenschmidt

    Thadeu Lima de Souza Cascardo
     

04 Jan, 2013

1 commit

  • CONFIG_HOTPLUG is going away as an option. As a result, the __dev*
    markings need to be removed.

    This change removes the use of __devinit, __devexit_p, __devinitdata,
    __devinitconst, and __devexit from these drivers.

    Based on patches originally written by Bill Pemberton, but redone by me
    in order to handle some of the coding style issues better, by hand.

    Cc: Bill Pemberton
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

18 Sep, 2012

2 commits

  • Function eeh_rmv_from_parent_pe() could be called by the path of
    either normal PCI hotplug, or EEH recovery. For the former case,
    we need purge the corresponding PE on removal of the associated
    PE bus.

    The patch tries to cover that by passing more information to function
    pcibios_remove_pci_devices() so that we know if the corresponding PE
    needs to be purged or be marked as "invalid".

    Signed-off-by: Gavin Shan
    Signed-off-by: Benjamin Herrenschmidt

    Gavin Shan
     
  • When EEH error happens on the PE whose PCI devices don't have
    attached drivers. In function eeh_handle_event(), the default
    value PCI_ERS_RESULT_NONE will be returned after iterating all
    drivers of those PCI devices belonging to the PE. Actually, we
    don't have installed drivers for the PCI devices. Under the
    circumstance, we will remove the corresponding PCI bus of the PE,
    including the associated EEH devices and PE instance. However,
    we still need the information stored in the PE instance to do PE
    reset after that. So it's unsafe to free the PE instance.

    The patch introduces EEH_PE_INVALID type PE to address the issue.
    When the PCI bus and the corresponding attached EEH devices are
    removed, we will mark the PE as EEH_PE_INVALID. At later point,
    the PE will be changed to EEH_PE_DEVICE or EEH_PE_BUS when the
    corresponding EEH devices are attached again.

    Signed-off-by: Gavin Shan
    Signed-off-by: Benjamin Herrenschmidt

    Gavin Shan
     

10 Sep, 2012

9 commits

  • The patch does cleanup on EEH PCI address cache based on the fact
    EEH core is the only user of the component.

    * Cleanup on function names so that they all have prefix
    "eeh" and looks more short.
    * Function printk() has been replaced with pr_debug() or
    pr_warning() accordingly.

    Signed-off-by: Gavin Shan
    Signed-off-by: Benjamin Herrenschmidt

    Gavin Shan
     
  • The idea comes from Benjamin Herrenschmidt. The eeh cache helps
    fetching the pci device according to the given I/O address. Since
    the eeh cache is serving for eeh, it's reasonable for eeh cache
    to trace eeh device except pci device.

    The patch make eeh cache to trace eeh device. Also, the major
    eeh entry function eeh_dn_check_failure has been renamed to
    eeh_dev_check_failure since it will take eeh device as input
    parameter.

    Signed-off-by: Gavin Shan
    Signed-off-by: Benjamin Herrenschmidt

    Gavin Shan
     
  • While EEH module is installed, PCI devices is checked one by one
    to see if it supports eeh. On different platforms, the PCI devices
    are referred through different ways when the EEH module is loaded.
    For example, on pSeries platform, that is done by OF node. However,
    we would do that by real PCI devices (struct pci_dev) on PowerNV
    platform in future. So we needs some mechanism to differentiate
    those cases by classifying them to probe modes, either from OF
    nodes or real PCI devices.

    The patch implements the support to eeh probe mode. Also, the
    EEH on pSeries has set it into EEH_PROBE_MODE_DEVTREE. That means
    the probe will be done based on OF nodes on pSeries platform.

    In addition, On pSeries platform, it's done by OF nodes. The patch
    moves the the probe function from EEH core to platform dependent
    backend and some cleanup applied.

    Signed-off-by: Gavin Shan
    Signed-off-by: Benjamin Herrenschmidt

    Gavin Shan
     
  • The patch removes the eeh related statistics for eeh device since
    they have been maintained by the corresponding eeh PE. Also, the
    flags used to trace the state of eeh device and PE have been reworked
    for a little bit.

    Signed-off-by: Gavin Shan
    Signed-off-by: Benjamin Herrenschmidt

    Gavin Shan
     
  • The patch reworks the current implementation so that the eeh errors
    will be handled basing on PE instead of eeh device.

    Signed-off-by: Gavin Shan
    Signed-off-by: Benjamin Herrenschmidt

    Gavin Shan
     
  • The patch introduces the function to traverse the devices of the
    specified PE and its child PEs. Also, the restore on device bars
    is implemented based on the traverse function.

    Signed-off-by: Gavin Shan
    Signed-off-by: Benjamin Herrenschmidt

    Gavin Shan
     
  • Originally, all the EEH operations were implemented based on OF node.

    Actually, it explicitly breaks the rules that the operation target
    is PE instead of device. Therefore, the patch makes all the operations
    based on PE instead of device.

    Unfortunately, the backend for config space has to be kept as original
    because it doesn't depend on PE.

    Signed-off-by: Gavin Shan
    Signed-off-by: Benjamin Herrenschmidt

    Gavin Shan
     
  • Since we've introduced dedicated struct to trace individual PEs,
    it's reasonable to trace its state through the dedicated struct
    instead of using "eeh_dev" any more.

    The patches implements the state tracing based on PE. It's notable
    that the PE state will be applied to the specified PE as well as
    its child PEs. That complies with the rule that problematic parent
    PE will prevent those child PEs from working properly.

    Signed-off-by: Gavin Shan
    Signed-off-by: Benjamin Herrenschmidt

    Gavin Shan
     
  • During PCI hotplug and EEH recovery, the PE hierarchy tree might be
    changed due to the PCI topology changes. At later point when the
    PCI device is added, the PE will be created dynamically again.

    The patch introduces new function to remove EEH devices from the
    associated PE. That also can cause that the parent PE is removed
    from the PE tree if the parent PE doesn't include valid EEH devices
    and child PEs.

    Signed-off-by: Gavin Shan
    Signed-off-by: Benjamin Herrenschmidt

    Gavin Shan