22 Mar, 2007

1 commit

  • The EEH event notification system passes around data that is
    not needed or at least, not used properly. Stop passing this
    data; get it in a more reliable fashion.

    Signed-off-by: Linas Vepstas
    Signed-off-by: Paul Mackerras

    Linas Vepstas
     

22 Apr, 2006

1 commit

  • The current PCI error recovery system keeps track of the number of PCI card
    resets, and refuses to bring a card back up if this number is too large.
    The goal of doing this was to avoid an infinite loop of resets if a card is
    obviously dead. However, if the failures are rare, but the machine has a
    high uptime, this mechanism might still be triggered; this is too harsh.

    This patch will avoids this problem by decrementing the fail count after an
    hour. Thus, as long as a pci card BSOD's less than 6 times an hour, it
    will continue to be reset indefinitely. If it's failure rate is greater
    than that, it will be taken off-line permanently.

    This patch is larger than it might otherwise be because it changes
    indentation by removing a pointless while-loop. The while loop is not
    needed, as the handler is invoked once fo each event (by schedule_work());
    the loop is leftover cruft from an earlier implementation.

    Signed-off-by: Linas Vepstas
    Signed-off-by: Andrew Morton
    Signed-off-by: Paul Mackerras

    Linas Vepstas
     

10 Jan, 2006

1 commit


09 Jan, 2006

1 commit

  • include/asm-ppc/ had #ifdef __KERNEL__ in all header files that
    are not meant for use by user space, include/asm-powerpc does
    not have this yet.

    This patch gets us a lot closer there. There are a few cases
    where I was not sure, so I left them out. I have verified
    that no CONFIG_* symbols are used outside of __KERNEL__
    any more and that there are no obvious compile errors when
    including any of the headers in user space libraries.

    Signed-off-by: Arnd Bergmann
    Signed-off-by: Paul Mackerras

    Arnd Bergmann
     

10 Nov, 2005

1 commit

  • 12-eeh-event-dispatcher.patch

    ppc64: EEH Recovery dispatcher thread

    This patch adds a mechanism to create recovery threads when an
    EEH event is received. Since an EEH freeze state may be detected
    within an interrupt context, we need to get out of the interrupt
    context before starting recovery. This dispatcher does this in
    two steps: first, it uses a workqueue to get out, and then
    lanuches a kernel thread, so that the recovery routine can
    sleep for exteded periods without upseting the keventd.

    A kernel thread is created with each EEH event, rather than
    having one long-running daemon started at boot time. This is
    because it is anticipated that EEH events will be very rare
    (very very rare, ideally) and so its pointless to cluter the
    process tables with a daemon that will almost never run.

    Signed-off-by: Linas Vepstas
    Signed-off-by: Paul Mackerras

    Linas Vepstas