23 Jun, 2017

6 commits

  • Currently there are trace events for the various RAS
    errors with the exception of ARM processor type errors.
    Add a new trace event for such errors so that the user
    will know when they occur. These trace events are
    consistent with the ARM processor error section type
    defined in UEFI 2.6 spec section N.2.4.4.

    Signed-off-by: Tyler Baicar
    Acked-by: Steven Rostedt
    Reviewed-by: Xie XiuQi
    Signed-off-by: Will Deacon

    Tyler Baicar
     
  • The UEFI spec includes non-standard section type support in the
    Common Platform Error Record. This is defined in section N.2.3 of
    UEFI version 2.5.

    Currently if the CPER section's type (UUID) does not match any
    section type that the kernel knows how to parse, a trace event is
    not generated.

    Generate a trace event which contains the raw error data for
    non-standard section type error records.

    Signed-off-by: Tyler Baicar
    CC: Jonathan (Zhixiong) Zhang
    Tested-by: Shiju Jose
    Signed-off-by: Will Deacon

    Tyler Baicar
     
  • UEFI spec allows for non-standard section in Common Platform Error
    Record. This is defined in section N.2.3 of UEFI version 2.5.

    Currently if the CPER section's type (UUID) does not match with
    one of the section types that the kernel knows how to parse, the
    section is skipped. Therefore, user is not able to see
    such CPER data, for instance, error record of non-standard section.

    This change prints out the raw data in hex in the dmesg buffer so
    that non-standard sections are reported to the user. Non-standard
    section type errors should be reported to the user because these
    can include errors which are vendor specific. The data length is
    taken from Error Data length field of Generic Error Data Entry.

    The following is a sample output from dmesg:
    Hardware error from APEI Generic Hardware Error Source: 2
    It has been corrected by h/w and requires no further action
    event severity: corrected
    time: precise 2017-03-15 20:37:35
    Error 0, type: corrected
    section type: unknown, d2e2621c-f936-468d-0d84-15a4ed015c8b
    section length: 0x238
    00000000: 4d415201 4d492031 453a4d45 435f4343 .RAM1 IMEM:ECC_C
    00000010: 53515f45 44525f42 00000000 00000000 E_QSB_RD........
    00000020: 00000000 00000000 00000000 00000000 ................
    00000030: 00000000 00000000 01010000 01010000 ................
    00000040: 00000000 00000000 00000005 00000000 ................
    00000050: 01010000 00000000 00000001 00dddd00 ................
    ...

    The raw data from the error can then be decoded using vendor
    specific tools.

    Signed-off-by: Tyler Baicar
    CC: Jonathan (Zhixiong) Zhang
    Reviewed-by: James Morse
    Signed-off-by: Will Deacon

    Tyler Baicar
     
  • Even if an error status block's severity is fatal, the kernel does not
    honor the severity level and panic.

    With the firmware first model, the platform could inform the OS about a
    fatal hardware error through the non-NMI GHES notification type. The OS
    should panic when a hardware error record is received with this
    severity.

    Call panic() after CPER data in error status block is printed if
    severity is fatal, before each error section is handled.

    Signed-off-by: Jonathan (Zhixiong) Zhang
    Signed-off-by: Tyler Baicar
    Reviewed-by: James Morse
    Signed-off-by: Will Deacon

    Jonathan (Zhixiong) Zhang
     
  • ARM APEI extension proposal added SEA (Synchronous External Abort)
    notification type for ARMv8.
    Add a new GHES error source handling function for SEA. If an error
    source's notification type is SEA, then this function can be registered
    into the SEA exception handler. That way GHES will parse and report
    SEA exceptions when they occur.
    An SEA can interrupt code that had interrupts masked and is treated as
    an NMI. To aid this the page of address space for mapping APEI buffers
    while in_nmi() is always reserved, and ghes_ioremap_pfn_nmi() is
    changed to use the helper methods to find the prot_t to map with in
    the same way as ghes_ioremap_pfn_irq().

    Signed-off-by: Tyler Baicar
    CC: Jonathan (Zhixiong) Zhang
    Reviewed-by: James Morse
    Acked-by: Catalin Marinas
    Signed-off-by: Will Deacon

    Tyler Baicar
     
  • SEA exceptions are often caused by an uncorrected hardware
    error, and are handled when data abort and instruction abort
    exception classes have specific values for their Fault Status
    Code.
    When SEA occurs, before killing the process, report the error
    in the kernel logs.
    Update fault_info[] with specific SEA faults so that the
    new SEA handler is used.

    Signed-off-by: Tyler Baicar
    CC: Jonathan (Zhixiong) Zhang
    Reviewed-by: James Morse
    Acked-by: Catalin Marinas
    [will: use NULL instead of 0 when assigning si_addr]
    Signed-off-by: Will Deacon

    Tyler Baicar
     

22 Jun, 2017

4 commits

  • Add support for ARM Common Platform Error Record (CPER).
    UEFI 2.6 specification adds support for ARM specific
    processor error information to be reported as part of the
    CPER records. This provides more detail on for processor error logs.

    Signed-off-by: Tyler Baicar
    CC: Jonathan (Zhixiong) Zhang
    Reviewed-by: James Morse
    Reviewed-by: Ard Biesheuvel
    Signed-off-by: Will Deacon

    Tyler Baicar
     
  • The ACPI 6.1 spec added a timestamp to the generic error data
    entry structure. Print the timestamp out when printing out the
    error information.

    Signed-off-by: Tyler Baicar
    CC: Jonathan (Zhixiong) Zhang
    Signed-off-by: Will Deacon

    Tyler Baicar
     
  • The ACPI 6.1 spec adds a new revision of the generic error data
    entry structure. Add support to handle the new structure as well
    as properly verify and iterate through the generic data entries.

    Signed-off-by: Tyler Baicar
    CC: Jonathan (Zhixiong) Zhang
    Signed-off-by: Will Deacon

    Tyler Baicar
     
  • A RAS (Reliability, Availability, Serviceability) controller
    may be a separate processor running in parallel with OS
    execution, and may generate error records for consumption by
    the OS. If the RAS controller produces multiple error records,
    then they may be overwritten before the OS has consumed them.

    The Generic Hardware Error Source (GHES) v2 structure
    introduces the capability for the OS to acknowledge the
    consumption of the error record generated by the RAS
    controller. A RAS controller supporting GHESv2 shall wait for
    the acknowledgment before writing a new error record, thus
    eliminating the race condition.

    Add support for parsing of GHESv2 sub-tables as well.

    Signed-off-by: Tyler Baicar
    CC: Jonathan (Zhixiong) Zhang
    Reviewed-by: James Morse
    Signed-off-by: Will Deacon

    Tyler Baicar
     

21 Jun, 2017

1 commit


09 Jun, 2017

2 commits


08 Jun, 2017

1 commit


07 Jun, 2017

1 commit

  • acpi_evaluate_dsm() and friends take a pointer to a raw buffer of 16
    bytes. Instead we convert them to use guid_t type. At the same time we
    convert current users.

    acpi_str_to_uuid() becomes useless after the conversion and it's safe to
    get rid of it.

    Acked-by: Rafael J. Wysocki
    Cc: Borislav Petkov
    Acked-by: Dan Williams
    Cc: Amir Goldstein
    Reviewed-by: Jarkko Sakkinen
    Reviewed-by: Jani Nikula
    Acked-by: Jani Nikula
    Cc: Ben Skeggs
    Acked-by: Benjamin Tissoires
    Acked-by: Joerg Roedel
    Acked-by: Adrian Hunter
    Cc: Yisen Zhuang
    Acked-by: Bjorn Helgaas
    Acked-by: Felipe Balbi
    Acked-by: Mathias Nyman
    Reviewed-by: Heikki Krogerus
    Acked-by: Mark Brown
    Signed-off-by: Andy Shevchenko
    Signed-off-by: Christoph Hellwig

    Andy Shevchenko
     

06 Jun, 2017

4 commits


05 Jun, 2017

21 commits