17 May, 2019

1 commit


14 May, 2019

1 commit

  • The function should return NULL in case no device is found, but it
    always returns the last checked mc device from the list even if the
    index did not match. Fix that.

    I did some analysis why this did not raise any issues for about 3 years
    and the reason is that edac_mc_find() is mostly used to search for
    existing devices. Thus, the bug is not triggered.

    [ bp: Drop the if (mci->mc_idx > idx) test in favor of readability. ]

    Fixes: c73e8833bec5 ("EDAC, mc: Fix locking around mc_devices list")
    Signed-off-by: Robert Richter
    Signed-off-by: Borislav Petkov
    Cc: "linux-edac@vger.kernel.org"
    Cc: James Morse
    Cc: Mauro Carvalho Chehab
    Link: https://lkml.kernel.org/r/20190514104838.15065-1-rrichter@marvell.com

    Robert Richter
     

11 May, 2019

1 commit

  • The mpc85xx EDAC driver can be configured as a module but then fails to
    build because it uses two unexported symbols:

    ERROR: ".pci_find_hose_for_OF_device" [drivers/edac/mpc85xx_edac_mod.ko] undefined!
    ERROR: ".early_find_capability" [drivers/edac/mpc85xx_edac_mod.ko] undefined!

    We don't want to export those symbols just for this driver, so make the
    driver only configurable as a built-in.

    This seems to have been broken since at least

    c92132f59806 ("edac/85xx: Add PCIe error interrupt edac support")

    (Nov 2013).

    [ bp: make it depend on EDAC=y so that the EDAC core doesn't get built
    as a module. ]

    Signed-off-by: Michael Ellerman
    Signed-off-by: Borislav Petkov
    Acked-by: Johannes Thumshirn
    Cc: James Morse
    Cc: Mauro Carvalho Chehab
    Cc: linux-edac
    Cc: linuxppc-dev@ozlabs.org
    Cc: morbidrsa@gmail.com
    Link: https://lkml.kernel.org/r/20190502141941.12927-1-mpe@ellerman.id.au

    Michael Ellerman
     

07 May, 2019

1 commit

  • Pull RAS updates from Borislav Petkov:

    - Support for varying MCA bank numbers per CPU: this is in preparation
    for future CPU enablement (Yazen Ghannam)

    - MCA banks read race fix (Tony Luck)

    - Facility to filter MCEs which should not be logged (Yazen Ghannam)

    - The usual round of cleanups and fixes

    * 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/MCE/AMD: Don't report L1 BTB MCA errors on some family 17h models
    x86/MCE: Add an MCE-record filtering function
    RAS/CEC: Increment cec_entered under the mutex lock
    x86/mce: Fix debugfs_simple_attr.cocci warnings
    x86/mce: Remove mce_report_event()
    x86/mce: Handle varying MCA bank counts
    x86/mce: Fix machine_check_poll() tests for error types
    MAINTAINERS: Fix file pattern for X86 MCE INFRASTRUCTURE
    x86/MCE: Group AMD function prototypes in

    Linus Torvalds
     

25 Apr, 2019

1 commit

  • This reverts commit 0a227af521d6df5286550b62f4b591417170b4ea.

    Unfortunately, this commit caused wrong detection of chip select sizes
    on some F17h client machines:

    --- 00-rc6+ 2019-02-14 14:28:03.126622904 +0100
    +++ 01-rc4+ 2019-04-14 21:06:16.060614790 +0200
    EDAC amd64: MC: 0: 0MB 1: 0MB
    -EDAC amd64: MC: 2: 16383MB 3: 16383MB
    +EDAC amd64: MC: 2: 0MB 3: 2097151MB
    EDAC amd64: MC: 4: 0MB 5: 0MB
    EDAC amd64: MC: 6: 0MB 7: 0MB
    EDAC MC: UMC1 chip selects:
    EDAC amd64: MC: 0: 0MB 1: 0MB
    -EDAC amd64: MC: 2: 16383MB 3: 16383MB
    +EDAC amd64: MC: 2: 0MB 3: 2097151MB
    EDAC amd64: MC: 4: 0MB 5: 0MB
    EDAC amd64: MC: 6: 0MB 7: 0M

    Revert it for now until it has been solved properly.

    Signed-off-by: Borislav Petkov
    Cc: Yazen Ghannam

    Borislav Petkov
     

24 Apr, 2019

1 commit

  • AMD family 17h Models 10h-2Fh may report a high number of L1 BTB MCA
    errors under certain conditions. The errors are benign and can safely be
    ignored. However, the high error rate may cause the MCA threshold
    counter to overflow causing a high rate of thresholding interrupts.

    In addition, users may see the errors reported through the AMD MCE
    decoder module, even with the interrupt disabled, due to MCA polling.

    Clear the "Counter Present" bit in the Instruction Fetch bank's
    MCA_MISC0 register. This will prevent enabling MCA thresholding on this
    bank which will prevent the high interrupt rate due to this error.

    Define an AMD-specific function to filter these errors from the MCE
    event pool so that they don't get reported during early boot.

    Rename filter function in EDAC/mce_amd to avoid a naming conflict, while
    at it.

    [ bp: Move function prototype to the internal header and
    massage/cleanup, fix typos. ]

    Reported-by: Rafał Miłecki
    Signed-off-by: Yazen Ghannam
    Signed-off-by: Borislav Petkov
    Cc: "H. Peter Anvin"
    Cc: "clemej@gmail.com"
    Cc: Arnd Bergmann
    Cc: Ingo Molnar
    Cc: James Morse
    Cc: Kees Cook
    Cc: Mauro Carvalho Chehab
    Cc: Pu Wen
    Cc: Qiuxu Zhuo
    Cc: Shirish S
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Vishal Verma
    Cc: linux-edac
    Cc: x86-ml
    Cc: # 5.0.x: c95b323dcd35: x86/MCE/AMD: Turn off MC4_MISC thresholding on all family 0x15 models
    Cc: # 5.0.x: 30aa3d26edb0: x86/MCE/AMD: Carve out the MC4_MISC thresholding quirk
    Cc: # 5.0.x: 9308fd407455: x86/MCE: Group AMD function prototypes in
    Cc: # 5.0.x
    Link: https://lkml.kernel.org/r/20190325163410.171021-2-Yazen.Ghannam@amd.com

    Yazen Ghannam
     

02 Apr, 2019

1 commit

  • Reserve ECC Double Bit Error SMC call to alert U-Boot that a DBE has
    occurred. Move the call from local EDAC header file to a common header.

    [ bp: Merge the two patches. ]

    Signed-off-by: Thor Thayer
    Signed-off-by: Borislav Petkov
    Reviewed-by: Richard Gong
    Reviewed-by: Alan Tull # firmware
    Cc: Greg KH
    Cc: James Morse
    Cc: linux-edac
    Cc: mchehab@kernel.org
    Link: https://lkml.kernel.org/r/1553870639-23895-1-git-send-email-thor.thayer@linux.intel.com

    Signed-off-by: Borislav Petkov

    Thor Thayer
     

29 Mar, 2019

2 commits

  • The FIFO memory and ECC initialization doesn't need to be
    done as a separate operation early in the startup.

    Improve the Arria10 and Stratix10 peripheral FIFO init
    by initializing memory and enabling ECC as part of the
    device driver initialization.

    Signed-off-by: Thor Thayer
    Signed-off-by: Borislav Petkov
    Cc: James Morse
    Cc: Mauro Carvalho Chehab
    Cc: linux-edac
    Link: https://lkml.kernel.org/r/1553635771-32693-2-git-send-email-thor.thayer@linux.intel.com

    Thor Thayer
     
  • Improve the Arria10 and Stratix10 error injection routine
    by reading the data and changing just 1 bit before writing
    back out. Previous routine would overwrite the first bytes
    to 0 then change 1 bit but this method is less intrusive.

    Signed-off-by: Thor Thayer
    Signed-off-by: Borislav Petkov
    Cc: James Morse
    Cc: Mauro Carvalho Chehab
    Cc: linux-edac
    Link: https://lkml.kernel.org/r/1553635771-32693-1-git-send-email-thor.thayer@linux.intel.com

    Thor Thayer
     

27 Mar, 2019

7 commits

  • AMD systems may support chip select interleaving. However, on family
    17h+ this was not taken into account when printing the chip select
    sizes.

    Add support to detect if chip selects are interleaved on family 17h+,
    and adjust the sizes accordingly.

    Signed-off-by: Yazen Ghannam
    Signed-off-by: Borislav Petkov
    Tested-by: Kim Phillips
    Cc: James Morse
    Cc: Mauro Carvalho Chehab
    Cc: linux-edac
    Link: https://lkml.kernel.org/r/20190228153558.127292-6-Yazen.Ghannam@amd.com

    Yazen Ghannam
     
  • The struct chip_select array that's used for saving chip select bases
    and masks is fixed at length of two. There should be one struct
    chip_select for each controller, so this array should be increased to
    support systems that may have more than two controllers.

    Increase the size of the struct chip_select array to eight, which is the
    largest number of controllers per die currently supported on AMD
    systems.

    Also, carve out the Family 17h+ reading of the bases/masks into a
    separate function. This effectively reverts the original bases/masks
    reading code to before Family 17h support was added.

    Signed-off-by: Yazen Ghannam
    Signed-off-by: Borislav Petkov
    Tested-by: Kim Phillips
    Cc: James Morse
    Cc: Mauro Carvalho Chehab
    Cc: linux-edac
    Link: https://lkml.kernel.org/r/20190228153558.127292-5-Yazen.Ghannam@amd.com

    Yazen Ghannam
     
  • Future AMD systems may support x16 symbol sizes.

    Recognize if a system is using x16 symbol size. Also, simplify the print
    statement.

    Note that a x16 syndrome vector table is not necessary like with x4 or
    x8 syndromes. This is because systems that support x16 symbol sizes are
    SMCA systems and in that case, the syndrome can be directly extracted
    from the MCA_SYND[Syndrome] field.

    [ bp: massage. ]

    Signed-off-by: Yazen Ghannam
    Signed-off-by: Borislav Petkov
    Tested-by: Kim Phillips
    Cc: James Morse
    Cc: Mauro Carvalho Chehab
    Cc: linux-edac
    Link: https://lkml.kernel.org/r/20190228153558.127292-4-Yazen.Ghannam@amd.com

    Yazen Ghannam
     
  • The AMD64 EDAC module currently hardcodes the EDAC channel layer size
    count to two. Future AMD systems may have more channels than this.

    Set the EDAC channel layer size equal to the maximum number of channels
    possible for the system. On Family 17h and later, this is set in the
    num_umcs variable. Older systems will continue to use two as the
    default.

    Signed-off-by: Yazen Ghannam
    Signed-off-by: Borislav Petkov
    Cc: James Morse
    Cc: Mauro Carvalho Chehab
    Cc: linux-edac
    Link: https://lkml.kernel.org/r/20190325203319.7603-1-Yazen.Ghannam@amd.com

    Yazen Ghannam
     
  • The first few models of Family 17h all had 2 Unified Memory Controllers
    per Die, so this was treated as a fixed value. However, future systems
    may have more Unified Memory Controllers per Die.

    Related to this, the channel number and base address of a Unified Memory
    Controller were found by matching on fixed, known values. However,
    current and future systems follow this pattern for the channel number
    and base address of a Unified Memory Controller: 0xYXXXXX, where Y is
    the channel number. So matching on hardcoded values is not necessary.

    Set the number of Unified Memory Controllers at driver init time based
    on the family/model. Also, update the functions that find the channel
    number and base address of a Unified Memory Controller to support more
    than two.

    [ bp: Move num_umcs into the .c file and simplify comment. ]

    Signed-off-by: Yazen Ghannam
    Signed-off-by: Borislav Petkov
    Tested-by: Kim Phillips
    Cc: James Morse
    Cc: Mauro Carvalho Chehab
    Cc: linux-edac
    Link: https://lkml.kernel.org/r/20190228153558.127292-3-Yazen.Ghannam@amd.com

    Yazen Ghannam
     
  • Define and use a macro for looping over the number of Unified Memory
    Controllers.

    No functional change.

    Signed-off-by: Yazen Ghannam
    Signed-off-by: Borislav Petkov
    Tested-by: Kim Phillips
    Cc: James Morse
    Cc: Mauro Carvalho Chehab
    Cc: linux-edac
    Link: https://lkml.kernel.org/r/20190228153558.127292-2-Yazen.Ghannam@amd.com

    Yazen Ghannam
     
  • Add the new Family 17h Model 30h PCI IDs to the AMD64 EDAC module.

    This also fixes a probe failure that appeared when some other PCI IDs
    for Family 17h Model 30h were added to the AMD NB code.

    Fixes: be3518a16ef2 (x86/amd_nb: Add PCI device IDs for family 17h, model 30h)
    Signed-off-by: Yazen Ghannam
    Signed-off-by: Borislav Petkov
    Tested-by: Kim Phillips
    Cc: James Morse
    Cc: Mauro Carvalho Chehab
    Cc: linux-edac
    Link: https://lkml.kernel.org/r/20190228153558.127292-1-Yazen.Ghannam@amd.com

    Yazen Ghannam
     

23 Mar, 2019

2 commits

  • Stratix10 Double Bit Error Address was always read from SDRAM Address
    register instead of each device's Address register.

    To determine which device had the DBE, cycle through the EDAC devices
    comparing the DBE value to the db_irq value. Once found, report the DBE
    Address from the device registers as well as the device name.

    Finally, notify the system via an SMC call and indicate the panic should
    result in a system reboot. Change a run-time check to a Stratix10
    compile-time check for a clean SMC notification.

    Fixes: d5fc9125566c ("EDAC, altera: Combine Stratix10 and Arria10 probe functions")
    Signed-off-by: Thor Thayer
    Signed-off-by: Borislav Petkov
    Cc: James Morse
    Cc: Mauro Carvalho Chehab
    Cc: linux-edac
    Link: https://lkml.kernel.org/r/1552490842-25440-1-git-send-email-thor.thayer@linux.intel.com

    Thor Thayer
     
  • The following Kconfig constellations fail randconfig builds:

    CONFIG_ACPI_NFIT=y
    CONFIG_EDAC_DEBUG=y
    CONFIG_EDAC_SKX=m
    CONFIG_EDAC_I10NM=y

    or

    CONFIG_ACPI_NFIT=y
    CONFIG_EDAC_DEBUG=y
    CONFIG_EDAC_SKX=y
    CONFIG_EDAC_I10NM=m

    with:
    ...
    CC [M] drivers/edac/skx_common.o
    ...
    .../skx_common.o:.../skx_common.c:672: undefined reference to `__this_module'

    That is because if one of the two drivers - skx_edac or i10nm_edac - is
    built-in and the other one is a module, the shared file skx_common.c
    gets linked into a module object by kbuild. Therefore, when linking that
    same file into vmlinux, the '__this_module' symbol used in debugfs isn't
    defined, leading to the above error.

    Fix it by moving all debugfs code from skx_common.c to both skx_base.c
    and i10nm_base.c respectively. Thus, skx_common.c doesn't refer to the
    '__this_module' symbol anymore.

    Clarify skx_common.c's purpose at the top of the file for future
    reference, while at it.

    [ bp: Make text more readable. ]

    Fixes: d4dc89d069aa ("EDAC, i10nm: Add a driver for Intel 10nm server processors")
    Reported-by: Arnd Bergmann
    Signed-off-by: Qiuxu Zhuo
    Signed-off-by: Tony Luck
    Signed-off-by: Borislav Petkov
    Cc: James Morse
    Cc: Mauro Carvalho Chehab
    Cc: linux-edac
    Link: https://lkml.kernel.org/r/20190321221339.GA32323@agluck-desk

    Qiuxu Zhuo
     

09 Mar, 2019

2 commits

  • Pull RAS updates from Borislav Petkov:
    "This time around we have in store:

    - Disable MC4_MISC thresholding banks on all AMD family 0x15 models
    (Shirish S)

    - AMD MCE error descriptions update and error decode improvements
    (Yazen Ghannam)

    - The usual smaller conversions and fixes"

    * 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/mce: Improve error message when kernel cannot recover, p2
    EDAC/mce_amd: Decode MCA_STATUS in bit definition order
    EDAC/mce_amd: Decode MCA_STATUS[Scrub] bit
    EDAC, mce_amd: Print ExtErrorCode and description on a single line
    EDAC, mce_amd: Match error descriptions to latest documentation
    x86/MCE/AMD, EDAC/mce_amd: Add new error descriptions for some SMCA bank types
    x86/MCE/AMD, EDAC/mce_amd: Add new McaTypes for CS, PSP, and SMU units
    x86/MCE/AMD, EDAC/mce_amd: Add new MP5, NBIO, and PCIE SMCA bank types
    RAS: Add a MAINTAINERS entry
    RAS: Use consistent types for UUIDs
    x86/MCE/AMD: Carve out the MC4_MISC thresholding quirk
    x86/MCE/AMD: Turn off MC4_MISC thresholding on all family 0x15 models
    x86/MCE: Switch to use the new generic UUID API

    Linus Torvalds
     
  • Pull EDAC updates from Borislav Petkov:

    - A new EDAC AST 2500 SoC driver (Stefan M Schaeckeler)

    - New i10nm EDAC driver for Intel 10nm CPUs (Qiuxu Zhuo and Tony Luck)

    - Altera SDRAM functionality carveout for separate enablement of RAS
    and SDRAM capabilities on some Altera chips. (Thor Thayer)

    - The usual round of cleanups and fixes

    And last but not least: recruit James Morse as a reviewer for the ARM
    side.

    * tag 'edac_for_5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
    EDAC/altera: Add separate SDRAM EDAC config
    EDAC, altera: Add missing of_node_put()
    EDAC, skx_common: Add code to recognise new compound error code
    EDAC, i10nm: Fix randconfig builds
    EDAC, i10nm: Add a driver for Intel 10nm server processors
    EDAC, skx_edac: Delete duplicated code
    EDAC, skx_common: Separate common code out from skx_edac
    EDAC: Do not check return value of debugfs_create() functions
    EDAC: Add James Morse as a reviewer
    dt-bindings, EDAC: Add Aspeed AST2500
    EDAC, aspeed: Add an Aspeed AST2500 EDAC driver

    Linus Torvalds
     

26 Feb, 2019

1 commit

  • The CONFIG_ALTERA_EDAC Kconfig symbol always enables the SDRAM EDAC
    functionality. On the newer architectures, however, there are cases
    where the peripheral EDAC functionality is enabled but SDRAM needs to be
    disabled.

    Move SDRAM functions so they can be contained inside the conditional
    CONFIG. Create new CONFIG option just for SDRAM.

    [ bp: Massage commit message. ]

    Signed-off-by: Thor Thayer
    Signed-off-by: Borislav Petkov
    Cc: James Morse
    Cc: Mauro Carvalho Chehab
    Cc: dinguyen@kernel.org
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: linux-edac
    Cc: linux@armlinux.org.uk
    Link: https://lkml.kernel.org/r/1551121006-4657-2-git-send-email-thor.thayer@linux.intel.com

    Thor Thayer
     

15 Feb, 2019

3 commits

  • Sort the MCA_STATUS bits in decode output to follow how they are defined
    in the register.

    The order is as follows:

    Bit | Decode
    ------------
    62 | Over
    61 | UC
    59 | MiscV
    58 | AddrV
    57 | PCC
    55 | TCC
    53 | SyndV
    46 | CECC
    45 | UECC
    44 | Deferred
    43 | Poison
    40 | Scrub

    [ bp: Massage a bit. ]

    Signed-off-by: Yazen Ghannam
    Signed-off-by: Borislav Petkov
    Cc: Mauro Carvalho Chehab
    Cc: linux-edac
    Cc: x86@kernel.org
    Link: https://lkml.kernel.org/r/20190212212417.107049-2-Yazen.Ghannam@amd.com

    Yazen Ghannam
     
  • Previous AMD systems have had a bit in MCA_STATUS to indicate that an
    error was detected on a scrub operation. However, this bit was defined
    differently within different banks and families/models.

    Starting with Family 17h, MCA_STATUS[40] is either Reserved/Read-as-Zero
    or defined as "Scrub", for all MCA banks and CPU models. Therefore, this
    bit can be defined as the "Scrub" bit.

    Define MCA_STATUS[40] as "Scrub" and decode it in the AMD MCE decoding
    module for Family 17h and newer systems.

    Signed-off-by: Yazen Ghannam
    Signed-off-by: Borislav Petkov
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: James Morse
    Cc: linux-edac
    Cc: Mauro Carvalho Chehab
    Cc: Pu Wen
    Cc: Qiuxu Zhuo
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Vishal Verma
    Cc: x86-ml
    Link: https://lkml.kernel.org/r/20190212212417.107049-1-Yazen.Ghannam@amd.com

    Yazen Ghannam
     
  • The call to of_parse_phandle() returns a node pointer with refcount
    incremented thus it must be explicitly decremented here after the last
    usage.

    Signed-off-by: Huang Zijiang
    Signed-off-by: Borislav Petkov
    Reviewed-by: Thor Thayer
    Cc: James Morse
    Cc: Mauro Carvalho Chehab
    Cc: linux-edac
    Cc: wang.yi59@zte.com.cn
    Link: https://lkml.kernel.org/r/1550126347-27984-1-git-send-email-huang.zijiang@zte.com.cn

    Huang Zijiang
     

06 Feb, 2019

2 commits

  • A new error code for systems that use DRAM as an extra level of cache
    looks like:

    000F 0010 1MMM CCCC

    where the MMM and CCCC bits are used for the same purpose as the
    original code. For this new class of errors the ADXL translation will
    provide details of both the DIMM used as cache for the error location
    and the component that is being cached.

    Note: This new error code is first supported in Skylake. Older EDAC
    drivers do not need to be updated.

    Signed-off-by: Tony Luck
    Signed-off-by: Borislav Petkov
    Cc: Aristeu Rozanski
    Cc: James Morse
    Cc: Mauro Carvalho Chehab
    Cc: Qiuxu Zhuo
    Cc: linux-edac
    Link: https://lkml.kernel.org/r/20190205182109.27828-1-tony.luck@intel.com

    Tony Luck
     
  • I10NM_EDAC depends on CONFIG_ACPI so make that dependency explicit.

    Reported-by: Borislav Petkov
    Signed-off-by: Tony Luck
    Signed-off-by: Borislav Petkov
    Cc: Aristeu Rozanski
    Cc: James Morse
    Cc: Mauro Carvalho Chehab
    Cc: Qiuxu Zhuo
    Cc: linux-edac
    Link: https://lkml.kernel.org/r/20190205180200.26865-1-tony.luck@intel.com

    Tony Luck
     

05 Feb, 2019

1 commit

  • Save a log line by printing the extended error code and the description
    on a single line. This is similar to how errors are printed in other
    subsystems, e.g. "#, description". If we don't have a valid description
    then only the number/code is printed.

    Signed-off-by: Yazen Ghannam
    Signed-off-by: Borislav Petkov
    Cc: linux-edac
    Cc: Mauro Carvalho Chehab
    Cc: Tony Luck
    Cc: x86@kernel.org
    Link: https://lkml.kernel.org/r/20190201225534.8177-6-Yazen.Ghannam@amd.com

    Yazen Ghannam
     

03 Feb, 2019

4 commits

  • Update the error descriptions to match the latest documentation for
    easier searching. In some cases the changes are small and in other cases
    the changes may be total rewording of the description.

    No functional changes.

    Signed-off-by: Yazen Ghannam
    Signed-off-by: Borislav Petkov
    Cc: linux-edac
    Cc: Mauro Carvalho Chehab
    Cc: Tony Luck
    Cc: x86@kernel.org
    Link: https://lkml.kernel.org/r/20190201225534.8177-5-Yazen.Ghannam@amd.com

    Yazen Ghannam
     
  • Some SMCA bank types on future systems will report new error types even
    though the bank type is not treated as a new version. These new error
    types will reported by bits that are reserved in past systems.

    Add the new error descriptions to the lists in edac_mce_amd.

    Signed-off-by: Yazen Ghannam
    Signed-off-by: Borislav Petkov
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Kees Cook
    Cc: linux-edac
    Cc: Mauro Carvalho Chehab
    Cc: Shirish S
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: x86-ml
    Link: https://lkml.kernel.org/r/20190201225534.8177-4-Yazen.Ghannam@amd.com

    Yazen Ghannam
     
  • The existing CS, PSP, and SMU SMCA bank types will see new versions (as
    indicated by their McaTypes) in future SMCA systems.

    Add the new (HWID, MCATYPE) tuples for these new versions. Reuse the
    same names as the older versions, since they are logically the same to
    the user. SMCA systems won't mix and match IP blocks with different
    McaType versions in the same system, so there isn't a need to
    distinguish them. The MCA_IPID register is saved when logging an MCA
    error, and that can be used to triage the error.

    Also, add the new error descriptions to edac_mce_amd. Some error types
    (positions in the list) are overloaded compared to the previous
    McaTypes. Therefore, just create new lists of the error descriptions to
    keep things simple even if some of the error descriptions are the same
    between versions.

    Signed-off-by: Yazen Ghannam
    Signed-off-by: Borislav Petkov
    Cc: Arnd Bergmann
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Kees Cook
    Cc: linux-edac
    Cc: Mauro Carvalho Chehab
    Cc: Pu Wen
    Cc: Qiuxu Zhuo
    Cc: Shirish S
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Vishal Verma
    Cc: x86-ml
    Link: https://lkml.kernel.org/r/20190201225534.8177-3-Yazen.Ghannam@amd.com

    Yazen Ghannam
     
  • Add the (HWID, MCATYPE) tuples and names for the new MP5, NBIO, and
    PCIE SMCA bank types.

    Also, add their respective error descriptions to the MCE decoding module
    edac_mce_amd.

    Signed-off-by: Yazen Ghannam
    Signed-off-by: Borislav Petkov
    Cc: Arnd Bergmann
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Kees Cook
    Cc: linux-edac
    Cc: Mauro Carvalho Chehab
    Cc: Pu Wen
    Cc: Qiuxu Zhuo
    Cc: Shirish S
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Vishal Verma
    Cc: x86-ml
    Link: https://lkml.kernel.org/r/20190201225534.8177-2-Yazen.Ghannam@amd.com

    Yazen Ghannam
     

02 Feb, 2019

3 commits

  • This driver supports the Intel 10nm series server integrated memory
    controller. It gets the memory capacity and topology information by
    reading the registers in PCI configuration space and memory-mapped I/O.

    It decodes the memory error address to the platform specific address
    by using the ACPI Address Translation (ADXL) Device Specific Method
    (DSM).

    Co-developed-by: Tony Luck
    Signed-off-by: Qiuxu Zhuo
    Signed-off-by: Tony Luck
    Signed-off-by: Borislav Petkov
    Cc: James Morse
    Cc: Mauro Carvalho Chehab
    Cc: linux-edac
    Link: https://lkml.kernel.org/r/20190130191519.15393-5-tony.luck@intel.com

    Qiuxu Zhuo
     
  • Delete the duplicated code from skx_edac.c and rename skx_edac.c to
    skx_base.c. Update the Makefile to build the skx_edac driver from
    skx_base.c and skx_common.c.

    Add SPDX to skx_base.c and clean out unnecessary #include lines.

    [ bp: Drop the license boilerplate - there's an SPDX identifier now. ]

    Co-developed-by: Tony Luck
    Signed-off-by: Qiuxu Zhuo
    Signed-off-by: Tony Luck
    Signed-off-by: Borislav Petkov
    Cc: James Morse
    Cc: Mauro Carvalho Chehab
    Cc: linux-edac
    Link: https://lkml.kernel.org/r/20190130191519.15393-4-tony.luck@intel.com

    Qiuxu Zhuo
     
  • Parts of skx_edac can be shared with the Intel 10nm server EDAC driver.

    Carve out the common parts from skx_edac in preparation to support both
    skx_edac driver and i10nm_edac drivers.

    Co-developed-by: Tony Luck
    Signed-off-by: Qiuxu Zhuo
    Signed-off-by: Tony Luck
    Signed-off-by: Borislav Petkov
    Cc: James Morse
    Cc: Mauro Carvalho Chehab
    Cc: linux-edac
    Link: https://lkml.kernel.org/r/20190130191519.15393-3-tony.luck@intel.com

    Qiuxu Zhuo
     

25 Jan, 2019

1 commit

  • Correct the persistent register offset where address and status are
    stored.

    Fixes: 08f08bfb7b4c ("EDAC, altera: Merge Stratix10 into the Arria10 SDRAM probe routine")
    Signed-off-by: Thor Thayer
    Signed-off-by: Borislav Petkov
    Cc: James Morse
    Cc: Mauro Carvalho Chehab
    Cc: devicetree@vger.kernel.org
    Cc: dinguyen@kernel.org
    Cc: linux-edac
    Cc: mark.rutland@arm.com
    Cc: robh+dt@kernel.org
    Cc: stable
    Link: https://lkml.kernel.org/r/1548179287-21760-2-git-send-email-thor.thayer@linux.intel.com

    Thor Thayer
     

23 Jan, 2019

1 commit

  • When calling debugfs functions, there is no need to ever check the
    return value. The function can work or not, but the code logic should
    never do something different based on this.

    [ bp: Make edac_debugfs_init() return void too, while at it. ]

    Signed-off-by: Greg Kroah-Hartman
    Signed-off-by: Borislav Petkov
    Cc: James Morse
    Cc: Mauro Carvalho Chehab
    Cc: linux-edac
    Link: https://lkml.kernel.org/r/20190122152151.16139-17-gregkh@linuxfoundation.org

    Greg Kroah-Hartman
     

18 Jan, 2019

1 commit

  • Add support for the Aspeed AST2500 SoC.

    Signed-off-by: Stefan M Schaeckeler
    Signed-off-by: Borislav Petkov
    Cc: Andrew Jeffery
    Cc: Joel Stanley
    Cc: Mark Rutland
    Cc: Mauro Carvalho Chehab
    Cc: Rob Herring
    Cc: devicetree@vger.kernel.org
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: linux-aspeed@lists.ozlabs.org
    Cc: linux-edac
    Link: https://lkml.kernel.org/r/1547743097-5236-2-git-send-email-schaecsn@gmx.net

    Stefan M Schaeckeler
     

19 Dec, 2018

1 commit

  • The Freescale ddr driver also works on the LS1021A board.

    Signed-off-by: Patrick Havelange
    Signed-off-by: Borislav Petkov
    Cc: Mauro Carvalho Chehab
    Cc: York Sun
    Cc: arnout.vandecappelle@essensium.com
    Cc: linux-edac
    Cc: matthew.weber@rockwellcollins.com
    Cc: patrick.havelange@essensium.com
    Link: https://lkml.kernel.org/r/20181219104323.10324-1-patrick.havelange@essensium.com

    Patrick Havelange
     

11 Dec, 2018

1 commit

  • Remove unused local variables as reported by gcc's -Wunused-but-set-variable option.

    [ bp: simplify commit message. ]

    Signed-off-by: YueHaibing
    Signed-off-by: Borislav Petkov
    Cc: Mauro Carvalho Chehab
    Cc: linux-edac
    Link: https://lkml.kernel.org/r/20181211095207.25936-1-yuehaibing@huawei.com

    YueHaibing
     

21 Nov, 2018

1 commit