05 Oct, 2016

1 commit

  • Pull EDAC updates from Borislav Petkov:
    "A lot of movement in the EDAC tree this time around, coarse summary
    below:

    - Altera Arria10 enablement of NAND, DMA, USB, QSPI and SD-MMC FIFO
    buffers (Thor Thayer)

    - split the memory controller part out of mpc85xx and share it with a
    new Freescale ARM Layerscape driver (York Sun)

    - amd64_edac fixes (Yazen Ghannam)

    - misc cleanups, refactoring and fixes all over the place"

    * tag 'edac_for_4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: (37 commits)
    EDAC, altera: Add IRQ Flags to disable IRQ while handling
    EDAC, altera: Correct EDAC IRQ error message
    EDAC, amd64: Autoload module using x86_cpu_id
    EDAC, sb_edac: Remove NULL pointer check on array pci_tad
    EDAC: Remove NO_IRQ from powerpc-only drivers
    EDAC, fsl_ddr: Fix error return code in fsl_mc_err_probe()
    EDAC, fsl_ddr: Add entry to MAINTAINERS
    EDAC: Move Doug Thompson to CREDITS
    EDAC, I3000: Orphan driver
    EDAC, fsl_ddr: Replace simple_strtoul() with kstrtoul()
    EDAC, layerscape: Add Layerscape EDAC support
    EDAC, fsl_ddr: Fix IRQ dispose warning when module is removed
    EDAC, fsl_ddr: Add support for little endian
    EDAC, fsl_ddr: Add missing DDR DRAM types
    EDAC, fsl_ddr: Rename macros and names
    EDAC, fsl-ddr: Separate FSL DDR driver from MPC85xx
    EDAC, mpc85xx: Replace printk() with pr_* format
    EDAC, mpc85xx: Drop setting/clearing RFXE bit in HID1
    EDAC, altera: Rename MC trigger to common name
    EDAC, altera: Rename device trigger to common name
    ...

    Linus Torvalds
     

23 Sep, 2016

2 commits

  • Add the IRQF_ONESHOT and IRQF_TRIGGER_HIGH flags to disable the IRQ
    while executing the IRQ handler. Remove the IRQF_SHARED because these
    are not shared IRQs in the domain. Exposed when flooding IRQs.

    Signed-off-by: Thor Thayer
    Cc: linux-edac
    Link: http://lkml.kernel.org/r/1474582419-7053-2-git-send-email-tthayer@opensource.altera.com
    Signed-off-by: Borislav Petkov

    Thor Thayer
     
  • Correct the error message sent out in the case of a single bit error IRQ
    allocation.

    Signed-off-by: Thor Thayer
    Cc: linux-edac
    Link: http://lkml.kernel.org/r/1474582419-7053-1-git-send-email-tthayer@opensource.altera.com
    Signed-off-by: Borislav Petkov

    Thor Thayer
     

21 Sep, 2016

1 commit


13 Sep, 2016

7 commits

  • Bank 4 is reserved on family 0x17 and shouldn't generate any MCE
    records. However, broken hardware and software is not something unheard
    of so warn about bank 4 errors. They shouldn't be coming from bank 4
    naturally but users can still use mce_amd_inj to simulate errors from it
    for testing purposed.

    Also, avoid special handling in the injector mce_amd_inj like it is
    being done on the older families.

    [ bp: Rewrite commit message and merge into one patch. Use boot_cpu_data. ]

    Signed-off-by: Yazen Ghannam
    Signed-off-by: Borislav Petkov
    Reviewed-by: Aravind Gopalakrishnan
    Link: http://lkml.kernel.org/r/1473384591-5323-1-git-send-email-Yazen.Ghannam@amd.com
    Link: http://lkml.kernel.org/r/1473384591-5323-2-git-send-email-Yazen.Ghannam@amd.com
    Signed-off-by: Thomas Gleixner

    Yazen Ghannam
     
  • The MCA_SYND and MCA_IPID registers contain valuable information and
    should be included in MCE output. The MCA_SYND register contains
    syndrome and other error information, and the MCA_IPID register will
    uniquely identify the MCA bank's type without having to rely on system
    software.

    Signed-off-by: Yazen Ghannam
    Signed-off-by: Borislav Petkov
    Link: http://lkml.kernel.org/r/1472680624-34221-2-git-send-email-Yazen.Ghannam@amd.com
    Signed-off-by: Thomas Gleixner

    Yazen Ghannam
     
  • Scalable MCA defines a number of IP types. An MCA bank on an SMCA
    system is defined as one of these IP types. A bank's type is uniquely
    identified by the combination of the HWID and MCATYPE values read from
    its MCA_IPID register.

    Add the required tables in order to be able to lookup error descriptions
    based on a bank's type and the error's extended error code.

    [ bp: Align comments, simplify a bit. ]

    Signed-off-by: Yazen Ghannam
    Signed-off-by: Borislav Petkov
    Link: http://lkml.kernel.org/r/1472741832-1690-1-git-send-email-Yazen.Ghannam@amd.com
    Signed-off-by: Thomas Gleixner

    Yazen Ghannam
     
  • The error descriptions defined for Fam17h can be reused for other SMCA
    systems, so their names should reflect this.

    Change f17h prefix to smca for error descriptions.

    Signed-off-by: Yazen Ghannam
    Signed-off-by: Borislav Petkov
    Link: http://lkml.kernel.org/r/1472673994-12235-4-git-send-email-Yazen.Ghannam@amd.com
    Signed-off-by: Thomas Gleixner

    Yazen Ghannam
     
  • Add missing SMCA error descriptions to the error descriptions arrays.

    Signed-off-by: Yazen Ghannam
    Signed-off-by: Borislav Petkov
    Link: http://lkml.kernel.org/r/1472673994-12235-3-git-send-email-Yazen.Ghannam@amd.com
    Signed-off-by: Thomas Gleixner

    Yazen Ghannam
     
  • Print SyndV bit status and print the raw value of the MCA_SYND register.
    Further decoding of the syndrome from struct mce.synd can be done in
    other places where appropriate, e.g. DRAM ECC.

    Boris: make the error stanza more compact by putting the error address
    and syndrome on the same line:

    [Hardware Error]: Corrected error, no action required.
    [Hardware Error]: CPU:2 (17:0:0) MC4_STATUS[-|CE|-|PCC|AddrV|-|-|SyndV|CECC]: 0x96204100001e0117
    [Hardware Error]: Error Addr: 0x000000007f4c52e3, Syndrome: 0x0000000000000000
    [Hardware Error]: Invalid IP block specified.
    [Hardware Error]: cache level: L3/GEN, tx: DATA, mem-tx: RD

    Signed-off-by: Yazen Ghannam
    Signed-off-by: Borislav Petkov
    Link: http://lkml.kernel.org/r/1467633035-32080-2-git-send-email-Yazen.Ghannam@amd.com
    Signed-off-by: Thomas Gleixner

    Yazen Ghannam
     
  • pvt->pci_tad is a NUM_CHANNELS array of struct pci_dev pointers and
    hence cannot be NULL, so the NULL pointer check on pci_tad is redundant.
    Remove it.

    Signed-off-by: Colin Ian King
    Acked-by: Tony Luck
    Cc: linux-edac
    Link: http://lkml.kernel.org/r/20160908083801.14766-1-colin.king@canonical.com
    Signed-off-by: Borislav Petkov

    Colin Ian King
     

12 Sep, 2016

1 commit

  • We'd like to eventually remove NO_IRQ on powerpc, so remove usages of it
    from powerpc-only drivers.

    The pdata structs are kzalloc'ed, so we don't need to initialise those
    to 0, we can just drop the assignments entirely.

    Signed-off-by: Michael Ellerman
    Cc: Arnd Bergmann
    Cc: Johannes Thumshirn
    Cc: linux-edac
    Cc: linuxppc-dev@ozlabs.org
    Link: http://lkml.kernel.org/r/1473674436-19467-1-git-send-email-mpe@ellerman.id.au
    Signed-off-by: Borislav Petkov

    Michael Ellerman
     

10 Sep, 2016

1 commit


01 Sep, 2016

11 commits

  • Replace obsolete simple_strtoul() with kstrtoul().

    Signed-off-by: York Sun
    Cc: linux-edac
    Link: http://lkml.kernel.org/r/1471990593-27536-1-git-send-email-york.sun@nxp.com
    Signed-off-by: Borislav Petkov

    York Sun
     
  • Add DDR EDAC driver for ARM-based compatible controllers. Both
    big-endian and little-endian are supported, as specified in device tree.

    Signed-off-by: York Sun
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: linux-edac
    Link: http://lkml.kernel.org/r/1471990465-27443-1-git-send-email-york.sun@nxp.com
    Signed-off-by: Borislav Petkov

    York Sun
     
  • When compiled as a module, removing it causes kernel warnings
    when irq_dispose_mapping() is called. Instead of calling
    irq_of_parse_and_map(), use platform_get_irq() to acquire the IRQ
    number.

    Signed-off-by: York Sun
    Cc: linux-edac
    Cc: morbidrsa@gmail.com
    Cc: oss@buserror.net
    Cc: stuart.yoder@nxp.com
    Link: http://lkml.kernel.org/r/1470779760-16483-8-git-send-email-york.sun@nxp.com
    Signed-off-by: Borislav Petkov

    York Sun
     
  • Get endianness from device tree. Both big endian and little endian are
    supported. Default to big endian for backwards compatibility to MPC85xx.

    Signed-off-by: York Sun
    Acked-by: Rob Herring
    Cc: devicetree@vger.kernel.org
    Cc: linux-edac
    Cc: morbidrsa@gmail.com
    Cc: oss@buserror.net
    Cc: stuart.yoder@nxp.com
    Link: http://lkml.kernel.org/r/1470779760-16483-7-git-send-email-york.sun@nxp.com
    Signed-off-by: Borislav Petkov

    York Sun
     
  • The compatible DDR controllers may support DDR, DDR2, DDR3, DDR4 DRAM.
    An individual controller doesn't support all of them. The EDAC driver
    reads SDRAM_CFG to determine which mode is configured.

    Add DDR4 and drop the defines used only in the mtype assignment.

    Signed-off-by: York Sun
    Cc: linux-edac
    Cc: morbidrsa@gmail.com
    Cc: oss@buserror.net
    Cc: stuart.yoder@nxp.com
    Link: http://lkml.kernel.org/r/1470779760-16483-6-git-send-email-york.sun@nxp.com
    Signed-off-by: Borislav Petkov

    York Sun
     
  • Use FSL-specific prefix for macros, variables and functions.

    Signed-off-by: York Sun
    Cc: Johannes Thumshirn
    Cc: linux-edac
    Cc: oss@buserror.net
    Cc: stuart.yoder@nxp.com
    Link: http://lkml.kernel.org/r/1470779760-16483-5-git-send-email-york.sun@nxp.com
    Signed-off-by: Borislav Petkov

    York Sun
     
  • The mpc85xx-compatible DDR controllers are used on ARM-based SoCs too.
    Carve out the DDR part from the mpc85xx EDAC driver in preparation to
    support both architectures.

    Signed-off-by: York Sun
    Cc: Johannes Thumshirn
    Cc: linux-edac
    Cc: oss@buserror.net
    Cc: stuart.yoder@nxp.com
    Link: http://lkml.kernel.org/r/1470946525-3410-1-git-send-email-york.sun@nxp.com
    Signed-off-by: Borislav Petkov

    York Sun
     
  • Replace printk() with pr_err/pr_warn/pr_info macros.

    Signed-off-by: York Sun
    Cc: Johannes Thumshirn
    Cc: linux-edac
    Cc: oss@buserror.net
    Cc: stuart.yoder@nxp.com
    Link: http://lkml.kernel.org/r/1470779760-16483-3-git-send-email-york.sun@nxp.com
    [ Boris: unbreak strings for easier greppability. ]
    Signed-off-by: Borislav Petkov

    York Sun
     
  • On e500v1, read fault exception enable (RFXE) controls whether assertion
    of core_fault_in causes a machine check interrupt. Assertion of
    core_fault_in can result from uncorrectable data error, such as an L2
    multi-bit ECC error. It can also occur from a system error if logic on
    the integrated device signals a fault for nonfatal errors. RFXE bit is
    cleared out of reset, and should be left clear for normal operation.
    Assertion of core_fault_in does not cause a machine check.

    RFXE is set specifically for RIO (Rapid IO) and PCI for book E to catch
    the errors by machine check. With this bit set, the EDAC driver can't
    get the interrupt in case of uncorrectable error. So this bit is cleared
    in favor of EDAC. However, the benefit of catching such uncorrectable
    error doesn't outweigh the other errors which may hang the system.
    Besides, e500v2 has different errors masked by RFXE, and e500mc doesn't
    support this bit. It is more reasonable to leave RFXE as is in the EDAC
    driver, and leave the uncorrectable errors triggering machine check for
    e500v1.

    Suggested-by: Scott Wood
    Signed-off-by: York Sun
    Cc: Johannes Thumshirn
    Cc: linux-edac
    Cc: oss@buserror.net
    Cc: stuart.yoder@nxp.com
    Link: http://lkml.kernel.org/r/1470779760-16483-2-git-send-email-york.sun@nxp.com
    Signed-off-by: Borislav Petkov

    York Sun
     
  • Rename the Memory Controller debug trigger to the same common name as
    the EDAC devices.

    Signed-off-by: Thor Thayer
    Cc: linux-edac
    Link: http://lkml.kernel.org/r/1471622666-15197-3-git-send-email-tthayer@opensource.altera.com
    Signed-off-by: Borislav Petkov

    Thor Thayer
     
  • The L2 and OCRAM devices have different ecc trigger names than the other
    EDAC devices (FIFO peripherals). Make them all the same and remove the
    character array from the device structure.

    Signed-off-by: Thor Thayer
    Cc: linux-edac
    Link: http://lkml.kernel.org/r/1471622666-15197-2-git-send-email-tthayer@opensource.altera.com
    Signed-off-by: Borislav Petkov

    Thor Thayer
     

22 Aug, 2016

1 commit

  • This is an entirely new driver instead of yet another set of patches
    to sb_edac.c because:

    1) Mapping from PCI devices to socket/memory controller is significantly
    different. Skylake scatters devices on a socket across a number of
    PCI buses.
    2) There is an extra level of interleaving via the "mcroute" register
    that would be a little messy to squeeze into the old driver.
    3) Validation is getting too expensive. Changes to sb_edac need to
    be checked against Sandy Bridge, Ivy Bridge, Haswell, Broadwell and
    Knights Landing.

    Acked-by: Aristeu Rozanski
    Acked-by: Borislav Petkov
    Signed-off-by: Tony Luck
    Signed-off-by: Linus Torvalds

    Tony Luck
     

18 Aug, 2016

1 commit

  • According to the reference manual of MPC8572 and T4240, bit 31 of
    PEX_ERR_CAP_STAT is W1C (write 1 to clear).

    Add the corresponding write to PEX_ERR_CAP_STAT in order to fix the PCIe
    error capture.

    Tested on a T4240 processor.

    Signed-off-by: Tillmann Heidsieck
    Acked-by: Johannes Thumshirn
    Cc: linux-edac
    Link: http://lkml.kernel.org/r/20160815190849.29327-1-theidsieck@leenox.de
    Signed-off-by: Borislav Petkov

    Tillmann Heidsieck
     

15 Aug, 2016

1 commit

  • Replace the deprecated create_singlethread_workqueue() with
    alloc_ordered_workqueue() with WQ_MEM_RECLAIM. This is the identity
    conversion.

    It's not recommended to stall it from memory pressure. Hence,
    WQ_MEM_RECLAIM has been set to ensure forward progress under memory
    pressure.

    Signed-off-by: Bhaktipriya Shridhar
    Cc: Tejun Heo
    Cc: linux-edac
    Link: http://lkml.kernel.org/r/20160813164124.GA9077@Karyakshetra
    Signed-off-by: Borislav Petkov

    Bhaktipriya Shridhar
     

11 Aug, 2016

1 commit

  • Fix the following sparse warning:

    drivers/edac/altera_edac.c:1649:23: warning:
    symbol 'a10_eccmgr_ic_ops' was not declared. Should it be static?

    Signed-off-by: Wei Yongjun
    Reviewed-by: Thor Thayer
    Cc: linux-edac
    Cc: lkml
    Link: http://lkml.kernel.org/r/1470836667-11822-1-git-send-email-weiyj.lk@gmail.com
    Signed-off-by: Borislav Petkov

    Wei Yongjun
     

10 Aug, 2016

1 commit

  • Add Altera Arria10 SD-MMC FIFO memory EDAC support. The SD-MMC is a
    dual port RAM implementation which is different than any of the other
    peripherals and therefore requires additional code.

    Signed-off-by: Thor Thayer
    Cc: dinguyen@opensource.altera.com
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: linux-edac
    Link: http://lkml.kernel.org/r/1470753653-23465-3-git-send-email-tthayer@opensource.altera.com
    Signed-off-by: Borislav Petkov

    Thor Thayer
     

08 Aug, 2016

6 commits

  • Fam15hMod60h systems are using the channel decode of Fam15hMod30h which
    gives incorrect results. Fam15hMod60h systems should use the generic
    channel decode method plus a couple more cases.

    Signed-off-by: Yazen Ghannam
    Cc: Aravind Gopalakrishnan
    Cc: linux-edac
    Link: http://lkml.kernel.org/r/1470236355-30039-1-git-send-email-Yazen.Ghannam@amd.com
    Signed-off-by: Borislav Petkov

    Yazen Ghannam
     
  • Add Altera Arria10 QSPI FIFO memory support.

    Signed-off-by: Thor Thayer
    Cc: dinguyen@opensource.altera.com
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: linux-edac
    Link: http://lkml.kernel.org/r/1468512408-5156-9-git-send-email-tthayer@opensource.altera.com
    Signed-off-by: Borislav Petkov

    Thor Thayer
     
  • Add Altera Arria10 USB FIFO memory support.

    Signed-off-by: Thor Thayer
    Cc: dinguyen@opensource.altera.com
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: linux-edac
    Link: http://lkml.kernel.org/r/1468512408-5156-8-git-send-email-tthayer@opensource.altera.com
    Signed-off-by: Borislav Petkov

    Thor Thayer
     
  • Add Altera Arria10 DMA FIFO memory support.

    Signed-off-by: Thor Thayer
    Cc: dinguyen@opensource.altera.com
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: linux-edac
    Link: http://lkml.kernel.org/r/1468512408-5156-7-git-send-email-tthayer@opensource.altera.com
    Signed-off-by: Borislav Petkov

    Thor Thayer
     
  • Add Altera Arria10 NAND FIFO memory support.

    Signed-off-by: Thor Thayer
    Cc: dinguyen@opensource.altera.com
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: linux-edac
    Link: http://lkml.kernel.org/r/1468512408-5156-6-git-send-email-tthayer@opensource.altera.com
    [ Reformat loop in altr_edac_a10_probe() for better readability. ]
    Signed-off-by: Borislav Petkov

    Thor Thayer
     
  • On Intel Xeon Phi Knights Landing processor family the channels of the
    memory controller have untypical arrangement - MC0 is mapped to CH3,4,5
    and MC1 is mapped to CH0,1,2. This causes the EDAC driver to report the
    channel name incorrectly.

    We missed this change earlier, so the code already contains similar
    comment, but the translation function is incorrect.

    Without this patch:
    errors in DIMM_A and DIMM_D were reported in DIMM_D
    errors in DIMM_B and DIMM_E were reported in DIMM_E
    errors in DIMM_C and DIMM_F were reported in DIMM_F

    Correct this.

    Hubert Chrzaniuk:
    - rebased to 4.8
    - comments and code cleanup

    Fixes: d0cdf9003140 ("sb_edac: Add Knights Landing (Xeon Phi gen 2) support")
    Reviewed-by: Tony Luck
    Cc: Mauro Carvalho Chehab
    Cc: Hubert Chrzaniuk
    Cc: linux-edac
    Cc: lukasz.anaczkowski@intel.com
    Cc: lukasz.odzioba@intel.com
    Cc: mchehab@kernel.org
    Cc: # v4.5..
    Link: http://lkml.kernel.org/r/1469231089-22837-1-git-send-email-lukasz.odzioba@intel.com
    Signed-off-by: Lukasz Odzioba
    [ Boris: Simplify a bit by removing char mc. ]
    Signed-off-by: Borislav Petkov

    Lukasz Odzioba
     

28 Jul, 2016

1 commit

  • Pull EDAC updates from Borislav Petkov:
    "This last cycle, Thor was busy adding Arria10 eth FIFO support to the
    altera_edac driver along with other improvements. We have two
    cleanups/fixes too.

    Summary:

    - Altera Arria10 ethernet FIFO buffer support (Thor Thayer)

    - Minor cleanups"

    * tag 'edac_for_4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
    ARM: dts: Add Arria10 Ethernet EDAC devicetree entry
    EDAC, altera: Add Arria10 Ethernet EDAC support
    EDAC, altera: Add Arria10 ECC memory init functions
    Documentation: dt: socfpga: Add Arria10 Ethernet binding
    EDAC, altera: Drop some ifdeffery
    EDAC, altera: Add panic flag check to A10 IRQ
    EDAC, altera: Check parent status for Arria10 EDAC block
    EDAC, altera: Make all private data structures static
    EDAC: Correct channel count limit
    EDAC, amd64_edac: Init opstate at the proper time during init
    EDAC, altera: Handle Arria10 SDRAM child node
    EDAC, altera: Add ECC Manager IRQ controller support
    Documentation: dt: socfpga: Add interrupt-controller to ecc-manager

    Linus Torvalds
     

16 Jul, 2016

1 commit

  • In commit 2c1ea4c700af ("EDAC, sb_edac: Use cpu family/model in driver
    detection") I broke Knights Landing because I failed to notice that it
    called a wrapper macro "sbridge_get_all_devices_knl" instead of
    "sbridge_get_all_devices" like all the other types.

    Now that we include the processor type in the pci_id_table structure we
    can skip the wrappers and just have the sbridge_get_all_devices() check
    the type to decide whether to allow duplicate devices and controllers to
    have registers spread across buses.

    Fixes: 2c1ea4c700af ("EDAC, sb_edac: Use cpu family/model in driver detection")
    Tested-by: Lukasz Odzioba
    Acked-by: Aristeu Rozanski
    Signed-off-by: Tony Luck
    Signed-off-by: Linus Torvalds

    Tony Luck
     

25 Jun, 2016

2 commits

  • Add Altera Arria10 Ethernet FIFO memory EDAC support. Update to support
    a common compatibility string for all Ethernet FIFOs in the DT.

    Signed-off-by: Thor Thayer
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: linux-edac
    Link: http://lkml.kernel.org/r/1466603939-7526-8-git-send-email-tthayer@opensource.altera.com
    Signed-off-by: Borislav Petkov

    Thor Thayer
     
  • In preparation for additional memory module ECCs, add the memory
    initialization functions and helpers.

    Signed-off-by: Thor Thayer
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: linux-edac
    Link: http://lkml.kernel.org/r/1466603939-7526-7-git-send-email-tthayer@opensource.altera.com
    Signed-off-by: Borislav Petkov

    Thor Thayer
     

24 Jun, 2016

1 commit

  • Make the IRQ and check_deps() functions available to all the memory
    buffers by moving them outside of the OCRAM only area.

    Signed-off-by: Thor Thayer
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: linux-edac
    Link: http://lkml.kernel.org/r/1466603939-7526-5-git-send-email-tthayer@opensource.altera.com
    Signed-off-by: Borislav Petkov

    Thor Thayer