07 Nov, 2015

1 commit

  • Pull asm-generic cleanups from Arnd Bergmann:
    "The asm-generic changes for 4.4 are mostly a series from Christoph
    Hellwig to clean up various abuses of headers in there. The patch to
    rename the io-64-nonatomic-*.h headers caused some conflicts with new
    users, so I added a workaround that we can remove in the next merge
    window.

    The only other patch is a warning fix from Marek Vasut"

    * tag 'asm-generic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
    asm-generic: temporarily add back asm-generic/io-64-nonatomic*.h
    asm-generic: cmpxchg: avoid warnings from macro-ized cmpxchg() implementations
    gpio-mxc: stop including
    n_tracesink: stop including
    n_tracerouter: stop including
    mlx5: stop including
    hifn_795x: stop including
    drbd: stop including
    move count_zeroes.h out of asm-generic
    move io-64-nonatomic*.h out of asm-generic

    Linus Torvalds
     

04 Nov, 2015

1 commit

  • Pull RAS changes from Ingo Molnar:
    "The main system reliability related changes were from x86, but also
    some generic RAS changes:

    - AMD MCE error injection subsystem enhancements. (Aravind
    Gopalakrishnan)

    - Fix MCE and CPU hotplug interaction bug. (Ashok Raj)

    - kcrash bootup robustness fix. (Baoquan He)

    - kcrash cleanups. (Borislav Petkov)

    - x86 microcode driver rework: simplify it by unmodularizing it and
    other cleanups. (Borislav Petkov)"

    * 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
    x86/mce: Add a default case to the switch in __mcheck_cpu_ancient_init()
    x86/mce: Add a Scalable MCA vendor flags bit
    MAINTAINERS: Unify the microcode driver section
    x86/microcode/intel: Move #ifdef DEBUG inside the function
    x86/microcode/amd: Remove maintainers from comments
    x86/microcode: Remove modularization leftovers
    x86/microcode: Merge the early microcode loader
    x86/microcode: Unmodularize the microcode driver
    x86/mce: Fix thermal throttling reporting after kexec
    kexec/crash: Say which char is the unrecognized
    x86/setup/crash: Check memblock_reserve() retval
    x86/setup/crash: Cleanup some more
    x86/setup/crash: Remove alignment variable
    x86/setup: Cleanup crashkernel reservation functions
    x86/amd_nb, EDAC: Rename amd_get_node_id()
    x86/setup: Do not reserve crashkernel high memory if low reservation failed
    x86/microcode/amd: Do not overwrite final patch levels
    x86/microcode/amd: Extract current patch level read to a function
    x86/ras/mce_amd_inj: Inject bank 4 errors on the NBC
    x86/ras/mce_amd_inj: Trigger deferred and thresholding errors interrupts
    ...

    Linus Torvalds
     

23 Oct, 2015

1 commit

  • The PAGES_TO_MiB macro is used for unit conversion but the
    trace_mc_event() tracepoint expects a page address. Fix that.

    Signed-off-by: Tan Xiaojun
    Cc: Mauro Carvalho Chehab
    Cc: linux-edac
    Link: http://lkml.kernel.org/r/1445341538-24271-1-git-send-email-tanxiaojun@huawei.com
    Signed-off-by: Borislav Petkov

    Tan Xiaojun
     

21 Oct, 2015

1 commit

  • This function doesn't give us the "Node ID" as the function name
    suggests. Rather, it receives a PCI device as argument, checks
    the available F3 PCI device IDs in the system and returns the
    index of the matching Bus/Device IDs.

    Rename it to amd_pci_dev_to_node_id().

    No functional change is introduced.

    Suggested-by: Ingo Molnar
    Signed-off-by: Aravind Gopalakrishnan
    Signed-off-by: Borislav Petkov
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Mauro Carvalho Chehab
    Cc: Peter Zijlstra
    Cc: Suravee Suthikulpanit
    Cc: Thomas Gleixner
    Cc: linux-edac
    Link: http://lkml.kernel.org/r/1445246268-26285-3-git-send-email-bp@alien8.de
    Signed-off-by: Ingo Molnar

    Aravind Gopalakrishnan
     

15 Oct, 2015

3 commits

  • The bootloader may or may not enable the ECC_CORR_EN bit. By
    not enabling ECC_CORR_EN, when error happens, it is the user's
    responsibility to perform a full SDRAM scrub.

    Remove the check for ECC_CORR_EN.

    Signed-off-by: Dinh Nguyen
    Cc: linux-edac
    Cc: Mauro Carvalho Chehab
    Cc: Thor Thayer
    Link: http://lkml.kernel.org/r/1444864456-21778-1-git-send-email-dinguyen@opensource.altera.com
    Signed-off-by: Borislav Petkov

    Dinh Nguyen
     
  • These are not implementations of default architecture code but helpers
    for drivers. Move them to the place they belong to.

    Signed-off-by: Christoph Hellwig
    Acked-by: Darren Hart
    Acked-by: Hitoshi Mitake
    Signed-off-by: Arnd Bergmann

    Christoph Hellwig
     
  • debugfs_remove() is used to remove a file or a directory from the
    debugfs filesystem, but mci->debugfs might not empty.

    This can be triggered by the following sequence:

    1) Enable CONFIG_EDAC_DEBUG
    2) insmod an EDAC module (like i3000_edac or similar)
    3) rmmod this module
    4) we can see files remaining under /edac/ like
    "fake_inject", for example.

    Removing edac_core then, causes a NULL pointer dereference.

    Reported-by: Yun Wu (Abel)
    Signed-off-by: Tan Xiaojun
    Cc: Doug Thompson
    Cc: Mauro Carvalho Chehab
    Cc: linux-edac
    Link: http://lkml.kernel.org/r/1444787364-104353-1-git-send-email-tanxiaojun@huawei.com
    Signed-off-by: Borislav Petkov

    Tan Xiaojun
     

03 Oct, 2015

1 commit


29 Sep, 2015

2 commits

  • Git provides us all the changelogs anyway. So trim the comments section
    here. Update the copyrights info while at it.

    Signed-off-by: Aravind Gopalakrishnan
    Cc: linux-edac
    Link: http://lkml.kernel.org/r/1443440593-2316-3-git-send-email-Aravind.Gopalakrishnan@amd.com
    Signed-off-by: Borislav Petkov

    Aravind Gopalakrishnan
     
  • The scrub rate control register has moved to function 2 in PCI config
    space and is at a different offset on family 0x15, models 0x60 and
    later. The minimum recommended scrub rate has also changed. (Refer to
    D18F2x1c9_dct[1:0][DramScrub] in Fam15hM60h BKDG).

    Adjust set_scrub_rate() and get_scrub_rate() functions to accommodate
    this.

    Tested on F15hM60h, Fam15h, models 00h-0fh and Fam10h systems.

    Signed-off-by: Aravind Gopalakrishnan
    Cc: linux-edac
    Link: http://lkml.kernel.org/r/1443440593-2316-2-git-send-email-Aravind.Gopalakrishnan@amd.com
    [ Cleanup conditionals. ]
    Signed-off-by: Borislav Petkov

    Aravind Gopalakrishnan
     

28 Sep, 2015

1 commit

  • Updating dimm_label to an empty string does not make much sense. Change
    the sysfs dimm_label store operation to fail a request when an input
    string is empty.

    Suggested-by: Borislav Petkov
    Signed-off-by: Toshi Kani
    Cc: elliott@hpe.com
    Cc: Mauro Carvalho Chehab
    Cc: Tony Luck
    Cc: linux-edac
    Link: http://lkml.kernel.org/r/1443124767.25474.172.camel@hpe.com
    Signed-off-by: Borislav Petkov

    Toshi Kani
     

26 Sep, 2015

2 commits

  • Sysfs "dimm_label" and "chX_dimm_label" nodes have the following issues
    in their store operation:

    1) A newline-terminated input string causes redundant newlines:

    # echo "test" > /sys/bus/mc0/devices/dimm0/dimm_label
    # cat /sys/bus/mc0/devices/dimm0/dimm_label
    test

    # od -bc /sys/bus/mc0/devices/dimm0/dimm_label
    0000000 164 145 163 164 012 012
    t e s t \n \n
    0000006

    2) The original label string (31 characters) cannot be stored due to
    an improper size check:

    # echo "CPU_SrcID#0_Ha#0_Chan#0_DIMM#0" > /sys/bus/mc0/devices/dimm0/dimm_label
    # cat /sys/bus/mc0/devices/dimm0/dimm_label

    # od -bc /sys/bus/mc0/devices/dimm0/dimm_label
    0000000 012 012
    \n \n
    0000002

    3) An input string longer than the buffer size results a wrong label
    info as it allows a retry with the remaining string:

    # echo "CPU_SrcID#0_Ha#0_Chan#0_DIMM#0_TEST" > /sys/bus/mc0/devices/dimm0/dimm_label
    # cat /sys/bus/mc0/devices/dimm0/dimm_label
    _TEST

    Fix these issues by making the following changes:
    1) Replace a newline character at the end by setting a null. It also
    assures that the string is null-terminated in the label buffer.
    2) Check the label buffer size with 'sizeof(dimm->label)'.
    3) Fail a request if its string exceeds the label buffer size.

    Signed-off-by: Toshi Kani
    Acked-by: Tony Luck
    Cc: linux-edac
    Cc: Mauro Carvalho Chehab
    Cc: Robert Elliott
    Link: http://lkml.kernel.org/r/1443121564.25474.160.camel@hpe.com
    Signed-off-by: Borislav Petkov

    Toshi Kani
     
  • After

    7d375bffa524 ("sb_edac: Fix support for systems with two home agents per socket")

    sysfs "dimm_label" and "chX_dimm_label" show their label string without a
    newline "\n" at the end.

    [root@orange ~]# cat /sys/bus/mc0/devices/dimm0/dimm_label
    CPU_SrcID#0_Ha#0_Chan#0_DIMM#0[root@orange ~]#

    [root@orange ~]# cat /sys/devices/system/edac/mc/mc0/csrow0/ch0_dimm_label
    CPU_SrcID#0_Ha#0_Chan#0_DIMM#0[root@orange ~]#

    The label strings now have 31 characters, which are the same as
    EDAC_MC_LABEL_LEN. Since the snprintf()s in channel_dimm_label_show()
    and dimmdev_label_show() limit the whole length by EDAC_MC_LABEL_LEN,
    the newline in the format "%s\n" is ignored.

    [root@orange ~]# od -bc /sys/bus/mc0/devices/dimm0/dimm_label
    0000000 103 120 125 137 123 162 143 111 104 043 060 137 110 141 043 060
    C P U _ S r c I D # 0 _ H a # 0
    0000020 137 103 150 141 156 043 060 137 104 111 115 115 043 060 000
    _ C h a n # 0 _ D I M M # 0 \0
    0000037

    Fix it by using 'sizeof(dimm->label) + 1' as the whole length in the
    snprintf()s in channel_dimm_label_show() and dimmdev_label_show().

    Reported-by: Robert Elliott
    Signed-off-by: Toshi Kani
    Acked-by: Tony Luck
    Cc: linux-edac
    Cc: Mauro Carvalho Chehab
    Link: http://lkml.kernel.org/r/1442933883-21587-2-git-send-email-toshi.kani@hpe.com
    Signed-off-by: Borislav Petkov

    Toshi Kani
     

25 Sep, 2015

4 commits

  • Add support for the SoC component.

    Signed-off-by: Loc Ho
    Cc: Arnd Bergmann
    Cc: devicetree@vger.kernel.org
    Cc: ijc+devicetree@hellion.org.uk
    Cc: jcm@redhat.com
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: linux-edac
    Cc: mark.rutland@arm.com
    Cc: Mauro Carvalho Chehab
    Cc: patches@apm.com
    Cc: robh+dt@kernel.org
    Link: http://lkml.kernel.org/r/1443055261-8613-4-git-send-email-lho@apm.com
    Signed-off-by: Borislav Petkov

    Loc Ho
     
  • Replace sprintf() with snprintf() to avoid possible string array
    overflow.

    Signed-off-by: Loc Ho
    Cc: Arnd Bergmann
    Cc: devicetree@vger.kernel.org
    Cc: ijc+devicetree@hellion.org.uk
    Cc: jcm@redhat.com
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: linux-edac
    Cc: mark.rutland@arm.com
    Cc: Mauro Carvalho Chehab
    Cc: patches@apm.com
    Cc: robh+dt@kernel.org
    Link: http://lkml.kernel.org/r/1443116287-11752-1-git-send-email-lho@apm.com
    Signed-off-by: Borislav Petkov

    Loc Ho
     
  • Add EDAC support for the L3 component.

    Signed-off-by: Loc Ho
    Cc: Arnd Bergmann
    Cc: devicetree@vger.kernel.org
    Cc: ijc+devicetree@hellion.org.uk
    Cc: jcm@redhat.com
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: linux-edac
    Cc: mark.rutland@arm.com
    Cc: Mauro Carvalho Chehab
    Cc: patches@apm.com
    Cc: robh+dt@kernel.org
    Link: http://lkml.kernel.org/r/1443055261-8613-3-git-send-email-lho@apm.com
    Signed-off-by: Borislav Petkov

    Loc Ho
     
  • In commit

    7d375bffa524 ("sb_edac: Fix support for systems with two home agents per socket")

    NUM_CHANNELS was changed to 8 and the channel space was renumerated to
    handle EN, EP, and EX configurations.

    The *_mci_bind_devs() functions - except for sbridge_mci_bind_devs() -
    got a new device presence check in the form of saw_chan_mask. However,
    sbridge_mci_bind_devs() still uses the NUM_CHANNELS for loop.

    With the increase in NUM_CHANNELS, this loop fails at index 4 since
    SB only has 4 TADs. This results in the following error on SB machines:

    EDAC sbridge: Some needed devices are missing
    EDAC sbridge: Couldn't find mci handler
    EDAC sbridge: Couldn't find mci handle

    This patch adapts the saw_chan_mask logic for sbridge_mci_bind_devs() as
    well.

    After this patch:

    EDAC MC0: Giving out device to module sbridge_edac.c controller Sandy Bridge Socket#0: DEV 0000:3f:0e.0 (POLLED)
    EDAC MC1: Giving out device to module sbridge_edac.c controller Sandy Bridge Socket#1: DEV 0000:7f:0e.0 (POLLED)

    Signed-off-by: Seth Jennings
    Acked-by: Aristeu Rozanski
    Acked-by: Tony Luck
    Tested-by: Borislav Petkov
    Cc: # v4.2
    Cc: Mauro Carvalho Chehab
    Cc: linux-edac
    Link: http://lkml.kernel.org/r/1438798561-10180-1-git-send-email-sjenning@redhat.com
    Signed-off-by: Borislav Petkov

    Seth Jennings
     

23 Sep, 2015

5 commits


22 Sep, 2015

1 commit


12 Sep, 2015

1 commit


09 Sep, 2015

2 commits


02 Sep, 2015

1 commit


01 Sep, 2015

1 commit

  • Pull RAS updates from Ingo Molnar:
    "MCE handling updates, but also some generic drivers/edac/ changes to
    better organize the Kconfig space"

    * 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/ras: Move AMD MCE injector to arch/x86/ras/
    x86/mce: Add a wrapper around mce_log() for injection
    x86/mce: Rename rcu_dereference_check_mce() to mce_log_get_idx_check()
    RAS: Add a menuconfig option with descriptive text
    x86/mce: Reenable CMCI banks when swiching back to interrupt mode
    x86/mce: Clear Local MCE opt-in before kexec
    x86/mce: Remove unused function declarations
    x86/mce: Kill drain_mcelog_buffer()
    x86/mce: Avoid potential deadlock due to printk() in MCE context
    x86/mce: Remove the MCE ring for Action Optional errors
    x86/mce: Don't use percpu workqueues
    x86/mce: Provide a lockless memory pool to save error records
    x86/mce: Reuse one of the u16 padding fields in 'struct mce'

    Linus Torvalds
     

13 Aug, 2015

4 commits

  • This is an x86-specific module and would benefit from being
    closer to the arch code. Move it there. Update copyright while
    at it.

    Signed-off-by: Borislav Petkov
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Link: http://lkml.kernel.org/r/1439396985-12812-14-git-send-email-bp@alien8.de
    Signed-off-by: Ingo Molnar

    Borislav Petkov
     
  • This used to flush out MCEs logged during early boot and which
    were in the MCA registers from a previous system run. No need
    for that now, since we've moved to a genpool.

    Suggested-by: Tony Luck
    Signed-off-by: Borislav Petkov
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1439396985-12812-7-git-send-email-bp@alien8.de
    Signed-off-by: Ingo Molnar

    Borislav Petkov
     
  • Use unified genpool to save Action Optional error events and put
    Action Optional error handling in the same notification chain as
    MCE error decoding.

    Signed-off-by: Chen, Gong
    [ Fold in subsequent patch from Boris for early boot logging. ]
    Signed-off-by: Tony Luck
    [ Correct a lot. ]
    Signed-off-by: Borislav Petkov
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1439396985-12812-5-git-send-email-bp@alien8.de
    Signed-off-by: Ingo Molnar

    Chen, Gong
     
  • The commit

    de3910eb79ac ("edac: change the mem allocation scheme to
    make Documentation/kobject.txt happy")

    changed the memory allocation for the csrows member. But ppc4xx_edac was
    forgotten in the patch. Fix it.

    Signed-off-by: Michael Walle
    Cc:
    Cc: linux-edac
    Cc: Mauro Carvalho Chehab
    Link: http://lkml.kernel.org/r/1437469253-8611-1-git-send-email-michael@walle.cc
    Signed-off-by: Borislav Petkov

    Michael Walle
     

14 Jul, 2015

1 commit

  • Currently, when decoding an MCE, we display 'CE' for a Deferred error, like
    this:

    [Hardware Error]: CPU:0 (15:2:0) MC4_STATUS[Over|CE|MiscV|-|AddrV|Deferred|-|UECC]: 0xdc04b00095080813

    When the 'UC' bit in the MCx_STATUS register is clear, the error status
    is either a Corrected error or Deferred error as determined by the
    'Deferred' bit. So do not print 'CE' on a deferred error.

    Refer to AMD Error Scope Hierarchy table in a newer BKDG (example:
    49125_15h_Models_30h-3Fh_BKDG.pdf, section "RAS Features").

    Signed-off-by: Aravind Gopalakrishnan
    Cc: Mauro Carvalho Chehab
    Cc: linux-edac
    Link: http://lkml.kernel.org/r/1436788382-6463-1-git-send-email-aravind.gopalakrishnan@amd.com
    Signed-off-by: Borislav Petkov

    Aravind Gopalakrishnan
     

10 Jul, 2015

1 commit


04 Jul, 2015

1 commit


02 Jul, 2015

1 commit

  • Commit

    debe6a623d3c ("MIPS: OCTEON: Update octeon-model.h code for new SoCs.")

    renamed some SoC model helper functions, but forgot to update the EDAC
    drivers resulting in build failures. Fix that.

    Cc: stable@vger.kernel.org # v4.0+
    Signed-off-by: Aaro Koskinen
    Acked-by: David Daney
    Cc: Mauro Carvalho Chehab
    Cc: Ralf Baechle
    Cc: linux-edac
    Cc: linux-mips@linux-mips.org
    Link: http://lkml.kernel.org/r/1435747132-10954-1-git-send-email-aaro.koskinen@nokia.com
    Signed-off-by: Borislav Petkov

    Aaro Koskinen
     

27 Jun, 2015

1 commit

  • Pull driver core updates from Greg KH:
    "Here is the driver core / firmware changes for 4.2-rc1.

    A number of small changes all over the place in the driver core, and
    in the firmware subsystem. Nothing really major, full details in the
    shortlog. Some of it is a bit of churn, given that the platform
    driver probing changes was found to not work well, so they were
    reverted.

    All of these have been in linux-next for a while with no reported
    issues"

    * tag 'driver-core-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (31 commits)
    Revert "base/platform: Only insert MEM and IO resources"
    Revert "base/platform: Continue on insert_resource() error"
    Revert "of/platform: Use platform_device interface"
    Revert "base/platform: Remove code duplication"
    firmware: add missing kfree for work on async call
    fs: sysfs: don't pass count == 0 to bin file readers
    base:dd - Fix for typo in comment to function driver_deferred_probe_trigger().
    base/platform: Remove code duplication
    of/platform: Use platform_device interface
    base/platform: Continue on insert_resource() error
    base/platform: Only insert MEM and IO resources
    firmware: use const for remaining firmware names
    firmware: fix possible use after free on name on asynchronous request
    firmware: check for file truncation on direct firmware loading
    firmware: fix __getname() missing failure check
    drivers: of/base: move of_init to driver_init
    drivers/base: cacheinfo: fix annoying typo when DT nodes are absent
    sysfs: disambiguate between "error code" and "failure" in comments
    driver-core: fix build for !CONFIG_MODULES
    driver-core: make __device_attach() static
    ...

    Linus Torvalds
     

26 Jun, 2015

1 commit

  • Pull edac updates from Mauro Carvalho Chehab:
    "Some fixes and additions to the EDAC driver used on modern Intel x86
    CPUs. It includes support for Broadwell EP/EX platforms and fixes for
    motherboards with more than 2 CPU sockets"

    * tag 'edac/v4.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac:
    sb_edac: support for Broadwell -EP and -EX
    sb_edac: Fix support for systems with two home agents per socket
    sb_edac: Fix a typo and a thinko in address handling for Haswell
    EDAC: Remove arbitrary limit on number of channels

    Linus Torvalds
     

25 Jun, 2015

2 commits