17 Jul, 2017

1 commit

  • It is a write-only variable so get rid of it.

    Signed-off-by: Borislav Petkov
    Acked-by: Robert Richter
    Acked-by: Michal Simek
    Acked-by: Thor Thayer
    Acked-by: Tony Luck
    Cc: Mark Gross
    Cc: Tim Small
    Cc: Ranganathan Desikan
    Cc: "Arvind R."
    Cc: Jason Baron
    Cc: "Sören Brinkmann"
    Cc: Ralf Baechle
    Cc: David Daney
    Cc: Loc Ho
    Cc: linux-edac@vger.kernel.org
    Cc: linux-kernel@vger.kernel.org
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: linux-mips@linux-mips.org

    Borislav Petkov
     

10 Apr, 2017

5 commits

  • Change them to have the edac_ prefix.

    No functionality change.

    Signed-off-by: Borislav Petkov

    Borislav Petkov
     
  • Move the remaining functionality to edac_mc.c. Convert "edac_report=" to
    a module parameter.

    Signed-off-by: Borislav Petkov

    Borislav Petkov
     
  • ... and the glue around it. It is not needed anymore.

    Signed-off-by: Borislav Petkov

    Borislav Petkov
     
  • Use mc_devices list instead to check whether we have EDAC driver
    instances successfully registered with EDAC core.

    Signed-off-by: Borislav Petkov

    Borislav Petkov
     
  • Apparently, some machines used to report DRAM errors through a PCI SERR
    NMI. This is why we have a call into EDAC in the NMI handler. See

    c0d121720220 ("drivers/edac: add new nmi rescan").

    From looking at the patch above, that's two drivers: e752x_edac.c and
    e7xxx_edac.c. Now, I wanna say those are old machines which are probably
    decommissioned already.

    Tony says that "[t]the newest CPU supported by either of those drivers
    is the Xeon E7520 (a.k.a. "Nehalem") released in Q1'2010. Possibly some
    folks are still using these ... but people that hold onto h/w for 7
    years generally cling to old s/w too ... so I'd guess it unlikely that
    we will get complaints for breaking these in upstream."

    So even if there is a small number still in use, we did load EDAC with
    edac_op_state == EDAC_OPSTATE_POLL by default (we still do, in fact)
    which means a default EDAC setup without any parameters supplied on the
    command line or otherwise would never even log the error in the NMI
    handler because we're polling by default:

    inline int edac_handler_set(void)
    {
    if (edac_op_state == EDAC_OPSTATE_POLL)
    return 0;

    return atomic_read(&edac_handlers);
    }

    So, long story short, I'd like to get rid of that nastiness called
    edac_stub.c and confine all the EDAC drivers solely to drivers/edac/. If
    we ever have to do stuff like that again, it should be notifiers we're
    using and not some insanity like this one.

    Signed-off-by: Borislav Petkov
    Acked-by: Thomas Gleixner
    Cc: Tony Luck

    Borislav Petkov
     

06 Jan, 2017

1 commit


15 Dec, 2016

4 commits


21 Nov, 2016

2 commits

  • Currently, deferred errors are classified as correctable in EDAC. Add a
    new error type for deferred errors so that they are correctly reported
    to the user.

    Signed-off-by: Yazen Ghannam
    Cc: Aravind Gopalakrishnan
    Cc: linux-edac
    Link: http://lkml.kernel.org/r/1479423463-8536-7-git-send-email-Yazen.Ghannam@amd.com
    Signed-off-by: Borislav Petkov

    Yazen Ghannam
     
  • AMD Fam17h systems can support Load-Reduced DDR4 DIMMs. So add this new
    type to edac.h in preparation for the Fam17h EDAC update. Also, let's
    fix a format issue with the LRDDR3 line while we're here.

    Signed-off-by: Yazen Ghannam
    Cc: Aravind Gopalakrishnan
    Cc: linux-edac
    Link: http://lkml.kernel.org/r/1479423463-8536-3-git-send-email-Yazen.Ghannam@amd.com
    Signed-off-by: Borislav Petkov

    Yazen Ghannam
     

11 Dec, 2015

2 commits


03 Dec, 2015

1 commit

  • Make EDAC aware of DDR4/RDDR4 mem types.

    Signed-off-by: Jim Snow
    Cc: Mauro Carvalho Chehab
    Cc: linux-edac
    Cc: lukasz.anaczkowski@intel.com
    Link: http://lkml.kernel.org/r/1449136134-23706-2-git-send-email-hubert.chrzaniuk@intel.com
    [ Rebase to 4.4-rc3. ]
    Signed-off-by: Hubert Chrzaniuk
    Signed-off-by: Borislav Petkov

    Jim Snow
     

05 Nov, 2015

1 commit

  • Pull driver core updates from Greg KH:
    "Here's the "big" driver core updates for 4.4-rc1. Primarily a bunch
    of debugfs updates, with a smattering of minor driver core fixes and
    updates as well.

    All have been in linux-next for a long time"

    * tag 'driver-core-4.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
    debugfs: Add debugfs_create_ulong()
    of: to support binding numa node to specified device in devicetree
    debugfs: Add read-only/write-only bool file ops
    debugfs: Add read-only/write-only size_t file ops
    debugfs: Add read-only/write-only x64 file ops
    debugfs: Consolidate file mode checks in debugfs_create_*()
    Revert "mm: Check if section present during memory block (un)registering"
    driver-core: platform: Provide helpers for multi-driver modules
    mm: Check if section present during memory block (un)registering
    devres: fix a for loop bounds check
    CMA: fix CONFIG_CMA_SIZE_MBYTES overflow in 64bit
    base/platform: assert that dev_pm_domain callbacks are called unconditionally
    sysfs: correctly handle short reads on PREALLOC attrs.
    base: soc: siplify ida usage
    kobject: move EXPORT_SYMBOL() macros next to corresponding definitions
    kobject: explain what kobject's sd field is
    debugfs: document that debugfs_remove*() accepts NULL and error values
    debugfs: Pass bool pointer to debugfs_create_bool()
    ACPI / EC: Fix broken 64bit big-endian users of 'global_lock'

    Linus Torvalds
     

04 Oct, 2015

1 commit

  • Its a bit odd that debugfs_create_bool() takes 'u32 *' as an argument,
    when all it needs is a boolean pointer.

    It would be better to update this API to make it accept 'bool *'
    instead, as that will make it more consistent and often more convenient.
    Over that bool takes just a byte.

    That required updates to all user sites as well, in the same commit
    updating the API. regmap core was also using
    debugfs_{read|write}_file_bool(), directly and variable types were
    updated for that to be bool as well.

    Signed-off-by: Viresh Kumar
    Acked-by: Mark Brown
    Acked-by: Charles Keepax
    Signed-off-by: Greg Kroah-Hartman

    Viresh Kumar
     

22 Sep, 2015

1 commit


20 Oct, 2014

1 commit


27 Jun, 2014

1 commit


12 Dec, 2013

1 commit

  • This new parameter is used to control how to report HW error reporting,
    especially for newer Intel platform, like Ivybridge-EX, which contains
    an enhanced error decoding functionality in the firmware, i.e. eMCA.

    Signed-off-by: Chen, Gong
    Acked-by: Tony Luck
    Link: http://lkml.kernel.org/r/1386310630-12529-2-git-send-email-gong.chen@linux.intel.com
    [ Boris: massage commit message. ]
    Signed-off-by: Borislav Petkov

    Chen, Gong
     

24 Oct, 2013

1 commit


24 Jul, 2013

1 commit

  • Fix the following:

    BUG: key ffff88043bdd0330 not in .data!
    ------------[ cut here ]------------
    WARNING: at kernel/lockdep.c:2987 lockdep_init_map+0x565/0x5a0()
    DEBUG_LOCKS_WARN_ON(1)
    Modules linked in: glue_helper sb_edac(+) edac_core snd acpi_cpufreq lrw gf128mul ablk_helper iTCO_wdt evdev i2c_i801 dcdbas button cryptd pcspkr iTCO_vendor_support usb_common lpc_ich mfd_core soundcore mperf processor microcode
    CPU: 2 PID: 599 Comm: modprobe Not tainted 3.10.0 #1
    Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A08 01/24/2013
    0000000000000009 ffff880439a1d920 ffffffff8160a9a9 ffff880439a1d958
    ffffffff8103d9e0 ffff88043af4a510 ffffffff81a16e11 0000000000000000
    ffff88043bdd0330 0000000000000000 ffff880439a1d9b8 ffffffff8103dacc
    Call Trace:
    dump_stack
    warn_slowpath_common
    warn_slowpath_fmt
    lockdep_init_map
    ? trace_hardirqs_on_caller
    ? trace_hardirqs_on
    debug_mutex_init
    __mutex_init
    bus_register
    edac_create_sysfs_mci_device
    edac_mc_add_mc
    sbridge_probe
    pci_device_probe
    driver_probe_device
    __driver_attach
    ? driver_probe_device
    bus_for_each_dev
    driver_attach
    bus_add_driver
    driver_register
    __pci_register_driver
    ? 0xffffffffa0010fff
    sbridge_init
    ? 0xffffffffa0010fff
    do_one_initcall
    load_module
    ? unset_module_init_ro_nx
    SyS_init_module
    tracesys
    ---[ end trace d24a70b0d3ddf733 ]---
    EDAC MC0: Giving out device to 'sbridge_edac.c' 'Sandy Bridge Socket#0': DEV 0000:3f:0e.0
    EDAC sbridge: Driver loaded.

    What happens is that bus_register needs a statically allocated lock_key
    because the last is handed in to lockdep. However, struct mem_ctl_info
    embeds struct bus_type (the whole struct, not a pointer to it) and the
    whole thing gets dynamically allocated.

    Fix this by using a statically allocated struct bus_type for the MC bus.

    Signed-off-by: Borislav Petkov
    Acked-by: Mauro Carvalho Chehab
    Cc: Markus Trippelsdorf
    Cc: stable@kernel.org # v3.10
    Signed-off-by: Tony Luck

    Borislav Petkov
     

16 Mar, 2013

2 commits

  • Both mci.mem_is_per_rank and mci.csbased denote the same thing: the
    memory controller is csrows based. Merge both fields into one.

    There's no need for the driver to actually fill it, as the core detects
    it by checking if one of the layers has the csrows type as part of the
    memory hierarchy:

    if (layers[i].type == EDAC_MC_LAYER_CHIP_SELECT)
    per_rank = true;

    Signed-off-by: Mauro Carvalho Chehab
    Signed-off-by: Borislav Petkov

    Mauro Carvalho Chehab
     
  • We were filling the csrow size with a wrong value. 16a528ee3975 ("EDAC:
    Fix csrow size reported in sysfs") tried to address the issue. It fixed
    the report with the old API but not with the new one. Correct it for the
    new API too.

    Signed-off-by: Mauro Carvalho Chehab
    [ make it a per-csrow accounting regardless of ->channel_count ]
    Signed-off-by: Borislav Petkov

    Mauro Carvalho Chehab
     

22 Feb, 2013

2 commits


21 Feb, 2013

2 commits


21 Dec, 2012

1 commit


28 Nov, 2012

2 commits


27 Jun, 2012

1 commit


12 Jun, 2012

4 commits

  • Kernel kobjects have rigid rules: each container object should be
    dynamically allocated, and can't be allocated into a single kmalloc.

    EDAC never obeyed this rule: it has a single malloc function that
    allocates all needed data into a single kzalloc.

    As this is not accepted anymore, change the allocation schema of the
    EDAC *_info structs to enforce this Kernel standard.

    Acked-by: Chris Metcalf
    Cc: Aristeu Rozanski
    Cc: Doug Thompson
    Cc: Greg K H
    Cc: Borislav Petkov
    Cc: Mark Gross
    Cc: Tim Small
    Cc: Ranganathan Desikan
    Cc: "Arvind R."
    Cc: Olof Johansson
    Cc: Egor Martovetsky
    Cc: Michal Marek
    Cc: Jiri Kosina
    Cc: Dmitry Eremin-Solenikov
    Cc: Benjamin Herrenschmidt
    Cc: Hitoshi Mitake
    Cc: Andrew Morton
    Cc: Shaohui Xie
    Cc: linuxppc-dev@lists.ozlabs.org
    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     
  • Sometimes, it is useful to have a mechanism that generates fake
    errors, in order to test the EDAC core code, and the userspace
    tools.

    Provide such mechanism by adding a few debugfs nodes.

    Reviewed-by: Aristeu Rozanski
    Cc: Doug Thompson
    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     
  • Now that al users for the old kobj raw access are gone,
    we can get rid of the legacy kobj-based structures and
    data.

    Reviewed-by: Aristeu Rozanski
    Cc: Doug Thompson
    Cc: Michal Marek
    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     
  • The EDAC subsystem uses the old struct sysdev approach,
    creating all nodes using the raw sysfs API. This is bad,
    as the API is deprecated.

    As we'll be changing the EDAC API, let's first port the existing
    code to struct device.

    There's one drawback on this patch: driver-specific sysfs
    nodes, used by mpc85xx_edac, amd64_edac and i7core_edac
    won't be created anymore. While it would be possible to
    also port the device-specific code, that would mix kobj with
    struct device, with is not recommended. Also, it is easier and nicer
    to move the code to the drivers, instead, as the core can get rid
    of some complex logic that just emulates what the device_add()
    and device_create_file() already does.

    The next patches will convert the driver-specific code to use
    the device-specific calls. Then, the remaining bits of the old
    sysfs API will be removed.

    NOTE: a per-MC bus is required, otherwise devices with more than
    one memory controller will hit a bug like the one below:

    [ 819.094946] EDAC DEBUG: find_mci_by_dev: find_mci_by_dev()
    [ 819.094948] EDAC DEBUG: edac_create_sysfs_mci_device: edac_create_sysfs_mci_device() idx=1
    [ 819.094952] EDAC DEBUG: edac_create_sysfs_mci_device: edac_create_sysfs_mci_device(): creating device mc1
    [ 819.094967] EDAC DEBUG: edac_create_sysfs_mci_device: edac_create_sysfs_mci_device creating dimm0, located at channel 0 slot 0
    [ 819.094984] ------------[ cut here ]------------
    [ 819.100142] WARNING: at fs/sysfs/dir.c:481 sysfs_add_one+0xc1/0xf0()
    [ 819.107282] Hardware name: S2600CP
    [ 819.111078] sysfs: cannot create duplicate filename '/bus/edac/devices/dimm0'
    [ 819.119062] Modules linked in: sb_edac(+) edac_core ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables bridge stp llc sunrpc binfmt_misc dm_mirror dm_region_hash dm_log vhost_net macvtap macvlan tun kvm microcode pcspkr iTCO_wdt iTCO_vendor_support igb i2c_i801 i2c_core sg ioatdma dca sr_mod cdrom sd_mod crc_t10dif ahci libahci isci libsas libata scsi_transport_sas scsi_mod wmi dm_mod [last unloaded: scsi_wait_scan]
    [ 819.175748] Pid: 10902, comm: modprobe Not tainted 3.3.0-0.11.el7.v12.2.x86_64 #1
    [ 819.184113] Call Trace:
    [ 819.186868] [] warn_slowpath_common+0x7f/0xc0
    [ 819.193573] [] warn_slowpath_fmt+0x46/0x50
    [ 819.200000] [] sysfs_add_one+0xc1/0xf0
    [ 819.206025] [] sysfs_do_create_link+0x135/0x220
    [ 819.212944] [] ? sysfs_create_group+0x13/0x20
    [ 819.219656] [] sysfs_create_link+0x13/0x20
    [ 819.226109] [] bus_add_device+0xe6/0x1b0
    [ 819.232350] [] device_add+0x2db/0x460
    [ 819.238300] [] edac_create_dimm_object+0x84/0xf0 [edac_core]
    [ 819.246460] [] edac_create_sysfs_mci_device+0xe8/0x290 [edac_core]
    [ 819.255215] [] edac_mc_add_mc+0x5a/0x2c0 [edac_core]
    [ 819.262611] [] sbridge_register_mci+0x1bc/0x279 [sb_edac]
    [ 819.270493] [] sbridge_probe+0xef/0x175 [sb_edac]
    [ 819.277630] [] ? pm_runtime_enable+0x58/0x90
    [ 819.284268] [] local_pci_probe+0x5c/0xd0
    [ 819.290508] [] __pci_device_probe+0xf1/0x100
    [ 819.297117] [] pci_device_probe+0x3a/0x60
    [ 819.303457] [] really_probe+0x73/0x270
    [ 819.309496] [] driver_probe_device+0x4e/0xb0
    [ 819.316104] [] __driver_attach+0xab/0xb0
    [ 819.322337] [] ? driver_probe_device+0xb0/0xb0
    [ 819.329151] [] bus_for_each_dev+0x56/0x90
    [ 819.335489] [] driver_attach+0x1e/0x20
    [ 819.341534] [] bus_add_driver+0x1b0/0x2a0
    [ 819.347884] [] ? 0xffffffffa0346fff
    [ 819.353641] [] driver_register+0x76/0x140
    [ 819.359980] [] ? printk+0x51/0x53
    [ 819.365524] [] ? 0xffffffffa0346fff
    [ 819.371291] [] __pci_register_driver+0x56/0xd0
    [ 819.378096] [] sbridge_init+0x54/0x1000 [sb_edac]
    [ 819.385231] [] do_one_initcall+0x3f/0x170
    [ 819.391577] [] sys_init_module+0xbe/0x230
    [ 819.397926] [] system_call_fastpath+0x16/0x1b
    [ 819.404633] ---[ end trace 1654fdd39556689f ]---

    This happens because the bus is not being properly initialized.
    Instead of putting the memory sub-devices inside the memory controller,
    it is putting everything under the same directory:

    $ tree /sys/bus/edac/
    /sys/bus/edac/
    ├── devices
    │ ├── all_channel_counts -> ../../../devices/system/edac/mc/mc0/all_channel_counts
    │ ├── csrow0 -> ../../../devices/system/edac/mc/mc0/csrow0
    │ ├── csrow1 -> ../../../devices/system/edac/mc/mc0/csrow1
    │ ├── csrow2 -> ../../../devices/system/edac/mc/mc0/csrow2
    │ ├── dimm0 -> ../../../devices/system/edac/mc/mc0/dimm0
    │ ├── dimm1 -> ../../../devices/system/edac/mc/mc0/dimm1
    │ ├── dimm3 -> ../../../devices/system/edac/mc/mc0/dimm3
    │ ├── dimm6 -> ../../../devices/system/edac/mc/mc0/dimm6
    │ ├── inject_addrmatch -> ../../../devices/system/edac/mc/mc0/inject_addrmatch
    │ ├── mc -> ../../../devices/system/edac/mc
    │ └── mc0 -> ../../../devices/system/edac/mc/mc0
    ├── drivers
    ├── drivers_autoprobe
    ├── drivers_probe
    └── uevent

    On a multi-memory controller system, the names "csrow%d" and "dimm%d"
    should be under "mc%d", and not at the main hierarchy level.

    So, we need to create a per-MC bus, in order to have its own namespace.

    Reviewed-by: Aristeu Rozanski
    Cc: Doug Thompson
    Cc: Greg K H
    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     

11 Jun, 2012

2 commits

  • No functional changes. Just comment improvements.

    Reviewed-by: Aristeu Rozanski
    Cc: Doug Thompson
    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     
  • As EDAC doesn't use struct device itself, it created a parent dev
    pointer called as "pdev". Now that we'll be converting it to use
    struct device, instead of struct devsys, this needs to be fixed.

    No functional changes.

    Reviewed-by: Aristeu Rozanski
    Acked-by: Chris Metcalf
    Cc: Doug Thompson
    Cc: Borislav Petkov
    Cc: Mark Gross
    Cc: Jason Uhlenkott
    Cc: Tim Small
    Cc: Ranganathan Desikan
    Cc: "Arvind R."
    Cc: Olof Johansson
    Cc: Egor Martovetsky
    Cc: Michal Marek
    Cc: Jiri Kosina
    Cc: Joe Perches
    Cc: Dmitry Eremin-Solenikov
    Cc: Benjamin Herrenschmidt
    Cc: Hitoshi Mitake
    Cc: Andrew Morton
    Cc: "Niklas Söderlund"
    Cc: Shaohui Xie
    Cc: Josh Boyer
    Cc: linuxppc-dev@lists.ozlabs.org
    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab