26 Jul, 2008

14 commits

  • Convert PCI err device from platform to open firmware of_dev to comply
    with powerpc schemes.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Dave Jiang
    Signed-off-by: Doug Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Jiang
     
  • Fixup of missing bit 0 on 64360 PCIx_ERR_MASK and errata FEr-#11 and
    FEr-#16 for the 64460. Bit 0 must remain 0.

    Signed-off-by: Dave Jiang
    Signed-off-by: Doug Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Jiang
     
  • Update get_property() call to use of_get_property() in order to fix compile

    Signed-off-by: Dave Jiang
    Signed-off-by: Doug Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Jiang
     
  • This module harvests more than just memory errors, it also harvests
    various bus and dma errors that the Chipset detects. Previously, it would
    report all such errors, which would cause output to be TOO loud.

    This patches therefore adds a parameter which is used to turn off
    NON-MEMORY error reports by default. Or the reporting can be enabled via
    the parameter

    Also did code style cleanup: less than 80 characters per line rule

    Signed-off-by: Doug Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Doug Thompson
     
  • The channel DIMM label does not seem to be used much in the edac code.
    However, where it is used (in the core code), it is assumed to not have a
    newline embedded. This leaves the sysfs file newline free which looks
    funny when cat'ing it. Here we just add the trailing newline to the sysfs
    chX_dimm_label output...

    [Doug Thompson note: the DIMM label is one of the primary uses of EDAC.
    User space daemon scripts, edac-utils@sourceforge, populate the DIMM label
    fields, via /sys/devices/system/edac attributes, with the silk screen
    labels of the motherboard in use. dmidecode access BIOS tables, but BIOS
    tables are well known to be incorrect and useless in these respects.
    edac-utils will strip off any newlines before its use of the output, when
    displaying DIMM slot silk screen labels.

    Signed-off-by: Arthur Jones
    Signed-off-by: Doug Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arthur Jones
     
  • Static kobjects and ksets are not supported in Linux kernel. Convert the
    mc_kset from static to dynamic. This patch depends on my previous patch
    to remove the module parameter attributes from mc...

    Signed-off-by: Arthur Jones
    Signed-off-by: Doug Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arthur Jones
     
  • /sys/devices/system/edac/mc has a few files which are duplicated in
    /sys/module/edac_core/parameters. Now that all the functionality is
    duplicated between these two locations, we remove the former kobject
    attributes and update the documentation.

    Signed-off-by: Arthur Jones
    Signed-off-by: Doug Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arthur Jones
     
  • When updating the edac_mc_poll_msec module parameter from the sysfs
    /sys/module/edac_core/parameters/edac_mc_poll_msec file, we don't update
    the workq timers. So that, if we move from a big poll time to a small
    one, the small one won't take effect until the big one has timed out.

    Here we provide a new module parameter set method to call out to the
    update routine. This brings the /sys/module/edac_core/parameters
    functionality up to that provided by the /sys/drivers/system/edac/mc sysfs
    module parameter files so that we can remove them or at least link to the
    /sys/module files...

    Signed-off-by: Arthur Jones
    Signed-off-by: Doug Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arthur Jones
     
  • Static kobjects are not supported in linux kernel. Convert the
    edac_pci_top_main_kobj from static to dynamic. This avoids the double
    free of the edac_pci_top_main_kobj.name that we see on module reload of
    the e752x edac driver (and probably others as well).

    In addition Greg KH has pointed out that this code may be
    cleaned up significantly. I will look at that as a follow-on patch, for
    now, I just want the minimum fix to get this double-free oops bug
    squashed...

    Many thanks to Greg KH for his patience in showing me what the
    Documentation/kobject.txt already said (oops)...

    Signed-off-by: Arthur Jones
    Signed-off-by: Doug Thompson
    Acked-by: Greg Kroah-Hartman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arthur Jones
     
  • Some code cleanliness issues found by Andrew Morton (thanks!) which should
    not affect functionality, but which should help make the code more
    maintainable.

    In particular, we now:

    * convert all #define's w/ a parameter to static inlines
    * use 1UL rather than 1ULL when calculating an unsigned long
    * use pci_disable_device

    The resulting code is tested and seems to work fine...

    Signed-off-by: Arthur Jones
    Cc: Doug Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arthur Jones
     
  • Explicitly unmask ECC errors we are interested in reporting.

    Signed-off-by: Arthur Jones
    Signed-off-by: Doug Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arthur Jones
     
  • It is possible that the BIOS did not enable ECC at boot time. We check
    for that case and fail to load if it is true.

    Signed-off-by: Arthur Jones
    Signed-off-by: Doug Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arthur Jones
     
  • The error mask we use to trigger ECC notifications is missing many bits of
    interest. We add these bits here so that all possible ECC errors can be
    reported.

    Signed-off-by: Arthur Jones
    Signed-off-by: Doug Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arthur Jones
     
  • Preliminary support for the Intel 5100 MCH. CE and UE errors are reported
    along with the current DIMM label information and other memory parameters.

    Reasons why this is preliminary:

    1) This chip has 2 independent memory controllers which, for best
    perforance, use interleaved accesses to the DDR2 memory. This
    architecture does not map very well to the current edac data structures
    which depend on symmetric channel access to the interleaved data.
    Without core changes, the best I could do for now is to map both memory
    controllers to different csrows (first all ranks of controller 0, then
    all ranks of controller 1). Someone much more familiar with the edac
    core than I will probably need to come up with a more general data
    structure to handle the interleaving and de-interleaving of the two
    memory controllers.

    2) I have not yet tackled the de-interleaving of the rank/controller
    address space into the physical address space of the CPU. There is
    nothing fundamentally missing, it is just ending up to be a lot of
    code, and I'd rather keep it separate for now, esp since it doesn't
    work yet...

    3) The code depends on a particular i5100 chip select to DIMM mainboard
    chip select mapping. This mapping seems obvious to me in order to
    support dual and single ranked memory, but it is not unique and DIMM
    labels could be wrong on other mainboards. There is no way to query
    this mapping that I know of.

    4) The code requires that the i5100 is in 32GB mode. Only 4 ranks per
    controller, 2 ranks per DIMM are supported. I do not have hardware
    (nor do I expect to have hardware anytime soon) for the 48GB (6 ranks
    per controller) mode.

    5) The serial presence detect code should be broken out into a "real"
    i2c driver so that decode-dimms.pl can work.

    Signed-off-by: Arthur Jones
    Signed-off-by: Doug Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arthur Jones
     

22 Jul, 2008

1 commit


25 May, 2008

1 commit

  • including of causes build problems since it doesn't exist.

    Also removed warning:
    drivers/edac/mpc85xx_edac.c:45: warning: 'mpc85xx_ctl_name' defined but not used

    Signed-off-by: Kumar Gala
    Acked-by: Doug Thompson
    Acked-by: Dave Jiang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kumar Gala
     

06 May, 2008

1 commit

  • Commit 06916639e2fed9ee475efef2747a1b7429f8fe76 ("driver-core: add
    dev_name() to help transition away from using bus_id") added a static
    inline dev_name() and used it in dev_printk.

    Unfortunately, drivers/edac/edac_core.h defines a macro called
    dev_name(). Rename the latter.

    Diagnosis by Tony Breeds and Michael Ellerman.

    Signed-off-by: Stephen Rothwell
    Acked-by: Doug Thompson
    Signed-off-by: Linus Torvalds

    Stephen Rothwell
     

30 Apr, 2008

1 commit

  • Commit c3c52bce6993c6d37af2c2de9b482a7013d646a7 ("edac: fix module
    initialization on several modules 2nd time") added a call to opstate_init
    but did not include linux/edac.h that declares it.

    Signed-off-by: Stephen Rothwell
    Acked-by: Olof Johansson
    Signed-off-by: Linus Torvalds

    Stephen Rothwell
     

29 Apr, 2008

5 commits

  • I implemented opstate_init() as a inline function in linux/edac.h.

    added calling opstate_init() to:
    i82443bxgx_edac.c
    i82860_edac.c
    i82875p_edac.c
    i82975x_edac.c

    I wrote a fixed patch of
    edac-fix-module-initialization-on-several-modules.patch,
    and tested building 2.6.25-rc7 with applying this. It was succeed.
    I think the patch is now correct.

    Cc: Alan Cox
    Signed-off-by: Hitoshi Mitake
    Signed-off-by: Doug Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hitoshi Mitake
     
  • Collection of patches, merged into one, from Adrian that do the following:

    1) This patch makes the following needlessly global functions static:
    - edac_pci_get_log_pe()
    - edac_pci_get_log_npe()
    - edac_pci_get_panic_on_pe()
    - edac_pci_unregister_sysfs_instance_kobj()
    - edac_pci_main_kobj_setup()

    2) Remove unneeded function edac_device_find()

    3) Added #if 0 around function edac_pci_find()

    4) make the needlessly global edac_pci_generic_check() static

    5) Removed function edac_check_mc_devices()

    Doug Thompson modified Adrian's patches, to bettern represent
    the direction of EDAC, and make them one patch.

    Cc: Alan Cox
    Signed-off-by: Adrian Bunk
    Signed-off-by: Doug Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • Signed-off-by: Robert P. J. Day
    Acked-by: Doug Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Robert P. J. Day
     
  • Add a module parameter "sysbus_parity" to allow forcing system bus parity
    error checking on or off. Also add support to automatically disable system
    bus parity errors for processors which do not support it.

    If the sysbus_parity parameter is specified, sysbus parity detection will be
    forced on or off. If it is not specified, the driver will attempt to look at
    the CPU identifier string and determine if the CPU supports system bus parity.
    A blacklist was used instead of a whitelist so that system bus parity would
    be enabled by default and to minimize the chances of breaking things for those
    people already using the driver which for some reason have a processor that
    does not have a valid CPU identifier string.

    [akpm@linux-foundation.org: coding-style fixes]
    Cc: Alan Cox
    Signed-off-by: Peter Tyser
    Signed-off-by: Doug Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Tyser
     
  • Add Intel 3100 chipset support to e752x EDAC driver.

    Cc: Alan Cox
    Signed-off-by: Andrei Konovalov
    Signed-off-by: Doug Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrei Konovalov
     

08 Feb, 2008

13 commits

  • By popular request, add a comment documenting the implicit type promotion
    here.

    Signed-off-by: Jason Uhlenkott
    Signed-off-by: Doug Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jason Uhlenkott
     
  • There is a missing sequence of initialization code during startup.

    Signed-off-by: Hitoshi Mitake
    Signed-off-by: Jason Uhlenkott
    Signed-off-by: Doug Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hitoshi Mitake
     
  • Made a previous global variable, static in scope

    Signed-off-by: Doug Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Doug Thompson
     
  • Modified to run on x86_64 as well as x86

    i3000_edac builds (and runs) fine on x86_64.

    Signed-off-by: Jason Uhlenkott
    Signed-off-by: Doug Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jason Uhlenkott
     
  • Using the EDAC code in kernel.org kernel version 2.6.23.8 I am seeing the
    following problem:

    In the kernel there is a pci device attribute located in sysfs that is
    checked by the EDAC PCI scanning code. If that attribute is set,
    PCI parity/error scannining is skipped for that device. The attribute
    is:

    broken_parity_status

    as is located in /sys/devices/pci/0000:XX:YY.Z directorys for
    PCI devices.

    I don't think this check was actually implemented. I have a misbehaved card
    that reports a parity error every 1000 ms:

    Nov 25 07:28:43 beta kernel: EDAC PCI: Master Data Parity Error on 0000:05:01.0
    Nov 25 07:28:44 beta kernel: EDAC PCI: Master Data Parity Error on 0000:05:01.0
    Nov 25 07:28:45 beta kernel: EDAC PCI: Master Data Parity Error on 0000:05:01.0

    Setting that card's broken_parity_status bit did not mask the error:

    echo "1" > /sys/bus/pci/devices/0000:05:01.0/broken_parity_status

    I looked through the EDAC code and did not readily see any reference to
    broken_parity_status at all (which makes sense based on the behavior I am
    seeing). I applied the following patch as a proof-of-concept and now EDAC's
    PCI parity error reporting behaves as documented:

    bryan

    Good regression find, bryan. It used to work. sigh.
    I added more logic to your patch, for more coverage of the error.

    Doug T

    Signed-off-by: Bryan Boatright
    Signed-off-by: Doug Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bryan Boatright
     
  • Marvell mv64x60 SoC support for EDAC. Used on PPC and MIPS platforms.
    Development and testing done on PPC Motorola prpmc2800 ATCA board.

    [akpm@linux-foundation.org: make mv64x60_ctl_name static]
    Signed-off-by: Dave Jiang
    Cc: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Jiang
     
  • EDAC chip driver support for Freescale MPC85xx platforms. PPC based.

    Signed-off-by: Dave Jiang
    Cc: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Jiang
     
  • Replace function-like macros with functions.

    Signed-off-by: Jason Uhlenkott
    Cc: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jason Uhlenkott
     
  • Style cleanup, mostly just 80-column fixes.

    Signed-off-by: Jason Uhlenkott
    Cc: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jason Uhlenkott
     
  • Adds driver for the Cell memory controller when used without a Hypervisor such
    as on the IBM Cell blades. There might still be some improvements to do to
    this such as finding if it's possible to properly obtain more details about
    the address of the error but it's good enough already to report CE counts
    which is our main priority at the moment.

    Signed-off-by: Benjamin Herrenschmidt
    Cc: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Benjamin Herrenschmidt
     
  • Add the definitions for the Rambus XDR memory type used by the Cell processor.
    It's a pre-requisite for the followup Cell EDAC patch.

    Signed-off-by: Benjamin Herrenschmidt
    Cc: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Benjamin Herrenschmidt
     
  • When rounding a relative timeout we need to use round_jiffies_relative().

    Signed-off-by: Anton Blanchard
    Acked-by: Arjan van de Ven
    Cc: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Anton Blanchard
     
  • ENABLE the 'logging' of CE and UE events for the EDAC_DEVICE class of error
    harvester in EDAC

    Cc: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Doug Thompson
     

03 Feb, 2008

1 commit


31 Jan, 2008

1 commit


25 Jan, 2008

2 commits