17 Dec, 2009

2 commits


16 Dec, 2009

3 commits

  • Add support for 6 ranks per channel to the i5100 chipset. I have tested
    the patch as far as possible with correctible errors and things appear
    good. The DIMM mapping is correct for our board, but boards may differ.

    Signed-off-by: Nils Carlson
    Acked-by: Arthur Jones
    Signed-off-by: Doug Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nils Carlson
     
  • Addscrubbing to the i5100 chipset. The i5100 chipset only supports one
    scrubbing rate, which is not constant but dependent on memory load. The
    rate returned by this driver is an estimate based on some experimentation,
    but is substantially closer to the truth than the speed supplied in the
    documentation.

    Also, scrubbing is done once, and then a done-bit is set. This means that
    to accomplish continuous scrubbing a re-enabling mechanism must be used.
    I have created the simplest possible such mechanism in the form of a
    work-queue which will check every five minutes. This interval is quite
    arbitrary but should be sufficient for all sizes of system memory.

    Signed-off-by: Nils Carlson
    Signed-off-by: Doug Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nils Carlson
     
  • The i5100 driver uses the word controller instead of channel in a lot of
    places, this is simply a cleanup of the patch.

    Signed-off-by: Nils Carlson
    Signed-off-by: Doug Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nils Carlson
     

15 Dec, 2009

1 commit


12 Dec, 2009

1 commit

  • The current rd/wrmsr_on_cpus helpers assume that the supplied
    cpumasks are contiguous. However, there are machines out there
    like some K8 multinode Opterons which have a non-contiguous core
    enumeration on each node (e.g. cores 0,2 on node 0 instead of 0,1), see
    http://www.gossamer-threads.com/lists/linux/kernel/1160268.

    This patch fixes out-of-bounds writes (see URL above) by adding per-CPU
    msr structs which are used on the respective cores.

    Additionally, two helpers, msrs_{alloc,free}, are provided for use by
    the callers of the MSR accessors.

    Cc: H. Peter Anvin
    Cc: Mauro Carvalho Chehab
    Cc: Aristeu Rozanski
    Cc: Randy Dunlap
    Cc: Doug Thompson
    Signed-off-by: Borislav Petkov
    LKML-Reference:
    Signed-off-by: H. Peter Anvin

    Borislav Petkov
     

08 Dec, 2009

21 commits


04 Dec, 2009

1 commit


04 Nov, 2009

2 commits


29 Oct, 2009

3 commits

  • Allow csrows to properly initialize when the topology only has active
    channels on 2 and 3. This new check allows proper detection and
    initialization in this topology. Only checking the first mrt that
    represented channels 0 and 1 is not sufficient.

    I also fixed up the related debug information path. I can submit as a 2nd
    patch if needed.

    Signed-off-by: Keith Mannthey
    Acked-by: Aristeu Rozanski
    Signed-off-by: Doug Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Keith Mannthey
     
  • When building without CONFIG_PCI the edac_pci_idx variable is unused,
    causing a build-time warning. Wrap the variable in #ifdef CONFIG_PCI,
    just like the rest of the PCI support.

    Signed-off-by: Ira W. Snyder
    Signed-off-by: Doug Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ira W. Snyder
     
  • The i5400 EDAC driver has several bugs with chip-select row computation
    which most likely lead to bugs in detailed error reporting. Attempts to
    contact the authors have gone mostly unanswered so I am presenting my diff
    here. I do not subscribe to lkml and would appreciate being kept in the
    cc.

    The most egregious problem was miscalculating the addresses of MTR
    registers after register 0 by assuming they are 32bit rather than 16.
    This caused the driver to miss half of the memories. Most motherboards
    tend to have only 8 dimm slots and not 16, so this may not have been
    noticed before.

    Further, the row calculations multiplied the number of dimms several
    times, ultimately ending up with a maximum row of 32. The chipset only
    supports 4 dimms in each of 4 channels, so csrow could not be higher than
    4 unless you use a row per-rank with dual-rank dimms. I opted to
    eliminate this behavior as it is confusing to the user and the error
    reporting works by slot and not rank. This gives a much clearer view of
    memory by slot and channel in /sys.

    Signed-off-by: Jeff Roberson
    Signed-off-by: Doug Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeff Roberson
     

17 Oct, 2009

1 commit


12 Oct, 2009

1 commit

  • Add an atomic notifier which ensures proper locking when conveying
    MCE info to EDAC for decoding. The actual notifier call overrides a
    default, negative priority notifier.

    Note: make sure we register the default decoder only once since
    mcheck_init() runs on each CPU.

    Signed-off-by: Borislav Petkov
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Borislav Petkov
     

09 Oct, 2009

1 commit


07 Oct, 2009

3 commits

  • When injecting DRAM ECC errors (F3xBC_x8), EccVector[15:0] is a bitmask
    of which bits should be error injected when written to and holds the
    payload of 16-bit DRAM word when read, respectively.

    Add /sysfs members to show the DRAM ECC section/word/vector.

    Fail wrong injection values entered over /sysfs instead of truncating
    them.

    Signed-off-by: Borislav Petkov

    Borislav Petkov
     
  • On Fam10h and above, F1x[1, 0][7C:40] are DRAM Base/Limit registers
    which specify the destination node of a DRAM address. Those address
    boundaries are being extracted into ->dram_base[] and ->dram_limit[].
    Correct the extraction masks to match the respective address bits.

    Signed-off-by: Borislav Petkov

    Borislav Petkov
     
  • Different processor families support a different number of chip selects.
    Handle this in a family-dependent way with the proper values assigned at
    init time (see amd64_set_dct_base_and_mask).

    Remove _DCSM_COUNT defines since they're used at one place and originate
    from public documentation.

    CC: Keith Mannthey
    Signed-off-by: Borislav Petkov

    Borislav Petkov