22 Feb, 2013

2 commits


21 Feb, 2013

12 commits

  • APEI GHES and i7core_edac/sb_edac currently can be loaded at
    the same time, but those are Highlander modules:
    "There can be only one".

    There are two reasons for that:

    1) Each driver assumes that it is the only one registering at
    the EDAC core, as it is driver's responsibility to number
    the memory controllers, and all of them start from 0;

    2) If BIOS is handling the memory errors, the OS can't also be
    doing it, as one will mangle with the other.

    So, we need to add an module owner's lock at the EDAC core,
    in order to avoid having two different modules handling memory
    errors at the same time. The best way for doing this lock seems
    to use the driver's name, as this is unique, and won't require
    changes on every driver.

    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     
  • There are some cases where the memory controller layout is
    completely hidden. This is the case of firmware-driven error
    code, like the one provided by GHES. Add a new layer to be
    used on such memory error report mechanisms.

    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     
  • In order for it to work with it builtin, the EDAC core should
    be initialized earlier, otherwise the ghes_edac driver initializes
    before edac_mc_sysfs_init() being called:

    ...
    [ 4.998373] EDAC MC0: Giving out device to 'ghes_edac.c' 'ghes_edac': DEV ghes
    ...
    [ 4.998373] EDAC MC1: Giving out device to 'ghes_edac.c' 'ghes_edac': DEV ghes
    [ 6.519495] EDAC MC: Ver: 3.0.0
    [ 6.523749] EDAC DEBUG: edac_mc_sysfs_init: device mc created

    The net result is that no EDAC sysfs nodes will appear.

    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     
  • It is hard to find what's wrong without a proper error
    report. Improve it, in debug mode.

    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     
  • The last changeset introduced a few checkpatch warnings:

    WARNING: debugfs_remove_recursive(NULL) is safe this check is probably not required
    261: FILE: drivers/edac/i5100_edac.c:1207:
    + if (priv->debugfs)
    + debugfs_remove_recursive(priv->debugfs);

    WARNING: debugfs_remove(NULL) is safe this check is probably not required
    290: FILE: drivers/edac/i5100_edac.c:1250:
    + if (i5100_debugfs)
    + debugfs_remove(i5100_debugfs);

    Get rid of them.

    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     
  • Create a debugfs direcotry i5100_edac/mcX for each memory controller and
    add nodes to control how fault injection is preformed.

    After configuring an injection using inject_channel, inject_deviceptr1,
    inject_deviceptr2, inject_eccmask1, inject_eccmask2 and inject_hlinesel
    trigger the injection by writing anything to inject_enable.

    Example of a CE injection:

    echo 0 > /sys/kernel/debug/i5100_edac/mc0/inject_channel
    echo 1 > /sys/kernel/debug/i5100_edac/mc0/inject_hlinesel
    echo 61440 > /sys/kernel/debug/i5100_edac/mc0/inject_eccmask1
    echo 1 > /sys/kernel/debug/i5100_edac/mc0/inject_enable

    Example of UE injection:

    echo 0 > /sys/kernel/debug/i5100_edac/mc0/inject_channel
    echo 2 > /sys/kernel/debug/i5100_edac/mc0/inject_hlinesel
    echo 65535 > /sys/kernel/debug/i5100_edac/mc0/inject_eccmask1
    echo 65535 > /sys/kernel/debug/i5100_edac/mc0/inject_eccmask2
    echo 17 > /sys/kernel/debug/i5100_edac/mc0/inject_deviceptr1
    echo 0 > /sys/kernel/debug/i5100_edac/mc0/inject_deviceptr2
    echo 1 > /sys/kernel/debug/i5100_edac/mc0/inject_enable

    Sometimes it is needed to enable the injection more then once (echo to
    the inject_enable node) for the injection to happen, I am not sure why.

    Signed-off-by: Niklas Söderlund
    Signed-off-by: Mauro Carvalho Chehab

    Niklas Söderlund
     
  • Add fault injection based on information datasheet for i5100, see 1. In
    addition to the i5100 datasheet some missing information on injection
    functions where found through experimentation and the i7300 datasheet,
    see 2.

    [1] Intel 5100 Memory Controller Hub Chipset
    Doc.Nr: 318378
    http://www.intel.com/content/dam/doc/datasheet/5100-
    memory-controller-hub-chipset-datasheet.pdf

    [2] Intel 7300 Chipset MemoryController Hub (MCH)
    Doc.Nr: 318082
    http://www.intel.com/assets/pdf/datasheet/318082.pdf

    Signed-off-by: Niklas Söderlund
    Signed-off-by: Mauro Carvalho Chehab

    Niklas Söderlund
     
  • Probe and store the device handle for the device 19 function 0 during
    driver initialization. The device is used during fault injection.

    Signed-off-by: Niklas Söderlund
    Signed-off-by: Mauro Carvalho Chehab

    Niklas Söderlund
     
  • Currently, sdram_scrub_rate sysfs node is created even if the device
    doesn't support get/set the scub rate. Change the logic to only
    create this device node when the operation is supported.

    Reported-by: Felipe Balbi
    Acked-by: Borislav Petkov
    Reviewed-by: Felipe Balbi
    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     
  • After running a series of tests on an HP DL320, filled with different
    memory sizes, it was noticed that, when filled with just one DIMM
    on such hardware, the driver wrongly detects twice the memory, and
    thinks that both channels 0 and 1 are filled.

    It seems to be partially caused by the BIOS and partially by the driver.

    The i3200_edac current logic would be working fine if the BIOS were
    disabling the unused second channel when just one DIMM is connected,
    in order to do power-saving, as recommended on this chipset's datasheet.

    However, the BIOS on this particular machine doesn't do it:

    [ 16.741421] EDAC DEBUG: how_many_channels: In dual channel mode
    [ 16.741424] EDAC DEBUG: how_many_channels: 2 DIMMS per channel enabled

    So, the driver were assuming that 2 channels are enabled (well, they are,
    but the second is unused).

    Combined with that, I found two issues at the logic that creates the
    EDAC data, that were failing when the two channels are not equally
    filled (AFAICT, that happens only when just 1 DIMM is plugged).

    The first one is that a 0 at DRB means that nothing is filled. The
    driver's logic, however, do some calculation with that.

    The second one is that the logic that fills the DIMM data currently
    assumes that both channels are equally filled.

    I tested the system already with the current configuration and my
    patch and it is now working fine. So, for a 2R single DIMM 2Gb memory
    at dimm slot 01 (channel 0), it is now displaying:

    [ 16.741406] EDAC DEBUG: i3200_get_drbs: drb[0][0] = 16, drb[1][0] = 0
    [ 16.741410] EDAC DEBUG: i3200_get_drbs: drb[0][1] = 32, drb[1][1] = 0
    [ 16.741413] EDAC DEBUG: i3200_get_drbs: drb[0][2] = 32, drb[1][2] = 0
    [ 16.741416] EDAC DEBUG: i3200_get_drbs: drb[0][3] = 32, drb[1][3] = 0
    ...
    [ 16.741896] EDAC DEBUG: i3200_probe1: csrow 0, channel 0, size = 1024 Mb
    [ 16.741899] EDAC DEBUG: i3200_probe1: csrow 1, channel 0, size = 1024 Mb

    and the corresponding sysfs nodes are now properly filled.

    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     
  • Currently, it is not possible to know, when debug is enabled,
    if the driver is using 2 DIMMS per channel mode or not. It is
    not possible to know the values of the drbs registers, used
    to identify the memory rank sizes.

    Add debug for both, as it helps to track issues on the driver.

    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     
  • Linux 3.8-rc7

    * tag 'v3.8-rc7': (12052 commits)
    Linux 3.8-rc7
    net: sctp: sctp_endpoint_free: zero out secret key data
    net: sctp: sctp_setsockopt_auth_key: use kzfree instead of kfree
    atm/iphase: rename fregt_t -> ffreg_t
    ARM: 7641/1: memory: fix broken mmap by ensuring TASK_UNMAPPED_BASE is aligned
    ARM: DMA mapping: fix bad atomic test
    ARM: realview: ensure that we have sufficient IRQs available
    ARM: GIC: fix GIC cpumask initialization
    net: usb: fix regression from FLAG_NOARP code
    l2tp: dont play with skb->truesize
    net: sctp: sctp_auth_key_put: use kzfree instead of kfree
    netback: correct netbk_tx_err to handle wrap around.
    xen/netback: free already allocated memory on failure in xen_netbk_get_requests
    xen/netback: don't leak pages on failure in xen_netbk_tx_check_gop.
    xen/netback: shutdown the ring if it contains garbage.
    drm/ttm: fix fence locking in ttm_buffer_object_transfer, 2nd try
    virtio_console: Don't access uninitialized data.
    net: qmi_wwan: add more Huawei devices, including E320
    net: cdc_ncm: add another Huawei vendor specific device
    ipv6/ip6_gre: fix error case handling in ip6gre_tunnel_xmit()
    ...

    Mauro Carvalho Chehab
     

30 Jan, 2013

2 commits


10 Jan, 2013

1 commit


08 Jan, 2013

3 commits

  • Use device_unregister to replace put_device + device_del for
    cleanup, and fix the potential use after free.

    Signed-off-by: Lans Zhang
    Signed-off-by: Borislav Petkov

    Lans Zhang
     
  • After f65aad41772f("MIPS: Cavium: Add EDAC support."), when entering
    the "Device Drivers" toplevel menu in menuconfig, the suboptions behind
    EDAC appeared merged with the rest of the device drivers types. This was
    because the menuconfig option EDAC is querying an EDAC_SUPPORT Kconfig
    bool which was defined after the menu definition.

    When pushing EDAC_SUPPORT up, before the menu definition, the variable
    is defined earlier and the above menuconfig artifact doesn't happen.

    Drop a useless menuconfig comment while at it.

    Cc: Ralf Baechle
    Signed-off-by: Borislav Petkov

    Borislav Petkov
     
  • This patch fixes use-after-free and double-free bugs in
    edac_mc_sysfs_exit(). mci_pdev has single reference and put_device()
    calls mc_attr_release() which calls kfree(). The following
    device_del() works with already released memory. An another kfree() in
    edac_mc_sysfs_exit() releses the same memory again. Great.

    Signed-off-by: Konstantin Khlebnikov
    Cc: stable@vger.kernel.org # 3.[67]
    Cc: Denis Kirjanov
    Cc: Mauro Carvalho Chehab
    Link: http://lkml.kernel.org/r/20121214110310.11019.21098.stgit@zurg
    Signed-off-by: Borislav Petkov

    Konstantin Khlebnikov
     

04 Jan, 2013

1 commit

  • CONFIG_HOTPLUG is going away as an option. As a result, the __dev*
    markings need to be removed.

    This change removes the use of __devinit, __devexit_p, and __devexit
    from these drivers.

    Based on patches originally written by Bill Pemberton, but redone by me
    in order to handle some of the coding style issues better, by hand.

    Cc: Bill Pemberton
    Cc: Doug Thompson
    Cc: Borislav Petkov
    Cc: Mark Gross
    Cc: Jason Uhlenkott
    Cc: Mauro Carvalho Chehab
    Cc: Tim Small
    Cc: Ranganathan Desikan
    Cc: "Arvind R."
    Cc: Ralf Baechle
    Cc: David Daney
    Cc: Egor Martovetsky
    Cc: Olof Johansson
    Cc: Chris Metcalf
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

21 Dec, 2012

4 commits

  • It is easy to trigger this crash on 3.7.0:

    root@intel_westmere_ep-3:~# modprobe -r i7core_edac
    EDAC PCI: Removed device 0 for i7core_edac EDAC PCI controller: DEV 0000:fe:03.0
    EDAC MC: Removed device 1 for i7core_edac.c i7 core #1: DEV 0000:fe:03.0
    EDAC PCI: Removed device 1 for i7core_edac EDAC PCI controller: DEV 0000:ff:03.0
    EDAC MC: Removed device 0 for i7core_edac.c i7 core #0: DEV 0000:ff:03.0
    BUG: unable to handle kernel NULL pointer dereference at 0000000000000110
    IP: [] __blocking_notifier_call_chain+0x29/0x80
    PGD 1eaae7067 PUD 1e96e4067 PMD 0
    Oops: 0000 [#1] PREEMPT SMP
    Modules linked in: minix acpi_cpufreq freq_table mperf ioatdma processor edac_core(-) iTCO_wdt coretemp evdev hwmon lpc_ich dca mfd_core crc32c_intel ioapic [last unloaded: i7core_edac]
    CPU 3
    Pid: 1268, comm: modprobe Not tainted 3.7.0-WR5.0.1.0_standard+ #30 Intel Corporation S5520HC/S5520HC
    RIP: 0010:[] [] __blocking_notifier_call_chain+0x29/0x80
    RSP: 0018:ffff8801eb12de28 EFLAGS: 00010246
    RAX: 0000000000000000 RBX: 00000000000000f0 RCX: 00000000ffffffff
    RDX: ffff88012b452800 RSI: 0000000000000002 RDI: 00000000000000f0
    RBP: ffff8801eb12de68 R08: 0000000000000000 R09: ffffea0004ad1118
    R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
    R13: ffff8801eb12dee8 R14: ffff88012b452800 R15: 000000000060e518
    FS: 00007f9ea95a9700(0000) GS:ffff8801efc20000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: 0000000000000110 CR3: 00000001262f1000 CR4: 00000000000007e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Process modprobe (pid: 1268, threadinfo ffff8801eb12c000, task ffff8801e8421690)
    Stack:
    ffff88012c802a00 ffff88012b445ec0 ffff88012c802300 ffff88012b452800
    0000000000000000 ffff8801eb12dee8 000000000060e080 000000000060e518
    ffff8801eb12de78 ffffffff82069f56 ffff8801eb12dea8 ffffffff824ead7c
    Call Trace:
    [] blocking_notifier_call_chain+0x16/0x20
    [] device_del+0x3c/0x1d0
    [] edac_mc_sysfs_exit+0x1c/0x2f [edac_core]
    [] edac_exit+0x4f/0x56 [edac_core]
    [] sys_delete_module+0x17a/0x240
    [] ? vm_munmap+0x5c/0x80
    [] system_call_fastpath+0x16/0x1b
    Code: 90 90 55 48 89 e5 48 83 ec 40 48 89 5d d8 4c 89 65 e0 4c 89 6d e8 4c 89 75 f0 4c 89 7d f8 66 66 66 66 90 31 c0 49 89 d6 48 89 fb 8b 57 20 49 89 f5 41 89 cf 4c 8d 67 20 48 85 d2 74 2c 4c 89
    RIP [] __blocking_notifier_call_chain+0x29/0x80
    RSP
    CR2: 0000000000000110
    ---[ end trace b69acf12ccad1c0d ]---

    Usually, edac_subsys is grabbed one time by pci at initialization.
    But edac_subsys may be released several times if multiple pci MCs exist.
    The fix just makes the operations balanced.

    Signed-off-by: Lans Zhang
    Signed-off-by: Mauro Carvalho Chehab

    Lans Zhang
     
  • Remove size from lookup arrays and mark them as const.

    Reviewed-by: Jesper Juhl
    Signed-off-by: Niklas Söderlund
    Signed-off-by: Mauro Carvalho Chehab

    Niklas Söderlund
     
  • [ 17.024963] EDAC DEBUG: get_memory_layout: TOHM: 132.160 GB (0x0000002043ffffff)[ 17.024971] EDAC DEBUG: get_memory_layout: SAD#0 DRAM up to 33.792 GB (0x0000000840000000) Interleave: 8:6 reg=0x000083c3

    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     
  • There are no more embedded kobjects in struct mem_ctl_info. Remove a header and
    a comment that does not reflect the code anymore.

    Signed-off-by: Shaun Ruffell
    Signed-off-by: Mauro Carvalho Chehab

    Shaun Ruffell
     

15 Dec, 2012

1 commit

  • Pull MIPS updates from Ralf Baechle:
    "The MIPS bits for 3.8. This also includes a bunch fixes that were
    sitting in the linux-mips.org git tree for a long time. This pull
    request contains updates to several OCTEON drivers and the board
    support code for BCM47XX, BCM63XX, XLP, XLR, XLS, lantiq, Loongson1B,
    updates to the SSB bus support, MIPS kexec code and adds support for
    kdump.

    When pulling this, there are two expected merge conflicts in
    include/linux/bcma/bcma_driver_chipcommon.h which are trivial to
    resolve, just remove the conflict markers and keep both alternatives."

    * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (90 commits)
    MIPS: PMC-Sierra Yosemite: Remove support.
    VIDEO: Newport Fix console crashes
    MIPS: wrppmc: Fix build of PCI code.
    MIPS: IP22/IP28: Fix build of EISA code.
    MIPS: RB532: Fix build of prom code.
    MIPS: PowerTV: Fix build.
    MIPS: IP27: Correct fucked grammar in ops-bridge.c
    MIPS: Highmem: Fix build error if CONFIG_DEBUG_HIGHMEM is disabled
    MIPS: Fix potencial corruption
    MIPS: Fix for warning from FPU emulation code
    MIPS: Handle COP3 Unusable exception as COP1X for FP emulation
    MIPS: Fix poweroff failure when HOTPLUG_CPU configured.
    MIPS: MT: Fix build with CONFIG_UIDGID_STRICT_TYPE_CHECKS=y
    MIPS: Remove unused smvp.h
    MIPS/EDAC: Improve OCTEON EDAC support.
    MIPS: OCTEON: Add definitions for OCTEON memory contoller registers.
    MIPS: OCTEON: Add OCTEON family definitions to octeon-model.h
    ata: pata_octeon_cf: Use correct byte order for DMA in when built little-endian.
    MIPS/OCTEON/ata: Convert pata_octeon_cf.c to use device tree.
    MIPS: Remove usage of CEVT_R4K_LIB config option.
    ...

    Linus Torvalds
     

14 Dec, 2012

1 commit


12 Dec, 2012

2 commits

  • Drivers for EDAC on Cavium. Supported subsystems are:

    o CPU primary caches. These are parity protected only, so only error
    reporting.
    o Second level cache - ECC protected, provides SECDED.
    o Memory: ECC / SECDEC if used with suitable DRAM modules. The driver will
    will only initialize if ECC is enabled on a system so is safe to run on
    non-ECC memory.
    o PCI: Parity error reporting

    Since it is very hard to test this sort of code the implementation is very
    conservative and uses polling where possible for now.

    Signed-off-by: Ralf Baechle
    Reviewed-by: Borislav Petkov

    Ralf Baechle
     
  • Pull EDAC fixes from Borislav Petkov:

    - EDAC core error path fix, from Denis Kirjanov.

    - Generalization of AMD MCE bank names and some minor error reporting
    improvements.

    - EDAC core cleanups and simplifications, from Wei Yongjun.

    - amd64_edac fixes for sysfs-reported values, from Josh Hunt.

    - some heavy amd64_edac error reporting path shaving, leading to
    removing a bunch of code.

    - amd64_edac error injection method improvements.

    - EDAC core cleanups and fixes

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: (24 commits)
    EDAC, pci_sysfs: Use for_each_pci_dev to simplify the code
    EDAC: Handle error path in edac_mc_sysfs_init() properly
    MCE, AMD: Dump error status
    MCE, AMD: Report decoded error type first
    MCE, AMD: Dump CPU f/m/s triple with the error
    MCE, AMD: Remove functional unit references
    EDAC: Convert to use simple_open()
    EDAC, Calxeda highbank: Convert to use simple_open()
    EDAC: Fix mc size reported in sysfs
    EDAC: Fix csrow size reported in sysfs
    EDAC: Pass mci parent
    EDAC: Add memory controller flags
    amd64_edac: Fix csrows size and pages computation
    amd64_edac: Use DBAM_DIMM macro
    amd64_edac: Fix K8 chip select reporting
    amd64_edac: Reorganize error reporting path
    amd64_edac: Do not check whether error address is valid
    amd64_edac: Improve error injection
    amd64_edac: Cleanup error injection code
    amd64_edac: Small fixlets and cleanups
    ...

    Linus Torvalds
     

04 Dec, 2012

2 commits


28 Nov, 2012

9 commits