10 Apr, 2017

6 commits


28 Jan, 2017

1 commit


25 Dec, 2016

1 commit


15 Dec, 2016

2 commits


14 Nov, 2016

1 commit

  • When accessing the mc_devices list of memory controller descriptors, we
    need to hold mem_ctls_mutex. This was not always the case, fix that.

    Make all external callers call a version which grabs the mutex since the
    last is local to edac_mc.c.

    Reported-by: Yazen Ghannam
    Signed-off-by: Borislav Petkov

    Borislav Petkov
     

03 Jun, 2016

1 commit

  • After the workqueue cleanup, we're registering workqueues based on
    the presence of an ->edac_check function. When that is the case,
    we're setting OP_RUNNING_POLL. But we forgot to check that in
    edac_mc_reset_delay_period(), leading to:

    BUG: unable to handle kernel paging request at 0000000000015d10
    IP: [ .. ] queued_spin_lock_slowpath
    PGD 3ffcc8067 PUD 3ffc56067 PMD 0
    Oops: 0002 [#1] SMP
    Modules linked in: ...
    CPU: 1 PID: 2792 Comm: edactest Not tainted 4.6.0-dirty #1
    Hardware name: HP ProLiant MicroServer, BIOS O41 10/01/2013
    Stack:
    Call Trace:
    ? _raw_spin_lock_irqsave
    ? lock_timer_base.isra.34
    ? del_timer
    ? try_to_grab_pending
    ? mod_delayed_work_on
    ? edac_mc_reset_delay_period
    ? edac_set_poll_msec
    ? param_attr_store
    ? module_attr_store
    ? kernfs_fop_write
    ? __vfs_write
    ? __vfs_read
    ? __alloc_fd
    ? vfs_write
    ? SyS_write
    ? entry_SYSCALL_64_fastpath
    Code:
    RIP [ .. ] queued_spin_lock_slowpath
    RSP <>
    CR2: 0000000000015d10
    ---[ end trace 3f286bc71cca15d1 ]---
    Kernel panic - not syncing: Fatal exception

    Fix it.

    Signed-off-by: Nicholas Krause
    Cc: # 4.5
    Cc: Mauro Carvalho Chehab
    Cc: linux-edac
    Link: http://lkml.kernel.org/r/1463697958-13406-1-git-send-email-xerofoify@gmail.com
    [ Rewrite commit message. ]
    Signed-off-by: Borislav Petkov

    Nicholas Krause
     

24 Apr, 2016

1 commit

  • Fix typo in edac_inc_ue_error() to increment ue_noinfo_count instead of
    ce_noinfo_count.

    Signed-off-by: Emmanouil Maroudas
    Cc: Mauro Carvalho Chehab
    Cc: linux-edac
    Fixes: 4275be635597 ("edac: Change internal representation to work with layers")
    Link: http://lkml.kernel.org/r/1461425580-5898-1-git-send-email-emmanouil.maroudas@gmail.com
    Signed-off-by: Borislav Petkov

    Emmanouil Maroudas
     

02 Feb, 2016

3 commits


11 Dec, 2015

2 commits

  • Hide the EDAC workqueue pointer in a separate compilation unit and add
    accessors for the workqueue manipulations needed.

    Remove edac_pci_reset_delay_period() which wasn't used by anything. It
    seems it got added without a user with

    91b99041c1d5 ("drivers/edac: updated PCI monitoring")

    Signed-off-by: Borislav Petkov

    Borislav Petkov
     
  • EDAC workqueue destruction is really fragile. We cancel delayed work
    but if it is still running and requeues itself, we still go ahead and
    destroy the workqueue and the queued work explodes when workqueue core
    attempts to run it.

    Make the destruction more robust by switching op_state to offline so
    that requeuing stops. Cancel any pending work *synchronously* too.

    EDAC i7core: Driver loaded.
    general protection fault: 0000 [#1] SMP
    CPU 12
    Modules linked in:
    Supported: Yes
    Pid: 0, comm: kworker/0:1 Tainted: G IE 3.0.101-0-default #1 HP ProLiant DL380 G7
    RIP: 0010:[] [] __queue_work+0x17/0x3f0
    < ... regs ...>
    Process kworker/0:1 (pid: 0, threadinfo ffff88019def6000, task ffff88019def4600)
    Stack:
    ...
    Call Trace:
    call_timer_fn
    run_timer_softirq
    __do_softirq
    call_softirq
    do_softirq
    irq_exit
    smp_apic_timer_interrupt
    apic_timer_interrupt
    intel_idle
    cpuidle_idle_call
    cpu_idle
    Code: ...
    RIP __queue_work
    RSP

    Signed-off-by: Borislav Petkov
    Cc:

    Borislav Petkov
     

23 Oct, 2015

1 commit

  • The PAGES_TO_MiB macro is used for unit conversion but the
    trace_mc_event() tracepoint expects a page address. Fix that.

    Signed-off-by: Tan Xiaojun
    Cc: Mauro Carvalho Chehab
    Cc: linux-edac
    Link: http://lkml.kernel.org/r/1445341538-24271-1-git-send-email-tanxiaojun@huawei.com
    Signed-off-by: Borislav Petkov

    Tan Xiaojun
     

28 May, 2015

1 commit

  • So first of all, this atomic_scrub() function's naming is bad. It looks
    like an atomic_t helper. Change it to edac_atomic_scrub().

    The bigger problem is that this function is arch-specific and every new
    arch which doesn't necessarily need that functionality still needs to
    define it, otherwise EDAC doesn't compile.

    So instead of doing that and including arch-specific headers, have each
    arch define an EDAC_ATOMIC_SCRUB symbol which can be used in edac_mc.c
    for ifdeffery. Much cleaner.

    And we already are doing this with another symbol - EDAC_SUPPORT. This
    is also much cleaner than having CONFIG_EDAC enumerate all the arches
    which need/have EDAC support and drivers.

    This way I can kill the useless edac.h header in tile too.

    Acked-by: Ralf Baechle
    Acked-by: Michael Ellerman
    Acked-by: Chris Metcalf
    Acked-by: Ingo Molnar
    Acked-by: Russell King
    Cc: Benjamin Herrenschmidt
    Cc: Doug Thompson
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: linux-edac@vger.kernel.org
    Cc: linux-kernel@vger.kernel.org
    Cc: linux-mips@linux-mips.org
    Cc: linuxppc-dev@lists.ozlabs.org
    Cc: "Maciej W. Rozycki"
    Cc: Markos Chandras
    Cc: Mauro Carvalho Chehab
    Cc: Paul Mackerras
    Cc: "Steven J. Hill"
    Cc: x86@kernel.org
    Signed-off-by: Borislav Petkov

    Borislav Petkov
     

23 Feb, 2015

1 commit

  • Add edac_mc_add_mc_with_groups() for initializing the mem_ctl_info
    object with the optional attribute groups. This allows drivers to
    pass additional sysfs entries without manual (and racy)
    device_create_file() and co calls.

    edac_mc_add_mc() is kept as is, just calling edac_mc_add_with_groups()
    with NULL groups.

    Signed-off-by: Takashi Iwai
    Link: http://lkml.kernel.org/r/1423046938-18111-3-git-send-email-tiwai@suse.de
    Signed-off-by: Borislav Petkov

    Takashi Iwai
     

20 Oct, 2014

2 commits


02 Sep, 2014

1 commit


24 Jun, 2014

1 commit

  • To avoid confuision and conflict of usage for RAS related trace event,
    add an unified RAS trace event stub.

    Start a RAS subsystem menu which will be fleshed out in time, when more
    features get added to it.

    Signed-off-by: Chen, Gong
    Link: http://lkml.kernel.org/r/1402475691-30045-2-git-send-email-gong.chen@linux.intel.com
    Signed-off-by: Borislav Petkov
    Signed-off-by: Tony Luck

    Chen, Gong
     

09 May, 2014

1 commit


14 Feb, 2014

2 commits

  • We're using edac_mc_workq_setup() both on the init path, when
    we load an edac driver and when we change the polling period
    (edac_mc_reset_delay_period) through /sys/.../edac_mc_poll_msec.

    On that second path we don't need to init the workqueue which has been
    initialized already.

    Thanks to Tejun for workqueue insights.

    Signed-off-by: Borislav Petkov
    Link: http://lkml.kernel.org/r/1391457913-881-1-git-send-email-prarit@redhat.com
    Cc:

    Borislav Petkov
     
  • Sanitize code even more to accept unsigned longs only and to not allow
    polling intervals below 1 second as this is unnecessary and doesn't make
    much sense anyway for polling errors.

    Signed-off-by: Borislav Petkov
    Link: http://lkml.kernel.org/r/1391457913-881-1-git-send-email-prarit@redhat.com
    Cc: Doug Thompson
    Cc:

    Borislav Petkov
     

05 Nov, 2013

1 commit


24 Jul, 2013

1 commit

  • Fix the following:

    BUG: key ffff88043bdd0330 not in .data!
    ------------[ cut here ]------------
    WARNING: at kernel/lockdep.c:2987 lockdep_init_map+0x565/0x5a0()
    DEBUG_LOCKS_WARN_ON(1)
    Modules linked in: glue_helper sb_edac(+) edac_core snd acpi_cpufreq lrw gf128mul ablk_helper iTCO_wdt evdev i2c_i801 dcdbas button cryptd pcspkr iTCO_vendor_support usb_common lpc_ich mfd_core soundcore mperf processor microcode
    CPU: 2 PID: 599 Comm: modprobe Not tainted 3.10.0 #1
    Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A08 01/24/2013
    0000000000000009 ffff880439a1d920 ffffffff8160a9a9 ffff880439a1d958
    ffffffff8103d9e0 ffff88043af4a510 ffffffff81a16e11 0000000000000000
    ffff88043bdd0330 0000000000000000 ffff880439a1d9b8 ffffffff8103dacc
    Call Trace:
    dump_stack
    warn_slowpath_common
    warn_slowpath_fmt
    lockdep_init_map
    ? trace_hardirqs_on_caller
    ? trace_hardirqs_on
    debug_mutex_init
    __mutex_init
    bus_register
    edac_create_sysfs_mci_device
    edac_mc_add_mc
    sbridge_probe
    pci_device_probe
    driver_probe_device
    __driver_attach
    ? driver_probe_device
    bus_for_each_dev
    driver_attach
    bus_add_driver
    driver_register
    __pci_register_driver
    ? 0xffffffffa0010fff
    sbridge_init
    ? 0xffffffffa0010fff
    do_one_initcall
    load_module
    ? unset_module_init_ro_nx
    SyS_init_module
    tracesys
    ---[ end trace d24a70b0d3ddf733 ]---
    EDAC MC0: Giving out device to 'sbridge_edac.c' 'Sandy Bridge Socket#0': DEV 0000:3f:0e.0
    EDAC sbridge: Driver loaded.

    What happens is that bus_register needs a statically allocated lock_key
    because the last is handed in to lockdep. However, struct mem_ctl_info
    embeds struct bus_type (the whole struct, not a pointer to it) and the
    whole thing gets dynamically allocated.

    Fix this by using a statically allocated struct bus_type for the MC bus.

    Signed-off-by: Borislav Petkov
    Acked-by: Mauro Carvalho Chehab
    Cc: Markus Trippelsdorf
    Cc: stable@kernel.org # v3.10
    Signed-off-by: Tony Luck

    Borislav Petkov
     

16 Mar, 2013

1 commit

  • Both mci.mem_is_per_rank and mci.csbased denote the same thing: the
    memory controller is csrows based. Merge both fields into one.

    There's no need for the driver to actually fill it, as the core detects
    it by checking if one of the layers has the csrows type as part of the
    memory hierarchy:

    if (layers[i].type == EDAC_MC_LAYER_CHIP_SELECT)
    per_rank = true;

    Signed-off-by: Mauro Carvalho Chehab
    Signed-off-by: Borislav Petkov

    Mauro Carvalho Chehab
     

22 Feb, 2013

2 commits


21 Feb, 2013

3 commits

  • APEI GHES and i7core_edac/sb_edac currently can be loaded at
    the same time, but those are Highlander modules:
    "There can be only one".

    There are two reasons for that:

    1) Each driver assumes that it is the only one registering at
    the EDAC core, as it is driver's responsibility to number
    the memory controllers, and all of them start from 0;

    2) If BIOS is handling the memory errors, the OS can't also be
    doing it, as one will mangle with the other.

    So, we need to add an module owner's lock at the EDAC core,
    in order to avoid having two different modules handling memory
    errors at the same time. The best way for doing this lock seems
    to use the driver's name, as this is unique, and won't require
    changes on every driver.

    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     
  • There are some cases where the memory controller layout is
    completely hidden. This is the case of firmware-driven error
    code, like the one provided by GHES. Add a new layer to be
    used on such memory error report mechanisms.

    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     
  • Linux 3.8-rc7

    * tag 'v3.8-rc7': (12052 commits)
    Linux 3.8-rc7
    net: sctp: sctp_endpoint_free: zero out secret key data
    net: sctp: sctp_setsockopt_auth_key: use kzfree instead of kfree
    atm/iphase: rename fregt_t -> ffreg_t
    ARM: 7641/1: memory: fix broken mmap by ensuring TASK_UNMAPPED_BASE is aligned
    ARM: DMA mapping: fix bad atomic test
    ARM: realview: ensure that we have sufficient IRQs available
    ARM: GIC: fix GIC cpumask initialization
    net: usb: fix regression from FLAG_NOARP code
    l2tp: dont play with skb->truesize
    net: sctp: sctp_auth_key_put: use kzfree instead of kfree
    netback: correct netbk_tx_err to handle wrap around.
    xen/netback: free already allocated memory on failure in xen_netbk_get_requests
    xen/netback: don't leak pages on failure in xen_netbk_tx_check_gop.
    xen/netback: shutdown the ring if it contains garbage.
    drm/ttm: fix fence locking in ttm_buffer_object_transfer, 2nd try
    virtio_console: Don't access uninitialized data.
    net: qmi_wwan: add more Huawei devices, including E320
    net: cdc_ncm: add another Huawei vendor specific device
    ipv6/ip6_gre: fix error case handling in ip6gre_tunnel_xmit()
    ...

    Mauro Carvalho Chehab
     

30 Jan, 2013

1 commit


21 Dec, 2012

1 commit


12 Dec, 2012

1 commit

  • Pull EDAC fixes from Borislav Petkov:

    - EDAC core error path fix, from Denis Kirjanov.

    - Generalization of AMD MCE bank names and some minor error reporting
    improvements.

    - EDAC core cleanups and simplifications, from Wei Yongjun.

    - amd64_edac fixes for sysfs-reported values, from Josh Hunt.

    - some heavy amd64_edac error reporting path shaving, leading to
    removing a bunch of code.

    - amd64_edac error injection method improvements.

    - EDAC core cleanups and fixes

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: (24 commits)
    EDAC, pci_sysfs: Use for_each_pci_dev to simplify the code
    EDAC: Handle error path in edac_mc_sysfs_init() properly
    MCE, AMD: Dump error status
    MCE, AMD: Report decoded error type first
    MCE, AMD: Dump CPU f/m/s triple with the error
    MCE, AMD: Remove functional unit references
    EDAC: Convert to use simple_open()
    EDAC, Calxeda highbank: Convert to use simple_open()
    EDAC: Fix mc size reported in sysfs
    EDAC: Fix csrow size reported in sysfs
    EDAC: Pass mci parent
    EDAC: Add memory controller flags
    amd64_edac: Fix csrows size and pages computation
    amd64_edac: Use DBAM_DIMM macro
    amd64_edac: Fix K8 chip select reporting
    amd64_edac: Reorganize error reporting path
    amd64_edac: Do not check whether error address is valid
    amd64_edac: Improve error injection
    amd64_edac: Cleanup error injection code
    amd64_edac: Small fixlets and cleanups
    ...

    Linus Torvalds
     

04 Dec, 2012

1 commit