15 Dec, 2011

1 commit


27 May, 2011

1 commit


31 Mar, 2011

1 commit


24 Sep, 2009

1 commit

  • Module edac_core.ko uses call_rcu() callbacks in edac_device.c, edac_mc.c
    and edac_pci.c.

    They all use a wait_for_completion() scheme, but this scheme it not 100%
    safe on multiple CPUs. See the _rcu_barrier() implementation which
    explains why extra precausion is needed.

    The patch adds a comment about rcu_barrier() and as a precausion calls
    rcu_barrier(). A maintainer needs to look at removing the
    wait_for_completion code.

    [dougthompson@xmission.com: remove the wait_for_completion code]
    Signed-off-by Jesper Dangaard Brouer
    Signed-off-by: Doug Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jesper Dangaard Brouer
     

19 Jun, 2009

1 commit

  • Add edac_device_alloc_index(), because for MAPLE platform there may
    exist several EDAC driver modules that could make use of
    edac_device_ctl_info structure at the same time. The index allocation
    for these structures should be taken care of by EDAC core.

    [akpm@linux-foundation.org: cleanups]
    Signed-off-by: Harry Ciao
    Cc: Doug Thompson
    Cc: Michael Ellerman
    Cc: Benjamin Herrenschmidt
    Cc: Kumar Gala
    Cc: Paul Mackerras
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Harry Ciao
     

14 Apr, 2009

1 commit

  • The edac-core driver includes code which assumes that the work_struct
    which is included in every delayed_work is the first member of that
    structure. This is currently the case but might change in the future, so
    use to_delayed_work() instead, which doesn't make such an assumption.

    linux-2.6.30-rc1 has the to_delayed_work() function that will allow this
    patch to work

    Signed-off-by: Jean Delvare
    Signed-off-by: Doug Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jean Delvare
     

07 Jan, 2009

1 commit

  • This patch is part of a larger patch series which will remove the "char
    bus_id[20]" name string from struct device. The device name is managed in
    the kobject anyway, and without any size limitation, and just needlessly
    copied into "struct device".

    [akpm@linux-foundation.org: coding-style fixes]
    Acked-by: Greg Kroah-Hartman
    Acked-by: Doug Thompson
    Signed-off-by: Kay Sievers
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kay Sievers
     

24 Dec, 2008

1 commit

  • When deleting an edac device, we have to wait for its edac_dev.work to be
    completed before deleting the whole edac_dev structure. Since we have no
    idea which work in current edac_poller's workqueue is the work we are
    conerned about, we wait for all work in the edac_poller's workqueue to be
    proceseed. This is done via flush_cpu_workqueue() which inserts a
    wq_barrier into the tail of the workqueue and then sleeping on the
    completion of this wq_barrier. The edac_poller will wake up sleepers when
    it is found.

    EDAC core creates only one kernel worker thread, edac_poller, to run the
    works of all current edac devices. They share the same callback function
    of edac_device_workq_function(), which would grab the mutex of
    device_ctls_mutex first before it checks the device. This is exactly
    where edac_poller and rmmod would have a great chance to deadlock.

    In below call trace of rmmod > ... >
    edac_device_del_device >
    edac_device_workq_teardown > flush_workqueue > flush_cpu_workqueue,

    device_ctls_mutex would have already been grabbed by
    edac_device_del_device(). So, on one hand rmmod would sleep on the
    completion of a wq_barrier, holding device_ctls_mutex; on the other hand
    edac_poller would be blocked on the same mutex when it's running any one
    of works of existing edac evices(Note, this edac_dev.work is likely to be
    totally irrelevant to the one that is being removed right now)and never
    would have a chance to run the work of above wq_barrier to wake rmmod up.

    edac_device_workq_teardown() should not be called within the critical
    region of device_ctls_mutex. Just like is done in edac_pci_del_device()
    and edac_mc_del_mc(), where edac_pci_workq_teardown() and
    edac_mc_workq_teardown() are called after related mutex are released.

    Moreover, an edac_dev.work should check first if it is being removed. If
    this is the case, then it should bail out immediately. Since not all of
    existing edac devices are to be removed, this "shutting flag" should be
    contained to edac device being removed. The current edac_dev.op_state can
    be used to serve this purpose.

    The original deadlock problem and the solution have been witnessed and
    tested on actual hardware. Without the solution, rmmod an edac driver
    would result in below deadlock:

    root@localhost:/root> rmmod mv64x60_edac
    EDAC DEBUG: mv64x60_dma_err_remove()
    EDAC DEBUG: edac_device_del_device()
    EDAC DEBUG: find_edac_device_by_dev()

    (hang for a moment)

    INFO: task edac-poller:2030 blocked for more than 120 seconds.
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    edac-poller D 00000000 0 2030 2
    Call Trace:
    [df159dc0] [c0071e3c] free_hot_cold_page+0x17c/0x304 (unreliable)
    [df159e80] [c000a024] __switch_to+0x6c/0xa0
    [df159ea0] [c03587d8] schedule+0x2f4/0x4d8
    [df159f00] [c03598a8] __mutex_lock_slowpath+0xa0/0x174
    [df159f40] [e1030434] edac_device_workq_function+0x28/0xd8 [edac_core]
    [df159f60] [c003beb4] run_workqueue+0x114/0x218
    [df159f90] [c003c674] worker_thread+0x5c/0xc8
    [df159fd0] [c004106c] kthread+0x5c/0xa0
    [df159ff0] [c0013538] original_kernel_thread+0x44/0x60
    INFO: task rmmod:2062 blocked for more than 120 seconds.
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    rmmod D 0ff2c9fc 0 2062 1839
    Call Trace:
    [df119c00] [c0437a74] 0xc0437a74 (unreliable)
    [df119cc0] [c000a024] __switch_to+0x6c/0xa0
    [df119ce0] [c03587d8] schedule+0x2f4/0x4d8
    [df119d40] [c03591dc] schedule_timeout+0xb0/0xf4

    Signed-off-by: Linus Torvalds

    Harry Ciao
     

06 May, 2008

1 commit

  • Commit 06916639e2fed9ee475efef2747a1b7429f8fe76 ("driver-core: add
    dev_name() to help transition away from using bus_id") added a static
    inline dev_name() and used it in dev_printk.

    Unfortunately, drivers/edac/edac_core.h defines a macro called
    dev_name(). Rename the latter.

    Diagnosis by Tony Breeds and Michael Ellerman.

    Signed-off-by: Stephen Rothwell
    Acked-by: Doug Thompson
    Signed-off-by: Linus Torvalds

    Stephen Rothwell
     

29 Apr, 2008

2 commits

  • Collection of patches, merged into one, from Adrian that do the following:

    1) This patch makes the following needlessly global functions static:
    - edac_pci_get_log_pe()
    - edac_pci_get_log_npe()
    - edac_pci_get_panic_on_pe()
    - edac_pci_unregister_sysfs_instance_kobj()
    - edac_pci_main_kobj_setup()

    2) Remove unneeded function edac_device_find()

    3) Added #if 0 around function edac_pci_find()

    4) make the needlessly global edac_pci_generic_check() static

    5) Removed function edac_check_mc_devices()

    Doug Thompson modified Adrian's patches, to bettern represent
    the direction of EDAC, and make them one patch.

    Cc: Alan Cox
    Signed-off-by: Adrian Bunk
    Signed-off-by: Doug Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • Signed-off-by: Robert P. J. Day
    Acked-by: Doug Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Robert P. J. Day
     

08 Feb, 2008

2 commits


20 Jul, 2007

17 commits

  • Some simple fixes to properly reference counter values from the block
    attribute level of edac_device objects. Properly sequencing the array pointer
    was added, resulting in correct identification of block level attributes from
    their base class functions.

    Added more verbose debug statement for event tracking.

    Also during some corner testing, found a bug in the store/show sequence
    of operations for the block attribute/controls management.

    An old intermediate structure for 'blocks' was still in the processing
    pipeline. This patch removes that old structure and correctly utilizes the
    new struct edac_dev_sysfs_block_attribute for passing control from the sysfs
    to the low level store/show function of the edac driver.

    Now the proper kobj pointer to passed downward to the store/show
    functions.

    Signed-off-by: Doug Thompson
    Cc: Greg KH
    Cc: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Doug Thompson
     
  • Fix mutex locking deadlock on the device controller linked list. Was calling
    a lock then a function that could call the same lock. Moved the cancel workq
    function to outside the lock

    Added some short circuit logic in the workq code

    Added comments of description

    Code tidying

    Signed-off-by: Doug Thompson
    Cc: Greg KH
    Cc: Alan Cox
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Doug Thompson
     
  • With feedback, this patch corrects operation of the kobject release operation
    on kobjects, attributes and controls for the edac_device.

    Cc: Alan Cox alan@lxorguk.ukuu.org.uk
    Signed-off-by: Doug Thompson
    Acked-by: Greg KH
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Douglas Thompson
     
  • Refactoring of sysfs code necessitated the refactoring of the
    edac_device_alloc() and edac_device_add_device() apis, of moving the index
    value to the alloc() function. This patch alters the in tree drivers to
    utilize this new api signature.

    Having the index value performed later created a chicken-and-the-egg issue.
    Moving it to the alloc() function allows for creating the necessary sysfs
    entries with the proper index number

    Cc: Alan Cox alan@lxorguk.ukuu.org.uk
    Signed-off-by: Doug Thompson
    Cc: Greg KH
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Doug Thompson
     
  • This patch fixes and enhances the driver level set of sysfs attributes that
    can be added to the 'block' level of an edac_device type of driver.

    There is a controller information structure, which contains one or more
    instances of device. Each instance will have one or more blocks of device
    specific counters. This patch fixes the ability to have more detailed
    attributes/controls for each of the 'blocks', providing for the addition of
    controls/attributes from the low level driver to user space via sysfs.

    Cc: Alan Cox alan@lxorguk.ukuu.org.uk
    Signed-off-by: Douglas Thompson
    Cc: Greg KH
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Douglas Thompson
     
  • A previous patch changed the edac_mc src file from semaphore usage to mutex
    This patch changes the edac_device src file as well, from semaphore use to
    mutex operation.

    Use a mutex primitive for mutex operations, as it does not require a
    semaphore

    Cc: Alan Cox alan@lxorguk.ukuu.org.uk
    Signed-off-by: Doug Thompson
    Cc: Greg KH
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Doug Thompson
     
  • Refactored the function edac_op_state_toString() to be edac_op_state_to_string()
    for consistent style, and its callers

    Signed-off-by: Douglas Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Douglas Thompson
     
  • Refactor the edac_align_ptr() function to reduce the noise of casting the
    aligned pointer to the various types of data objects and modified its callers
    to its new signature

    Signed-off-by: Douglas Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Douglas Thompson
     
  • For the file edac_device.c perform some coding style enhancements
    Add some function header comments
    Made for better readability commands

    Signed-off-by: Douglas Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Douglas Thompson
     
  • This patch fixes some remnant spaces inserted by the use of Lindent.
    Seems Lindent adds some spaces when it shoulded. These have been fixed.
    In addition, goto targets have issues, these have been fixed
    in this patch.

    Signed-off-by: Douglas Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Douglas Thompson
     
  • The error handling output strings needed to be refactored for better
    displaying of the error informaton.

    Also needed to added offset_value for output as well

    Signed-off-by: Douglas Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Douglas Thompson
     
  • The origin of this code comes from patches at sourceforge, that
    allow EDAC to be updated to various kernels. With kernel version 2.6.20 a
    new workq system was installed, thus the patches needed to be modified
    based on the kernel version. For submitting to the latest kernel.org
    those #ifdefs are removed

    Signed-off-by: Douglas Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Douglas Thompson
     
  • Run the EDAC CORE files through Lindent for cleanup

    Signed-off-by: Douglas Thompson
    Signed-off-by: Dave Jiang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Douglas Thompson
     
  • Moving PCI to a per-instance device model

    This should include the correct sysfs setup as well. Please review.

    Signed-off-by: Dave Jiang
    Signed-off-by: Douglas Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Jiang
     
  • Move the memory controller object to work queue based implementation from the
    kernel thread based.

    Signed-off-by: Dave Jiang
    Signed-off-by: Douglas Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Jiang
     
  • Move dev_name() macro to a more generic interface since it's not possible
    to determine whether a device is pci, platform, or of_device easily.

    Now each low level driver sets the name into the control structure, and
    the EDAC core references the control structure for the information.

    Better abstraction.

    Signed-off-by: Dave Jiang
    Signed-off-by: Douglas Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Jiang
     
  • This patch adds the new 'class' of object to be managed, named: 'edac_device'.

    As a peer of the 'edac_mc' class of object, it provides a non-memory centric
    view of an ERROR DETECTING device in hardware. It provides a sysfs interface
    and an abstraction for varioius EDAC type devices.

    Multiple 'instances' within the class are possible, with each 'instance'
    able to have multiple 'blocks', and each 'block' having 'attributes'.

    At the 'block' level there are the 'ce_count' and 'ue_count' fields
    which the device driver can update and/or call edac_device_handle_XX()
    functions. At each higher level are additional 'total' count fields,
    which are a summation of counts below that level.

    This 'edac_device' has been used to capture and present ECC errors
    which are found in a a L1 and L2 system on a per CORE/CPU basis.

    Signed-off-by: Douglas Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Douglas Thompson