27 Jul, 2007

1 commit

  • This fixes a deadlock that could occur on a 'setup' and 'teardown' sequence of
    the workq for a edac_mc control structure instance. A similiar fix was
    previously implemented for the edac_device code.

    In addition, the edac_mc device code there was missing code to allow the workq
    period valu to be altered via sysfs control.

    This patch adds that fix on the code, and allows for the changing of the
    period value as well.

    Cc: Alan Cox
    Signed-off-by: Doug Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Doug Thompson
     

20 Jul, 2007

18 commits

  • Fix mutex locking deadlock on the device controller linked list. Was calling
    a lock then a function that could call the same lock. Moved the cancel workq
    function to outside the lock

    Added some short circuit logic in the workq code

    Added comments of description

    Code tidying

    Signed-off-by: Doug Thompson
    Cc: Greg KH
    Cc: Alan Cox
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Doug Thompson
     
  • This patch refactors the 'releasing' of kobjects for the edac_mc type of
    device. The correct pattern of kobject release is followed.

    As internal kobjs are allocated they bump a ref count on the top level kobj.
    It in turn has a module ref count on the edac_core module. When internal
    kobjects are released, they dec the ref count on the top level kobj. When the
    top level kobj reaches zero, it decrements the ref count on the edac_core
    object, allow it to be unloaded, as all resources have all now been released.

    Cc: Alan Cox alan@lxorguk.ukuu.org.uk
    Signed-off-by: Doug Thompson
    Acked-by: Greg KH
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Doug Thompson
     
  • Refactoring of sysfs code necessitated the refactoring of the edac_mc_alloc()
    and edac_mc_add_mc() apis, of moving the index value to the alloc() function.
    This patch alters the in tree drivers to utilize this new api signature.

    Having the index value performed later created a chicken-and-the-egg issue.
    Moving it to the alloc() function allows for creating the necessary sysfs
    entries with the proper index number

    Cc: Alan Cox alan@lxorguk.ukuu.org.uk
    Signed-off-by: Doug Thompson
    Cc: Greg KH
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Doug Thompson
     
  • Refactor the edac_align_ptr() function to reduce the noise of casting the
    aligned pointer to the various types of data objects and modified its callers
    to its new signature

    Signed-off-by: Douglas Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Douglas Thompson
     
  • This patch fixes some remnant spaces inserted by the use of Lindent.
    Seems Lindent adds some spaces when it shoulded. These have been fixed.
    In addition, goto targets have issues, these have been fixed
    in this patch.

    Signed-off-by: Douglas Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Douglas Thompson
     
  • The origin of this code comes from patches at sourceforge, that
    allow EDAC to be updated to various kernels. With kernel version 2.6.20 a
    new workq system was installed, thus the patches needed to be modified
    based on the kernel version. For submitting to the latest kernel.org
    those #ifdefs are removed

    Signed-off-by: Douglas Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Douglas Thompson
     
  • Run the EDAC CORE files through Lindent for cleanup

    Signed-off-by: Douglas Thompson
    Signed-off-by: Dave Jiang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Douglas Thompson
     
  • Fixup poll values for MC and PCI.
    Also make mc function names unique to mc.

    Signed-off-by: Dave Jiang
    Signed-off-by: Douglas Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Jiang
     
  • Change error check and clear variable from an atomic to an int

    Signed-off-by: Dave Jiang
    Signed-off-by: Douglas Thompson
    Signed-off-by: Linus Torvalds

    Dave Jiang
     
  • Move the memory controller object to work queue based implementation from the
    kernel thread based.

    Signed-off-by: Dave Jiang
    Signed-off-by: Douglas Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Jiang
     
  • Move dev_name() macro to a more generic interface since it's not possible
    to determine whether a device is pci, platform, or of_device easily.

    Now each low level driver sets the name into the control structure, and
    the EDAC core references the control structure for the information.

    Better abstraction.

    Signed-off-by: Dave Jiang
    Signed-off-by: Douglas Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Jiang
     
  • In the refactoring of edac_mc.c into several subsystem files,
    the header file edac_mc.h became meaningless. A new header file
    edac_core.h was created. All the files that previously included
    "edac_mc.h" are changed to include "edac_core.h".

    Signed-off-by: Douglas Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Douglas Thompson
     
  • Provides a way for NMI reported errors on x86 to notify the EDAC
    subsystem pending ECC errors by writing to a software state variable.

    Here's the reworked patch. I added an EDAC stub to the kernel so we can
    have variables that are in the kernel even if EDAC is a module. I also
    implemented the idea of using the chip driver to select error detection
    mode via module parameter and eliminate the kernel compile option.
    Please review/test. Thx!

    Also, I only made changes to some of the chipset drivers since I am
    unfamiliar with the other ones. We can add similar changes as we go.

    Signed-off-by: Dave Jiang
    Signed-off-by: Douglas Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Jiang
     
  • The EDAC core code uses a semaphore as mutex. use the mutex API
    instead of the (binary) semaphore.

    Matthaias wrote this, but since I had some patches ahead of it,
    I need to modify it to follow my patches.

    Signed-off-by: Matthias Kaehlcke
    Signed-off-by: Douglas Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthias Kaehlcke
     
  • This patch adds the new 'class' of object to be managed, named: 'edac_device'.

    As a peer of the 'edac_mc' class of object, it provides a non-memory centric
    view of an ERROR DETECTING device in hardware. It provides a sysfs interface
    and an abstraction for varioius EDAC type devices.

    Multiple 'instances' within the class are possible, with each 'instance'
    able to have multiple 'blocks', and each 'block' having 'attributes'.

    At the 'block' level there are the 'ce_count' and 'ue_count' fields
    which the device driver can update and/or call edac_device_handle_XX()
    functions. At each higher level are additional 'total' count fields,
    which are a summation of counts below that level.

    This 'edac_device' has been used to capture and present ECC errors
    which are found in a a L1 and L2 system on a per CORE/CPU basis.

    Signed-off-by: Douglas Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Douglas Thompson
     
  • This is a large patch to refactor the original EDAC module in the kernel
    and to break it up into better file granularity, such that each source
    file contains a given subsystem of the EDAC CORE.

    Originally, the EDAC 'core' was contained in one source file: edac_mc.c
    with it corresponding edac_mc.h file.

    Now, there are the following files:

    edac_module.c The main module init/exit function and other overhead
    edac_mc.c Code handling the edac_mc class of object
    edac_mc_sysfs.c Code handling for sysfs presentation
    edac_pci_sysfs.c Code handling for PCI sysfs presentation
    edac_core.h CORE .h include file for 'edac_mc' and 'edac_device' drivers
    edac_module.h Internal CORE .h include file

    This forms a foundation upon which a later patch can create the 'edac_device'
    class of object code in a new file 'edac_device.c'.

    Signed-off-by: Douglas Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Douglas Thompson
     
  • This patch makes needlessly global code static, in the edac core

    Signed-off-by: Adrian Bunk
    Cc: Doug Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • This simple patch adds an important CORE API for EDAC that EDAC drivers can
    use to find their edac_mc control structure by passing a mem_ctl_info
    'instance' value

    Needed for subsequent patches

    Signed-off-by: Douglas Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Douglas Thompson
     

18 Jul, 2007

1 commit

  • Currently, the freezer treats all tasks as freezable, except for the kernel
    threads that explicitly set the PF_NOFREEZE flag for themselves. This
    approach is problematic, since it requires every kernel thread to either
    set PF_NOFREEZE explicitly, or call try_to_freeze(), even if it doesn't
    care for the freezing of tasks at all.

    It seems better to only require the kernel threads that want to or need to
    be frozen to use some freezer-related code and to remove any
    freezer-related code from the other (nonfreezable) kernel threads, which is
    done in this patch.

    The patch causes all kernel threads to be nonfreezable by default (ie. to
    have PF_NOFREEZE set by default) and introduces the set_freezable()
    function that should be called by the freezable kernel threads in order to
    unset PF_NOFREEZE. It also makes all of the currently freezable kernel
    threads call set_freezable(), so it shouldn't cause any (intentional)
    change of behaviour to appear. Additionally, it updates documentation to
    describe the freezing of tasks more accurately.

    [akpm@linux-foundation.org: build fixes]
    Signed-off-by: Rafael J. Wysocki
    Acked-by: Nigel Cunningham
    Cc: Pavel Machek
    Cc: Oleg Nesterov
    Cc: Gautham R Shenoy
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     

13 Feb, 2007

2 commits

  • Eric Wollesen ported the Bluesmoke Memory Controller driver for the Intel
    5000X/V/P (Blackford/Greencreek) chipset to the in kernel EDAC model.

    This patch incorporates those required changes to the edac_mc.c and edac_mc.h
    core files by added new Fully Buffered DIMM interface to the EDAC Core module.

    Signed-off-by: eric wollesen
    Signed-off-by: doug thompson
    Acked-by: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    eric wollesen
     
  • This is an attempt of providing an interface for memory scrubbing control in
    EDAC.

    This patch modifies the EDAC Core to provide the Interface for memory
    controller modules to implment.

    The following things are still outstanding:

    - K8 is the first implemenation,

    The patch provide a method of configuring the K8 hardware memory scrubber
    via the 'mcX' sysfs directory. There should be some fallback to a generic
    scrubber implemented in software if the hardware does not support
    scrubbing.

    Or .. the scrubbing sysfs entry should not be visible at all.

    - Only works with SDRAM, not cache,

    The K8 can scrub cache and l2cache also - but I think this is not so
    useful as the cache is busy all the time (one hopes).

    One would also expect that cache scrubbing requires hardware support.

    - Error Handling,

    I would like that errors are returned to the user in "terms of file
    system".

    - Presentation,

    I chose Bandwidth in Bytes/Second as a representation of the scrubbing
    rate for the following reasons:

    I like that the sysfs entries are sort-of textual, related to something
    that makes sense instead of magical values that must be looked up.

    "My People" wants "% main memory scrubbed per hour" others prefer "%
    memory bandwidth used" as representation, "bandwith used" makes it easy to
    calculate both versions in one-liner scripts.

    If one later wants to scrub cache, the scaling becomes wierd for K8
    changing from "blocks of 64 byte memory" to "blocks of 64 cache lines" to
    "blocks of 64 bit". Using "bandwidth used" makes sense in all three cases,
    (I.M.O. anyway ;-).

    - Discovery,

    There is no way to discover the possible settings and what they do
    without reading the code and the documentation.

    *I* do not know how to make that work in a practical way.

    - Bugs(??),

    other tools can set invalid values in the memory scrub control register,
    those will read back as '-1', requiring the user to reset the scrub rate.
    This is how *I* think it should be.

    - Afflicting other areas of code,

    I made changes to edac_mc.c and edac_mc.h which will show up globally -
    this is not nice, it would be better that the memory scrubbing fuctionality
    and interface could be entirely contained within the memory controller it
    applies to.

    Frithiof Jensen

    edac_mc.c and its .h file is a CORE helper module for EDAC
    driver modules. This provides the abstraction for device specific
    drivers. It is fine to modify this CORE to provide help for
    new features of the the drivers

    doug thompson

    Signed-off-by: Frithiof Jensen
    Signed-off-by: doug thompson
    Acked-by: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Frithiof Jensen
     

08 Dec, 2006

1 commit


04 Nov, 2006

1 commit

  • Call sysdev_class_unregister() on failure in edac_sysfs_memctrl_setup()
    and decrease identation level for clear logic.

    Acked-by: Doug Thompson
    Signed-off-by: Akinobu Mita
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     

11 Jul, 2006

1 commit

  • When EDAC was first introduced into the kernel it had a sysfs interface,
    but due to some problems it was disabled in 2.6.16 and remained disabled in
    2.6.17.

    With feedback, several of the control and attribute files of that interface
    had some good constructive feedback. PCI Blacklist/Whitelist was a major
    set which has design issues and it has been removed in this patch. Instead
    of storing PCI broken parity status in EDAC, it has been moved to the
    pci_dev structure itself by a previous PCI patch. A future patch will
    enable that feature in EDAC by utilizing the pci_dev info.

    The sysfs is now enabled in this patch, with a minimal set of control and
    attribute files for examining EDAC state and for enabling/disabling the
    memory and PCI operations.

    The Documentation for EDAC has also been updated to reflect the new state
    of EDAC operation.

    Signed-off-by:Doug Thompson
    Cc: Greg KH
    Cc: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Doug Thompson
     

01 Jul, 2006

4 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial:
    Remove obsolete #include
    remove obsolete swsusp_encrypt
    arch/arm26/Kconfig typos
    Documentation/IPMI typos
    Kconfig: Typos in net/sched/Kconfig
    v9fs: do not include linux/version.h
    Documentation/DocBook/mtdnand.tmpl: typo fixes
    typo fixes: specfic -> specific
    typo fixes in Documentation/networking/pktgen.txt
    typo fixes: occuring -> occurring
    typo fixes: infomation -> information
    typo fixes: disadvantadge -> disadvantage
    typo fixes: aquire -> acquire
    typo fixes: mecanism -> mechanism
    typo fixes: bandwith -> bandwidth
    fix a typo in the RTC_CLASS help text
    smb is no longer maintained

    Manually merged trivial conflict in arch/um/kernel/vmlinux.lds.S

    Linus Torvalds
     
  • Remove add_mc_to_global_list(). In next patch, this function will be
    reimplemented with different semantics.

    1 Reimplement add_mc_to_global_list() with semantics that allow the caller to
    determine the ID number for a mem_ctl_info structure. Then modify
    edac_mc_add_mc() so that the caller specifies the ID number for the new
    mem_ctl_info structure. Platform-specific code should be able to assign the
    ID numbers in a platform-specific manner. For instance, on Opteron it makes
    sense to have the ID of the mem_ctl_info structure match the ID of the node
    that the memory controller belongs to.

    2 Modify callers of edac_mc_add_mc() so they use the new semantics.

    Signed-off-by: Doug Thompson
    Cc: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Doug Thompson
     
  • Change MC drivers from using CVS revision strings for their version number,
    Now each driver has its own local string.

    Remove some PCI dependencies from the core EDAC module. Made the code 'struct
    device' centric instead of 'struct pci_dev' Most of the code changes here are
    from a patch by Dave Jiang. It may be best to eventually move the
    PCI-specific code into a separate source file.

    Signed-off-by: Doug Thompson
    Cc: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Doug Thompson
     
  • Signed-off-by: Jörn Engel
    Signed-off-by: Adrian Bunk

    Jörn Engel
     

29 Mar, 2006

1 commit


27 Mar, 2006

10 commits

  • Change all instances of EXPORT_SYMBOL() in the core EDAC module to
    EXPORT_SYMBOL_GPL().

    Signed-off-by: David S. Peterson
    Cc: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Peterson
     
  • Cosmetic indentation/formatting cleanup for EDAC code. Make sure we
    are using tabs rather than spaces to indent, etc.

    Signed-off-by: David S. Peterson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Peterson
     
  • Fix EDAC code so EXPORT_SYMBOL comes after the function that is being
    exported. This is to maintain consistency with the rest of the kernel.

    Signed-off-by: David S. Peterson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Peterson
     
  • - Fix code so we always hold mem_ctls_mutex while we are stepping
    through the list of mem_ctl_info structures. Otherwise bad things
    may happen if one task is stepping through the list while another
    task is modifying it. We may eventually want to use reference
    counting to manage the mem_ctl_info structures. In the meantime we
    may as well fix this bug.

    - Don't disable interrupts while we are walking the list of
    mem_ctl_info structures in check_mc_devices(). This is unnecessary.

    Signed-off-by: David S. Peterson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Peterson
     
  • - After we unregister a kobject, wait for our kobject release method
    to call complete(). This causes us to wait until the kobject
    reference count reaches 0. Otherwise, a task accessing the EDAC
    sysfs interface can hold the reference count above 0 until after the
    EDAC module has been unloaded. When the reference count finally
    drops to 0, this will result in an attempt to call our release
    method inside the EDAC module after the module has already been
    unloaded.

    This isn't the best fix, since a process can get stuck sleeping forever
    uninterruptibly if the user does the following:

    rmmod my_module < /sys/my_sysfs/file

    I'll go back and implement a better fix later. However this should
    be ok for now.

    - Call edac_remove_sysfs_mci_device() from edac_mc_del_mc() rather
    than from edac_mc_free(). Since edac_mc_add_mc() calls
    edac_create_sysfs_mci_device(), edac_mc_del_mc() should call
    edac_remove_sysfs_mci_device().

    Signed-off-by: David S. Peterson
    Cc: Greg KH
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Peterson
     
  • - Remove calls to kobject_init(). These are unnecessary because
    kobject_register() calls kobject_init().

    - Remove extra calls to kobject_put(). When we call
    kobject_unregister(), this releases our reference to the kobject.
    The extra calls to kobject_put() may cause the reference count to
    drop to 0 while a kobject is still in use.

    Signed-off-by: David S. Peterson
    Cc: Greg KH
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Peterson
     
  • This is part 2 of a 2-part patch set.

    Fix edac_mc_add_mc() so it cleans up properly if call to
    edac_create_sysfs_mci_device() fails.

    Signed-off-by: David S. Peterson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Peterson
     
  • This is part 1 of a 2-part patch set. The code changes are split into
    two parts to make the patches more readable.

    Move complete_mc_list_del() and del_mc_from_global_list() so we can
    call del_mc_from_global_list() from edac_mc_add_mc() without forward
    declarations. Perhaps using forward declarations would be better?
    I'm doing things this way because the rest of the code is missing
    them.

    Signed-off-by: David S. Peterson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Peterson
     
  • Fix xxx_probe1() functions so they call xxx_get_error_info() functions
    to clear initial errors. This is simpler and cleaner than duplicating
    the low-level code for accessing PCI config space.

    Signed-off-by: David S. Peterson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Peterson
     
  • This implements the following idea:

    On Monday 30 January 2006 19:22, Eric W. Biederman wrote:
    > One piece missing from this conversation is the issue that we need errors
    > in a uniform format. That is why edac_mc has helper functions.
    >
    > However there will always be errors that don't fit any particular model.
    > Could we add a edac_printk(dev, ); That is similar to dev_printk but
    > prints out an EDAC header and the device on which the error was found?
    > Letting the rest of the string be user specified.
    >
    > For actual control that interface may be to blunt, but at least for people
    > looking in the logs it allows all of the errors to be detected and
    > harvested.

    Signed-off-by: David S. Peterson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Peterson