17 Jan, 2012

1 commit

  • When suspending, there was a large list of warnings going something like:

    Device 'machinecheck1' does not have a release() function, it is broken and must be fixed

    This patch turns the static mce_devices into dynamically allocated, and
    properly frees them when they are removed from the system. It solves
    the warning messages on my laptop here.

    Reported-by: "Srivatsa S. Bhat"
    Reported-by: Linus Torvalds
    Tested-by: Djalal Harouni
    Cc: Kay Sievers
    Cc: Tony Luck
    Cc: Borislav Petkov
    Signed-off-by: Greg Kroah-Hartman
    Signed-off-by: Linus Torvalds

    Greg Kroah-Hartman
     

08 Jan, 2012

1 commit

  • * 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (73 commits)
    arm: fix up some samsung merge sysdev conversion problems
    firmware: Fix an oops on reading fw_priv->fw in sysfs loading file
    Drivers:hv: Fix a bug in vmbus_driver_unregister()
    driver core: remove __must_check from device_create_file
    debugfs: add missing #ifdef HAS_IOMEM
    arm: time.h: remove device.h #include
    driver-core: remove sysdev.h usage.
    clockevents: remove sysdev.h
    arm: convert sysdev_class to a regular subsystem
    arm: leds: convert sysdev_class to a regular subsystem
    kobject: remove kset_find_obj_hinted()
    m86k: gpio - convert sysdev_class to a regular subsystem
    mips: txx9_sram - convert sysdev_class to a regular subsystem
    mips: 7segled - convert sysdev_class to a regular subsystem
    sh: dma - convert sysdev_class to a regular subsystem
    sh: intc - convert sysdev_class to a regular subsystem
    power: suspend - convert sysdev_class to a regular subsystem
    power: qe_ic - convert sysdev_class to a regular subsystem
    power: cmm - convert sysdev_class to a regular subsystem
    s390: time - convert sysdev_class to a regular subsystem
    ...

    Fix up conflicts with 'struct sysdev' removal from various platform
    drivers that got changed:
    - arch/arm/mach-exynos/cpu.c
    - arch/arm/mach-exynos/irq-eint.c
    - arch/arm/mach-s3c64xx/common.c
    - arch/arm/mach-s3c64xx/cpu.c
    - arch/arm/mach-s5p64x0/cpu.c
    - arch/arm/mach-s5pv210/common.c
    - arch/arm/plat-samsung/include/plat/cpu.h
    - arch/powerpc/kernel/sysfs.c
    and fix up cpu_is_hotpluggable() as per Greg in include/linux/cpu.h

    Linus Torvalds
     

07 Jan, 2012

1 commit

  • This resolves the conflict in the arch/arm/mach-s3c64xx/s3c6400.c file,
    and it fixes the build error in the arch/x86/kernel/microcode_core.c
    file, that the merge did not catch.

    The microcode_core.c patch was provided by Stephen Rothwell
    who was invaluable in the merge issues involved
    with the large sysdev removal process in the driver-core tree.

    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

22 Dec, 2011

1 commit

  • This moves the 'cpu sysdev_class' over to a regular 'cpu' subsystem
    and converts the devices to regular devices. The sysdev drivers are
    implemented as subsystem interfaces now.

    After all sysdev classes are ported to regular driver core entities, the
    sysdev implementation will be entirely removed from the kernel.

    Userspace relies on events and generic sysfs subsystem infrastructure
    from sysdev devices, which are made available with this conversion.

    Cc: Haavard Skinnemoen
    Cc: Hans-Christian Egtvedt
    Cc: Tony Luck
    Cc: Fenghua Yu
    Cc: Arnd Bergmann
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: Paul Mundt
    Cc: "David S. Miller"
    Cc: Chris Metcalf
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Borislav Petkov
    Cc: Tigran Aivazian
    Cc: Len Brown
    Cc: Zhang Rui
    Cc: Dave Jones
    Cc: Peter Zijlstra
    Cc: Russell King
    Cc: Andrew Morton
    Cc: Arjan van de Ven
    Cc: "Rafael J. Wysocki"
    Cc: "Srivatsa S. Bhat"
    Signed-off-by: Kay Sievers
    Signed-off-by: Greg Kroah-Hartman

    Kay Sievers
     

18 Dec, 2011

1 commit


17 Dec, 2011

1 commit

  • mce-inject provides a mechanism to simulate errors so that test
    scripts can check for correct operation of the kernel without
    requiring any specialized hardware to create rare events.

    The existing code can simulate events in normal process context
    and also in NMI context - but not in IRQ context. This patch
    fills that gap.

    Link: https://lkml.org/lkml/2011/12/7/537
    Signed-off-by: Chen Gong
    Signed-off-by: Tony Luck

    Chen Gong
     

14 Dec, 2011

1 commit


08 Nov, 2011

1 commit

  • Arjan would like to make struct file_operations const, but
    mce-inject directly writes to the mce_chrdev_ops to install its
    write handler. In an ideal world mce-inject would have its own
    character device, but we have a sizable legacy of test scripts
    that hardwire "/dev/mcelog", so it would be painful to switch to
    a separate device now. Instead, this patch switches to a stub
    function in the mce code, with a registration helper that
    mce-inject can call when it is loaded.

    Note that this would also allow for a sane process to allow
    mce-inject to be unloaded again (with an unregister function,
    and appropriate module_{get,put}() calls), but that is left for
    potential future patches.

    Reported-by: Arjan van de Ven
    Signed-off-by: Tony Luck
    Link: http://lkml.kernel.org/r/4eb2e1971326651a3b@agluck-desktop.sc.intel.com
    Signed-off-by: Ingo Molnar

    Luck, Tony
     

27 Jul, 2011

1 commit

  • This allows us to move duplicated code in
    (atomic_inc_not_zero() for now) to

    Signed-off-by: Arun Sharma
    Reviewed-by: Eric Dumazet
    Cc: Ingo Molnar
    Cc: David Miller
    Cc: Eric Dumazet
    Acked-by: Mike Frysinger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arun Sharma
     

16 Jun, 2011

2 commits

  • There are many functions named mce_* so use a new prefix for the subset
    of functions related to sysfs support.

    And since f3c6ea1b06c71b43f751b36bd99345369fe911af introduces
    syscore_ops, use the prefix mce_syscore for some functions related to
    power management which were in sysdev_class before.

    Before: After:
    mce_device mce_sysdev
    mce_sysclass mce_sysdev_class
    mce_attrs mce_sysdev_attrs
    mce_dev_initialized mce_sysdev_initialized
    mce_create_device mce_sysdev_create
    mce_remove_device mce_sysdev_remove

    mce_suspend mce_syscore_suspend
    mce_shutdown mce_syscore_shutdown
    mce_resume mce_syscore_resume

    Signed-off-by: Hidetoshi Seto
    Acked-by: Tony Luck
    Link: http://lkml.kernel.org/r/4DEED81B.8020506@jp.fujitsu.com
    Signed-off-by: Borislav Petkov

    Hidetoshi Seto
     
  • Follow other MCi register defines. Plus define MCI_MISC_ADDR_LSB() and
    MCI_MISC_ADDR_MODE().

    Signed-off-by: Hidetoshi Seto
    Acked-by: Tony Luck
    Link: http://lkml.kernel.org/r/4DEED6E8.9090509@jp.fujitsu.com
    Signed-off-by: Borislav Petkov

    Hidetoshi Seto
     

21 Apr, 2011

1 commit

  • The default notifier doesn't make a lot of sense to call in the
    correctable errors case. Drop it and emit the mcelog decoding
    hint only in the uncorrectable errors case and when no notifier
    is registered. Also, limit issuing the "mcelog --ascii" message
    in the rare case when we dump unreported CEs before panicking.

    While at it, remove unused old x86_mce_decode_callback from the
    header.

    Signed-off-by: Borislav Petkov
    Signed-off-by: Prarit Bhargava
    Cc: Tony Luck
    Cc: Nagananda Chumbalkar
    Cc: Russ Anderson
    Link: http://lkml.kernel.org/r/20110420102349.GB1361@aftab
    Signed-off-by: Ingo Molnar

    Borislav Petkov
     

04 Jan, 2011

1 commit

  • This patch adds code to therm_throt.c to notify core thermal threshold
    events. These thresholds are supported by the IA32_THERM_INTERRUPT register.
    The status/log for the same is monitored using the IA32_THERM_STATUS register.
    The necessary #defines are in msr-index.h. A call back is added to mce.h, to
    further notify the thermal stack, about the threshold events.

    Signed-off-by: Durgadoss R
    LKML-Reference:
    Signed-off-by: H. Peter Anvin

    R, Durgadoss
     

11 Jun, 2010

2 commits

  • It is reported that CMCI is not raised when number of corrected error
    reaches preset threshold. After inspection, it is found that
    MSR_IA32_MCI_CTL2 threshold field is not setup properly. This patch
    fixed it.

    Value of MCI_CTL2_CMCI_THRESHOLD_MASK is fixed according to x86_64
    Software Developer's Manual too.

    Reported-by: Shaohui Zheng
    Signed-off-by: Huang Ying
    LKML-Reference:
    Reviewed-by: Hidetoshi Seto
    Signed-off-by: H. Peter Anvin

    Huang Ying
     
  • Rename CMCI_EN to MCI_CTL2_CMCI_EN and CMCI_THRESHOLD_MASK to
    MCI_CTL2_CMCI_THRESHOLD_MASK to make naming consistent.

    Signed-off-by: Huang Ying
    LKML-Reference:
    Signed-off-by: H. Peter Anvin

    Huang Ying
     

20 May, 2010

1 commit

  • Generic Hardware Error Source provides a way to report platform
    hardware errors (such as that from chipset). It works in so called
    "Firmware First" mode, that is, hardware errors are reported to
    firmware firstly, then reported to Linux by firmware. This way, some
    non-standard hardware error registers or non-standard hardware link
    can be checked by firmware to produce more valuable hardware error
    information for Linux.

    Now, only SCI notification type and memory errors are supported. More
    notification type and hardware error type will be added later. These
    memory errors are reported to user space through /dev/mcelog via
    faking a corrected Machine Check, so that the error memory page can be
    offlined by /sbin/mcelog if the error count for one page is beyond the
    threshold.

    On some machines, Machine Check can not report physical address for
    some corrected memory errors, but GHES can do that. So this simplified
    GHES is implemented firstly.

    Signed-off-by: Huang Ying
    Signed-off-by: Andi Kleen
    Signed-off-by: Len Brown

    Huang Ying
     

13 Jan, 2010

1 commit


10 Nov, 2009

1 commit

  • On platforms where the BIOS handles the thermal monitor interrupt,
    APIC_LVTTHMR on each logical CPU is programmed to generate a SMI
    and OS must not touch it.

    Unfortunately AP bringup sequence using INIT-SIPI-SIPI clears all
    the LVT entries except the mask bit. Essentially this results in
    all LVT entries including the thermal monitoring interrupt set
    to masked (clearing the bios programmed value for APIC_LVTTHMR).

    And this leads to kernel take over the thermal monitoring
    interrupt on AP's but not on BSP (leaving the bios programmed
    value only on BSP).

    As a result of this, we have seen system hangs when the thermal
    monitoring interrupt is generated.

    Fix this by reading the initial value of thermal LVT entry on
    BSP and if bios has taken over the control, then program the
    same value on all AP's and leave the thermal monitoring
    interrupt control on all the logical cpu's to the bios.

    Signed-off-by: Yong Wang
    Reviewed-by: Suresh Siddha
    Cc: Borislav Petkov
    Cc: Arjan van de Ven
    LKML-Reference:
    Signed-off-by: Ingo Molnar
    Cc: stable@kernel.org

    Yong Wang
     

16 Oct, 2009

1 commit

  • Prefix global/setup routines with "mcheck_" thus differentiating
    from the internal facilities prefixed with "mce_". Also, prefix
    the per cpu calls with mcheck_cpu and rename them to reflect the
    MCE setup hierarchy of calls better.

    There should be no functionality change resulting from this
    patch.

    Signed-off-by: Borislav Petkov
    Cc: Andi Kleen
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Borislav Petkov
     

12 Oct, 2009

1 commit

  • Add an atomic notifier which ensures proper locking when conveying
    MCE info to EDAC for decoding. The actual notifier call overrides a
    default, negative priority notifier.

    Note: make sure we register the default decoder only once since
    mcheck_init() runs on each CPU.

    Signed-off-by: Borislav Petkov
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Borislav Petkov
     

02 Oct, 2009

1 commit

  • Make decoding of MCEs happen only on AMD hardware by registering a
    non-default callback only on CPU families which support it.

    While looking at the interaction of decode_mce() with the other MCE
    code i also noticed a few other things and made the following
    cleanups/fixes:

    - Fixed the mce_decode() weak alias - a weak alias is really not
    good here, it should be a proper callback. A weak alias will be
    overriden if a piece of code is built into the kernel - not
    good, obviously.

    - The patch initializes the callback on AMD family 10h and 11h.

    - Added the more correct fallback printk of:

    No support for human readable MCE decoding on this CPU type.
    Transcribe the message and run it through 'mcelog --ascii' to decode.

    On CPUs that dont have a decoder.

    - Made the surrounding code more readable.

    Note that the callback allows us to have a default fallback -
    without having to check the CPU versions during the printout
    itself. When an EDAC module registers itself, it can install the
    decode-print function.

    (there's no unregister needed as this is core code.)

    version -v2 by Borislav Petkov:

    - add K8 to the set of supported CPUs

    - always build in edac_mce_amd since we use an early_initcall now

    - fix checkpatch warnings

    Signed-off-by: Borislav Petkov
    Cc: Linus Torvalds
    Cc: Andi Kleen
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

11 Aug, 2009

2 commits

  • Raise mode include raising as exception or raising as poll, it is
    specified via the mce.inject_flags field.

    This can be used to specify raise mode of UCNA, which is UC error but
    raised not as exception. And this can be used to test the filter code
    of poll handler or exception handler too. For example, enforce a poll
    raise mode for a fatal MCE.

    ChangeLog:

    v2:

    - Re-base on latest x86-tip.git/mce3

    Signed-off-by: Huang Ying
    Signed-off-by: H. Peter Anvin

    Huang Ying
     
  • The cpu context is specified via the new mce.inject_flags fields.
    This allows more realistic machine check testing in different
    situations. "RANDOM" context is implemented via NMI broadcasting to
    add randomization to testing.

    AK: Fix NMI broadcasting check. Fix 32-bit building. Some race
    fixes. Move to module. Various changes

    ChangeLog:

    v3:

    - Re-based on latest x86-tip.git/mce4

    - Fix 32-bit building

    v2:

    - Re-base on latest x86-tip.git/mce3

    Signed-off-by: Huang Ying
    Signed-off-by: Andi Kleen
    Signed-off-by: H. Peter Anvin

    Huang Ying
     

10 Jul, 2009

2 commits


21 Jun, 2009

1 commit


17 Jun, 2009

5 commits


11 Jun, 2009

1 commit

  • This patch introduces three boot options (no_cmci, dont_log_ce
    and ignore_ce) to control handling for corrected errors.

    The "mce=no_cmci" boot option disables the CMCI feature.

    Since CMCI is a new feature so having boot controls to disable
    it will be a help if the hardware is misbehaving.

    The "mce=dont_log_ce" boot option disables logging for corrected
    errors. All reported corrected errors will be cleared silently.
    This option will be useful if you never care about corrected
    errors.

    The "mce=ignore_ce" boot option disables features for corrected
    errors, i.e. polling timer and cmci. All corrected events are
    not cleared and kept in bank MSRs.

    Usually this disablement is not recommended, however it will be
    a help if there are some conflict with the BIOS or hardware
    monitoring applications etc., that clears corrected events in
    banks instead of OS.

    [ And trivial cleanup (space -> tab) for doc is included. ]

    Signed-off-by: Hidetoshi Seto
    Reviewed-by: Andi Kleen
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Hidetoshi Seto
     

04 Jun, 2009

8 commits

  • Newer Intel CPUs support a new class of machine checks called recoverable
    action optional.

    Action Optional means that the CPU detected some form of corruption in
    the background and tells the OS about using a machine check
    exception. The OS can then take appropiate action, like killing the
    process with the corrupted data or logging the event properly to disk.

    This is done by the new generic high level memory failure handler added
    in a earlier patch. The high level handler takes the address with the
    failed memory and does the appropiate action, like killing the process.

    In this version of the patch the high level handler is stubbed out
    with a weak function to not create a direct dependency on the hwpoison
    branch.

    The high level handler cannot be directly called from the machine check
    exception though, because it has to run in a defined process context to
    be able to sleep when taking VM locks (it is not expected to sleep for a
    long time, just do so in some exceptional cases like lock contention)

    Thus the MCE handler has to queue a work item for process context,
    trigger process context and then call the high level handler from there.

    This patch adds two path to process context: through a per thread kernel
    exit notify_user() callback or through a high priority work item.
    The first runs when the process exits back to user space, the other when
    it goes to sleep and there is no higher priority process.

    The machine check handler will schedule both, and whoever runs first
    will grab the event. This is done because quick reaction to this
    event is critical to avoid a potential more fatal machine check
    when the corruption is consumed.

    There is a simple lock less ring buffer to queue the corrupted
    addresses between the exception handler and the process context handler.
    Then in process context it just calls the high level VM code with
    the corrupted PFNs.

    The code adds the required code to extract the failed address from
    the CPU's machine check registers. It doesn't try to handle all
    possible cases -- the specification has 6 different ways to specify
    memory address -- but only the linear address.

    Most of the required checking has been already done earlier in the
    mce_severity rule checking engine. Following the Intel
    recommendations Action Optional errors are only enabled for known
    situations (encoded in MCACODs). The errors are ignored otherwise,
    because they are action optional.

    v2: Improve comment, disable preemption while processing ring buffer
    (reported by Ying Huang)

    Signed-off-by: Andi Kleen
    Signed-off-by: Hidetoshi Seto
    Signed-off-by: H. Peter Anvin

    Andi Kleen
     
  • Rename the mce_notify_user function to mce_notify_irq. The next
    patch will split the wakeup handling of interrupt context
    and of process context and it's better to give it a clearer
    name for this.

    Contains a fix from Ying Huang

    [ Impact: cleanup ]

    Signed-off-by: Andi Kleen
    Signed-off-by: Hidetoshi Seto
    Cc: Huang Ying
    Signed-off-by: H. Peter Anvin

    Andi Kleen
     
  • The x86 architecture recently added some new machine check status bits:
    S(ignalled) and AR (Action-Required). Signalled allows to check
    if a specific event caused an exception or was just logged through CMCI.
    AR allows the kernel to decide if an event needs immediate action
    or can be delayed or ignored.

    Implement support for these new status bits. mce_severity() uses
    the new bits to grade the machine check correctly and decide what
    to do. The exception handler uses AR to decide to kill or not.
    The S bit is used to separate events between the poll/CMCI handler
    and the exception handler.

    Classical UC always leads to panic. That was true before anyways
    because the existing CPUs always passed a PCC with it.

    Also corrects the rules whether to kill in user or kernel context
    and how to handle missing RIPV.

    The machine check handler largely uses the mce-severity grading
    engine now instead of making its own decisions. This means the logic
    is centralized in one place. This is useful because it has to be
    evaluated multiple times.

    v2: Some rule fixes; Add AO events
    Fix RIPV, RIPV|EIPV order (Ying Huang)
    Fix UCNA with AR=1 message (Ying Huang)
    Add comment about panicing in m_c_p.

    Signed-off-by: Andi Kleen
    Signed-off-by: Hidetoshi Seto
    Signed-off-by: H. Peter Anvin

    Andi Kleen
     
  • Experience has shown that struct mce which is used to pass an machine
    check to the user space daemon currently a few limitations. Also some
    data which is useful to print at panic level is also missing.

    This patch addresses most of them. The same information is also
    printed out together with mce panic.

    struct mce can be painlessly extended in a compatible way, the mcelog
    user space code just ignores additional fields with a warning.

    - It doesn't provide a wall time timestamp. There have been a few
    complaints about that. Fix that by adding a 64bit time_t

    - It doesn't provide the exact CPU identification. This makes
    it awkward for mcelog to decode the event correctly, especially
    when there are variations in the supported MCE codes on different
    CPU models or when mcelog is running on a different host after a panic.
    Previously the administrator had to specify the correct CPU
    when mcelog ran on a different host, but with the more variation
    in machine checks now it's better to auto detect that.
    It's also useful for more detailed analysis of CPU events.
    Pass CPUID 1.EAX and the cpu vendor (as encoded in processor.h) instead.

    - Socket ID and initial APIC ID are useful to report because they
    allow to identify the failing CPU in some (not all) cases.
    This is also especially useful for the panic situation.
    This addresses one of the complaints from Thomas Gleixner earlier.

    - The MCG capabilities MSR needs to be reported for some advanced
    error processing in mcelog

    Signed-off-by: Andi Kleen
    Signed-off-by: Hidetoshi Seto
    Signed-off-by: H. Peter Anvin

    Andi Kleen
     
  • The old struct mce had a limitation to 256 CPUs. But x86 Linux supports
    more than that now with x2apic. Add a new field extcpu to report the
    extended number.

    Signed-off-by: Andi Kleen
    Signed-off-by: Hidetoshi Seto
    Signed-off-by: H. Peter Anvin

    Andi Kleen
     
  • This makes it easier for tools who want to extract the mcelog out of
    crash images or memory dumps to adapt to changing struct mce size.
    The length field replaces padding, so it's fully compatible.

    Signed-off-by: Andi Kleen
    Signed-off-by: Hidetoshi Seto
    Signed-off-by: H. Peter Anvin

    Andi Kleen
     
  • Keep a count of the machine check polls (or CMCI events) in
    /proc/interrupts.

    Andi needs this for debugging, but it's also useful in general
    to see what's going in by the kernel.

    Signed-off-by: Andi Kleen
    Signed-off-by: Hidetoshi Seto
    Signed-off-by: H. Peter Anvin

    Andi Kleen
     
  • Useful for debugging, but it's also good general policy
    to have a counter for all special interrupts there. This makes it easier
    to diagnose where a CPU is spending its time.

    [ Impact: feature, debugging tool ]

    Signed-off-by: Andi Kleen
    Signed-off-by: Hidetoshi Seto
    Signed-off-by: H. Peter Anvin

    Andi Kleen