19 Jan, 2012

1 commit

  • This includes initial support for the recently published ACPI 5.0 spec.
    In particular, support for the "hardware-reduced" bit that eliminates
    the dependency on legacy hardware.

    APEI has patches resulting from testing on real hardware.

    Plus other random fixes.

    * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: (52 commits)
    acpi/apei/einj: Add extensions to EINJ from rev 5.0 of acpi spec
    intel_idle: Split up and provide per CPU initialization func
    ACPI processor: Remove unneeded variable passed by acpi_processor_hotadd_init V2
    ACPI processor: Remove unneeded cpuidle_unregister_driver call
    intel idle: Make idle driver more robust
    intel_idle: Fix a cast to pointer from integer of different size warning in intel_idle
    ACPI: kernel-parameters.txt : Add intel_idle.max_cstate
    intel_idle: remove redundant local_irq_disable() call
    ACPI processor: Fix error path, also remove sysdev link
    ACPI: processor: fix acpi_get_cpuid for UP processor
    intel_idle: fix API misuse
    ACPI APEI: Convert atomicio routines
    ACPI: Export interfaces for ioremapping/iounmapping ACPI registers
    ACPI: Fix possible alignment issues with GAS 'address' references
    ACPI, ia64: Use SRAT table rev to use 8bit or 16/32bit PXM fields (ia64)
    ACPI, x86: Use SRAT table rev to use 8bit or 32bit PXM fields (x86/x86-64)
    ACPI: Store SRAT table revision
    ACPI, APEI, Resolve false conflict between ACPI NVS and APEI
    ACPI, Record ACPI NVS regions
    ACPI, APEI, EINJ, Refine the fix of resource conflict
    ...

    Linus Torvalds
     

18 Jan, 2012

2 commits


17 Jan, 2012

9 commits

  • APEI needs memory access in interrupt context. The obvious choice is
    acpi_read(), but originally it couldn't be used in interrupt context
    because it makes temporary mappings with ioremap(). Therefore, we added
    drivers/acpi/atomicio.c, which provides:
    acpi_pre_map_gar() -- ioremap in process context
    acpi_atomic_read() -- memory access in interrupt context
    acpi_post_unmap_gar() -- iounmap

    Later we added acpi_os_map_generic_address() (2971852) and enhanced
    acpi_read() so it works in interrupt context as long as the address has
    been previously mapped (620242a). Now this sequence:
    acpi_os_map_generic_address() -- ioremap in process context
    acpi_read()/apei_read() -- now OK in interrupt context
    acpi_os_unmap_generic_address()
    is equivalent to what atomicio.c provides.

    This patch introduces apei_read() and apei_write(), which currently are
    functional equivalents of acpi_read() and acpi_write(). This is mainly
    proactive, to prevent APEI breakages if acpi_read() and acpi_write()
    are ever augmented to support the 'bit_offset' field of GAS, as APEI's
    __apei_exec_write_register() precludes splitting up functionality
    related to 'bit_offset' and APEI's 'mask' (see its
    APEI_EXEC_PRESERVE_REGISTER block).

    With apei_read() and apei_write() in place, usages of atomicio routines
    are converted to apei_read()/apei_write() and existing calls within
    osl.c and the CA, based on the re-factoring that was done in an earlier
    patch series - http://marc.info/?l=linux-acpi&m=128769263327206&w=2:
    acpi_pre_map_gar() --> acpi_os_map_generic_address()
    acpi_post_unmap_gar() --> acpi_os_unmap_generic_address()
    acpi_atomic_read() --> apei_read()
    acpi_atomic_write() --> apei_write()

    Note that acpi_read() and acpi_write() currently use 'bit_width'
    for accessing GARs which seems incorrect. 'bit_width' is the size of
    the register, while 'access_width' is the size of the access the
    processor must generate on the bus. The 'access_width' may be larger,
    for example, if the hardware only supports 32-bit or 64-bit reads. I
    wanted to minimize any possible impacts with this patch series so I
    did *not* change this behavior.

    Signed-off-by: Myron Stowe
    Signed-off-by: Len Brown

    Myron Stowe
     
  • Some firmware will access memory in ACPI NVS region via APEI. That
    is, instructions in APEI ERST/EINJ table will read/write ACPI NVS
    region. The original resource conflict checking in APEI code will
    check memory/ioport accessed by APEI via general resource management
    mech. But ACPI NVS region is marked as busy already, so that the
    false resource conflict will prevent APEI ERST/EINJ to work.

    To fix this, this patch excludes ACPI NVS regions when APEI components
    request resources. So that they will not conflict with ACPI NVS
    regions.

    Reported-and-tested-by: Pavel Ivanov
    Signed-off-by: Huang Ying
    Signed-off-by: Len Brown

    Huang Ying
     
  • Current fix for resource conflict is to remove the address region from trigger resource, which is highly relies on valid user
    input. This patch is trying to avoid such potential issues by fetching the
    exact address region from trigger action table entry.

    Signed-off-by: Xiao, Hui
    Signed-off-by: Huang Ying
    Signed-off-by: Len Brown

    Xiao, Hui
     
  • Some APEI firmware implementation will access injected address
    specified in param1 to trigger the error when injecting memory error.
    This will cause resource conflict with RAM.

    On one of our testing machine, if injecting at memory address
    0x10000000, the following error will be reported in dmesg:

    APEI: Can not request iomem region for GARs.

    This patch removes the injecting memory address range from trigger
    table resources to avoid conflict.

    Signed-off-by: Huang Ying
    Tested-by: Tony Luck
    Signed-off-by: Len Brown

    Huang Ying
     
  • Because printk is not safe inside NMI handler, the recoverable error
    records received in NMI handler will be queued to be printked in a
    delayed IRQ context via irq_work. If a fatal error occurs after the
    recoverable error and before the irq_work processed, we lost a error
    report.

    To solve the issue, the queued error records are printked in NMI
    handler if system will go panic.

    Signed-off-by: Huang Ying
    Signed-off-by: Len Brown

    Huang Ying
     
  • In most cases, printk only guarantees messages from different printk
    calling will not be interleaved between each other. But, one APEI
    GHES hardware error report will involve multiple printk calling,
    normally each for one line. So it is possible that the hardware error
    report comes from different generic hardware error source will be
    interleaved.

    In this patch, a sequence number is prefixed to each line of error
    report. So that, even if they are interleaved, they still can be
    distinguished by the prefixed sequence number.

    Signed-off-by: Huang Ying
    Signed-off-by: Len Brown

    Huang Ying
     
  • Because APEI tables are optional, these message may confuse users, for
    example,

    https://bugs.launchpad.net/ubuntu/+source/linux/+bug/599715

    Reported-by: Bjorn Helgaas
    Signed-off-by: Huang Ying
    Signed-off-by: Len Brown

    Huang Ying
     
  • Use the normal %pR-like format for MMIO and I/O port ranges.

    Signed-off-by: Bjorn Helgaas
    Signed-off-by: Huang Ying
    Signed-off-by: Len Brown

    Bjorn Helgaas
     
  • aer_recover_queue() is called when recoverable PCIe AER errors are
    notified by firmware to do the recovery work.

    Signed-off-by: Huang Ying
    Signed-off-by: Len Brown

    Huang Ying
     

13 Jan, 2012

1 commit


18 Nov, 2011

2 commits

  • This allows a backend to filter on the dmesg reason as well as the pstore
    reason. When ramoops is switched to pstore, this is needed since it has
    no interest in storing non-crash dmesg details.

    Drop pstore_write() as it has no users, and handling the "reason" here
    has no obviously correct value.

    Signed-off-by: Kees Cook
    Signed-off-by: Tony Luck

    Kees Cook
     
  • The buf_lock cannot be held while populating the inodes, so make the backend
    pass forward an allocated and filled buffer instead. This solves the following
    backtrace. The effect is that "buf" is only ever used to notify the backends
    that something was written to it, and shouldn't be used in the read path.

    To replace the buf_lock during the read path, isolate the open/read/close
    loop with a separate mutex to maintain serialized access to the backend.

    Note that is is up to the pstore backend to cope if the (*write)() path is
    called in the middle of the read path.

    [ 59.691019] BUG: sleeping function called from invalid context at .../mm/slub.c:847
    [ 59.691019] in_atomic(): 0, irqs_disabled(): 1, pid: 1819, name: mount
    [ 59.691019] Pid: 1819, comm: mount Not tainted 3.0.8 #1
    [ 59.691019] Call Trace:
    [ 59.691019] [] __might_sleep+0xc3/0xca
    [ 59.691019] [] kmem_cache_alloc+0x32/0xf3
    [ 59.691019] [] ? __d_lookup_rcu+0x6f/0xf4
    [ 59.691019] [] alloc_inode+0x2a/0x64
    [ 59.691019] [] new_inode+0x18/0x43
    [ 59.691019] [] pstore_get_inode.isra.1+0x11/0x98
    [ 59.691019] [] pstore_mkfile+0xae/0x26f
    [ 59.691019] [] ? kmem_cache_free+0x19/0xb1
    [ 59.691019] [] ? ida_get_new_above+0x140/0x158
    [ 59.691019] [] ? __init_rwsem+0x1e/0x2c
    [ 59.691019] [] ? inode_init_always+0x111/0x1b0
    [ 59.691019] [] ? should_resched+0xd/0x27
    [ 59.691019] [] ? _cond_resched+0xd/0x21
    [ 59.691019] [] pstore_get_records+0x52/0xa7
    [ 59.691019] [] pstore_fill_super+0x7d/0x91
    [ 59.691019] [] mount_single+0x46/0x82
    [ 59.691019] [] pstore_mount+0x15/0x17
    [ 59.691019] [] ? pstore_get_inode.isra.1+0x98/0x98
    [ 59.691019] [] mount_fs+0x5a/0x12d
    [ 59.691019] [] ? alloc_vfsmnt+0xa4/0x14a
    [ 59.691019] [] vfs_kern_mount+0x4f/0x7d
    [ 59.691019] [] do_kern_mount+0x34/0xb2
    [ 59.691019] [] do_mount+0x5fc/0x64a
    [ 59.691019] [] ? strndup_user+0x2e/0x3f
    [ 59.691019] [] sys_mount+0x66/0x99
    [ 59.691019] [] sysenter_do_call+0x12/0x26

    Signed-off-by: Kees Cook
    Signed-off-by: Tony Luck

    Kees Cook
     

02 Nov, 2011

1 commit


26 Oct, 2011

1 commit

  • * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits)
    llist: Add back llist_add_batch() and llist_del_first() prototypes
    sched: Don't use tasklist_lock for debug prints
    sched: Warn on rt throttling
    sched: Unify the ->cpus_allowed mask copy
    sched: Wrap scheduler p->cpus_allowed access
    sched: Request for idle balance during nohz idle load balance
    sched: Use resched IPI to kick off the nohz idle balance
    sched: Fix idle_cpu()
    llist: Remove cpu_relax() usage in cmpxchg loops
    sched: Convert to struct llist
    llist: Add llist_next()
    irq_work: Use llist in the struct irq_work logic
    llist: Return whether list is empty before adding in llist_add()
    llist: Move cpu_relax() to after the cmpxchg()
    llist: Remove the platform-dependent NMI checks
    llist: Make some llist functions inline
    sched, tracing: Show PREEMPT_ACTIVE state in trace_sched_switch
    sched: Remove redundant test in check_preempt_tick()
    sched: Add documentation for bandwidth control
    sched: Return unused runtime on group dequeue
    ...

    Linus Torvalds
     

13 Oct, 2011

1 commit


10 Oct, 2011

1 commit

  • Just convert all the files that have an nmi handler to the new routines.
    Most of it is straight forward conversion. A couple of places needed some
    tweaking like kgdb which separates the debug notifier from the nmi handler
    and mce removes a call to notify_die.

    [Thanks to Ying for finding out the history behind that mce call

    https://lkml.org/lkml/2010/5/27/114

    And Boris responding that he would like to remove that call because of it

    https://lkml.org/lkml/2011/9/21/163]

    The things that get converted are the registeration/unregistration routines
    and the nmi handler itself has its args changed along with code removal
    to check which list it is on (most are on one NMI list except for kgdb
    which has both an NMI routine and an NMI Unknown routine).

    Signed-off-by: Don Zickus
    Signed-off-by: Peter Zijlstra
    Acked-by: Corey Minyard
    Cc: Jason Wessel
    Cc: Andi Kleen
    Cc: Robert Richter
    Cc: Huang Ying
    Cc: Corey Minyard
    Cc: Jack Steiner
    Link: http://lkml.kernel.org/r/1317409584-23662-4-git-send-email-dzickus@redhat.com
    Signed-off-by: Ingo Molnar

    Don Zickus
     

04 Oct, 2011

1 commit

  • Because llist code will be used in performance critical scheduler
    code path, make llist_add() and llist_del_all() inline to avoid
    function calling overhead and related 'glue' overhead.

    Signed-off-by: Huang Ying
    Acked-by: Mathieu Desnoyers
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1315461646-1379-2-git-send-email-ying.huang@intel.com
    Signed-off-by: Ingo Molnar

    Huang Ying
     

17 Aug, 2011

1 commit

  • pstore was using mutex locking to protect read/write access to the
    backend plug-ins. This causes problems when pstore is executed in
    an NMI context through panic() -> kmsg_dump().

    This patch changes the mutex to a spin_lock_irqsave then also checks to
    see if we are in an NMI context. If we are in an NMI and can't get the
    lock, just print a message stating that and blow by the locking.

    All this is probably a hack around the bigger locking problem but it
    solves my current situation of trying to sleep in an NMI context.

    Tested by loading the lkdtm module and executing a HARDLOCKUP which
    will cause the machine to panic inside the nmi handler.

    Signed-off-by: Don Zickus
    Acked-by: Matthew Garrett
    Signed-off-by: Tony Luck

    Don Zickus
     

12 Aug, 2011

2 commits

  • IRQ_WORK is used by GHES, but it is selected by PERF_EVENT.
    For now PERF_EVENT is selected by x86 by default, but
    in concept, IRQ_WORK should be selected by GHES, not by others.

    Signed-off-by: Chen Gong
    Signed-off-by: Len Brown

    Chen Gong
     
  • Bit 0 of the support parameter to the OSC call should be set in order to
    indicate that the OS supports the WHEA mechanism. Stuart Hayes tracked
    an APEI issue on some Dell platforms down to this.

    Reported-by: Stuart Hayes
    Signed-off-by: Matthew Garrett
    Signed-off-by: Len Brown

    Matthew Garrett
     

03 Aug, 2011

6 commits

  • Some trivial conflicts due to other various merges
    adding to the end of common lists sooner than this one.

    arch/ia64/Kconfig
    arch/powerpc/Kconfig
    arch/x86/Kconfig
    lib/Kconfig
    lib/Makefile

    Signed-off-by: Len Brown

    Len Brown
     
  • EINJ parameter support is only usable for some specific BIOS.
    Originally, it is expected to have no harm for BIOS does not support
    it. But now, we found it will cause issue (memory overwriting) for
    some BIOS. So param support is disabled by default and only enabled
    when newly added module parameter named "param_extension" is
    explicitly specified.

    Signed-off-by: Huang Ying
    Cc: Matthew Garrett
    Acked-by: Don Zickus
    Acked-by: Tony Luck
    Signed-off-by: Len Brown

    Huang Ying
     
  • drivers/acpi/apei/ghes.c:542: warning: integer overflow in expression
    drivers/acpi/apei/ghes.c:619: warning: integer overflow in expression

    ghes.c:(.text+0x46289): undefined reference to `__udivdi3'
      in function ghes_estatus_cache_add().

    Reported-by: Randy Dunlap
    Signed-off-by: Len Brown

    Len Brown
     
  • memory_failure_queue() is called when recoverable memory errors are
    notified by firmware to do the recovery work.

    Signed-off-by: Huang Ying
    Signed-off-by: Len Brown

    Huang Ying
     
  • printk is used by GHES to report hardware errors. Ratelimit is
    enforced on the printk to avoid too many hardware error reports in
    kernel log. Because there may be thousands or even millions of
    corrected hardware errors during system running.

    Currently, a simple scheme is used. That is, the total number of
    hardware error reporting is ratelimited. This may cause some issues
    in practice.

    For example, there are two kinds of hardware errors occurred in
    system. One is corrected memory error, because the fault memory
    address is accessed frequently, there may be hundreds error report
    per-second. The other is corrected PCIe AER error, it will be
    reported once per-second. Because they share one ratelimit control
    structure, it is highly possible that only memory error is reported.

    To avoid the above issue, an error record content based throttle
    algorithm is implemented in the patch. Where after the first
    successful reporting, all error records that are same are throttled for
    some time, to let other kinds of error records have the opportunity to
    be reported.

    In above example, the memory errors will be throttled for some time,
    after being printked. Then the PCIe AER error will be printked
    successfully.

    Signed-off-by: Huang Ying
    Signed-off-by: Len Brown

    Huang Ying
     
  • Some APEI GHES recoverable errors are reported via NMI, but printk is
    not safe in NMI context.

    To solve the issue, a lock-less memory allocator is used to allocate
    memory in NMI handler, save the error record into the allocated
    memory, put the error record into a lock-less list. On the other
    hand, an irq_work is used to delay the operation from NMI context to
    IRQ context. The irq_work IRQ handler will remove nodes from
    lock-less list, printk the error record and do some further processing
    include recovery operation, then free the memory.

    Signed-off-by: Huang Ying
    Signed-off-by: Len Brown

    Huang Ying
     

23 Jul, 2011

3 commits


14 Jul, 2011

8 commits

  • APEI firmware first mode must be turned on explicitly on some
    machines, otherwise there may be no GHES hardware error record for
    hardware error notification. APEI bit in generic _OSC call can be
    used to do that, but on some machine, a special WHEA _OSC call must be
    used. This patch adds the support to that WHEA _OSC call.

    Signed-off-by: Huang Ying
    Reviewed-by: Andi Kleen
    Reviewed-by: Matthew Garrett
    Signed-off-by: Len Brown

    Huang Ying
     
  • Some machine may have broken firmware so that GHES and firmware first
    mode should be disabled. This patch adds support to that.

    Signed-off-by: Huang Ying
    Reviewed-by: Andi Kleen
    Reviewed-by: Matthew Garrett
    Signed-off-by: Len Brown

    Huang Ying
     
  • GHES (Generic Hardware Error Source) is used to process hardware error
    notification in firmware first mode. But because firmware first mode
    can be turned on but can not be turned off, it is unreasonable to
    unload the GHES module with firmware first mode turned on. To avoid
    confusion, this patch makes GHES can be enabled/disabled in
    configuration time, but not built as module and unloaded at run time.

    Signed-off-by: Huang Ying
    Reviewed-by: Andi Kleen
    Reviewed-by: Matthew Garrett
    Signed-off-by: Len Brown

    Huang Ying
     
  • This patch changes APEI EINJ and ERST to use apei_exec_run for
    mandatory actions, and apei_exec_run_optional for optional actions.

    Cc: Thomas Renninger
    Signed-off-by: Huang Ying
    Signed-off-by: Len Brown

    Huang Ying
     
  • Some actions in APEI ERST and EINJ tables are optional, for example,
    ACPI_EINJ_BEGIN_OPERATION action is used to do some preparation for
    error injection, and firmware may choose to do nothing here. While
    some other actions are mandatory, for example, firmware must provide
    ACPI_EINJ_GET_ERROR_TYPE implementation.

    Original implementation treats all actions as optional (that is, can
    have no instructions), that may cause issue if firmware does not
    provide some mandatory actions. To fix this, this patch adds
    apei_exec_run_optional, which should be used for optional actions.
    The original apei_exec_run should be used for mandatory actions.

    Cc: Thomas Renninger
    Signed-off-by: Huang Ying
    Signed-off-by: Len Brown

    Huang Ying
     
  • printk is used by GHES to report hardware errors. Normally, the
    printk will be ratelimited to avoid too many hardware error reports in
    kernel log. Because there may be thousands or even millions of
    corrected hardware errors during system running.

    That is different for fatal hardware error, because system will go
    panic as soon as possible, there will be no more than several error
    records. And these error records are valuable for system fault
    diagnosis, so they should not be ratelimited.

    Signed-off-by: Huang Ying
    Signed-off-by: Len Brown

    Huang Ying
     
  • When we debug ERST table with erst-dbg, if the error record in ERST
    table is too long(>4K), it can't be read out. So this patch increases
    the buffer size to 16K to ensure such error records can be read from
    ERST table.

    Signed-off-by: Chen Gong
    Signed-off-by: Huang Ying
    Signed-off-by: Len Brown

    Chen Gong
     
  • erst_dbg module can not work when ERST is disabled. So disable module
    loading to provide clearer information to user.

    Signed-off-by: Huang Ying
    Signed-off-by: Len Brown

    Huang Ying