14 May, 2013

20 commits

  • Currently we only set the "to" address in the branch stack when the CPU
    explicitly gives us a value. Unfortunately it only does this for XL form
    branches (eg blr, bctr, bctar) and not I and B form branches (eg b, bc).

    Fortunately if we read the instruction from memory we can extract the offset of
    a branch and calculate the target address.

    This adds a function power_pmu_bhrb_to() to calculate the target/to address of
    the corresponding I and B form branches. It handles branches in both user and
    kernel spaces. It also plumbs this into the perf brhb reading code.

    Signed-off-by: Michael Neuling
    Signed-off-by: Benjamin Herrenschmidt

    Michael Neuling
     
  • The current Branch History Rolling Buffer (BHRB) code misinterprets the order
    of entries in the hardware buffer. It assumes that a branch target address
    will be read _after_ its corresponding branch. In reality the branch target
    comes before (lower mfbhrb entry) it's corresponding branch.

    This is a rewrite of the code to take this into account.

    Signed-off-by: Michael Neuling
    Signed-off-by: Benjamin Herrenschmidt

    Michael Neuling
     
  • The new Branch History Rolling buffer (BHRB) code is only useful on 64bit
    processors, so move it into the #ifdef CONFIG_PPC64 region.

    This avoids code bloat on 32bit systems.

    Signed-off-by: Michael Neuling
    Signed-off-by: Benjamin Herrenschmidt

    Michael Neuling
     
  • Start context tracking support from pSeries.

    Signed-off-by: Li Zhong
    Signed-off-by: Benjamin Herrenschmidt

    Li Zhong
     
  • This patch corresponds to
    [PATCH] x86: Use the new schedule_user API on userspace preemption
    commit 0430499ce9d78691f3985962021b16bf8f8a8048

    Signed-off-by: Li Zhong
    Signed-off-by: Benjamin Herrenschmidt

    Li Zhong
     
  • This patch allows RCU usage in do_notify_resume, e.g. signal handling.
    It corresponds to
    [PATCH] x86: Exit RCU extended QS on notify resume
    commit edf55fda35c7dc7f2d9241c3abaddaf759b457c6

    Signed-off-by: Li Zhong
    Signed-off-by: Benjamin Herrenschmidt

    Li Zhong
     
  • This is the exception hooks for context tracking subsystem, including
    data access, program check, single step, instruction breakpoint, machine check,
    alignment, fp unavailable, altivec assist, unknown exception, whose handlers
    might use RCU.

    This patch corresponds to
    [PATCH] x86: Exception hooks for userspace RCU extended QS
    commit 6ba3c97a38803883c2eee489505796cb0a727122

    But after the exception handling moved to generic code, and some changes in
    following two commits:
    56dd9470d7c8734f055da2a6bac553caf4a468eb
    context_tracking: Move exception handling to generic code
    6c1e0256fad84a843d915414e4b5973b7443d48d
    context_tracking: Restore correct previous context state on exception exit

    it is able for exception hooks to use the generic code above instead of a
    redundant arch implementation.

    Signed-off-by: Li Zhong
    Signed-off-by: Benjamin Herrenschmidt

    Li Zhong
     
  • This is the syscall slow path hooks for context tracking subsystem,
    corresponding to
    [PATCH] x86: Syscall hooks for userspace RCU extended QS
    commit bf5a3c13b939813d28ce26c01425054c740d6731

    TIF_MEMDIE is moved to the second 16-bits (with value 17), as it seems there
    is no asm code using it. TIF_NOHZ is added to _TIF_SYCALL_T_OR_A, so it is
    better for it to be in the same 16 bits with others in the group, so in the
    asm code, andi. with this group could work.

    Signed-off-by: Li Zhong
    Acked-by: Frederic Weisbecker
    Signed-off-by: Benjamin Herrenschmidt

    Li Zhong
     
  • MSR_DE is not cleared on entry to the kernel, and we don't clear it
    explicitly outside of debug code. If we have MSR_DE set in
    prime_debug_regs(), and the new thread has events enabled in DBCR0
    (e.g. ICMP is set in thread->dbsr0, even though it was cleared in the
    real DBCR0 when the thread got scheduled out), we'll end up taking a
    debug exception in the kernel when DBCR0 is loaded. DSRR0 will not
    point to an exception vector, and the kernel ends up hanging at
    kernel_dbg_exc. Fix this by always clearing MSR_DE when we load new
    debug state.

    Another observed source of kernel_dbg_exc hangs is with the branch
    taken event. If this event is active, but we take a non-debug trap
    (e.g. a TLB miss or an asynchronous interrupt) before the next branch.
    We end up taking a branch-taken debug exception on the initial branch
    instruction of the exception vector, but because the debug exception is
    DBSR_BT rather than DBSR_IC we branch to kernel_dbg_exc before even
    checking the DSRR0 address. Fix this by checking for DBSR_BT as well
    as DBSR_IC, which is what 32-bit does and what the comments suggest was
    intended in the 64-bit code as well.

    Signed-off-by: Scott Wood
    Signed-off-by: Benjamin Herrenschmidt

    Scott Wood
     
  • Signed-off-by: Alexander Gordeev
    Signed-off-by: Benjamin Herrenschmidt

    Alexander Gordeev
     
  • Some versions of GCC apparently expect this to be provided by libgcc.

    Updates from Mikey to fix 32 bit version and adding "r" to registers.

    Signed-off-by: David Woodhouse
    Signed-off-by: Michael Neuling
    Signed-off-by: Benjamin Herrenschmidt

    David Woodhouse
     
  • The current code fails to handle kexec on OPALv2. This fixes it
    and adds code to improve the situation on OPALv3 where we can
    query the CPU status from the firmware and decide what to do
    based on that.

    Signed-off-by: Benjamin Herrenschmidt

    Benjamin Herrenschmidt
     
  • Future firmwares will support that new version

    Signed-off-by: Benjamin Herrenschmidt

    Benjamin Herrenschmidt
     
  • Saw this warning again, and this time from the ret_from_fork path.

    It seems we could clear the back chain earlier in copy_thread(), which
    could cover both path, and also fix potential lockdep usage in
    schedule_tail(), or exception occurred before we clear the back chain.

    Signed-off-by: Li Zhong
    Signed-off-by: Benjamin Herrenschmidt

    Li Zhong
     
  • We are getting build errors with CONFIG_PROC_FS=n:

    arch/powerpc/kernel/rtas_flash.c
    In function 'rtas_flash_init':
    745:33: error: unused variable 'f' [-Werror=unused-variable]

    But rtas_flash.c should not be built when CONFIG_PROC_FS=n, beacause all
    it does is provide a /proc interface to the RTAS flash routines.

    CONFIG_RTAS_FLASH already depends on CONFIG_RTAS_PROC, to indicate that
    it depends on the RTAS proc support, but CONFIG_RTAS_PROC does not
    depend on CONFIG_PROC_FS. So fix that.

    Signed-off-by: Michael Ellerman
    Signed-off-by: Benjamin Herrenschmidt

    Michael Ellerman
     
  • This patch brings online all threads which are present but not online
    prior to migration/hibernation. After migration/hibernation those
    threads are taken back offline.

    During migration/hibernation all online CPUs must call H_JOIN, this is
    required by the hypervisor. Without this patch, threads that are offline
    (H_CEDE'd) will not be woken to make the H_JOIN call and the OS will be
    deadlocked (all threads either JOIN'd or CEDE'd).

    Cc:
    Signed-off-by: Robert Jennings
    Signed-off-by: Benjamin Herrenschmidt

    Robert Jennings
     
  • ibm,validate-flash-image RTAS call output buffer contains 150 - 200
    bytes of data on latest system. Presently we have output
    buffer size as 64 bytes and we use sprintf to copy data from
    RTAS buffer to local buffer. This causes kernel oops (see below
    call trace).

    This patch increases local buffer size to 256 and also uses
    snprintf instead of sprintf to copy data from RTAS buffer.

    Kernel call trace :
    -------------------
    Oops: Kernel access of bad area, sig: 11 [#1]
    SMP NR_CPUS=1024 NUMA pSeries
    Modules linked in: nfs fscache lockd auth_rpcgss nfs_acl sunrpc fuse loop dm_mod ipv6 ipv6_lib usb_storage ehea(X) sr_mod qlge ses cdrom enclosure st be2net sg ext3 jbd mbcache usbhid hid ohci_hcd ehci_hcd usbcore qla2xxx usb_common sd_mod crc_t10dif scsi_dh_hp_sw scsi_dh_rdac scsi_dh_alua scsi_dh_emc scsi_dh lpfc scsi_transport_fc scsi_tgt ipr(X) libata scsi_mod
    Supported: Yes
    NIP: 4520323031333130 LR: 4520323031333130 CTR: 0000000000000000
    REGS: c0000001b91779b0 TRAP: 0400 Tainted: G X (3.0.13-0.27-ppc64)
    MSR: 8000000040009032 CR: 44022488 XER: 20000018
    TASK = c0000001bca1aba0[4736] 'cat' THREAD: c0000001b9174000 CPU: 36
    GPR00: 4520323031333130 c0000001b9177c30 c000000000f87c98 000000000000009b
    GPR04: c0000001b9177c4a 000000000000000b 3520323031333130 2032303133313031
    GPR08: 3133313031350a4d 000000000000009b 0000000000000000 c0000000003664a4
    GPR12: 0000000022022448 c000000003ee6c00 0000000000000002 00000000100e8a90
    GPR16: 00000000100cb9d8 0000000010093370 000000001001d310 0000000000000000
    GPR20: 0000000000008000 00000000100fae60 000000000000005e 0000000000000000
    GPR24: 0000000010129350 46573738302e3030 2046573738302e30 300a4d4720323031
    GPR28: 333130313520554e 4b4e4f574e0a4d47 2032303133313031 3520323031333130
    NIP [4520323031333130] 0x4520323031333130
    LR [4520323031333130] 0x4520323031333130
    Call Trace:
    [c0000001b9177c30] [4520323031333130] 0x4520323031333130 (unreliable)
    Instruction dump:
    XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
    XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX

    Signed-off-by: Vasant Hegde
    Signed-off-by: Benjamin Herrenschmidt

    Vasant Hegde
     
  • commit b3f271e86e5a (powerpc: POWER7 optimised memcpy using VMX and
    enhanced prefetch) uses VMX when it is safe to do so (ie not in
    interrupt). It also looks at the task struct to decide if we have to
    save the current tasks' VMX state.

    kexec calls memcpy() at a point where the task struct may have been
    overwritten by the new kexec segments. If it has been overwritten
    then when memcpy -> enable_altivec looks up current->thread.regs->msr
    we get a cryptic oops or lockup.

    I also notice we aren't initialising thread_info->cpu, which means
    smp_processor_id is broken. Fix that too.

    Signed-off-by: Anton Blanchard
    Cc: # 3.6+
    Signed-off-by: Benjamin Herrenschmidt

    Anton Blanchard
     
  • Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Benjamin Herrenschmidt

    Aneesh Kumar K.V
     
  • Our pgtable are 2*sizeof(pte_t)*PTRS_PER_PTE which is PTE_FRAG_SIZE.
    Instead of depending on frag size, mask with PMD_MASKED_BITS.

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Benjamin Herrenschmidt

    Aneesh Kumar K.V
     

10 May, 2013

2 commits

  • lockdep.c has this:
    /*
    * So we're supposed to get called after you mask local IRQs,
    * but for some reason the hardware doesn't quite think you did
    * a proper job.
    */
    if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
    return;

    Since irqs_disabled() is based on soft_enabled(), that (not just the
    hard EE bit) needs to be 0 before we call trace_hardirqs_off.

    Signed-off-by: Scott Wood

    Scott Wood
     
  • We add a machine_shutdown hook that frees the OPAL interrupts
    (so they get masked at the source and don't fire while kexec'ing)
    and which triggers an IODA reset on all the PCIe host bridges
    which will have the effect of blocking all DMAs and subsequent
    PCIs interrupts.

    Signed-off-by: Benjamin Herrenschmidt

    Benjamin Herrenschmidt
     

08 May, 2013

2 commits

  • If the firmware returns an error such as "closed" (or hardware
    error), we should drop characters.

    Currently we only do that when a firmware compatible with OPAL v2
    APIs is detected, in the code that calls opal_console_write_buffer_space(),
    which didn't exist with OPAL v1 (or didn't work).

    However, when enabling early debug consoles, the flag indicating
    that v2 is supported isn't set yet, causing us, in case of errors
    or closed console, to spin forever.

    Signed-off-by: Benjamin Herrenschmidt

    Benjamin Herrenschmidt
     
  • This patch adds a new udbg early debug console which utilises
    statically defined input and output buffers stored within the kernel
    BSS. It is primarily designed to assist with bring up of new hardware
    which may not have a working console but which has a method of
    reading/writing kernel memory.

    This version incorporates comments made by Ben H (thanks!).

    Changes from v1:
    - Add memory barriers.
    - Ensure updating of read/write positions is atomic.

    Signed-off-by: Alistair Popple
    Signed-off-by: Benjamin Herrenschmidt

    Alistair Popple
     

07 May, 2013

1 commit

  • If hard_irq_disable() is called while interrupts are already soft-disabled
    (which is the most common case) all is already well.

    However you can (and in some cases want) to call it while everything is
    enabled (to make sure you don't get a lazy even, for example before entry
    into KVM guests) and in this case we need to inform the irq tracer that
    the irqs are going off.

    We have to change the inline into a macro to avoid an include circular
    dependency hell hole.

    Signed-off-by: Benjamin Herrenschmidt

    Benjamin Herrenschmidt
     

06 May, 2013

15 commits