28 Apr, 2014

13 commits

  • We have two copies of code that creates an OPAL sg list. Consolidate
    these into a common set of helpers and fix the endian issues.

    The flash interface embedded a version number in the num_entries
    field, whereas the dump interface did did not. Since versioning
    wasn't added to the flash interface and it is impossible to add
    this in a backwards compatible way, just remove it.

    Signed-off-by: Anton Blanchard
    Signed-off-by: Benjamin Herrenschmidt

    Anton Blanchard
     
  • Fix little endian issues with the OPAL error log code.

    Signed-off-by: Anton Blanchard
    Reviewed-by: Stewart Smith
    Signed-off-by: Benjamin Herrenschmidt

    Anton Blanchard
     
  • The bitmap in opal_poll_events and opal_handle_interrupt is
    big endian, so we need to byteswap it on little endian builds.

    Signed-off-by: Anton Blanchard
    Signed-off-by: Benjamin Herrenschmidt

    Anton Blanchard
     
  • Using size_t in our APIs is asking for trouble, especially
    when some OPAL calls use size_t pointers.

    Signed-off-by: Anton Blanchard
    Reviewed-by: Stewart Smith
    Signed-off-by: Benjamin Herrenschmidt

    Anton Blanchard
     
  • On PowerNV platform, we are holding an unnecessary refcount on a pci_dev, which
    leads to the pci_dev is not destroyed when hotplugging a pci device.

    This patch release the unnecessary refcount.

    Signed-off-by: Wei Yang
    Signed-off-by: Benjamin Herrenschmidt

    Wei Yang
     
  • During the EEH hotplug event, iommu_add_device() will be invoked three times
    and two of them will trigger warning or error.

    The three times to invoke the iommu_add_device() are:

    pci_device_add
    ...
    set_iommu_table_base_and_group kobj->sd is not initialized. The
    dev->kobj->sd is initialized in device_add().
    The third time's warning is triggered by the re-attach of the iommu_group.

    After applying this patch, the error

    iommu_tce: 0003:05:00.0 has not been added, ret=-14

    and the warning

    [ 204.123609] ------------[ cut here ]------------
    [ 204.123645] WARNING: at arch/powerpc/kernel/iommu.c:1125
    [ 204.123680] Modules linked in: xt_CHECKSUM nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6t_REJECT bnep bluetooth 6lowpan_iphc rfkill xt_conntrack ebtable_nat ebtable_broute bridge stp llc mlx4_ib ib_sa ib_mad ib_core ib_addr ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw bnx2x tg3 mlx4_core nfsd ptp mdio ses libcrc32c nfs_acl enclosure be2net pps_core shpchp lockd kvm uinput sunrpc binfmt_misc lpfc scsi_transport_fc ipr scsi_tgt
    [ 204.124356] CPU: 18 PID: 650 Comm: eehd Not tainted 3.14.0-rc5yw+ #102
    [ 204.124400] task: c0000027ed485670 ti: c0000027ed50c000 task.ti: c0000027ed50c000
    [ 204.124453] NIP: c00000000003cf80 LR: c00000000006c648 CTR: c00000000006c5c0
    [ 204.124506] REGS: c0000027ed50f440 TRAP: 0700 Not tainted (3.14.0-rc5yw+)
    [ 204.124558] MSR: 9000000000029032 CR: 88008084 XER: 20000000
    [ 204.124682] CFAR: c00000000006c644 SOFTE: 1
    GPR00: c00000000006c648 c0000027ed50f6c0 c000000001398380 c0000027ec260300
    GPR04: c0000027ea92c000 c00000000006ad00 c0000000016e41b0 0000000000000110
    GPR08: c0000000012cd4c0 0000000000000001 c0000027ec2602ff 0000000000000062
    GPR12: 0000000028008084 c00000000fdca200 c0000000000d1d90 c0000027ec281a80
    GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
    GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000001
    GPR24: 000000005342697b 0000000000002906 c000001fe6ac9800 c000001fe6ac9800
    GPR28: 0000000000000000 c0000000016e3a80 c0000027ea92c090 c0000027ea92c000
    [ 204.125353] NIP [c00000000003cf80] .iommu_add_device+0x30/0x1f0
    [ 204.125399] LR [c00000000006c648] .pnv_pci_ioda_dma_dev_setup+0x88/0xb0
    [ 204.125443] Call Trace:
    [ 204.125464] [c0000027ed50f6c0] [c0000027ed50f750] 0xc0000027ed50f750 (unreliable)
    [ 204.125526] [c0000027ed50f750] [c00000000006c648] .pnv_pci_ioda_dma_dev_setup+0x88/0xb0
    [ 204.125588] [c0000027ed50f7d0] [c000000000069cc8] .pnv_pci_dma_dev_setup+0x78/0x340
    [ 204.125650] [c0000027ed50f870] [c000000000044408] .pcibios_setup_device+0x88/0x2f0
    [ 204.125712] [c0000027ed50f940] [c000000000046040] .pcibios_setup_bus_devices+0x60/0xd0
    [ 204.125774] [c0000027ed50f9c0] [c000000000043acc] .pcibios_add_pci_devices+0xdc/0x1c0
    [ 204.125837] [c0000027ed50fa50] [c00000000086f970] .eeh_reset_device+0x36c/0x4f0
    [ 204.125939] [c0000027ed50fb20] [c00000000003a2d8] .eeh_handle_normal_event+0x448/0x480
    [ 204.126068] [c0000027ed50fbc0] [c00000000003a35c] .eeh_handle_event+0x4c/0x340
    [ 204.126192] [c0000027ed50fc80] [c00000000003a74c] .eeh_event_handler+0xfc/0x1b0
    [ 204.126319] [c0000027ed50fd30] [c0000000000d1ea0] .kthread+0x110/0x130
    [ 204.126430] [c0000027ed50fe30] [c00000000000a460] .ret_from_kernel_thread+0x5c/0x7c
    [ 204.126556] Instruction dump:
    [ 204.126610] 7c0802a6 fba1ffe8 fbc1fff0 fbe1fff8 f8010010 f821ff71 7c7e1b78 60000000
    [ 204.126787] 60000000 e87e0298 3143ffff 7d2a1910 2fa90000 40de00c8 ebfe0218
    [ 204.126966] ---[ end trace 6e7aefd80add2973 ]---

    are cleared.

    This patch removes iommu_add_device() in pnv_pci_ioda_dma_dev_setup(), which
    revert part of the change in commit d905c5df(PPC: POWERNV: move
    iommu_add_device earlier).

    Signed-off-by: Wei Yang
    Signed-off-by: Benjamin Herrenschmidt

    Wei Yang
     
  • With this patch I was able to update firmware on an LE kernel.

    Signed-off-by: Anton Blanchard
    Signed-off-by: Benjamin Herrenschmidt

    Anton Blanchard
     
  • We have a subtle race when sending CPUs back to OPAL on kexec.

    We mark them as "in real mode" right before we send them down. Once
    we've booted the new kernel, it might try to call opal_reinit_cpus()
    to change endianness, and that requires all CPUs to be spinning inside
    OPAL.

    However there is no synchronization here and we've observed cases
    where the returning CPUs hadn't established their new state inside
    OPAL before opal_reinit_cpus() is called, causing it to fail.

    The proper fix is to actually wait for them to go down all the way
    from the kexec'ing kernel.

    Signed-off-by: Benjamin Herrenschmidt

    Benjamin Herrenschmidt
     
  • The size of the sysparam sysfs files is determined from the device tree
    at boot. However the buffer is hard coded to 64 bytes. If we encounter a
    parameter that is larger than 64, or miss-parse the device tree, the
    buffer will overflow when reading or writing to the parameter.

    Check it at discovery time, and if the parameter is too large, do not
    create a sysfs entry for it.

    Signed-off-by: Joel Stanley
    Signed-off-by: Benjamin Herrenschmidt

    Joel Stanley
     
  • Signed-off-by: Benjamin Herrenschmidt

    Joel Stanley
     
  • The sysparam code currently uses the userspace supplied number of
    bytes when memcpy()ing in to a local 64-byte buffer.

    Limit the maximum number of bytes by the size of the buffer.

    Signed-off-by: Benjamin Herrenschmidt

    Joel Stanley
     
  • The OPAL calls are returning int64_t values, which the sysparam code
    stores in an int, and the sysfs callback returns ssize_t. Make code a
    easier to read by consistently using ssize_t.

    Signed-off-by: Joel Stanley
    Signed-off-by: Benjamin Herrenschmidt

    Joel Stanley
     
  • When a sysparam query in OPAL returned a negative value (error code),
    sysfs would spew out a decent chunk of memory; almost 64K more than
    expected. This was traced to a sign/unsigned mix up in the OPAL sysparam
    sysfs code at sys_param_show.

    The return value of sys_param_show is a ssize_t, calculated using

    return ret ? ret : attr->param_size;

    Alan Modra explains:

    "attr->param_size" is an unsigned int, "ret" an int, so the overall
    expression has type unsigned int. Result is that ret is cast to
    unsigned int before being cast to ssize_t.

    Instead of using the ternary operator, set ret to the param_size if an
    error is not detected. The same bug exists in the sysfs write callback;
    this patch fixes it in the same way.

    A note on debugging this next time: on my system gcc will warn about
    this if compiled with -Wsign-compare, which is not enabled by -Wall,
    only -Wextra.

    Signed-off-by: Joel Stanley
    Signed-off-by: Benjamin Herrenschmidt

    Joel Stanley
     

12 Apr, 2014

1 commit

  • Pull more ACPI and power management fixes and updates from Rafael Wysocki:
    "This is PM and ACPI material that has emerged over the last two weeks
    and one fix for a CPU hotplug regression introduced by the recent CPU
    hotplug notifiers registration series.

    Included are intel_idle and turbostat updates from Len Brown (these
    have been in linux-next for quite some time), a new cpufreq driver for
    powernv (that might spend some more time in linux-next, but BenH was
    asking me so nicely to push it for 3.15 that I couldn't resist), some
    cpufreq fixes and cleanups (including fixes for some silly breakage in
    a couple of cpufreq drivers introduced during the 3.14 cycle),
    assorted ACPI cleanups, wakeup framework documentation fixes, a new
    sysfs attribute for cpuidle and a new command line argument for power
    domains diagnostics.

    Specifics:

    - Fix for a recently introduced CPU hotplug regression in ARM KVM
    from Ming Lei.

    - Fixes for breakage in the at32ap, loongson2_cpufreq, and unicore32
    cpufreq drivers introduced during the 3.14 cycle (-stable material)
    from Chen Gang and Viresh Kumar.

    - New powernv cpufreq driver from Vaidyanathan Srinivasan, with bits
    from Gautham R Shenoy and Srivatsa S Bhat.

    - Exynos cpufreq driver fix preventing it from being included into
    multiplatform builds that aren't supported by it from Sachin Kamat.

    - cpufreq cleanups related to the usage of the driver_data field in
    struct cpufreq_frequency_table from Viresh Kumar.

    - cpufreq ppc driver cleanup from Sachin Kamat.

    - Intel BayTrail support for intel_idle and ACPI idle from Len Brown.

    - Intel CPU model 54 (Atom N2000 series) support for intel_idle from
    Jan Kiszka.

    - intel_idle fix for Intel Ivy Town residency targets from Len Brown.

    - turbostat updates (Intel Broadwell support and output cleanups)
    from Len Brown.

    - New cpuidle sysfs attribute for exporting C-states' target
    residency information to user space from Daniel Lezcano.

    - New kernel command line argument to prevent power domains enabled
    by the bootloader from being turned off even if they are not in use
    (for diagnostics purposes) from Tushar Behera.

    - Fixes for wakeup sysfs attributes documentation from Geert
    Uytterhoeven.

    - New ACPI video blacklist entry for ThinkPad Helix from Stephen
    Chandler Paul.

    - Assorted ACPI cleanups and a Kconfig help update from Jonghwan
    Choi, Zhihui Zhang, Hanjun Guo"

    * tag 'pm+acpi-3.15-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (28 commits)
    ACPI: Update the ACPI spec information in Kconfig
    arm, kvm: fix double lock on cpu_add_remove_lock
    cpuidle: sysfs: Export target residency information
    cpufreq: ppc: Remove duplicate inclusion of fsl_soc.h
    cpufreq: create another field .flags in cpufreq_frequency_table
    cpufreq: use kzalloc() to allocate memory for cpufreq_frequency_table
    cpufreq: don't print value of .driver_data from core
    cpufreq: ia64: don't set .driver_data to index
    cpufreq: powernv: Select CPUFreq related Kconfig options for powernv
    cpufreq: powernv: Use cpufreq_frequency_table.driver_data to store pstate ids
    cpufreq: powernv: cpufreq driver for powernv platform
    cpufreq: at32ap: don't declare local variable as static
    cpufreq: loongson2_cpufreq: don't declare local variable as static
    cpufreq: unicore32: fix typo issue for 'clk'
    cpufreq: exynos: Disable on multiplatform build
    PM / wakeup: Correct presence vs. emptiness of wakeup_* attributes
    PM / domains: Add pd_ignore_unused to keep power domains enabled
    ACPI / dock: Drop dock_device_ids[] table
    ACPI / video: Favor native backlight interface for ThinkPad Helix
    ACPI / thermal: Fix wrong variable usage in debug statement
    ...

    Linus Torvalds
     

09 Apr, 2014

5 commits


07 Apr, 2014

3 commits


03 Apr, 2014

2 commits

  • Pull powerpc non-virtualized cpuidle from Ben Herrenschmidt:
    "This is the branch I mentioned in my other pull request which contains
    our improved cpuidle support for the "powernv" platform
    (non-virtualized).

    It adds support for the "fast sleep" feature of the processor which
    provides higher power savings than our usual "nap" mode but at the
    cost of losing the timers while asleep, and thus exploits the new
    timer broadcast framework to work around that limitation.

    It's based on a tip timer tree that you seem to have already merged"

    * 'powernv-cpuidle' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
    cpuidle/powernv: Parse device tree to setup idle states
    cpuidle/powernv: Add "Fast-Sleep" CPU idle state
    powerpc/powernv: Add OPAL call to resync timebase on wakeup
    powerpc/powernv: Add context management for Fast Sleep
    powerpc: Split timer_interrupt() into timer handling and interrupt handling routines
    powerpc: Implement tick broadcast IPI as a fixed IPI message
    powerpc: Free up the slot of PPC_MSG_CALL_FUNC_SINGLE IPI message

    Linus Torvalds
     
  • Pull main powerpc updates from Ben Herrenschmidt:
    "This time around, the powerpc merges are going to be a little bit more
    complicated than usual.

    This is the main pull request with most of the work for this merge
    window. I will describe it a bit more further down.

    There is some additional cpuidle driver work, however I haven't
    included it in this tree as it depends on some work in tip/timer-core
    which Thomas accidentally forgot to put in a topic branch. Since I
    didn't want to carry all of that tip timer stuff in powerpc -next, I
    setup a separate branch on top of Thomas tree with just that cpuidle
    driver in it, and Stephen has been carrying that in next separately
    for a while now. I'll send a separate pull request for it.

    Additionally, two new pieces in this tree add users for a sysfs API
    that Tejun and Greg have been deprecating in drivers-core-next.
    Thankfully Greg reverted the patch that removes the old API so this
    merge can happen cleanly, but once merged, I will send a patch
    adjusting our new code to the new API so that Greg can send you the
    removal patch.

    Now as for the content of this branch, we have a lot of perf work for
    power8 new counters including support for our new "nest" counters
    (also called 24x7) under pHyp (not natively yet).

    We have new functionality when running under the OPAL firmware
    (non-virtualized or KVM host), such as access to the firmware error
    logs and service processor dumps, system parameters and sensors, along
    with a hwmon driver for the latter.

    There's also a bunch of bug fixes accross the board, some LE fixes,
    and a nice set of selftests for validating our various types of copy
    loops.

    On the Freescale side, we see mostly new chip/board revisions, some
    clock updates, better support for machine checks and debug exceptions,
    etc..."

    * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (70 commits)
    powerpc/book3s: Fix CFAR clobbering issue in machine check handler.
    powerpc/compat: 32-bit little endian machine name is ppcle, not ppc
    powerpc/le: Big endian arguments for ppc_rtas()
    powerpc: Use default set of netfilter modules (CONFIG_NETFILTER_ADVANCED=n)
    powerpc/defconfigs: Enable THP in pseries defconfig
    powerpc/mm: Make sure a local_irq_disable prevent a parallel THP split
    powerpc: Rate-limit users spamming kernel log buffer
    powerpc/perf: Fix handling of L3 events with bank == 1
    powerpc/perf/hv_{gpci, 24x7}: Add documentation of device attributes
    powerpc/perf: Add kconfig option for hypervisor provided counters
    powerpc/perf: Add support for the hv 24x7 interface
    powerpc/perf: Add support for the hv gpci (get performance counter info) interface
    powerpc/perf: Add macros for defining event fields & formats
    powerpc/perf: Add a shared interface to get gpci version and capabilities
    powerpc/perf: Add 24x7 interface headers
    powerpc/perf: Add hv_gpci interface header
    powerpc: Add hvcalls for 24x7 and gpci (Get Performance Counter Info)
    sysfs: create bin_attributes under the requested group
    powerpc/perf: Enable BHRB access for EBB events
    powerpc/perf: Add BHRB constraint and IFM MMCRA handling for EBB
    ...

    Linus Torvalds
     

24 Mar, 2014

3 commits


11 Mar, 2014

1 commit


07 Mar, 2014

3 commits

  • This enables support for userspace to fetch and initiate FSP and
    Platform dumps from the service processor (via firmware) through sysfs.

    Based on original patch from Vasant Hegde

    Flow:
    - We register for OPAL notification events.
    - OPAL sends new dump available notification.
    - We make information on dump available via sysfs
    - Userspace requests dump contents
    - We retrieve the dump via OPAL interface
    - User copies the dump data
    - userspace sends ack for dump
    - We send ACK to OPAL.

    sysfs files:
    - We add the /sys/firmware/opal/dump directory
    - echoing 1 (well, anything, but in future we may support
    different dump types) to /sys/firmware/opal/dump/initiate_dump
    will initiate a dump.
    - Each dump that we've been notified of gets a directory
    in /sys/firmware/opal/dump/ with a name of the dump type and ID (in hex,
    as this is what's used elsewhere to identify the dump).
    - Each dump has files: id, type, dump and acknowledge
    dump is binary and is the dump itself.
    echoing 'ack' to acknowledge (currently any string will do) will
    acknowledge the dump and it will soon after disappear from sysfs.

    OPAL APIs:
    - opal_dump_init()
    - opal_dump_info()
    - opal_dump_read()
    - opal_dump_ack()
    - opal_dump_resend_notification()

    Currently we are only ever notified for one dump at a time (until
    the user explicitly acks the current dump, then we get a notification
    of the next dump), but this kernel code should "just work" when OPAL
    starts notifying us of all the dumps present.

    Signed-off-by: Stewart Smith
    Signed-off-by: Benjamin Herrenschmidt

    Stewart Smith
     
  • Based on a patch by: Mahesh Salgaonkar

    This patch adds support to read error logs from OPAL and export
    them to userspace through a sysfs interface.

    We export each log entry as a directory in /sys/firmware/opal/elog/

    Currently, OPAL will buffer up to 128 error log records, we don't
    need to have any knowledge of this limit on the Linux side as that
    is actually largely transparent to us.

    Each error log entry has the following files: id, type, acknowledge, raw.
    Currently we just export the raw binary error log in the 'raw' attribute.
    In a future patch, we may parse more of the error log to make it a bit
    easier for userspace (e.g. to be able to display a brief summary in
    petitboot without having to have a full parser).

    If we have >128 logs from OPAL, we'll only be notified of 128 until
    userspace starts acknowledging them. This limitation may be lifted in
    the future and with this patch, that should "just work" from the linux side.

    A userspace daemon should:
    - wait for error log entries using normal mechanisms (we announce creation)
    - read error log entry
    - save error log entry safely to disk
    - acknowledge the error log entry
    - rinse, repeat.

    On the Linux side, we read the error log when we're notified of it. This
    possibly isn't ideal as it would be better to only read them on-demand.
    However, this doesn't really work with current OPAL interface, so we
    read the error log immediately when notified at the moment.

    I've tested this pretty extensively and am rather confident that the
    linux side of things works rather well. There is currently an issue with
    the service processor side of things for >128 error logs though.

    Signed-off-by: Stewart Smith
    Signed-off-by: Benjamin Herrenschmidt

    Stewart Smith
     
  • Detect and recover from machine check when inside opal on a special
    scom load instructions. On specific SCOM read via MMIO we may get a machine
    check exception with SRR0 pointing inside opal. To recover from MC
    in this scenario, get a recovery instruction address and return to it from
    MC.

    OPAL will export the machine check recoverable ranges through
    device tree node mcheck-recoverable-ranges under ibm,opal:

    # hexdump /proc/device-tree/ibm,opal/mcheck-recoverable-ranges
    0000000 0000 0000 3000 2804 0000 000c 0000 0000
    0000010 3000 2814 0000 0000 3000 27f0 0000 000c
    0000020 0000 0000 3000 2814 xxxx xxxx xxxx xxxx
    0000030 llll llll yyyy yyyy yyyy yyyy
    ...
    ...
    #

    where:
    xxxx xxxx xxxx xxxx = Starting instruction address
    llll llll = Length of the address range.
    yyyy yyyy yyyy yyyy = recovery address

    Each recoverable address range entry is (start address, len,
    recovery address), 2 cells each for start and recovery address, 1 cell for
    len, totalling 5 cells per entry. During kernel boot time, build up the
    recovery table with the list of recovery ranges from device-tree node which
    will be used during machine check exception to recover from MMIO SCOM UE.

    Signed-off-by: Mahesh Salgaonkar
    Signed-off-by: Benjamin Herrenschmidt

    Mahesh Salgaonkar
     

05 Mar, 2014

1 commit


28 Feb, 2014

3 commits

  • We need to unmangle the full address, not just the register
    number, and we also need to support the real indirect bit
    being set for in-kernel uses.

    Signed-off-by: Benjamin Herrenschmidt
    CC: [v3.13]

    Benjamin Herrenschmidt
     
  • As Ben suggested, the patch prints PHB diag-data with multiple
    fields in one line and omits the line if the fields of that
    line are all zero.

    With the patch applied, the PHB3 diag-data dump looks like:

    PHB3 PHB#3 Diag-data (Version: 1)

    brdgCtl: 00000002
    RootSts: 0000000f 00400000 b0830008 00100147 00002000
    nFir: 0000000000000000 0030006e00000000 0000000000000000
    PhbSts: 0000001c00000000 0000000000000000
    Lem: 0000000000100000 42498e327f502eae 0000000000000000
    InAErr: 8000000000000000 8000000000000000 0402030000000000 0000000000000000
    PE[ 8] A/B: 8480002b00000000 8000000000000000

    [ The current diag data is so big that it overflows the printk
    buffer pretty quickly in cases when we get a handful of errors
    at once which can happen. --BenH
    ]

    Signed-off-by: Gavin Shan
    CC:
    Signed-off-by: Benjamin Herrenschmidt

    Gavin Shan
     
  • The PHB diag-data is important to help locating the root cause for
    EEH errors such as frozen PE or fenced PHB. However, the EEH core
    enables IO path by clearing part of HW registers before collecting
    this data causing it to be corrupted.

    This patch fixes this by dumping the PHB diag-data immediately when
    frozen/fenced state on PE or PHB is detected for the first time in
    eeh_ops::get_state() or next_error() backend.

    Signed-off-by: Gavin Shan
    CC:
    Signed-off-by: Benjamin Herrenschmidt

    Gavin Shan
     

23 Feb, 2014

1 commit

  • The core idle loop now takes care of it. We need to add the runlatch
    function calls to the idle routines which was earlier taken care of by
    the arch specific idle routine.

    Signed-off-by: Nicolas Pitre
    Signed-off-by: Preeti U Murthy
    Reviewed-by: Deepthi Dharwar
    Signed-off-by: Peter Zijlstra
    Cc: Paul Burton
    Cc: "Rafael J. Wysocki"
    Cc: Daniel Lezcano
    Cc: linux-pm@vger.kernel.org
    Cc: linaro-kernel@lists.linaro.org
    Link: http://lkml.kernel.org/n/tip-nr4mtbkkzf2oomaj85m24o7c@git.kernel.org
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Nicolas Pitre
     

17 Feb, 2014

3 commits

  • We possiblly detect EEH errors during reboot, particularly in kexec
    path, but it's impossible for device drivers and EEH core to handle
    or recover them properly.

    The patch registers one reboot notifier for EEH and disable EEH
    subsystem during reboot. That means the EEH errors is going to be
    cleared by hardware reset or second kernel during early stage of
    PCI probe.

    Signed-off-by: Gavin Shan
    Signed-off-by: Benjamin Herrenschmidt

    Gavin Shan
     
  • The patch cleans up variable eeh_subsystem_enabled so that we needn't
    refer the variable directly from external. Instead, we will use
    function eeh_enabled() and eeh_set_enable() to operate the variable.

    Signed-off-by: Gavin Shan
    Signed-off-by: Benjamin Herrenschmidt

    Gavin Shan
     
  • When doing reset in order to recover the affected PE, we issue
    hot reset on PE primary bus if it's not root bus. Otherwise, we
    issue hot or fundamental reset on root port or PHB accordingly.
    For the later case, we didn't cover the situation where PE only
    includes root port and it potentially causes kernel crash upon
    EEH error to the PE.

    The patch reworks the logic of EEH reset to improve the code
    readability and also avoid the kernel crash.

    Cc: stable@vger.kernel.org
    Reported-by: Thadeu Lima de Souza Cascardo
    Signed-off-by: Gavin Shan
    Signed-off-by: Benjamin Herrenschmidt

    Gavin Shan
     

11 Feb, 2014

1 commit

  • This patch adds the support for to create a direct iommu "bypass"
    window on IODA2 bridges (such as Power8) allowing to bypass iommu
    page translation completely for 64-bit DMA capable devices, thus
    significantly improving DMA performances.

    Additionally, this adds a hook to the struct iommu_table so that
    the IOMMU API / VFIO can disable the bypass when external ownership
    is requested, since in that case, the device will be used by an
    environment such as userspace or a KVM guest which must not be
    allowed to bypass translations.

    Signed-off-by: Benjamin Herrenschmidt

    Benjamin Herrenschmidt