Eric Lee / smarc-fsl-linux-kernel

16 Aug, 2014

2 commits

49899007b Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux ... Browse Code »

Pull idle update from Len Brown:
"Two Intel-platform-specific updates to intel_idle, and a cosmetic
tweak to the turbostat utility"

* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
tools/power turbostat: tweak whitespace in output format
intel_idle: Broadwell support
intel_idle: Disable Baytrail Core and Module C6 auto-demotion

Linus Torvalds
2014-08-16 23:25:34 +0800
8c058d53f intel_idle: Disable Baytrail Core and Module C6 auto-demotion ... Browse Code »

Power efficiency improves on Baytrail (Intel Atom Processor E3000)
when Linux disables C6 auto-demotion.

Based on work by Srinidhi Kasagar .

Signed-off-by: Len Brown
Cc: x86@kernel.org

Len Brown
2014-08-16 05:06:14 +0800

15 Aug, 2014

4 commits

a11c5c9ef Merge tag 'pci-v3.17-changes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci ... Browse Code »

Pull DEFINE_PCI_DEVICE_TABLE removal from Bjorn Helgaas:
"Part two of the PCI changes for v3.17:

- Remove DEFINE_PCI_DEVICE_TABLE macro use (Benoit Taine)

It's a mechanical change that removes uses of the
DEFINE_PCI_DEVICE_TABLE macro. I waited until later in the merge
window to reduce conflicts, but it's possible you'll still see a few"

* tag 'pci-v3.17-changes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI: Remove DEFINE_PCI_DEVICE_TABLE macro use

Linus Torvalds
2014-08-15 08:10:33 +0800
179c0ac67 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc ... Browse Code »

Pull Sparc fixes from David Miller:
"Hook up the memfd syscall, and properly claim all PCI resources
discovered when building the PCI device tree"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc: Hook up memfd_create system call.
sparc64: Properly claim resources as each PCI bus is probed.
sparc64: Skip bogus PCI bridge ranges.
sparc64: Expand PCI bridge probing debug logging.

Linus Torvalds
2014-08-15 07:28:08 +0800
3b7b3e6ec Merge branch 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild ... Browse Code »

Pull kbuild updates from Michal Marek:
- make clean also considers $(extra-m) and $(extra-) to be consistent
- cleanup and fixes in scripts/Makefile.host
- allow to override the name of the Python 2 executable with make
PYTHON=... (only needed for ia64 in practice)
- option to split debugingo into *.dwo files to save disk space if the
compiler supports it (CONFIG_DEBUG_INFO_SPLIT)
- option to use dwarf4 debuginfo if the compiler supports it
(CONFIG_DEBUG_INFO_DWARF4)
- fix for disabling certain warnings with clang
- fix for unneeded rebuild with dash when a command contains
backslashes

* 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
kbuild: Fix handling of backslashes in *.cmd files
kbuild, LLVMLinux: Supress warnings unless W=1-3
Kbuild: Add a option to enable dwarf4 v2
kbuild: Support split debug info v4
kbuild: allow to override Python command name
kbuild: clean-up and bug fix of scripts/Makefile.host
kbuild: clean up scripts/Makefile.host
kbuild: drop shared library support from Makefile.host
kbuild: fix a bug of C++ host program handling
kbuild: fix a typo in scripts/Makefile.host
scripts/Makefile.clean: clean also $(extra-m) and $(extra-)

Linus Torvalds
2014-08-15 01:12:46 +0800
1d508f8ac Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc ... Browse Code »

Pull more powerpc updates from Ben Herrenschmidt:
"Here are some more powerpc bits for 3.17, essentially fixes.

The biggest series, also aimed at -stable, is from Aneesh and is the
result of weeks and weeks of debugging to find out why the heck or THP
implementation was occasionally triggering multi-hit errors in our
level 1 TLB. It ended up being a combination of issues including
subtleties as to how we should invalidate those special 'MPSS' pages
we use to allow the use of 16M pages inside 4K/64K "base page size"
segments (you really have to love our MMU !)

Another interesting one in the "OMG" category is the series from
Michael adding memory barriers to spin_is_locked(). That's also the
result of many days of debugging to figure out why the semaphore code
would occasionally crash in ways that made no sense. It ended up
being some creative lock stacking that was defeated by the fact that
our locks allow a load inside the locked section to be re-ordered with
the load of the lock value itself (I'm still of two mind about whether
to kill that once and for all by putting a heavier barrier back into
our lock implementation...). The fixes come with a long explanation
in the cset comments, feel free to read it if you feel like having a
headache today"

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (25 commits)
powerpc/thp: Add tracepoints to track hugepage invalidate
powerpc/mm: Use read barrier when creating real_pte
powerpc/thp: Use ACCESS_ONCE when loading pmdp
powerpc/thp: Invalidate with vpn in loop
powerpc/thp: Handle combo pages in invalidate
powerpc/thp: Invalidate old 64K based hash page mapping before insert of 4k pte
powerpc/thp: Don't recompute vsid and ssize in loop on invalidate
powerpc/thp: Add write barrier after updating the valid bit
powerpc: reorder per-cpu NUMA information's initialization
powerpc/perf/hv-24x7: Use kmem_cache_free
powerpc/pseries/hvcserver: Fix endian issue in hvcs_get_partner_info
powerpc: Hard disable interrupts in xmon
powerpc: remove duplicate definition of TEXASR_FS
powerpc/pseries: Avoid deadlock on removing ddw
powerpc/pseries: Failure on removing device node
powerpc/boot: Use correct zlib types for comparison
powerpc/powernv: Interface to register/unregister opal dump region
printk: Add function to return log buffer address and size
powerpc: Add POWER8 features to CPU_FTRS_POSSIBLE/ALWAYS
powerpc/ppc476: Disable BTAC
...

Linus Torvalds
2014-08-15 00:14:07 +0800

14 Aug, 2014

12 commits

ae36e95cf Merge tag 'devicetree-for-linus' of git://git.secretlab.ca/git/linux ... Browse Code »

Pull device tree updates from Grant Likely:
"The branch contains the following device tree changes the v3.17 merge
window:

Group changes to the device tree. In preparation for adding device
tree overlay support, OF_DYNAMIC is reworked so that a set of device
tree changes can be prepared and applied to the tree all at once.
OF_RECONFIG notifiers see the most significant change here so that
users always get a consistent view of the tree. Notifiers generation
is moved from before a change to after it, and notifiers for a group
of changes are emitted after the entire block of changes have been
applied

Automatic console selection from DT. Console drivers can now use
of_console_check() to see if the device node is specified as a console
device. If so then it gets added as a preferred console. UART
devices get this support automatically when uart_add_one_port() is
called.

DT unit tests no longer depend on pre-loaded data in the device tree.
Data is loaded dynamically at the start of unit tests, and then
unloaded again when the tests have completed.

Also contains a few bugfixes for reserved regions and early memory
setup"

* tag 'devicetree-for-linus' of git://git.secretlab.ca/git/linux: (21 commits)
of: Fixing OF Selftest build error
drivers: of: add automated assignment of reserved regions to client devices
of: Use proper types for checking memory overflow
of: typo fix in __of_prop_dup()
Adding selftest testdata dynamically into live tree
of: Add todo tasklist for Devicetree
of: Transactional DT support.
of: Reorder device tree changes and notifiers
of: Move dynamic node fixups out of powerpc and into common code
of: Make sure attached nodes don't carry along extra children
of: Make devicetree sysfs update functions consistent.
of: Create unlocked versions of node and property add/remove functions
OF: Utility helper functions for dynamic nodes
of: Move CONFIG_OF_DYNAMIC code into a separate file
of: rename of_aliases_mutex to just of_mutex
of/platform: Fix of_platform_device_destroy iteration of devices
of: Migrate of_find_node_by_name() users to for_each_node_by_name()
tty: Update hypervisor tty drivers to use core stdout parsing code.
arm/versatile: Add the uart as the stdout device.
of: Enable console on serial ports specified by /chosen/stdout-path
...

Linus Torvalds
2014-08-14 23:53:39 +0800
4dacb91c7 Merge tag 'stable/for-linus-3.17-b-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip ... Browse Code »

Pull Xen bugfixes from David Vrabel:
- fix ARM build
- fix boot crash with PVH guests
- improve reliability of resume/migration

* tag 'stable/for-linus-3.17-b-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
x86/xen: use vmap() to map grant table pages in PVH guests
x86/xen: resume timer irqs early
arm/xen: remove duplicate arch_gnttab_init() function

Linus Torvalds
2014-08-14 23:40:44 +0800
10cf15e1d sparc: Hook up memfd_create system call. ... Browse Code »

Signed-off-by: David S. Miller

David S. Miller
2014-08-14 13:00:09 +0800
f1d25d37d sparc64: Properly claim resources as each PCI bus is probed. ... Browse Code »

Perform a pci_claim_resource() on all valid resources discovered
during the OF device tree scan.

Based almost entirely upon the PCI OF bus probing code which does
the same thing there.

Signed-off-by: David S. Miller

David S. Miller
2014-08-14 12:17:49 +0800
4afba24e5 sparc64: Skip bogus PCI bridge ranges. ... Browse Code »

It seems that when a PCI Express bridge is not in use and has no devices
behind it, the ranges property is bogus. Specifically the size property
is of the form [0xffffffff:...], and if you add this size to the resource
start address the 64-bit calculation will overflow.

Just check specifically for this size value signature and skip them.

Signed-off-by: David S. Miller

David S. Miller
2014-08-14 12:17:49 +0800
93a6423bd sparc64: Expand PCI bridge probing debug logging. ... Browse Code »

Dump the various aspects of the PCI bridge probed at boot time, most
importantly the bridge number ranges, and the ranges property.

This helps diagnose PCI resource issues and other problems by giving
ofpci_debug=1 on the boot command line.

Signed-off-by: David S. Miller

David S. Miller
2014-08-14 12:17:49 +0800
f0094b28f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Pull networking fixes from David Miller:
"Several networking final fixes and tidies for the merge window:

1) Changes during the merge window unintentionally took away the
ability to build bluetooth modular, fix from Geert Uytterhoeven.

2) Several phy_node reference count bug fixes from Uwe Kleine-König.

3) Fix ucc_geth build failures, also from Uwe Kleine-König.

4) Fix klog false positivies when netlink messages go to network
taps, by properly resetting the network header. Fix from Daniel
Borkmann.

5) Sizing estimate of VF netlink messages is too small, from Jiri
Benc.

6) New APM X-Gene SoC ethernet driver, from Iyappan Subramanian.

7) VLAN untagging is erroneously dependent upon whether the VLAN
module is loaded or not, but there are generic dependencies that
matter wrt what can be expected as the SKB enters the stack.
Make the basic untagging generic code, and do it unconditionally.
From Vlad Yasevich.

8) xen-netfront only has so many slots in it's transmit queue so
linearize packets that have too many frags. From Zoltan Kiss.

9) Fix suspend/resume PHY handling in bcmgenet driver, from Florian
Fainelli"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (55 commits)
net: bcmgenet: correctly resume adapter from Wake-on-LAN
net: bcmgenet: update UMAC_CMD only when link is detected
net: bcmgenet: correctly suspend and resume PHY device
net: bcmgenet: request and enable main clock earlier
net: ethernet: myricom: myri10ge: myri10ge.c: Cleaning up missing null-terminate after strncpy call
xen-netfront: Fix handling packets on compound pages with skb_linearize
net: fec: Support phys probed from devicetree and fixed-link
smsc: replace WARN_ON() with WARN_ON_SMP()
xen-netback: Don't deschedule NAPI when carrier off
net: ethernet: qlogic: qlcnic: Remove duplicate object file from Makefile
wan: wanxl: Remove typedefs from struct names
m68k/atari: EtherNEC - ethernet support (ne)
net: ethernet: ti: cpmac.c: Cleaning up missing null-terminate after strncpy call
hdlc: Remove typedefs from struct names
airo_cs: Remove typedef local_info_t
atmel: Remove typedef atmel_priv_ioctl
com20020_cs: Remove typedef com20020_dev_t
ethernet: amd: Remove typedef local_info_t
net: Always untag vlan-tagged traffic on input.
drivers: net: Add APM X-Gene SoC ethernet driver support.
...

Linus Torvalds
2014-08-14 08:27:40 +0800
13b102bf4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc ... Browse Code »

Pull Sparc fixes from David Miller:
"Sparc bug fixes, one of which was preventing successful SMP boots with
mainline"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc64: Fix pcr_ops initialization and usage bugs.
sparc64: Do not disable interrupts in nmi_cpu_busy()
sparc: Hook up seccomp and getrandom system calls.
sparc: fix decimal printf format specifiers prefixed with 0x

Linus Torvalds
2014-08-14 08:26:50 +0800
81c02a21b Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull x86/apic updates from Thomas Gleixner:
"This is a major overhaul to the x86 apic subsystem consisting of the
following parts:

- Remove obsolete APIC driver abstractions (David Rientjes)

- Use the irqdomain facilities to dynamically allocate IRQs for
IOAPICs. This is a prerequisite to enable IOAPIC hotplug support,
and it also frees up wasted vectors (Jiang Liu)

- Misc fixlets.

Despite the hickup in Ingos previous pull request - caused by the
missing fixup for the suspend/resume issue reported by Borislav - I
strongly recommend that this update finds its way into 3.17. Some
history for you:

This is preparatory work for physical IOAPIC hotplug. The first
attempt to support this was done by Yinghai and I shot it down because
it just added another layer of obscurity and complexity to the already
existing mess without tackling the underlying shortcomings of the
current implementation.

After quite some on- and offlist discussions, I requested that the
design of this functionality must use generic infrastructure, i.e.
irq domains, which provide all the mechanisms to dynamically map linux
interrupt numbers to physical interrupts.

Jiang picked up the idea and did a great job of consolidating the
existing interfaces to manage the x86 (IOAPIC) interrupt system by
utilizing irq domains.

The testing in tip, Linux-next and inside of Intel on various machines
did not unearth any oddities until Borislav exposed it to one of his
oddball machines. The issue was resolved quickly, but unfortunately
the fix fell through the cracks and did not hit the tip tree before
Ingo sent the pull request. Not entirely Ingos fault, I also assumed
that the fix was already merged when Ingo asked me whether he could
send it.

Nevertheless this work has a proper design, has undergone several
rounds of review and the final fallout after applying it to tip and
integrating it into Linux-next has been more than moderate. It's the
ground work not only for IOAPIC hotplug, it will also allow us to move
the lowlevel vector allocation into the irqdomain hierarchy, which
will benefit other architectures as well. Patches are posted already,
but they are on hold for two weeks, see below.

I really appreciate the competence and responsiveness Jiang has shown
in course of this endavour. So I'm sure that any fallout of this will
be addressed in a timely manner.

FYI, I'm vanishing for 2 weeks into my annual kids summer camp kitchen
duty^Wvacation, while you folks are drooling at KS/LinuxCon :) But HPA
will have a look at the hopefully zero fallout until I'm back"

* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits)
x86, irq, PCI: Keep IRQ assignment for PCI devices during suspend/hibernation
x86/apic/vsmp: Make is_vsmp_box() static
x86, apic: Remove enable_apic_mode callback
x86, apic: Remove setup_portio_remap callback
x86, apic: Remove multi_timer_check callback
x86, apic: Replace noop_check_apicid_used
x86, apic: Remove check_apicid_present callback
x86, apic: Remove mps_oem_check callback
x86, apic: Remove smp_callin_clear_local_apic callback
x86, apic: Replace trampoline physical addresses with defaults
x86, apic: Remove x86_32_numa_cpu_node callback
x86: intel-mid: Use the new io_apic interfaces
x86, vsmp: Remove is_vsmp_box() from apic_is_clustered_box()
x86, irq: Clean up irqdomain transition code
x86, irq, devicetree: Release IOAPIC pin when PCI device is disabled
x86, irq, SFI: Release IOAPIC pin when PCI device is disabled
x86, irq, mpparse: Release IOAPIC pin when PCI device is disabled
x86, irq, ACPI: Release IOAPIC pin when PCI device is disabled
x86, irq: Introduce helper functions to release IOAPIC pin
x86, irq: Simplify the way to handle ISA IRQ
...

Linus Torvalds
2014-08-14 08:23:32 +0800
d27c0d901 Merge branch 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull x86/efix fixes from Peter Anvin:
"Two EFI-related Kconfig changes, which happen to touch immediately
adjacent lines in Kconfig and thus collapse to a single patch"

* 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/efi: Enforce CONFIG_RELOCATABLE for EFI boot stub
x86/efi: Fix 3DNow optimization build failure in EFI stub

Linus Torvalds
2014-08-14 08:21:35 +0800
7453f33b2 Merge branch 'x86-xsave-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull x86/xsave changes from Peter Anvin:
"This is a patchset to support the XSAVES instruction required to
support context switch of supervisor-only features in upcoming
silicon.

This patchset missed the 3.16 merge window, which is why it is based
on 3.15-rc7"

* 'x86-xsave-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, xsave: Add forgotten inline annotation
x86/xsaves: Clean up code in xstate offsets computation in xsave area
x86/xsave: Make it clear that the XSAVE macros use (%edi)/(%rdi)
Define kernel API to get address of each state in xsave area
x86/xsaves: Enable xsaves/xrstors
x86/xsaves: Call booting time xsaves and xrstors in setup_init_fpu_buf
x86/xsaves: Save xstate to task's xsave area in __save_fpu during booting time
x86/xsaves: Add xsaves and xrstors support for booting time
x86/xsaves: Clear reserved bits in xsave header
x86/xsaves: Use xsave/xrstor for saving and restoring user space context
x86/xsaves: Use xsaves/xrstors for context switch
x86/xsaves: Use xsaves/xrstors to save and restore xsave area
x86/xsaves: Define a macro for handling xsave/xrstor instruction fault
x86/xsaves: Define macros for xsave instructions
x86/xsaves: Change compacted format xsave area header
x86/alternative: Add alternative_input_2 to support alternative with two features and input
x86/xsaves: Add a kernel parameter noxsaves to disable xsaves/xrstors

Linus Torvalds
2014-08-14 08:20:04 +0800
fd1cf9058 Merge tag 'metag-for-v3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag ... Browse Code »

Pull metag architecture updates from James Hogan:
"Just a couple of minor static analysis fixes, removal of a NULL check
that should never happen, and fix an error check where an unsigned
value was being checked to see if it was negative"

* tag 'metag-for-v3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag:
metag: cachepart: Fix failure check
metag: hugetlbpage: Remove null pointer checks that could never happen

Linus Torvalds
2014-08-14 08:18:09 +0800

13 Aug, 2014

22 commits

9e813308a powerpc/thp: Add tracepoints to track hugepage invalidate ... Browse Code »

Add tracepoint to track hugepage invalidate. This help us
in debugging difficult to track bugs.

Signed-off-by: Aneesh Kumar K.V
Signed-off-by: Benjamin Herrenschmidt

Aneesh Kumar K.V
2014-08-13 16:20:42 +0800
85c1fafd7 powerpc/mm: Use read barrier when creating real_pte ... Browse Code »

On ppc64 we support 4K hash pte with 64K page size. That requires
us to track the hash pte slot information on a per 4k basis. We do that
by storing the slot details in the second half of pte page. The pte bit
_PAGE_COMBO is used to indicate whether the second half need to be
looked while building real_pte. We need to use read memory barrier while
doing that so that load of hidx is not reordered w.r.t _PAGE_COMBO
check. On the store side we already do a lwsync in __hash_page_4K

CC:
Signed-off-by: Aneesh Kumar K.V
Signed-off-by: Benjamin Herrenschmidt

Aneesh Kumar K.V
2014-08-13 16:20:41 +0800
7e467245b powerpc/thp: Use ACCESS_ONCE when loading pmdp ... Browse Code »

We would get wrong results in compiler recomputed old_pmd. Avoid
that by using ACCESS_ONCE

CC:
Signed-off-by: Aneesh Kumar K.V
Signed-off-by: Benjamin Herrenschmidt

Aneesh Kumar K.V
2014-08-13 16:20:41 +0800
969b7b208 powerpc/thp: Invalidate with vpn in loop ... Browse Code »

As per ISA, for 4k base page size we compare 14..65 bits of VA specified
with the entry_VA in tlb. That implies we need to make sure we do a
tlbie with all the possible 4k va we used to access the 16MB hugepage.
With 64k base page size we compare 14..57 bits of VA. Hence we cannot
ignore the lower 24 bits of va while tlbie .We also cannot tlb
invalidate a 16MB entry with just one tlbie instruction because
we don't track which va was used to instantiate the tlb entry.

CC:
Signed-off-by: Aneesh Kumar K.V
Signed-off-by: Benjamin Herrenschmidt

Aneesh Kumar K.V
2014-08-13 16:20:40 +0800
fc0479557 powerpc/thp: Handle combo pages in invalidate ... Browse Code »

If we changed base page size of the segment, either via sub_page_protect
or via remap_4k_pfn, we do a demote_segment which doesn't flush the hash
table entries. We do a lazy hash page table flush for all mapped pages
in the demoted segment. This happens when we handle hash page fault for
these pages.

We use _PAGE_COMBO bit along with _PAGE_HASHPTE to indicate whether a
pte is backed by 4K hash pte. If we find _PAGE_COMBO not set on the pte,
that implies that we could possibly have older 64K hash pte entries in
the hash page table and we need to invalidate those entries.

Use _PAGE_COMBO to determine the page size with which we should
invalidate the hash table entries on unmap.

CC:
Signed-off-by: Aneesh Kumar K.V
Signed-off-by: Benjamin Herrenschmidt

Aneesh Kumar K.V
2014-08-13 16:20:39 +0800
629149fae powerpc/thp: Invalidate old 64K based hash page mapping before insert of 4k pte ... Browse Code »

If we changed base page size of the segment, either via sub_page_protect
or via remap_4k_pfn, we do a demote_segment which doesn't flush the hash
table entries. We do a lazy hash page table flush for all mapped pages
in the demoted segment. This happens when we handle hash page fault
for these pages.

We use _PAGE_COMBO bit along with _PAGE_HASHPTE to indicate whether a
pte is backed by 4K hash pte. If we find _PAGE_COMBO not set on the pte,
that implies that we could possibly have older 64K hash pte entries in
the hash page table and we need to invalidate those entries.

Handle this correctly for 16M pages

CC:
Signed-off-by: Aneesh Kumar K.V
Signed-off-by: Benjamin Herrenschmidt

Aneesh Kumar K.V
2014-08-13 16:20:39 +0800
fa1f8ae80 powerpc/thp: Don't recompute vsid and ssize in loop on invalidate ... Browse Code »

The segment identifier and segment size will remain the same in
the loop, So we can compute it outside. We also change the
hugepage_invalidate interface so that we can use it the later patch

CC:
Signed-off-by: Aneesh Kumar K.V
Signed-off-by: Benjamin Herrenschmidt

Aneesh Kumar K.V
2014-08-13 16:20:38 +0800
b0aa44a3d powerpc/thp: Add write barrier after updating the valid bit ... Browse Code »

With hugepages, we store the hpte valid information in the pte page
whose address is stored in the second half of the PMD. Use a
write barrier to make sure clearing pmd busy bit and updating
hpte valid info are ordered properly.

CC:
Signed-off-by: Aneesh Kumar K.V
Signed-off-by: Benjamin Herrenschmidt

Aneesh Kumar K.V
2014-08-13 16:20:37 +0800
2fabf084b powerpc: reorder per-cpu NUMA information's initialization ... Browse Code »

There is an issue currently where NUMA information is used on powerpc
(and possibly ia64) before it has been read from the device-tree, which
leads to large slab consumption with CONFIG_SLUB and memoryless nodes.

NUMA powerpc non-boot CPU's cpu_to_node/cpu_to_mem is only accurate
after start_secondary(), similar to ia64, which is invoked via
smp_init().

Commit 6ee0578b4daae ("workqueue: mark init_workqueues() as
early_initcall()") made init_workqueues() be invoked via
do_pre_smp_initcalls(), which is obviously before the secondary
processors are online.

Additionally, the following commits changed init_workqueues() to use
cpu_to_node to determine the node to use for kthread_create_on_node:

bce903809ab3f ("workqueue: add wq_numa_tbl_len and
wq_numa_possible_cpumask[]")
f3f90ad469342 ("workqueue: determine NUMA node of workers accourding to
the allowed cpumask")

Therefore, when init_workqueues() runs, it sees all CPUs as being on
Node 0. On LPARs or KVM guests where Node 0 is memoryless, this leads to
a high number of slab deactivations
(http://www.spinics.net/lists/linux-mm/msg67489.html).

Fix this by initializing the powerpc-specific CPUnode/local memory
node mapping as early as possible, which on powerpc is
do_init_bootmem(). Currently that function initializes the mapping for
the boot CPU, but we extend it to setup the mapping for all possible
CPUs. Then, in smp_prepare_cpus(), we can correspondingly set the
per-cpu values for all possible CPUs. That ensures that before the
early_initcalls run (and really as early as possible), the per-cpu NUMA
mapping is accurate.

While testing memoryless nodes on PowerKVM guests with a fix to the
workqueue logic to use cpu_to_mem() instead of cpu_to_node(), with a
guest topology of:

available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
node 0 size: 0 MB
node 0 free: 0 MB
node 1 cpus: 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
node 1 size: 16336 MB
node 1 free: 15329 MB
node distances:
node 0 1
0: 10 40
1: 40 10

the slab consumption decreases from

Slab: 932416 kB
SUnreclaim: 902336 kB

to

Slab: 395264 kB
SUnreclaim: 359424 kB

And we a corresponding increase in the slab efficiency from

slab mem objs slabs
used active active
------------------------------------------------------------
kmalloc-16384 337 MB 11.28% 100.00%
task_struct 288 MB 9.93% 100.00%

to

slab mem objs slabs
used active active
------------------------------------------------------------
kmalloc-16384 37 MB 100.00% 100.00%
task_struct 31 MB 100.00% 100.00%

Powerpc didn't support memoryless nodes until recently (64bb80d87f01
"powerpc/numa: Enable CONFIG_HAVE_MEMORYLESS_NODES" and 8c272261194d
"powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID"). Those commits also
helped improve memory consumption with these kind of environments.

Signed-off-by: Nishanth Aravamudan
Signed-off-by: Benjamin Herrenschmidt

Nishanth Aravamudan
2014-08-13 13:14:05 +0800
d65897228 powerpc/perf/hv-24x7: Use kmem_cache_free ... Browse Code »

Free memory allocated using kmem_cache_zalloc using kmem_cache_free
rather than kfree.

The Coccinelle semantic patch that makes this change is as follows:

//
@@
expression x,E,c;
@@

x = $kmem_cache_alloc\|kmem_cache_zalloc\|kmem_cache_alloc_node$(c,...)
... when != x = E
when != &x
?-kfree(x)
+kmem_cache_free(c,x)
//

Signed-off-by: Himangi Saraogi
Acked-by: Julia Lawall
Signed-off-by: Benjamin Herrenschmidt

Himangi Saraogi
2014-08-13 13:14:04 +0800
587870e86 powerpc/pseries/hvcserver: Fix endian issue in hvcs_get_partner_info ... Browse Code »

A buffer returned by H_VTERM_PARTNER_INFO contains device information
in big endian format, causing problems for little endian architectures.
This patch ensures that they are in cpu endian.

Signed-off-by: Thomas Falcon
Signed-off-by: Benjamin Herrenschmidt

Thomas Falcon
2014-08-13 13:14:04 +0800
a71d64b4d powerpc: Hard disable interrupts in xmon ... Browse Code »

xmon only soft disables interrupts. This seems like a bad idea - we
certainly don't want decrementer and PMU exceptions going off when
we are debugging something inside xmon.

This issue was uncovered when the hard lockup detector went off
inside xmon. To ensure we wont get a spurious hard lockup warning,
I also call touch_nmi_watchdog() when exiting xmon.

Signed-off-by: Anton Blanchard
Signed-off-by: Benjamin Herrenschmidt

Anton Blanchard
2014-08-13 13:13:48 +0800
56758e3c3 powerpc: remove duplicate definition of TEXASR_FS ... Browse Code »

It appears that commits 7f06f21d40a6 ("powerpc/tm: Add checking to
treclaim/trechkpt") and e4e38121507a ("KVM: PPC: Book3S HV: Add
transactional memory support") both added definitions of TEXASR_FS.
Remove one of them. At the same time, fix the alignment of the remaining
definition (should be tab-separated like the rest of the #defines).

Signed-off-by: Nishanth Aravamudan
Signed-off-by: Benjamin Herrenschmidt

Nishanth Aravamudan
2014-08-13 13:13:47 +0800
5efbabe09 powerpc/pseries: Avoid deadlock on removing ddw ... Browse Code »

Function remove_ddw() could be called in of_reconfig_notifier and
we potentially remove the dynamic DMA window property, which invokes
of_reconfig_notifier again. Eventually, it leads to the deadlock as
following backtrace shows.

The patch fixes the above issue by deferring releasing the dynamic
DMA window property while releasing the device node.

=============================================
[ INFO: possible recursive locking detected ]
3.16.0+ #428 Tainted: G W
---------------------------------------------
drmgr/2273 is trying to acquire lock:
((of_reconfig_chain).rwsem){.+.+..}, at: [] \
.__blocking_notifier_call_chain+0x40/0x78

but task is already holding lock:
((of_reconfig_chain).rwsem){.+.+..}, at: [] \
.__blocking_notifier_call_chain+0x40/0x78

other info that might help us debug this:
Possible unsafe locking scenario:

CPU0
----
lock((of_reconfig_chain).rwsem);
lock((of_reconfig_chain).rwsem);
*** DEADLOCK ***

May be due to missing lock nesting notation

2 locks held by drmgr/2273:
#0: (sb_writers#4){.+.+.+}, at: [] \
.vfs_write+0xb0/0x1f8
#1: ((of_reconfig_chain).rwsem){.+.+..}, at: [] \
.__blocking_notifier_call_chain+0x40/0x78

stack backtrace:
CPU: 17 PID: 2273 Comm: drmgr Tainted: G W 3.16.0+ #428
Call Trace:
[c0000000137e7000] [c000000000013d9c] .show_stack+0x88/0x148 (unreliable)
[c0000000137e70b0] [c00000000083cd34] .dump_stack+0x7c/0x9c
[c0000000137e7130] [c0000000000b8afc] .__lock_acquire+0x128c/0x1c68
[c0000000137e7280] [c0000000000b9a4c] .lock_acquire+0xe8/0x104
[c0000000137e7350] [c00000000083588c] .down_read+0x4c/0x90
[c0000000137e73e0] [c000000000091890] .__blocking_notifier_call_chain+0x40/0x78
[c0000000137e7490] [c000000000091900] .blocking_notifier_call_chain+0x38/0x48
[c0000000137e7520] [c000000000682a28] .of_reconfig_notify+0x34/0x5c
[c0000000137e75b0] [c000000000682a9c] .of_property_notify+0x4c/0x54
[c0000000137e7650] [c000000000682bf0] .of_remove_property+0x30/0xd4
[c0000000137e76f0] [c000000000052a44] .remove_ddw+0x144/0x168
[c0000000137e7790] [c000000000053204] .iommu_reconfig_notifier+0x30/0xe0
[c0000000137e7820] [c00000000009137c] .notifier_call_chain+0x6c/0xb4
[c0000000137e78c0] [c0000000000918ac] .__blocking_notifier_call_chain+0x5c/0x78
[c0000000137e7970] [c000000000091900] .blocking_notifier_call_chain+0x38/0x48
[c0000000137e7a00] [c000000000682a28] .of_reconfig_notify+0x34/0x5c
[c0000000137e7a90] [c000000000682e14] .of_detach_node+0x44/0x1fc
[c0000000137e7b40] [c0000000000518e4] .ofdt_write+0x3ac/0x688
[c0000000137e7c20] [c000000000238430] .proc_reg_write+0xb8/0xd4
[c0000000137e7cd0] [c0000000001cbeac] .vfs_write+0xec/0x1f8
[c0000000137e7d70] [c0000000001cc3b0] .SyS_write+0x58/0xa0
[c0000000137e7e30] [c00000000000a064] syscall_exit+0x0/0x98

Cc: stable@vger.kernel.org
Signed-off-by: Gavin Shan
Signed-off-by: Benjamin Herrenschmidt

Gavin Shan
2014-08-13 13:13:47 +0800
f1b3929c2 powerpc/pseries: Failure on removing device node ... Browse Code »

While running command "drmgr -c phb -r -s 'PHB 528'", following
backtrace jumped out because the target device node isn't marked
with OF_DETACHED by of_detach_node(), which caused by error
returned from memory hotplug related reconfig notifier when
disabling CONFIG_MEMORY_HOTREMOVE. The patch fixes it.

ERROR: Bad of_node_put() on /pci@800000020000210/ethernet@0
CPU: 14 PID: 2252 Comm: drmgr Tainted: G W 3.16.0+ #427
Call Trace:
[c000000012a776a0] [c000000000013d9c] .show_stack+0x88/0x148 (unreliable)
[c000000012a77750] [c00000000083cd34] .dump_stack+0x7c/0x9c
[c000000012a777d0] [c0000000006807c4] .of_node_release+0x58/0xe0
[c000000012a77860] [c00000000038a7d0] .kobject_release+0x174/0x1b8
[c000000012a77900] [c00000000038a884] .kobject_put+0x70/0x78
[c000000012a77980] [c000000000681680] .of_node_put+0x28/0x34
[c000000012a77a00] [c000000000681ea8] .__of_get_next_child+0x64/0x70
[c000000012a77a90] [c000000000682138] .of_find_node_by_path+0x1b8/0x20c
[c000000012a77b40] [c000000000051840] .ofdt_write+0x308/0x688
[c000000012a77c20] [c000000000238430] .proc_reg_write+0xb8/0xd4
[c000000012a77cd0] [c0000000001cbeac] .vfs_write+0xec/0x1f8
[c000000012a77d70] [c0000000001cc3b0] .SyS_write+0x58/0xa0
[c000000012a77e30] [c00000000000a064] syscall_exit+0x0/0x98

Cc: stable@vger.kernel.org
Signed-off-by: Gavin Shan
Signed-off-by: Benjamin Herrenschmidt

Gavin Shan
2014-08-13 13:13:46 +0800
ce8f150a1 powerpc/boot: Use correct zlib types for comparison ... Browse Code »

Avoids this warning:

arch/powerpc/boot/gunzip_util.c:118:9: warning: comparison of distinct pointer types lacks a cast

Signed-off-by: Benjamin Herrenschmidt

Benjamin Herrenschmidt
2014-08-13 13:13:45 +0800
b09c2ec40 powerpc/powernv: Interface to register/unregister opal dump region ... Browse Code »

PowerNV platform is capable of capturing host memory region when system
crashes (because of host/firmware). We have new OPAL API to register/
unregister memory region to be captured when system crashes.

This patch adds support for new API. Also during boot time we register
kernel log buffer and unregister before doing kexec.

Signed-off-by: Vasant Hegde
Signed-off-by: Benjamin Herrenschmidt

Vasant Hegde
2014-08-13 13:13:45 +0800
3609e09fd powerpc: Add POWER8 features to CPU_FTRS_POSSIBLE/ALWAYS ... Browse Code »

We have been a bit slack about updating the CPU_FTRS_POSSIBLE and
CPU_FTRS_ALWAYS masks. When we added POWER8, and also POWER8E we forgot
to update the ALWAYS mask. And when we added POWER8_DD1 we forgot to
update both the POSSIBLE and ALWAYS masks.

Luckily this hasn't caused any actual bugs AFAICS. Failing to update the
ALWAYS mask just forgoes a potential optimisation opportunity. Failing
to update the POSSIBLE mask for POWER8_DD1 is also OK because it only
removes a bit rather than adding any.

Regardless they should all be in both masks so as to avoid any future
bugs when the set of ALWAYS/POSSIBLE bits changes, or the masks
themselves change.

Signed-off-by: Michael Ellerman
Acked-by: Michael Neuling
Acked-by: Joel Stanley
Signed-off-by: Benjamin Herrenschmidt

Michael Ellerman
2014-08-13 13:13:43 +0800
97b3be1e9 powerpc/ppc476: Disable BTAC ... Browse Code »

This patch disables the branch target address CAM which under specific
circumstances may cause the processor to skip execution of 1-4
instructions. This fixes IBM Erratum #47.

Signed-off-by: Alistair Popple
Signed-off-by: Benjamin Herrenschmidt

Alistair Popple
2014-08-13 13:13:42 +0800
763fe0add powerpc/powernv: Fix IOMMU group lost ... Browse Code »

When we take full hotplug to recover from EEH errors, PCI buses
could be involved. For the case, the child devices of involved
PCI buses can't be attached to IOMMU group properly, which is
caused by commit 3f28c5a ("powerpc/powernv: Reduce multi-hit of
iommu_add_device()").

When adding the PCI devices of the newly created PCI buses to
the system, the IOMMU group is expected to be added in (C).
(A) fails to bind the IOMMU group because bus->is_added is
false. (B) fails because the device doesn't have binding IOMMU
table yet. bus->is_added is set to true at end of (C) and
pdev->is_added is set to true at (D).

pcibios_add_pci_devices()
pci_scan_bridge()
pci_scan_child_bus()
pci_scan_slot()
pci_scan_single_device()
pci_scan_device()
pci_device_add()
pcibios_add_device() A: Ignore
device_add() B: Ignore
pcibios_fixup_bus()
pcibios_setup_bus_devices()
pcibios_setup_device() C: Hit
pcibios_finish_adding_to_bus()
pci_bus_add_devices()
pci_bus_add_device() D: Add device

If the parent PCI bus isn't involved in hotplug, the IOMMU
group is expected to be bound in (B). (A) should fail as the
sysfs entries aren't populated.

The patch fixes the issue by reverting commit 3f28c5a and remove
WARN_ON() in iommu_add_device() to allow calling the function
even the specified device already has associated IOMMU group.

Cc: # 3.16+
Reported-by: Thadeu Lima de Souza Cascardo
Signed-off-by: Gavin Shan
Acked-by: Wei Yang
Signed-off-by: Benjamin Herrenschmidt

Gavin Shan
2014-08-13 13:13:42 +0800
78e05b142 powerpc: Add smp_mb()s to arch_spin_unlock_wait() ... Browse Code »

Similar to the previous commit which described why we need to add a
barrier to arch_spin_is_locked(), we have a similar problem with
spin_unlock_wait().

We need a barrier on entry to ensure any spinlock we have previously
taken is visibly locked prior to the load of lock->slock.

It's also not clear if spin_unlock_wait() is intended to have ACQUIRE
semantics. For now be conservative and add a barrier on exit to give it
ACQUIRE semantics.

Signed-off-by: Michael Ellerman
Signed-off-by: Benjamin Herrenschmidt

Michael Ellerman
2014-08-13 13:13:27 +0800
51d7d5205 powerpc: Add smp_mb() to arch_spin_is_locked() ... Browse Code »

The kernel defines the function spin_is_locked(), which can be used to
check if a spinlock is currently locked.

Using spin_is_locked() on a lock you don't hold is obviously racy. That
is, even though you may observe that the lock is unlocked, it may become
locked at any time.

There is (at least) one exception to that, which is if two locks are
used as a pair, and the holder of each checks the status of the other
before doing any update.

Assuming *A and *B are two locks, and *COUNTER is a shared non-atomic
value:

The first CPU does:

spin_lock(*A)

if spin_is_locked(*B)
# nothing
else
smp_mb()
LOAD r = *COUNTER
r++
STORE *COUNTER = r

spin_unlock(*A)

And the second CPU does:

spin_lock(*B)

if spin_is_locked(*A)
# nothing
else
smp_mb()
LOAD r = *COUNTER
r++
STORE *COUNTER = r

spin_unlock(*B)

Although this is a strange locking construct, it should work.

It seems to be understood, but not documented, that spin_is_locked() is
not a memory barrier, so in the examples above and below the caller
inserts its own memory barrier before acting on the result of
spin_is_locked().

For now we assume spin_is_locked() is implemented as below, and we break
it out in our examples:

bool spin_is_locked(*LOCK) {
LOAD l = *LOCK
return l.locked
}

Our intuition is that there should be no problem even if the two code
sequences run simultaneously such as:

CPU 0 CPU 1
==================================================
spin_lock(*A) spin_lock(*B)
LOAD b = *B LOAD a = *A
if b.locked # true if a.locked # true
# nothing # nothing
spin_unlock(*A) spin_unlock(*B)

If one CPU gets the lock before the other then it will do the update and
the other CPU will back off:

CPU 0 CPU 1
==================================================
spin_lock(*A)
LOAD b = *B
spin_lock(*B)
if b.locked # false LOAD a = *A
else if a.locked # true
smp_mb() # nothing
LOAD r1 = *COUNTER spin_unlock(*B)
r1++
STORE *COUNTER = r1
spin_unlock(*A)

However in reality spin_lock() itself is not indivisible. On powerpc we
implement it as a load-and-reserve and store-conditional.

Ignoring the retry logic for the lost reservation case, it boils down to:
spin_lock(*LOCK) {
LOAD l = *LOCK
l.locked = true
STORE *LOCK = l
ACQUIRE_BARRIER
}

The ACQUIRE_BARRIER is required to give spin_lock() ACQUIRE semantics as
defined in memory-barriers.txt:

This acts as a one-way permeable barrier. It guarantees that all
memory operations after the ACQUIRE operation will appear to happen
after the ACQUIRE operation with respect to the other components of
the system.

On modern powerpc systems we use lwsync for ACQUIRE_BARRIER. lwsync is
also know as "lightweight sync", or "sync 1".

As described in Power ISA v2.07 section B.2.1.1, in this scenario the
lwsync is not the barrier itself. It instead causes the LOAD of *LOCK to
act as the barrier, preventing any loads or stores in the locked region
from occurring prior to the load of *LOCK.

Whether this behaviour is in accordance with the definition of ACQUIRE
semantics in memory-barriers.txt is open to discussion, we may switch to
a different barrier in future.

What this means in practice is that the following can occur:

CPU 0 CPU 1
==================================================
LOAD a = *A LOAD b = *B
a.locked = true b.locked = true
LOAD b = *B LOAD a = *A
STORE *A = a STORE *B = b
if b.locked # false if a.locked # false
else else
smp_mb() smp_mb()
LOAD r1 = *COUNTER LOAD r2 = *COUNTER
r1++ r2++
STORE *COUNTER = r1
STORE *COUNTER = r2 # Lost update
spin_unlock(*A) spin_unlock(*B)

That is, the load of *B can occur prior to the store that makes *A
visibly locked. And similarly for CPU 1. The result is both CPUs hold
their lock and believe the other lock is unlocked.

The easiest fix for this is to add a full memory barrier to the start of
spin_is_locked(), so adding to our previous definition would give us:

bool spin_is_locked(*LOCK) {
smp_mb()
LOAD l = *LOCK
return l.locked
}

The new barrier orders the store to the lock we are locking vs the load
of the other lock:

CPU 0 CPU 1
==================================================
LOAD a = *A LOAD b = *B
a.locked = true b.locked = true
STORE *A = a STORE *B = b
smp_mb() smp_mb()
LOAD b = *B LOAD a = *A
if b.locked # true if a.locked # true
# nothing # nothing
spin_unlock(*A) spin_unlock(*B)

Although the above example is theoretical, there is code similar to this
example in sem_lock() in ipc/sem.c. This commit in addition to the next
commit appears to be a fix for crashes we are seeing in that code where
we believe this race happens in practice.

Signed-off-by: Michael Ellerman
Signed-off-by: Benjamin Herrenschmidt

Michael Ellerman
2014-08-13 13:13:26 +0800