10 Aug, 2011
1 commit
-
Brown paper bag day, previous commit wouldn't work very well with modules
enabled. Move the exports into the ifdef.Signed-off-by: Benjamin Herrenschmidt
05 Aug, 2011
22 commits
-
Commit fea80311a939a746533a6d7e7c3183729d6a3faf
"iomap: make IOPORT/PCI mapping functions conditional"Broke powerpc build without CONFIG_PCI as we would still define
pci_iomap(), which overlaps with the new empty inline in the headers.Make our implementation conditional on CONFIG_PCI
Signed-off-by: Benjamin Herrenschmidt
-
Commit 112d1fe9f7715db423ffeec5ac1beccff6093dc4
"powerpc/4xx: Add check_link to struct ppc4xx_pciex_hwops" inadvertently
broke 405 builds due to some functions being over protected by an
ifdef CONFIG_44x.Move them back out.
Signed-off-by: Benjamin Herrenschmidt
-
The VPA, SLB shadow and DTL degistration functions do not need an
address, so simplify things and remove it.Also cleanup pseries_kexec_cpu_down a bit by storing the cpu IDs
in local variables.Signed-off-by: Anton Blanchard
Signed-off-by: Benjamin Herrenschmidt -
Make the VPA, SLB shadow and DTL registration and deregistration
functions print consistent messages on error. I needed the firmware
error code while chasing a kexec bug but we weren't printing it.Signed-off-by: Anton Blanchard
Signed-off-by: Benjamin Herrenschmidt -
Recent versions of firmware will fail to unmap the virtual processor
area if we have a dispatch trace log registered. This causes kexec
to fail.If a trace log is registered this patch unregisters it before the
SLB shadow and virtual processor areas, fixing the problem.The address argument is ignored by firmware on unregister so we
may as well remove it.Signed-off-by: Anton Blanchard
Cc:
Signed-off-by: Benjamin Herrenschmidt -
Grant intends to hand over maintainership of mpc5xxx
to me. Change MPC5XXX entry in MAINTAINERS accordingly.Signed-off-by: Anatolij Gustschin
Signed-off-by: Benjamin Herrenschmidt -
KVM_GUEST adds a 1 MB array to the kernel (kvm_tmp) which grew
my kernel enough to cause it to fail to boot.Dynamically allocating or reducing the size of this array is a
good idea, but in the meantime I think it makes sense to make
KVM_GUEST default to n in order to minimise surprises.Signed-off-by: Anton Blanchard
Signed-off-by: Benjamin Herrenschmidt -
On a box with gcc 4.3.2, I see errors like:
arch/powerpc/kvm/book3s_hv_rmhandlers.S:1254: Error: Unrecognized opcode: stxvd2x
arch/powerpc/kvm/book3s_hv_rmhandlers.S:1316: Error: Unrecognized opcode: lxvd2xSigned-off-by: Nishanth Aravamudan
Signed-off-by: Benjamin Herrenschmidt -
The ibm,io-events code is a bit verbose with its error messages.
Reverse the reporting so we only print when we successfully enable
I/O event interrupts.Signed-off-by: Anton Blanchard
Signed-off-by: Benjamin Herrenschmidt -
We are seeing boot failures on some very large boxes even with
commit b5416ca9f824 (powerpc: Move kdump default base address to
64MB on 64bit).This patch halves the RMO so both kernels get about the same
amount of RMO memory. On large machines this region will be
at least 256MB, so each kernel will get 128MB.We cap it at 256MB (small SLB size) since some early allocations need
to be in the bolted SLB region. We could relax this on machines with
1TB SLBs in a future patch.Signed-off-by: Anton Blanchard
Signed-off-by: Benjamin Herrenschmidt -
Panic observed on an older kernel when collecting call chains for
the context-switch software event:[]rb_erase+0x1b4/0x3e8
[]__dequeue_entity+0x50/0xe8
[]set_next_entity+0x178/0x1bc
[]pick_next_task_fair+0xb0/0x118
[]schedule+0x500/0x614
[]rwsem_down_failed_common+0xf0/0x264
[]rwsem_down_read_failed+0x34/0x54
[]down_read+0x3c/0x54
[]do_page_fault+0x114/0x5e8
[]handle_page_fault+0xc/0x80
[]perf_callchain+0x224/0x31c
[]perf_prepare_sample+0x240/0x2fc
[]__perf_event_overflow+0x280/0x398
[]perf_swevent_overflow+0x9c/0x10c
[]perf_swevent_ctx_event+0x1d0/0x230
[]do_perf_sw_event+0x84/0xe4
[]perf_sw_event_context_switch+0x150/0x1b4
[]perf_event_task_sched_out+0x44/0x2d4
[]schedule+0x2c0/0x614
[]__cond_resched+0x34/0x90
[]_cond_resched+0x4c/0x68
[]move_page_tables+0xb0/0x418
[]setup_arg_pages+0x184/0x2a0
[]load_elf_binary+0x394/0x1208
[]search_binary_handler+0xe0/0x2c4
[]do_execve+0x1bc/0x268
[]sys_execve+0x84/0xc8
[]ret_from_syscall+0x0/0x3cA page fault occurred walking the callchain while creating a perf
sample for the context-switch event. To handle the page fault the
mmap_sem is needed, but it is currently held by setup_arg_pages.
(setup_arg_pages calls shift_arg_pages with the mmap_sem held.
shift_arg_pages then calls move_page_tables which has a cond_resched
at the top of its for loop - hitting that cond_resched is what caused
the context switch.)This is an extension of Anton's proposed patch:
https://lkml.org/lkml/2011/7/24/151
adding case for 32-bit ppc.Tested on the system that first generated the panic and then again
with latest kernel using a PPC VM. I am not able to test the 64-bit
path - I do not have H/W for it and 64-bit PPC VMs (qemu on Intel)
is horribly slow.Signed-off-by: David Ahern
Signed-off-by: Benjamin Herrenschmidt -
One definition of PV_POWER7 seems enough to me.
Signed-off-by: Peter Zijlstra
Signed-off-by: Benjamin Herrenschmidt -
On a box with 8TB of RAM the MMU hashtable is 64GB in size. That
means we have 4G PTEs. pSeries_lpar_hptab_clear was using a signed
int to store the index which will overflow at 2G.Signed-off-by: Anton Blanchard
Cc:
Acked-by: Michael Neuling
Signed-off-by: Benjamin Herrenschmidt -
I hit an oops at boot on the first instruction of timer_cpu_notify:
NIP [c000000000722f88] .timer_cpu_notify+0x0/0x388
The code should look like:
c000000000722f78: eb e9 00 30 ld r31,48(r9)
c000000000722f7c: 2f bf 00 00 cmpdi cr7,r31,0
c000000000722f80: 40 9e ff 44 bne+ cr7,c000000000722ec4
c000000000722f84: 4b ff ff 74 b c000000000722ef8c000000000722f88 :
c000000000722f88: 7c 08 02 a6 mflr r0
c000000000722f8c: 2f a4 00 07 cmpdi cr7,r4,7
c000000000722f90: fb c1 ff f0 std r30,-16(r1)
c000000000722f94: fb 61 ff d8 std r27,-40(r1)But the oops output shows:
eb61ffd8 eb81ffe0 eba1ffe8 ebc1fff0 7c0803a6 ebe1fff8 4e800020
00000000 ebe90030 c0000000 00ad0a28 00000000 2fa40007 fbc1fff0 fb61ffd8So we scribbled over our instructions with c000000000ad0a28, which
is an address inside the jump_table ELF section.It turns out the jump_table section is only aligned to 8 bytes but
we are aligning our entries within the section to 16 bytes. This
means our entries are offset from the table:c000000000acd4a8 :
...
c000000000ad0a10: c0 00 00 00 lfs f0,0(0)
c000000000ad0a14: 00 70 cd 5c .long 0x70cd5c
c000000000ad0a18: c0 00 00 00 lfs f0,0(0)
c000000000ad0a1c: 00 70 cd 90 .long 0x70cd90
c000000000ad0a20: c0 00 00 00 lfs f0,0(0)
c000000000ad0a24: 00 ac a4 20 .long 0xaca420And the jump table sort code gets very confused and writes into the
wrong spot. Remove the alignment, and also remove the padding since
we it saves some space and we shouldn't need it.Signed-off-by: Anton Blanchard
Signed-off-by: Benjamin Herrenschmidt -
Add a newline to the panic messages in make_room. Also fix a
comment that suggested our chunk size is 4Mb. It's 1MB.Signed-off-by: Anton Blanchard
Signed-off-by: Benjamin Herrenschmidt -
I have a box that fails in OF during boot with:
DEFAULT CATCH!, exception-handler=fff00400
at %SRR0: 49424d2c4c6f6768 %SRR1: 800000004000b002ie "IBM,Logh". OF got corrupted with a device tree string.
Looking at make_room and alloc_up, we claim the first chunk (1 MB)
but we never claim any more. mem_end is always set to alloc_top
which is the top of our available address space, guaranteeing we will
never call alloc_up and claim more memory.Also alloc_up wasn't setting alloc_bottom to the bottom of the
available address space.This doesn't help the box to boot, but we at least fail with
an obvious error. We could relocate the device tree in a future
patch.Signed-off-by: Anton Blanchard
Cc:
Signed-off-by: Benjamin Herrenschmidt -
Commit af9eef3c7b1ed004c378c89b87642f4937337d50 caused cpu_setup to see
the_cpu_spec, rather than the source struct. However, on 32-bit, the
return value of identify_cpu was being used for feature fixups, and
identify_cpu was returning the source struct. So if cpu_setup patches
the feature bits, the update won't affect the fixups.Signed-off-by: Scott Wood
Signed-off-by: Benjamin Herrenschmidt -
Add a cast in case the caller passes in a different type, as it would
if mtspr/mtmsr were functions.Previously, if a 64-bit type was passed in on 32-bit, GCC would bind the
constraint to a pair of registers, and would substitute the first register
in the pair in the asm code. This corresponds to the upper half of the
64-bit register, which is generally not the desired behavior.Signed-off-by: Scott Wood
Signed-off-by: Benjamin Herrenschmidt -
* 'devicetree/merge' of git://git.secretlab.ca/git/linux-2.6:
Revert "dt: add of_alias_scan and of_alias_get_id"
dt: remove of_alias_get_id() reference -
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/parisc-2.6:
[PARISC] wire up sendmmsg syscall
[PARISC] fix return type of __atomic64_add_return
[PARISC] Fix futex support -
* 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6:
[S390] signal: use set_restore_sigmask() helper
[S390] smp: remove pointless comments in startup_secondary()
[S390] qdio: Use kstrtoul_from_user
[S390] sclp_async: Use kstrtoul_from_user
[S390] exec: remove redundant set_fs(USER_DS)
[S390] cpu hotplug: on cpu start wait until being marked active
[S390] signal: convert to use set_current_blocked()
[S390] asm offsets: fix coding style
[S390] Add support for IBM zEnterprise 114
[S390] dasd: check if raw track access is supported
[S390] Use diagnose 308 for system reset
[S390] Export store_status() function
[S390] dasd: use vmalloc for statistics input buffer
[S390] Add PSW restart shutdown trigger
[S390] missing return in page_table_alloc_pgste
[S390] qdio: 2nd stage retry on SIGA-W busy conditions -
While `pci_eisa_driver' still refer `pci_eisa_init', the .probe() function
should not be called after init memory release, as pointed out by commit
74b9a297. The structure is still referenced in the drivers subsystem, and can
be accesseed through sysfs, so the modpost warning is a false positive. Mark
it as such.In the same time, the warning referenced in 005bdad7b80 did only mention
`pci_eisa_driver', not `pci_eisa_pci_tbl', so remove its marking.Broken-by: Arnaud Lacombe (in 005bdad7b80)
Reported-by: Tetsuo Handa
Signed-off-by: Arnaud Lacombe
Signed-off-by: Linus Torvalds
04 Aug, 2011
17 commits
-
This reverts commit 750f463a749e28464151ad26938d11b07b1c43cb.
of_alias_* still needs work to be generalized for 'promtree' dt
platforms, and to no implicitly create entries for available ids.Signed-off-by: Grant Likely
-
of_alias_get_id() is broken and being reverted. Remove the reference
to it and replace with a single incrementing id number.There is no risk of regression here on the imx driver since the imx
change to use of_alias_get_id() is commit 22698aa2, "serial/imx: add
device tree probe support" which is new for v3.1, and it won't get
used unless CONFIG_OF is enabled and the board is booted using a
device tree. A single incrementing integer is sufficient for now.Signed-off-by: Grant Likely
Acked-by: Shawn Guo -
The core device layer sends tons of uevent notifications for each device
it finds, and if the kernel has been built with a non-empty
CONFIG_UEVENT_HELPER_PATH that will make us try to execute the usermode
helper binary for all these events very early in the boot.Not only won't the root filesystem even be mounted at that point, we
literally won't have necessarily even initialized all the process
handling data structures at that point, which causes no end of silly
problems even when the usermode helper doesn't actually succeed in
executing.So just use our existing infrastructure to disable the usermodehelpers
to make the kernel start out with them disabled. We enable them when
we've at least initialized stuff a bit.Problems related to an uninitialized
init_ipc_ns.ids[IPC_SHM_IDS].rw_mutex
reported by various people.
Reported-by: Manuel Lauss
Reported-by: Richard Weinberger
Reported-by: Marc Zyngier
Acked-by: Kay Sievers
Cc: Andrew Morton
Cc: Vasiliy Kulikov
Cc: Greg KH
Signed-off-by: Linus Torvalds -
Dmitry Kasatkin reports:
"kernel-devel package with kernel headers have no
directory if XEN is disabled. Modules which inclide asm/io.h won't
compile.XEN related content is behind the CONFIG_XEN flag in the io.h. And
should be also behind CONFIG_XEN flag."So move the include of down into the section that is
conditional on CONFIG_XEN.Reported-by: Dmitry Kasatkin
Signed-off-by: Linus Torvalds -
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: ad7879 - fix deficient device disable
Input: gpio_keys - fix two typos in devicetree documentation
Input: mma8450 - add device tree probe support
Input: gpio_keys - return proper error code if memory allocation fails
Input: lm8323 - add missing device_remove_file for dev_attr_time
Input: tegra-kbc - fix computation of polling time
Input: kxtj9 - explicitly include module.h
Input: psmouse - hgpk.c needs module.h -
* 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6:
cpuidle: stop depending on pm_idle
x86 idle: move mwait_idle_with_hints() to where it is used
cpuidle: replace xen access to x86 pm_idle and default_idle
cpuidle: create bootparam "cpuidle.off=1"
mrst_pmu: driver for Intel Moorestown Power Management Unit -
* 'apei-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6:
ACPI, APEI, EINJ Param support is disabled by default
APEI GHES: 32-bit buildfix
ACPI: APEI build fix
ACPI, APEI, GHES: Add hardware memory error recovery support
HWPoison: add memory_failure_queue()
ACPI, APEI, GHES, Error records content based throttle
ACPI, APEI, GHES, printk support for recoverable error via NMI
lib, Make gen_pool memory allocator lockless
lib, Add lock-less NULL terminated single list
Add Kconfig option ARCH_HAVE_NMI_SAFE_CMPXCHG
ACPI, APEI, Add WHEA _OSC support
ACPI, APEI, Add APEI bit support in generic _OSC call
ACPI, APEI, GHES, Support disable GHES at boot time
ACPI, APEI, GHES, Prevent GHES to be built as module
ACPI, APEI, Use apei_exec_run_optional in APEI EINJ and ERST
ACPI, APEI, Add apei_exec_run_optional
ACPI, APEI, GHES, Do not ratelimit fatal error printk before panic
ACPI, APEI, ERST, Fix erst-dbg long record reading issue
ACPI, APEI, ERST, Prevent erst_dbg from loading if ERST is disabled -
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
tcm_fc: Handle DDP/SW fc_frame_payload_get failures in ft_recv_write_data
target: Fix bug for transport_generic_wait_for_tasks with direct operation
target: iscsi_target depends on NET
target: Fix WRITE_SAME_16 lba assignment breakage
MAINTAINERS: Add target-devel list for drivers/target/
iscsi-target: Fix CONFIG_SMP=n and CONFIG_MODULES=n build failure
iscsi-target: Fix snprintf usage with MAX_PORTAL_LEN
iscsi-target: Fix uninitialized usage of cmd->pad_bytes
iscsi-target: strlen() doesn't count the terminator
iscsi-target: Fix NULL dereference on allocation failure -
* 'devicetree/next' of git://git.secretlab.ca/git/linux-2.6:
dt: add of_alias_scan and of_alias_get_id -
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: use kzalloc in ext4_kzalloc() -
We may optimistically check .in_use == 0 without holding the rw_mutex:
it's the common case, and if it's zero, there certainly won't be any
segments associated with us.After taking the lock, the idr_for_each() will do the right thing, so we
could now drop the re-check inside the lock without any real cost. But
it won't hurt.Signed-off-by: Vasiliy Kulikov
Signed-off-by: Linus Torvalds -
Commit 4c677e2eefdb ("shm: optimize locking and ipc_namespace getting")
introduced a copy-paste bug. Due to the bug cycle optimizations were
disabled.Signed-off-by: Vasiliy Kulikov
Signed-off-by: Linus Torvalds -
Expand the fs/Kconfig "help" info to clarify why it's a bad idea to
deselect the TMPFS_POSIX_ACL config variable.Signed-off-by: Robert P. J. Day
Acked-by: Randy Dunlap
Acked-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Make the radix_tree exceptional cases, mostly in filemap.c, clearer.
It's hard to devise a suitable snappy name that illuminates the use by
shmem/tmpfs for swap, while keeping filemap/pagecache/radix_tree
generality. And akpm points out that /* radix_tree_deref_retry(page) */
comments look like calls that have been commented out for unknown
reason.Skirt the naming difficulty by rearranging these blocks to handle the
transient radix_tree_deref_retry(page) case first; then just explain the
remaining shmem/tmpfs swap case in a comment.Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
We have already acknowledged that swapoff of a tmpfs file is slower than
it was before conversion to the generic radix_tree: a little slower
there will be acceptable, if the hotter paths are faster.But it was a shock to find swapoff of a 500MB file 20 times slower on my
laptop, taking 10 minutes; and at that rate it significantly slows down
my testing.Now, most of that turned out to be overhead from PROVE_LOCKING and
PROVE_RCU: without those it was only 4 times slower than before; and
more realistic tests on other machines don't fare as badly.I've tried a number of things to improve it, including tagging the swap
entries, then doing lookup by tag: I'd expected that to halve the time,
but in practice it's erratic, and often counter-productive.The only change I've so far found to make a consistent improvement, is
to short-circuit the way we go back and forth, gang lookup packing
entries into the array supplied, then shmem scanning that array for the
target entry. Scanning in place doubles the speed, so it's now only
twice as slow as before (or three times slower when the PROVEs are on).So, add radix_tree_locate_item() as an expedient, once-off,
single-caller hack to do the lookup directly in place. #ifdef it on
CONFIG_SHMEM and CONFIG_SWAP, as much to document its limited
applicability as save space in other configurations. And, sadly,
#include sched.h for cond_resched().Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Remove PageSwapBacked (!page_is_file_cache) cases from
add_to_page_cache_locked() and add_to_page_cache_lru(): those pages now
go through shmem_add_to_page_cache().Remove a comment on maximum tmpfs size from fsstack_copy_inode_size(),
and add a comment on swap entries to invalidate_mapping_pages().And mincore_page() uses find_get_page() on what might be shmem or a
tmpfs file: allow for a radix_tree_exceptional_entry(), and proceed to
find_get_page() on swapper_space if so (oh, swapper_space needs #ifdef).Signed-off-by: Hugh Dickins
Acked-by: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
But we've not yet removed the old swp_entry_t i_direct[16] from
shmem_inode_info. That's because it was still being shared with the
inline symlink. Remove it now (saving 64 or 128 bytes from shmem inode
size), and use kmemdup() for short symlinks, say, those up to 128 bytes.I wonder why mpol_free_shared_policy() is done in shmem_destroy_inode()
rather than shmem_evict_inode(), where we usually do such freeing? I
guess it doesn't matter, and I'm not into NUMA mpol testing right now.Signed-off-by: Hugh Dickins
Acked-by: Rik van Riel
Reviewed-by: Pekka Enberg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds