10 Aug, 2011

1 commit


05 Aug, 2011

22 commits

  • Commit fea80311a939a746533a6d7e7c3183729d6a3faf
    "iomap: make IOPORT/PCI mapping functions conditional"

    Broke powerpc build without CONFIG_PCI as we would still define
    pci_iomap(), which overlaps with the new empty inline in the headers.

    Make our implementation conditional on CONFIG_PCI

    Signed-off-by: Benjamin Herrenschmidt

    Benjamin Herrenschmidt
     
  • Commit 112d1fe9f7715db423ffeec5ac1beccff6093dc4
    "powerpc/4xx: Add check_link to struct ppc4xx_pciex_hwops" inadvertently
    broke 405 builds due to some functions being over protected by an
    ifdef CONFIG_44x.

    Move them back out.

    Signed-off-by: Benjamin Herrenschmidt

    Benjamin Herrenschmidt
     
  • The VPA, SLB shadow and DTL degistration functions do not need an
    address, so simplify things and remove it.

    Also cleanup pseries_kexec_cpu_down a bit by storing the cpu IDs
    in local variables.

    Signed-off-by: Anton Blanchard
    Signed-off-by: Benjamin Herrenschmidt

    Anton Blanchard
     
  • Make the VPA, SLB shadow and DTL registration and deregistration
    functions print consistent messages on error. I needed the firmware
    error code while chasing a kexec bug but we weren't printing it.

    Signed-off-by: Anton Blanchard
    Signed-off-by: Benjamin Herrenschmidt

    Anton Blanchard
     
  • Recent versions of firmware will fail to unmap the virtual processor
    area if we have a dispatch trace log registered. This causes kexec
    to fail.

    If a trace log is registered this patch unregisters it before the
    SLB shadow and virtual processor areas, fixing the problem.

    The address argument is ignored by firmware on unregister so we
    may as well remove it.

    Signed-off-by: Anton Blanchard
    Cc:
    Signed-off-by: Benjamin Herrenschmidt

    Anton Blanchard
     
  • Grant intends to hand over maintainership of mpc5xxx
    to me. Change MPC5XXX entry in MAINTAINERS accordingly.

    Signed-off-by: Anatolij Gustschin
    Signed-off-by: Benjamin Herrenschmidt

    Anatolij Gustschin
     
  • KVM_GUEST adds a 1 MB array to the kernel (kvm_tmp) which grew
    my kernel enough to cause it to fail to boot.

    Dynamically allocating or reducing the size of this array is a
    good idea, but in the meantime I think it makes sense to make
    KVM_GUEST default to n in order to minimise surprises.

    Signed-off-by: Anton Blanchard
    Signed-off-by: Benjamin Herrenschmidt

    Anton Blanchard
     
  • On a box with gcc 4.3.2, I see errors like:

    arch/powerpc/kvm/book3s_hv_rmhandlers.S:1254: Error: Unrecognized opcode: stxvd2x
    arch/powerpc/kvm/book3s_hv_rmhandlers.S:1316: Error: Unrecognized opcode: lxvd2x

    Signed-off-by: Nishanth Aravamudan
    Signed-off-by: Benjamin Herrenschmidt

    Nishanth Aravamudan
     
  • The ibm,io-events code is a bit verbose with its error messages.
    Reverse the reporting so we only print when we successfully enable
    I/O event interrupts.

    Signed-off-by: Anton Blanchard
    Signed-off-by: Benjamin Herrenschmidt

    Anton Blanchard
     
  • We are seeing boot failures on some very large boxes even with
    commit b5416ca9f824 (powerpc: Move kdump default base address to
    64MB on 64bit).

    This patch halves the RMO so both kernels get about the same
    amount of RMO memory. On large machines this region will be
    at least 256MB, so each kernel will get 128MB.

    We cap it at 256MB (small SLB size) since some early allocations need
    to be in the bolted SLB region. We could relax this on machines with
    1TB SLBs in a future patch.

    Signed-off-by: Anton Blanchard
    Signed-off-by: Benjamin Herrenschmidt

    Anton Blanchard
     
  • Panic observed on an older kernel when collecting call chains for
    the context-switch software event:

    []rb_erase+0x1b4/0x3e8
    []__dequeue_entity+0x50/0xe8
    []set_next_entity+0x178/0x1bc
    []pick_next_task_fair+0xb0/0x118
    []schedule+0x500/0x614
    []rwsem_down_failed_common+0xf0/0x264
    []rwsem_down_read_failed+0x34/0x54
    []down_read+0x3c/0x54
    []do_page_fault+0x114/0x5e8
    []handle_page_fault+0xc/0x80
    []perf_callchain+0x224/0x31c
    []perf_prepare_sample+0x240/0x2fc
    []__perf_event_overflow+0x280/0x398
    []perf_swevent_overflow+0x9c/0x10c
    []perf_swevent_ctx_event+0x1d0/0x230
    []do_perf_sw_event+0x84/0xe4
    []perf_sw_event_context_switch+0x150/0x1b4
    []perf_event_task_sched_out+0x44/0x2d4
    []schedule+0x2c0/0x614
    []__cond_resched+0x34/0x90
    []_cond_resched+0x4c/0x68
    []move_page_tables+0xb0/0x418
    []setup_arg_pages+0x184/0x2a0
    []load_elf_binary+0x394/0x1208
    []search_binary_handler+0xe0/0x2c4
    []do_execve+0x1bc/0x268
    []sys_execve+0x84/0xc8
    []ret_from_syscall+0x0/0x3c

    A page fault occurred walking the callchain while creating a perf
    sample for the context-switch event. To handle the page fault the
    mmap_sem is needed, but it is currently held by setup_arg_pages.
    (setup_arg_pages calls shift_arg_pages with the mmap_sem held.
    shift_arg_pages then calls move_page_tables which has a cond_resched
    at the top of its for loop - hitting that cond_resched is what caused
    the context switch.)

    This is an extension of Anton's proposed patch:
    https://lkml.org/lkml/2011/7/24/151
    adding case for 32-bit ppc.

    Tested on the system that first generated the panic and then again
    with latest kernel using a PPC VM. I am not able to test the 64-bit
    path - I do not have H/W for it and 64-bit PPC VMs (qemu on Intel)
    is horribly slow.

    Signed-off-by: David Ahern
    Signed-off-by: Benjamin Herrenschmidt

    David Ahern
     
  • One definition of PV_POWER7 seems enough to me.

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Benjamin Herrenschmidt

    Peter Zijlstra
     
  • On a box with 8TB of RAM the MMU hashtable is 64GB in size. That
    means we have 4G PTEs. pSeries_lpar_hptab_clear was using a signed
    int to store the index which will overflow at 2G.

    Signed-off-by: Anton Blanchard
    Cc:
    Acked-by: Michael Neuling
    Signed-off-by: Benjamin Herrenschmidt

    Anton Blanchard
     
  • I hit an oops at boot on the first instruction of timer_cpu_notify:

    NIP [c000000000722f88] .timer_cpu_notify+0x0/0x388

    The code should look like:

    c000000000722f78: eb e9 00 30 ld r31,48(r9)
    c000000000722f7c: 2f bf 00 00 cmpdi cr7,r31,0
    c000000000722f80: 40 9e ff 44 bne+ cr7,c000000000722ec4
    c000000000722f84: 4b ff ff 74 b c000000000722ef8

    c000000000722f88 :
    c000000000722f88: 7c 08 02 a6 mflr r0
    c000000000722f8c: 2f a4 00 07 cmpdi cr7,r4,7
    c000000000722f90: fb c1 ff f0 std r30,-16(r1)
    c000000000722f94: fb 61 ff d8 std r27,-40(r1)

    But the oops output shows:

    eb61ffd8 eb81ffe0 eba1ffe8 ebc1fff0 7c0803a6 ebe1fff8 4e800020
    00000000 ebe90030 c0000000 00ad0a28 00000000 2fa40007 fbc1fff0 fb61ffd8

    So we scribbled over our instructions with c000000000ad0a28, which
    is an address inside the jump_table ELF section.

    It turns out the jump_table section is only aligned to 8 bytes but
    we are aligning our entries within the section to 16 bytes. This
    means our entries are offset from the table:

    c000000000acd4a8 :
    ...
    c000000000ad0a10: c0 00 00 00 lfs f0,0(0)
    c000000000ad0a14: 00 70 cd 5c .long 0x70cd5c
    c000000000ad0a18: c0 00 00 00 lfs f0,0(0)
    c000000000ad0a1c: 00 70 cd 90 .long 0x70cd90
    c000000000ad0a20: c0 00 00 00 lfs f0,0(0)
    c000000000ad0a24: 00 ac a4 20 .long 0xaca420

    And the jump table sort code gets very confused and writes into the
    wrong spot. Remove the alignment, and also remove the padding since
    we it saves some space and we shouldn't need it.

    Signed-off-by: Anton Blanchard
    Signed-off-by: Benjamin Herrenschmidt

    Anton Blanchard
     
  • Add a newline to the panic messages in make_room. Also fix a
    comment that suggested our chunk size is 4Mb. It's 1MB.

    Signed-off-by: Anton Blanchard
    Signed-off-by: Benjamin Herrenschmidt

    Anton Blanchard
     
  • I have a box that fails in OF during boot with:

    DEFAULT CATCH!, exception-handler=fff00400
    at %SRR0: 49424d2c4c6f6768 %SRR1: 800000004000b002

    ie "IBM,Logh". OF got corrupted with a device tree string.

    Looking at make_room and alloc_up, we claim the first chunk (1 MB)
    but we never claim any more. mem_end is always set to alloc_top
    which is the top of our available address space, guaranteeing we will
    never call alloc_up and claim more memory.

    Also alloc_up wasn't setting alloc_bottom to the bottom of the
    available address space.

    This doesn't help the box to boot, but we at least fail with
    an obvious error. We could relocate the device tree in a future
    patch.

    Signed-off-by: Anton Blanchard
    Cc:
    Signed-off-by: Benjamin Herrenschmidt

    Anton Blanchard
     
  • Commit af9eef3c7b1ed004c378c89b87642f4937337d50 caused cpu_setup to see
    the_cpu_spec, rather than the source struct. However, on 32-bit, the
    return value of identify_cpu was being used for feature fixups, and
    identify_cpu was returning the source struct. So if cpu_setup patches
    the feature bits, the update won't affect the fixups.

    Signed-off-by: Scott Wood
    Signed-off-by: Benjamin Herrenschmidt

    Scott Wood
     
  • Add a cast in case the caller passes in a different type, as it would
    if mtspr/mtmsr were functions.

    Previously, if a 64-bit type was passed in on 32-bit, GCC would bind the
    constraint to a pair of registers, and would substitute the first register
    in the pair in the asm code. This corresponds to the upper half of the
    64-bit register, which is generally not the desired behavior.

    Signed-off-by: Scott Wood
    Signed-off-by: Benjamin Herrenschmidt

    Scott Wood
     
  • * 'devicetree/merge' of git://git.secretlab.ca/git/linux-2.6:
    Revert "dt: add of_alias_scan and of_alias_get_id"
    dt: remove of_alias_get_id() reference

    Linus Torvalds
     
  • * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/parisc-2.6:
    [PARISC] wire up sendmmsg syscall
    [PARISC] fix return type of __atomic64_add_return
    [PARISC] Fix futex support

    Linus Torvalds
     
  • * 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6:
    [S390] signal: use set_restore_sigmask() helper
    [S390] smp: remove pointless comments in startup_secondary()
    [S390] qdio: Use kstrtoul_from_user
    [S390] sclp_async: Use kstrtoul_from_user
    [S390] exec: remove redundant set_fs(USER_DS)
    [S390] cpu hotplug: on cpu start wait until being marked active
    [S390] signal: convert to use set_current_blocked()
    [S390] asm offsets: fix coding style
    [S390] Add support for IBM zEnterprise 114
    [S390] dasd: check if raw track access is supported
    [S390] Use diagnose 308 for system reset
    [S390] Export store_status() function
    [S390] dasd: use vmalloc for statistics input buffer
    [S390] Add PSW restart shutdown trigger
    [S390] missing return in page_table_alloc_pgste
    [S390] qdio: 2nd stage retry on SIGA-W busy conditions

    Linus Torvalds
     
  • While `pci_eisa_driver' still refer `pci_eisa_init', the .probe() function
    should not be called after init memory release, as pointed out by commit
    74b9a297. The structure is still referenced in the drivers subsystem, and can
    be accesseed through sysfs, so the modpost warning is a false positive. Mark
    it as such.

    In the same time, the warning referenced in 005bdad7b80 did only mention
    `pci_eisa_driver', not `pci_eisa_pci_tbl', so remove its marking.

    Broken-by: Arnaud Lacombe (in 005bdad7b80)
    Reported-by: Tetsuo Handa
    Signed-off-by: Arnaud Lacombe
    Signed-off-by: Linus Torvalds

    Arnaud Lacombe
     

04 Aug, 2011

17 commits

  • This reverts commit 750f463a749e28464151ad26938d11b07b1c43cb.

    of_alias_* still needs work to be generalized for 'promtree' dt
    platforms, and to no implicitly create entries for available ids.

    Signed-off-by: Grant Likely

    Grant Likely
     
  • of_alias_get_id() is broken and being reverted. Remove the reference
    to it and replace with a single incrementing id number.

    There is no risk of regression here on the imx driver since the imx
    change to use of_alias_get_id() is commit 22698aa2, "serial/imx: add
    device tree probe support" which is new for v3.1, and it won't get
    used unless CONFIG_OF is enabled and the board is booted using a
    device tree. A single incrementing integer is sufficient for now.

    Signed-off-by: Grant Likely
    Acked-by: Shawn Guo

    Grant Likely
     
  • The core device layer sends tons of uevent notifications for each device
    it finds, and if the kernel has been built with a non-empty
    CONFIG_UEVENT_HELPER_PATH that will make us try to execute the usermode
    helper binary for all these events very early in the boot.

    Not only won't the root filesystem even be mounted at that point, we
    literally won't have necessarily even initialized all the process
    handling data structures at that point, which causes no end of silly
    problems even when the usermode helper doesn't actually succeed in
    executing.

    So just use our existing infrastructure to disable the usermodehelpers
    to make the kernel start out with them disabled. We enable them when
    we've at least initialized stuff a bit.

    Problems related to an uninitialized

    init_ipc_ns.ids[IPC_SHM_IDS].rw_mutex

    reported by various people.

    Reported-by: Manuel Lauss
    Reported-by: Richard Weinberger
    Reported-by: Marc Zyngier
    Acked-by: Kay Sievers
    Cc: Andrew Morton
    Cc: Vasiliy Kulikov
    Cc: Greg KH
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • Dmitry Kasatkin reports:
    "kernel-devel package with kernel headers have no
    directory if XEN is disabled. Modules which inclide asm/io.h won't
    compile.

    XEN related content is behind the CONFIG_XEN flag in the io.h. And
    should be also behind CONFIG_XEN flag."

    So move the include of down into the section that is
    conditional on CONFIG_XEN.

    Reported-by: Dmitry Kasatkin
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
    Input: ad7879 - fix deficient device disable
    Input: gpio_keys - fix two typos in devicetree documentation
    Input: mma8450 - add device tree probe support
    Input: gpio_keys - return proper error code if memory allocation fails
    Input: lm8323 - add missing device_remove_file for dev_attr_time
    Input: tegra-kbc - fix computation of polling time
    Input: kxtj9 - explicitly include module.h
    Input: psmouse - hgpk.c needs module.h

    Linus Torvalds
     
  • * 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6:
    cpuidle: stop depending on pm_idle
    x86 idle: move mwait_idle_with_hints() to where it is used
    cpuidle: replace xen access to x86 pm_idle and default_idle
    cpuidle: create bootparam "cpuidle.off=1"
    mrst_pmu: driver for Intel Moorestown Power Management Unit

    Linus Torvalds
     
  • * 'apei-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6:
    ACPI, APEI, EINJ Param support is disabled by default
    APEI GHES: 32-bit buildfix
    ACPI: APEI build fix
    ACPI, APEI, GHES: Add hardware memory error recovery support
    HWPoison: add memory_failure_queue()
    ACPI, APEI, GHES, Error records content based throttle
    ACPI, APEI, GHES, printk support for recoverable error via NMI
    lib, Make gen_pool memory allocator lockless
    lib, Add lock-less NULL terminated single list
    Add Kconfig option ARCH_HAVE_NMI_SAFE_CMPXCHG
    ACPI, APEI, Add WHEA _OSC support
    ACPI, APEI, Add APEI bit support in generic _OSC call
    ACPI, APEI, GHES, Support disable GHES at boot time
    ACPI, APEI, GHES, Prevent GHES to be built as module
    ACPI, APEI, Use apei_exec_run_optional in APEI EINJ and ERST
    ACPI, APEI, Add apei_exec_run_optional
    ACPI, APEI, GHES, Do not ratelimit fatal error printk before panic
    ACPI, APEI, ERST, Fix erst-dbg long record reading issue
    ACPI, APEI, ERST, Prevent erst_dbg from loading if ERST is disabled

    Linus Torvalds
     
  • * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
    tcm_fc: Handle DDP/SW fc_frame_payload_get failures in ft_recv_write_data
    target: Fix bug for transport_generic_wait_for_tasks with direct operation
    target: iscsi_target depends on NET
    target: Fix WRITE_SAME_16 lba assignment breakage
    MAINTAINERS: Add target-devel list for drivers/target/
    iscsi-target: Fix CONFIG_SMP=n and CONFIG_MODULES=n build failure
    iscsi-target: Fix snprintf usage with MAX_PORTAL_LEN
    iscsi-target: Fix uninitialized usage of cmd->pad_bytes
    iscsi-target: strlen() doesn't count the terminator
    iscsi-target: Fix NULL dereference on allocation failure

    Linus Torvalds
     
  • * 'devicetree/next' of git://git.secretlab.ca/git/linux-2.6:
    dt: add of_alias_scan and of_alias_get_id

    Linus Torvalds
     
  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
    ext4: use kzalloc in ext4_kzalloc()

    Linus Torvalds
     
  • We may optimistically check .in_use == 0 without holding the rw_mutex:
    it's the common case, and if it's zero, there certainly won't be any
    segments associated with us.

    After taking the lock, the idr_for_each() will do the right thing, so we
    could now drop the re-check inside the lock without any real cost. But
    it won't hurt.

    Signed-off-by: Vasiliy Kulikov
    Signed-off-by: Linus Torvalds

    Vasiliy Kulikov
     
  • Commit 4c677e2eefdb ("shm: optimize locking and ipc_namespace getting")
    introduced a copy-paste bug. Due to the bug cycle optimizations were
    disabled.

    Signed-off-by: Vasiliy Kulikov
    Signed-off-by: Linus Torvalds

    Vasiliy Kulikov
     
  • Expand the fs/Kconfig "help" info to clarify why it's a bad idea to
    deselect the TMPFS_POSIX_ACL config variable.

    Signed-off-by: Robert P. J. Day
    Acked-by: Randy Dunlap
    Acked-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Robert P. J. Day
     
  • Make the radix_tree exceptional cases, mostly in filemap.c, clearer.

    It's hard to devise a suitable snappy name that illuminates the use by
    shmem/tmpfs for swap, while keeping filemap/pagecache/radix_tree
    generality. And akpm points out that /* radix_tree_deref_retry(page) */
    comments look like calls that have been commented out for unknown
    reason.

    Skirt the naming difficulty by rearranging these blocks to handle the
    transient radix_tree_deref_retry(page) case first; then just explain the
    remaining shmem/tmpfs swap case in a comment.

    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • We have already acknowledged that swapoff of a tmpfs file is slower than
    it was before conversion to the generic radix_tree: a little slower
    there will be acceptable, if the hotter paths are faster.

    But it was a shock to find swapoff of a 500MB file 20 times slower on my
    laptop, taking 10 minutes; and at that rate it significantly slows down
    my testing.

    Now, most of that turned out to be overhead from PROVE_LOCKING and
    PROVE_RCU: without those it was only 4 times slower than before; and
    more realistic tests on other machines don't fare as badly.

    I've tried a number of things to improve it, including tagging the swap
    entries, then doing lookup by tag: I'd expected that to halve the time,
    but in practice it's erratic, and often counter-productive.

    The only change I've so far found to make a consistent improvement, is
    to short-circuit the way we go back and forth, gang lookup packing
    entries into the array supplied, then shmem scanning that array for the
    target entry. Scanning in place doubles the speed, so it's now only
    twice as slow as before (or three times slower when the PROVEs are on).

    So, add radix_tree_locate_item() as an expedient, once-off,
    single-caller hack to do the lookup directly in place. #ifdef it on
    CONFIG_SHMEM and CONFIG_SWAP, as much to document its limited
    applicability as save space in other configurations. And, sadly,
    #include sched.h for cond_resched().

    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Remove PageSwapBacked (!page_is_file_cache) cases from
    add_to_page_cache_locked() and add_to_page_cache_lru(): those pages now
    go through shmem_add_to_page_cache().

    Remove a comment on maximum tmpfs size from fsstack_copy_inode_size(),
    and add a comment on swap entries to invalidate_mapping_pages().

    And mincore_page() uses find_get_page() on what might be shmem or a
    tmpfs file: allow for a radix_tree_exceptional_entry(), and proceed to
    find_get_page() on swapper_space if so (oh, swapper_space needs #ifdef).

    Signed-off-by: Hugh Dickins
    Acked-by: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • But we've not yet removed the old swp_entry_t i_direct[16] from
    shmem_inode_info. That's because it was still being shared with the
    inline symlink. Remove it now (saving 64 or 128 bytes from shmem inode
    size), and use kmemdup() for short symlinks, say, those up to 128 bytes.

    I wonder why mpol_free_shared_policy() is done in shmem_destroy_inode()
    rather than shmem_evict_inode(), where we usually do such freeing? I
    guess it doesn't matter, and I'm not into NUMA mpol testing right now.

    Signed-off-by: Hugh Dickins
    Acked-by: Rik van Riel
    Reviewed-by: Pekka Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins