10 Apr, 2015

1 commit

  • We have a powerpc specific global called mem_init_done which is "set on
    boot once kmalloc can be called".

    But that's not *quite* true. We set it at the bottom of mem_init(), and
    rely on the fact that mm_init() calls kmem_cache_init() immediately
    after that, and nothing is running in parallel.

    So replace it with the generic and 100% correct slab_is_available().

    Signed-off-by: Michael Ellerman

    Michael Ellerman
     

26 Mar, 2015

1 commit


25 Mar, 2015

1 commit

  • If CONFIG_SMP=n, does not include , causing:

    drivers/cpufreq/ppc-corenet-cpufreq.c: In function 'corenet_cpufreq_cpu_init':
    drivers/cpufreq/ppc-corenet-cpufreq.c:173:3: error: implicit declaration of function 'get_hard_smp_processor_id' [-Werror=implicit-funcuresh E. Warrier"
    X-Patchwork-Id: 443703
    Message-Id:
    To: linuxppc-dev@ozlabs.org
    Date: Wed, 25 Feb 2015 17:23:53 -0600

    Export __spin_yield so that the arch_spin_unlock() function can
    be invoked from a module. This will be required for modules where
    we want to take a lock that is also is acquired in hypervisor
    real mode. Because we want to avoid running any lockdep code
    (which may not be safe in real mode), this lock needs to be
    an arch_spinlock_t instead of a normal spinlock.

    Signed-off-by: Suresh Warrier
    Acked-by: Paul Mackerras
    Signed-off-by: Benjamin Herrenschmidt

    Geert Uytterhoeven
     

17 Mar, 2015

1 commit

  • These functions are only used from one place each. If the cacheable_*
    versions really are more efficient, then those changes should be
    migrated into the common code instead.

    NOTE: The old routines are just flat buggy on kernels that support
    hardware with different cacheline sizes.

    Signed-off-by: Kyle Moffett
    Signed-off-by: Benjamin Herrenschmidt

    Kyle Moffett
     

16 Mar, 2015

3 commits

  • The kfree() function tests whether its argument is NULL and then returns
    immediately. Thus the test around the call is not needed.

    This issue was detected by using the Coccinelle software.

    Signed-off-by: Markus Elfring
    Signed-off-by: Michael Ellerman

    Markus Elfring
     
  • As our various loops (copy, string, crypto etc) get more complicated,
    we want to share implementations between userspace (eg glibc) and
    the kernel. We also want to write userspace test harnesses to put
    in tools/testing/selftest.

    One gratuitous difference between userspace and the kernel is the
    VSX register definitions - the kernel uses vsrX whereas gcc uses
    vsX.

    Change the kernel to match userspace.

    Signed-off-by: Anton Blanchard
    Signed-off-by: Michael Ellerman

    Anton Blanchard
     
  • As our various loops (copy, string, crypto etc) get more complicated,
    we want to share implementations between userspace (eg glibc) and
    the kernel. We also want to write userspace test harnesses to put
    in tools/testing/selftest.

    One gratuitous difference between userspace and the kernel is the
    VMX register definitions - the kernel uses vrX whereas both gcc and
    glibc use vX.

    Change the kernel to match userspace.

    Signed-off-by: Anton Blanchard
    Signed-off-by: Michael Ellerman

    Anton Blanchard
     

28 Jan, 2015

2 commits


23 Jan, 2015

1 commit

  • I noticed ksm spending quite a lot of time in memcmp on a large
    KVM box. The current memcmp loop is very unoptimised - byte at a
    time compares with no loop unrolling. We can do much much better.

    Optimise the loop in a few ways:

    - Unroll the byte at a time loop

    - For large (at least 32 byte) comparisons that are also 8 byte
    aligned, use an unrolled modulo scheduled loop using 8 byte
    loads. This is similar to our glibc memcmp.

    A simple microbenchmark testing 10000000 iterations of an 8192 byte
    memcmp was used to measure the performance:

    baseline: 29.93 s

    modified: 1.70 s

    Just over 17x faster.

    v2: Incorporated some suggestions from Segher:

    - Use andi. instead of rdlicl.

    - Convert bdnzt eq, to bdnz. It's just duplicating the earlier compare
    and was a relic from a previous version.

    - Don't use cr5, we have plans to use that CR field for fast local
    atomics.

    Signed-off-by: Anton Blanchard
    Signed-off-by: Michael Ellerman

    Anton Blanchard
     

29 Dec, 2014

1 commit

  • In the Makefile, string.o (which is generated from string.S) is
    included into the list of objects being built unconditionally
    (obj-y) in line 12.

    Additionally, if CONFIG_PPC64 is set, it is included again in
    line 17.

    This patch removes the latter unnecessary inclusion.

    Signed-off-by: Andreas Ruprecht
    Signed-off-by: Michael Ellerman

    Andreas Ruprecht
     

13 Dec, 2014

1 commit

  • Pull trivial tree update from Jiri Kosina:
    "Usual stuff: documentation updates, printk() fixes, etc"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (24 commits)
    intel_ips: fix a type in error message
    cpufreq: cpufreq-dt: Move newline to end of error message
    ps3rom: fix error return code
    treewide: fix typo in printk and Kconfig
    ARM: dts: bcm63138: change "interupts" to "interrupts"
    Replace mentions of "list_struct" to "list_head"
    kernel: trace: fix printk message
    scsi: mpt2sas: fix ioctl in comment
    zbud, zswap: change module author email
    clocksource: Fix 'clcoksource' typo in comment
    arm: fix wording of "Crotex" in CONFIG_ARCH_EXYNOS3 help
    gpio: msm-v1: make boolean argument more obvious
    usb: Fix typo in usb-serial-simple.c
    PCI: Fix comment typo 'COMFIG_PM_OPS'
    powerpc: Fix comment typo 'CONIFG_8xx'
    powerpc: Fix comment typos 'CONFiG_ALTIVEC'
    clk: st: Spelling s/stucture/structure/
    isci: Spelling s/stucture/structure/
    usb: gadget: zero: Spelling s/infrastucture/infrastructure/
    treewide: Fix company name in module descriptions
    ...

    Linus Torvalds
     

20 Nov, 2014

1 commit


19 Nov, 2014

1 commit

  • Although we are now selecting NO_BOOTMEM, we still have some traces of
    bootmem lying around. That is because even with NO_BOOTMEM there is
    still a shim that converts bootmem calls into memblock calls, but
    ultimately we want to remove all traces of bootmem.

    Most of the patch is conversions from alloc_bootmem() to
    memblock_virt_alloc(). In general a call such as:

    p = (struct foo *)alloc_bootmem(x);

    Becomes:

    p = memblock_virt_alloc(x, 0);

    We don't need the cast because memblock_virt_alloc() returns a void *.
    The alignment value of zero tells memblock to use the default alignment,
    which is SMP_CACHE_BYTES, the same value alloc_bootmem() uses.

    We remove a number of NULL checks on the result of
    memblock_virt_alloc(). That is because memblock_virt_alloc() will panic
    if it can't allocate, in exactly the same way as alloc_bootmem(), so the
    NULL checks are and always have been redundant.

    The memory returned by memblock_virt_alloc() is already zeroed, so we
    remove several memsets of the result of memblock_virt_alloc().

    Finally we convert a few uses of __alloc_bootmem(x, y, MAX_DMA_ADDRESS)
    to just plain memblock_virt_alloc(). We don't use memblock_alloc_base()
    because MAX_DMA_ADDRESS is ~0ul on powerpc, so limiting the allocation
    to that is pointless, 16XB ought to be enough for anyone.

    Signed-off-by: Michael Ellerman

    Michael Ellerman
     

12 Nov, 2014

1 commit

  • Commit be96f63375a1 ("powerpc: Split out instruction analysis
    part of emulate_step()") added some calls to do_fp_load()
    and do_fp_store(), which fail to compile on configs with
    CONFIG_PPC_FPU=n and CONFIG_PPC_EMULATE_SSTEP=y. This fixes
    the compile by adding #ifdef CONFIG_PPC_FPU around the code
    that calls these functions.

    Signed-off-by: Paul Mackerras
    Signed-off-by: Michael Ellerman

    Paul Mackerras
     

10 Nov, 2014

1 commit


29 Oct, 2014

1 commit


25 Sep, 2014

5 commits


13 Aug, 2014

1 commit

  • Similar to the previous commit which described why we need to add a
    barrier to arch_spin_is_locked(), we have a similar problem with
    spin_unlock_wait().

    We need a barrier on entry to ensure any spinlock we have previously
    taken is visibly locked prior to the load of lock->slock.

    It's also not clear if spin_unlock_wait() is intended to have ACQUIRE
    semantics. For now be conservative and add a barrier on exit to give it
    ACQUIRE semantics.

    Signed-off-by: Michael Ellerman
    Signed-off-by: Benjamin Herrenschmidt

    Michael Ellerman
     

28 Jul, 2014

1 commit


22 Jul, 2014

2 commits

  • memmove may be called from module code copy_pages(btrfs), and it may
    call memcpy, which may call back to C code, so it needs to use
    _GLOBAL_TOC to set up r2 correctly.

    This fixes following error when I tried to boot an le guest:

    Vector: 300 (Data Access) at [c000000073f97210]
    pc: c000000000015004: enable_kernel_altivec+0x24/0x80
    lr: c000000000058fbc: enter_vmx_copy+0x3c/0x60
    sp: c000000073f97490
    msr: 8000000002009033
    dar: d000000001d50170
    dsisr: 40000000
    current = 0xc0000000734c0000
    paca = 0xc00000000fff0000 softe: 0 irq_happened: 0x01
    pid = 815, comm = mktemp
    enter ? for help
    [c000000073f974f0] c000000000058fbc enter_vmx_copy+0x3c/0x60
    [c000000073f97510] c000000000057d34 memcpy_power7+0x274/0x840
    [c000000073f97610] d000000001c3179c copy_pages+0xfc/0x110 [btrfs]
    [c000000073f97660] d000000001c3c248 memcpy_extent_buffer+0xe8/0x160 [btrfs]
    [c000000073f97700] d000000001be4be8 setup_items_for_insert+0x208/0x4a0 [btrfs]
    [c000000073f97820] d000000001be50b4 btrfs_insert_empty_items+0xf4/0x140 [btrfs]
    [c000000073f97890] d000000001bfed30 insert_with_overflow+0x70/0x180 [btrfs]
    [c000000073f97900] d000000001bff174 btrfs_insert_dir_item+0x114/0x2f0 [btrfs]
    [c000000073f979a0] d000000001c1f92c btrfs_add_link+0x10c/0x370 [btrfs]
    [c000000073f97a40] d000000001c20e94 btrfs_create+0x204/0x270 [btrfs]
    [c000000073f97b00] c00000000026d438 vfs_create+0x178/0x210
    [c000000073f97b50] c000000000270a70 do_last+0x9f0/0xe90
    [c000000073f97c20] c000000000271010 path_openat+0x100/0x810
    [c000000073f97ce0] c000000000272ea8 do_filp_open+0x58/0xd0
    [c000000073f97dc0] c00000000025ade8 do_sys_open+0x1b8/0x300
    [c000000073f97e30] c00000000000a008 syscall_exit+0x0/0x7c

    Signed-off-by: Benjamin Herrenschmidt

    Li Zhong
     
  • This fixes some bugs in emulate_step(). First, the setting of the carry
    bit for the arithmetic right-shift instructions was not correct on 64-bit
    machines because we were masking with a mask of type int rather than
    unsigned long. Secondly, the sld (shift left doubleword) instruction was
    using the wrong instruction field for the register containing the shift
    count.

    Signed-off-by: Paul Mackerras
    Signed-off-by: Benjamin Herrenschmidt

    Paul Mackerras
     

11 Jun, 2014

1 commit

  • Commit cd64d1697cf0 ("powerpc: mtmsrd not defined") added a check for
    CONFIG_PPC_CPU were a check for CONFIG_PPC_FPU was clearly intended.

    Fixes: cd64d1697cf0 ("powerpc: mtmsrd not defined")
    Signed-off-by: Paul Bolle
    Signed-off-by: Benjamin Herrenschmidt

    Paul Bolle
     

05 Jun, 2014

1 commit


05 May, 2014

1 commit


30 Apr, 2014

1 commit

  • Unaligned stores take alignment exceptions on POWER7 running in little-endian.
    This is a dumb little-endian base memcpy that prevents unaligned stores.
    Once booted the feature fixup code switches over to the VMX copy loops
    (which are already endian safe).

    The question is what we do before that switch over. The base 64bit
    memcpy takes alignment exceptions on POWER7 so we can't use it as is.
    Fixing the causes of alignment exception would slow it down, because
    we'd need to ensure all loads and stores are aligned either through
    rotate tricks or bytewise loads and stores. Either would be bad for
    all other 64bit platforms.

    [ I simplified the loop a bit - Anton ]

    Signed-off-by: Philippe Bergheaud
    Signed-off-by: Anton Blanchard
    Signed-off-by: Benjamin Herrenschmidt

    Philippe Bergheaud
     

23 Apr, 2014

4 commits


07 Mar, 2014

1 commit

  • Turn Anton's memcpy / copy_tofrom_user test into something that can
    live in tools/testing/selftests.

    It requires one turd in arch/powerpc/lib/memcpy_64.S, but it's pretty
    harmless IMHO.

    We are sailing very close to the wind with the feature macros. We define
    them to nothing, which currently means we get a few extra nops and
    include the unaligned calls.

    Signed-off-by: Anton Blanchard
    Signed-off-by: Michael Ellerman
    Signed-off-by: Benjamin Herrenschmidt

    Michael Ellerman
     

15 Jan, 2014

1 commit


30 Dec, 2013

2 commits

  • Merge a pile of fixes that went into the "merge" branch (3.13-rc's) such
    as Anton Little Endian fixes.

    Signed-off-by: Benjamin Herrenschmidt

    Benjamin Herrenschmidt
     
  • The powerpc 64-bit __copy_tofrom_user() function uses shifts to handle
    unaligned invocations. However, these shifts were designed for
    big-endian systems: On little-endian systems, they must shift in the
    opposite direction.

    This commit relies on the C preprocessor to insert the correct shifts
    into the assembly code.

    [ This is a rare but nasty LE issue. Most of the time we use the POWER7
    optimised __copy_tofrom_user_power7 loop, but when it hits an exception
    we fall back to the base __copy_tofrom_user loop. - Anton ]

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Anton Blanchard
    Signed-off-by: Benjamin Herrenschmidt

    Paul E. McKenney
     

02 Dec, 2013

1 commit


30 Oct, 2013

1 commit

  • Add a VMX optimised xor, used primarily for RAID5. On a POWER7 blade
    this is a decent win:

    32regs : 17932.800 MB/sec
    altivec : 19724.800 MB/sec

    The bigger gain is when the same test is run in SMT4 mode, as it
    would if there was a lot of work going on:

    8regs : 8377.600 MB/sec
    altivec : 15801.600 MB/sec

    I tested this against an array created without the patch, and also
    verified it worked as expected on a little endian kernel.

    [ Fix !CONFIG_ALTIVEC build -- BenH ]

    Signed-off-by: Anton Blanchard
    Signed-off-by: Benjamin Herrenschmidt

    Anton Blanchard