17 Aug, 2015

1 commit

  • [ Upstream commit 44922150d87cef616fd183220d43d8fde4d41390 ]

    If we have a series of events from userpsace, with %fprs=FPRS_FEF,
    like follows:

    ETRAP
    ETRAP
    VIS_ENTRY(fprs=0x4)
    VIS_EXIT
    RTRAP (kernel FPU restore with fpu_saved=0x4)
    RTRAP

    We will not restore the user registers that were clobbered by the FPU
    using kernel code in the inner-most trap.

    Traps allocate FPU save slots in the thread struct, and FPU using
    sequences save the "dirty" FPU registers only.

    This works at the initial trap level because all of the registers
    get recorded into the top-level FPU save area, and we'll return
    to userspace with the FPU disabled so that any FPU use by the user
    will take an FPU disabled trap wherein we'll load the registers
    back up properly.

    But this is not how trap returns from kernel to kernel operate.

    The simplest fix for this bug is to always save all FPU register state
    for anything other than the top-most FPU save area.

    Getting rid of the optimized inner-slot FPU saving code ends up
    making VISEntryHalf degenerate into plain VISEntry.

    Longer term we need to do something smarter to reinstate the partial
    save optimizations. Perhaps the fundament error is having trap entry
    and exit allocate FPU save slots and restore register state. Instead,
    the VISEntry et al. calls should be doing that work.

    This bug is about two decades old.

    Reported-by: James Y Knight
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    David S. Miller
     

24 Mar, 2015

1 commit


08 Nov, 2014

1 commit


15 Oct, 2014

1 commit

  • The AES loops in arch/sparc/crypto/aes_glue.c use a scheme where the
    key material is preloaded into the FPU registers, and then we loop
    over and over doing the crypt operation, reusing those pre-cooked key
    registers.

    There are intervening blkcipher*() calls between the crypt operation
    calls. And those might perform memcpy() and thus also try to use the
    FPU.

    The sparc64 kernel FPU usage mechanism is designed to allow such
    recursive uses, but with a catch.

    There has to be a trap between the two FPU using threads of control.

    The mechanism works by, when the FPU is already in use by the kernel,
    allocating a slot for FPU saving at trap time. Then if, within the
    trap handler, we try to use the FPU registers, the pre-trap FPU
    register state is saved into the slot. Then at trap return time we
    notice this and restore the pre-trap FPU state.

    Over the long term there are various more involved ways we can make
    this work, but for a quick fix let's take advantage of the fact that
    the situation where this happens is very limited.

    All sparc64 chips that support the crypto instructiosn also are using
    the Niagara4 memcpy routine, and that routine only uses the FPU for
    large copies where we can't get the source aligned properly to a
    multiple of 8 bytes.

    We look to see if the FPU is already in use in this context, and if so
    we use the non-large copy path which only uses integer registers.

    Furthermore, we also limit this special logic to when we are doing
    kernel copy, rather than a user copy.

    Signed-off-by: David S. Miller

    David S. Miller
     

13 Oct, 2014

1 commit

  • Pull arch atomic cleanups from Ingo Molnar:
    "This is a series kept separate from the main locking tree, which
    cleans up and improves various details in the atomics type handling:

    - Remove the unused atomic_or_long() method

    - Consolidate and compress atomic ops implementations between
    architectures, to reduce linecount and to make it easier to add new
    ops.

    - Rewrite generic atomic support to only require cmpxchg() from an
    architecture - generate all other methods from that"

    * 'locking-arch-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
    locking,arch: Use ACCESS_ONCE() instead of cast to volatile in atomic_read()
    locking, mips: Fix atomics
    locking, sparc64: Fix atomics
    locking,arch: Rewrite generic atomic support
    locking,arch,xtensa: Fold atomic_ops
    locking,arch,sparc: Fold atomic_ops
    locking,arch,sh: Fold atomic_ops
    locking,arch,powerpc: Fold atomic_ops
    locking,arch,parisc: Fold atomic_ops
    locking,arch,mn10300: Fold atomic_ops
    locking,arch,mips: Fold atomic_ops
    locking,arch,metag: Fold atomic_ops
    locking,arch,m68k: Fold atomic_ops
    locking,arch,m32r: Fold atomic_ops
    locking,arch,ia64: Fold atomic_ops
    locking,arch,hexagon: Fold atomic_ops
    locking,arch,cris: Fold atomic_ops
    locking,arch,avr32: Fold atomic_ops
    locking,arch,arm64: Fold atomic_ops
    locking,arch,arm: Fold atomic_ops
    ...

    Linus Torvalds
     

10 Sep, 2014

2 commits

  • The patch folding the atomic ops had a silly fail in the _return primitives.

    Fixes: 4f3316c2b5fe ("locking,arch,sparc: Fold atomic_ops")
    Reported-by: Guenter Roeck
    Tested-by: Guenter Roeck
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: "David S. Miller"
    Cc: Stephen Rothwell
    Cc: David S. Miller
    Cc: Linus Torvalds
    Cc: sparclinux@vger.kernel.org
    Link: http://lkml.kernel.org/r/20140902094016.GD31157@worktop.ger.corp.intel.com
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • This makes memset follow the standard (instead of returning 0 on success). This
    is needed when certain versions of gcc optimizes around memset calls and assume
    that the address argument is preserved in %o0.

    Signed-off-by: Andreas Larsson
    Signed-off-by: David S. Miller

    Andreas Larsson
     

14 Aug, 2014

1 commit

  • Many of the atomic op implementations are the same except for one
    instruction; fold the lot into a few CPP macros and reduce LoC.

    This also prepares for easy addition of new ops.

    Signed-off-by: Peter Zijlstra
    Acked-by: David S. Miller
    Cc: Bjorn Helgaas
    Cc: Kirill Tkhai
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Sam Ravnborg
    Cc: sparclinux@vger.kernel.org
    Link: http://lkml.kernel.org/r/20140508135852.825281379@infradead.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

07 Aug, 2014

1 commit

  • Pull sparc updates from David Miller:

    1) Add sparc RAM output to /proc/iomem, from Bob Picco.

    2) Allow seeks on /dev/mdesc, from Khalid Aziz.

    3) Cleanup sparc64 I/O accessors, from Sam Ravnborg.

    4) If update_mmu_cache{,_pmd}() is called with an not-valid mapping, do
    not insert it into the TLB miss hash tables otherwise we'll
    livelock. Based upon work by Christopher Alexander Tobias Schulze.

    5) Fix BREAK detection in sunsab driver when no actual characters are
    pending, from Christopher Alexander Tobias Schulze.

    6) Because we have modules --> openfirmware --> vmalloc ordering of
    virtual memory, the lazy VMAP TLB flusher can cons up an invocation
    of flush_tlb_kernel_range() that covers the openfirmware address
    range. Unfortunately this will flush out the firmware's locked TLB
    mapping which causes all kinds of trouble. Just split up the flush
    request if this happens, but in the long term the lazy VMAP flusher
    should probably be made a little bit smarter.

    Based upon work by Christopher Alexander Tobias Schulze.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next:
    sparc64: Fix up merge thinko.
    sparc: Add "install" target
    arch/sparc/math-emu/math_32.c: drop stray break operator
    sparc64: ldc_connect() should not return EINVAL when handshake is in progress.
    sparc64: Guard against flushing openfirmware mappings.
    sunsab: Fix detection of BREAK on sunsab serial console
    bbc-i2c: Fix BBC I2C envctrl on SunBlade 2000
    sparc64: Do not insert non-valid PTEs into the TSB hash table.
    sparc64: avoid code duplication in io_64.h
    sparc64: reorder functions in io_64.h
    sparc64: drop unused SLOW_DOWN_IO definitions
    sparc64: remove macro indirection in io_64.h
    sparc64: update IO access functions in PeeCeeI
    sparcspkr: use sbus_*() primitives for IO
    sparc: Add support for seek and shorter read to /dev/mdesc
    sparc: use %s for unaligned panic
    drivers/sbus/char: Micro-optimization in display7seg.c
    display7seg: Introduce the use of the managed version of kzalloc
    sparc64 - add mem to iomem resource

    Linus Torvalds
     

22 Jul, 2014

1 commit

  • The PeeCeeI.c code used in*() + out*() for IO access.
    But these are in little endian and the native (big) endian
    result was required which resulted in some bit-shifting.
    Shift the code over to use the __raw_*() variants all over.

    This simplifies the code as we can drop the calls
    to le16_to_cpu() and le32_to_cpu().
    And it should be a little faster too.

    With this change we now uses the same type of IO access functions
    in all of the file.

    Signed-off-by: Sam Ravnborg
    Signed-off-by: David S. Miller

    Sam Ravnborg
     

19 Jul, 2014

1 commit


20 Jun, 2014

1 commit

  • Pull sparc fixes from David Miller:
    "Sparc sparse fixes from Sam Ravnborg"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next: (67 commits)
    sparc64: fix sparse warnings in int_64.c
    sparc64: fix sparse warning in ftrace.c
    sparc64: fix sparse warning in kprobes.c
    sparc64: fix sparse warning in kgdb_64.c
    sparc64: fix sparse warnings in compat_audit.c
    sparc64: fix sparse warnings in init_64.c
    sparc64: fix sparse warnings in aes_glue.c
    sparc: fix sparse warnings in smp_32.c + smp_64.c
    sparc64: fix sparse warnings in perf_event.c
    sparc64: fix sparse warnings in kprobes.c
    sparc64: fix sparse warning in tsb.c
    sparc64: clean up compat_sigset_t.seta handling
    sparc64: fix sparse "Should it be static?" warnings in signal32.c
    sparc64: fix sparse warnings in sys_sparc32.c
    sparc64: fix sparse warning in pci.c
    sparc64: fix sparse warnings in smp_64.c
    sparc64: fix sparse warning in prom_64.c
    sparc64: fix sparse warning in btext.c
    sparc64: fix sparse warnings in sys_sparc_64.c + unaligned_64.c
    sparc64: fix sparse warning in process_64.c
    ...

    Conflicts:
    arch/sparc/include/asm/pgtable_64.h

    Linus Torvalds
     

18 May, 2014

1 commit


02 May, 2014

1 commit

  • Use asm-generic/io.h definitions where applicable.
    The inxx() and outxx() methods whcih was duplicated in pcic.c +
    leon_pci.c are replaced by a set of static inlins from asm-generic/io.h

    iomap.c is replaced by the generic versions, but are still
    present to support sparc64.

    Signed-off-by: Sam Ravnborg
    Cc: Daniel Hellstrom
    Signed-off-by: David S. Miller

    Sam Ravnborg
     

13 Nov, 2013

1 commit

  • Choose PAGE_OFFSET dynamically based upon cpu type.

    Original UltraSPARC-I (spitfire) chips only supported a 44-bit
    virtual address space.

    Newer chips (T4 and later) support 52-bit virtual addresses
    and up to 47-bits of physical memory space.

    Therefore we have to adjust PAGE_SIZE dynamically based upon
    the capabilities of the chip.

    Note that this change alone does not allow us to support > 43-bit
    physical memory, to do that we need to re-arrange our page table
    support. The current encodings of the pmd_t and pgd_t pointers
    restricts us to "32 + 11" == 43 bits.

    This change can waste quite a bit of memory for the various tables.
    In particular, a future change should work to size and allocate
    kern_linear_bitmap[] and sparc64_valid_addr_bitmap[] dynamically.
    This isn't easy as we really cannot take a TLB miss when accessing
    kern_linear_bitmap[]. We'd have to lock it into the TLB or similar.

    Signed-off-by: David S. Miller
    Acked-by: Bob Picco

    David S. Miller
     

06 Sep, 2013

1 commit

  • The functions

    __down_read
    __down_read_trylock
    __down_write
    __down_write_trylock
    __up_read
    __up_write
    __downgrade_write

    are implemented inline, so remove corresponding EXPORT_SYMBOLs
    (They lead to compile errors on RT kernel).

    Signed-off-by: Kirill Tkhai
    CC: David Miller
    Signed-off-by: David S. Miller

    Kirill Tkhai
     

01 May, 2013

1 commit

  • The help text for this config is duplicated across the x86, parisc, and
    s390 Kconfig.debug files. Arnd Bergman noted that the help text was
    slightly misleading and should be fixed to state that enabling this
    option isn't a problem when using pre 4.4 gcc.

    To simplify the rewording, consolidate the text into lib/Kconfig.debug
    and modify it there to be more explicit about when you should say N to
    this config.

    Also, make the text a bit more generic by stating that this option
    enables compile time checks so we can cover architectures which emit
    warnings vs. ones which emit errors. The details of how an
    architecture decided to implement the checks isn't as important as the
    concept of compile time checking of copy_from_user() calls.

    While we're doing this, remove all the copy_from_user_overflow() code
    that's duplicated many times and place it into lib/ so that any
    architecture supporting this option can get the function for free.

    Signed-off-by: Stephen Boyd
    Acked-by: Arnd Bergmann
    Acked-by: Ingo Molnar
    Acked-by: H. Peter Anvin
    Cc: Arjan van de Ven
    Acked-by: Helge Deller
    Cc: Heiko Carstens
    Cc: Stephen Rothwell
    Cc: Chris Metcalf
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stephen Boyd
     

01 Apr, 2013

1 commit

  • srmmu_nocache_bitmap is cleared by bit_map_init(). But bit_map_init()
    attempts to clear by memset(), so it can't clear the trailing edge of
    bitmap properly on big-endian architecture if the number of bits is not
    a multiple of BITS_PER_LONG.

    Actually, the number of bits in srmmu_nocache_bitmap is not always
    a multiple of BITS_PER_LONG. It is calculated as below:

    bitmap_bits = srmmu_nocache_size >> SRMMU_NOCACHE_BITMAP_SHIFT;

    srmmu_nocache_size is decided proportionally by the amount of system RAM
    and it is rounded to a multiple of PAGE_SIZE. SRMMU_NOCACHE_BITMAP_SHIFT
    is defined as (PAGE_SHIFT - 4). So it can only be said that bitmap_bits
    is a multiple of 16.

    This fixes the problem by using bitmap_clear() instead of memset()
    in bit_map_init() and this also uses BITS_TO_LONGS() to calculate correct
    size at bitmap allocation time.

    Signed-off-by: Akinobu Mita
    Cc: "David S. Miller"
    Cc: sparclinux@vger.kernel.org
    Signed-off-by: David S. Miller

    Akinobu Mita
     

10 Nov, 2012

1 commit

  • Sparc32 already supported it, as a consequence of using the
    generic atomic64 implementation. And the sparc64 implementation
    is rather trivial.

    This allows us to set ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE for all
    of sparc, and avoid the annoying warning from lib/atomic64_test.c

    Signed-off-by: David S. Miller

    David S. Miller
     

06 Oct, 2012

1 commit

  • This adds optimized memset/bzero/page-clear routines for Niagara-4.

    We basically can do what powerpc has been able to do for a decade (via
    the "dcbz" instruction), which is use cache line clearing stores for
    bzero and memsets with a 'c' argument of zero.

    As long as we make the cache initializing store to each 32-byte
    subblock of the L2 cache line, it works.

    As with other Niagara-4 optimized routines, the key is to make sure to
    avoid any usage of the %asi register, as reads and writes to it cost
    at least 50 cycles.

    For the user clear cases, we don't use these new routines, we use the
    Niagara-1 variants instead. Those have to use %asi in an unavoidable
    way.

    A Niagara-4 8K page clear costs just under 600 cycles.

    Add definitions of the MRU variants of the cache initializing store
    ASIs. By default, cache initializing stores install the line as Least
    Recently Used. If we know we're going to use the data immediately
    (which is true for page copies and clears) we can use the Most
    Recently Used variant, to decrease the likelyhood of the lines being
    evicted before they get used.

    Signed-off-by: David S. Miller

    David S. Miller
     

03 Oct, 2012

1 commit


29 Sep, 2012

1 commit


28 Sep, 2012

1 commit


27 Sep, 2012

2 commits


21 Aug, 2012

1 commit


27 Jun, 2012

1 commit


27 May, 2012

1 commit


25 May, 2012

3 commits


24 May, 2012

1 commit

  • Compute a mask that will only have 0x80 in the bytes which
    had a zero in them. The formula is:

    ~(((x & 0x7f7f7f7f) + 0x7f7f7f7f) | x | 0x7f7f7f7f)

    In the inner word iteration, we have to compute the "x | 0x7f7f7f7f"
    part, so we can reuse that in the above calculation.

    Once we have this mask, we perform divide and conquer to find the
    highest 0x80 location.

    Signed-off-by: David S. Miller

    David S. Miller
     

23 May, 2012

1 commit

  • Linus removed the end-of-address-space hackery from
    fs/namei.c:do_getname() so we really have to validate these edge
    conditions and cannot cheat any more (as x86 used to as well).

    Move to a common C implementation like x86 did. And if both
    src and dst are sufficiently aligned we'll do word at a time
    copies and checks as well.

    Signed-off-by: David S. Miller

    David S. Miller
     

20 May, 2012

2 commits

  • Otherwise if no references exist in the static kernel image,
    we won't export the symbol properly to modules.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Based on copy from microblaze add ucmpdi2 implementation.
    This fixes build of niu driver which failed with:

    drivers/built-in.o: In function `niu_get_nfc':
    niu.c:(.text+0x91494): undefined reference to `__ucmpdi2'

    This driver will never be used on a sparc32 system,
    but patch added to fix build breakage with all*config builds.

    Signed-off-by: Sam Ravnborg
    Signed-off-by: David S. Miller

    Sam Ravnborg
     

16 May, 2012

1 commit

  • For the explicit calls to .udiv/.umul in assembler, I made a
    mechanical (read as: safe) transformation. I didn't attempt
    to make any simplifications.

    In particular, __ndelay and __udelay can be simplified significantly.
    Some of the %y reads are unnecessary and these routines have no need
    any longer for allocating a register window, they can be leaf
    functions.

    Signed-off-by: David S. Miller

    David S. Miller
     

14 May, 2012

1 commit


12 May, 2012

3 commits