15 Oct, 2020

1 commit

  • A recent change to the checksum code removed usage of some extra
    arguments, alongside with storage on the stack for those, and the stack
    pointer no longer needed to be adjusted in the function prologue.

    But a left over subtraction wasn't removed in the function epilogue,
    causing the function to return with the stack pointer moved 16 bytes
    away from where it should have. This corrupted local state and lead to
    weird crashes.

    This simply removes the leftover instruction from the epilogue.

    Fixes: 70d65cd555c5 ("ppc: propagate the calling conventions change down to csum_partial_copy_generic()")
    Cc: Al Viro
    Signed-off-by: Jason A. Donenfeld
    Signed-off-by: Linus Torvalds

    Jason A. Donenfeld
     

21 Aug, 2020

1 commit


31 May, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license as published by
    the free software foundation either version 2 of the license or at
    your option any later version

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-or-later

    has been chosen to replace the boilerplate/reference in 3029 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

03 Jun, 2018

2 commits

  • The generic csum_ipv6_magic() generates a pretty bad result

    00000000 : (PPC32)
    0: 81 23 00 00 lwz r9,0(r3)
    4: 81 03 00 04 lwz r8,4(r3)
    8: 7c e7 4a 14 add r7,r7,r9
    c: 7d 29 38 10 subfc r9,r9,r7
    10: 7d 4a 51 10 subfe r10,r10,r10
    14: 7d 27 42 14 add r9,r7,r8
    18: 7d 2a 48 50 subf r9,r10,r9
    1c: 80 e3 00 08 lwz r7,8(r3)
    20: 7d 08 48 10 subfc r8,r8,r9
    24: 7d 4a 51 10 subfe r10,r10,r10
    28: 7d 29 3a 14 add r9,r9,r7
    2c: 81 03 00 0c lwz r8,12(r3)
    30: 7d 2a 48 50 subf r9,r10,r9
    34: 7c e7 48 10 subfc r7,r7,r9
    38: 7d 4a 51 10 subfe r10,r10,r10
    3c: 7d 29 42 14 add r9,r9,r8
    40: 7d 2a 48 50 subf r9,r10,r9
    44: 80 e4 00 00 lwz r7,0(r4)
    48: 7d 08 48 10 subfc r8,r8,r9
    4c: 7d 4a 51 10 subfe r10,r10,r10
    50: 7d 29 3a 14 add r9,r9,r7
    54: 7d 2a 48 50 subf r9,r10,r9
    58: 81 04 00 04 lwz r8,4(r4)
    5c: 7c e7 48 10 subfc r7,r7,r9
    60: 7d 4a 51 10 subfe r10,r10,r10
    64: 7d 29 42 14 add r9,r9,r8
    68: 7d 2a 48 50 subf r9,r10,r9
    6c: 80 e4 00 08 lwz r7,8(r4)
    70: 7d 08 48 10 subfc r8,r8,r9
    74: 7d 4a 51 10 subfe r10,r10,r10
    78: 7d 29 3a 14 add r9,r9,r7
    7c: 7d 2a 48 50 subf r9,r10,r9
    80: 81 04 00 0c lwz r8,12(r4)
    84: 7c e7 48 10 subfc r7,r7,r9
    88: 7d 4a 51 10 subfe r10,r10,r10
    8c: 7d 29 42 14 add r9,r9,r8
    90: 7d 2a 48 50 subf r9,r10,r9
    94: 7d 08 48 10 subfc r8,r8,r9
    98: 7d 4a 51 10 subfe r10,r10,r10
    9c: 7d 29 2a 14 add r9,r9,r5
    a0: 7d 2a 48 50 subf r9,r10,r9
    a4: 7c a5 48 10 subfc r5,r5,r9
    a8: 7c 63 19 10 subfe r3,r3,r3
    ac: 7d 29 32 14 add r9,r9,r6
    b0: 7d 23 48 50 subf r9,r3,r9
    b4: 7c c6 48 10 subfc r6,r6,r9
    b8: 7c 63 19 10 subfe r3,r3,r3
    bc: 7c 63 48 50 subf r3,r3,r9
    c0: 54 6a 80 3e rotlwi r10,r3,16
    c4: 7c 63 52 14 add r3,r3,r10
    c8: 7c 63 18 f8 not r3,r3
    cc: 54 63 84 3e rlwinm r3,r3,16,16,31
    d0: 4e 80 00 20 blr

    0000000000000000 : (PPC64)
    0: 81 23 00 00 lwz r9,0(r3)
    4: 80 03 00 04 lwz r0,4(r3)
    8: 81 63 00 08 lwz r11,8(r3)
    c: 7c e7 4a 14 add r7,r7,r9
    10: 7f 89 38 40 cmplw cr7,r9,r7
    14: 7d 47 02 14 add r10,r7,r0
    18: 7d 30 10 26 mfocrf r9,1
    1c: 55 29 f7 fe rlwinm r9,r9,30,31,31
    20: 7d 4a 4a 14 add r10,r10,r9
    24: 7f 80 50 40 cmplw cr7,r0,r10
    28: 7d 2a 5a 14 add r9,r10,r11
    2c: 80 03 00 0c lwz r0,12(r3)
    30: 81 44 00 00 lwz r10,0(r4)
    34: 7d 10 10 26 mfocrf r8,1
    38: 55 08 f7 fe rlwinm r8,r8,30,31,31
    3c: 7d 29 42 14 add r9,r9,r8
    40: 81 04 00 04 lwz r8,4(r4)
    44: 7f 8b 48 40 cmplw cr7,r11,r9
    48: 7d 29 02 14 add r9,r9,r0
    4c: 7d 70 10 26 mfocrf r11,1
    50: 55 6b f7 fe rlwinm r11,r11,30,31,31
    54: 7d 29 5a 14 add r9,r9,r11
    58: 7f 80 48 40 cmplw cr7,r0,r9
    5c: 7d 29 52 14 add r9,r9,r10
    60: 7c 10 10 26 mfocrf r0,1
    64: 54 00 f7 fe rlwinm r0,r0,30,31,31
    68: 7d 69 02 14 add r11,r9,r0
    6c: 7f 8a 58 40 cmplw cr7,r10,r11
    70: 7c 0b 42 14 add r0,r11,r8
    74: 81 44 00 08 lwz r10,8(r4)
    78: 7c f0 10 26 mfocrf r7,1
    7c: 54 e7 f7 fe rlwinm r7,r7,30,31,31
    80: 7c 00 3a 14 add r0,r0,r7
    84: 7f 88 00 40 cmplw cr7,r8,r0
    88: 7d 20 52 14 add r9,r0,r10
    8c: 80 04 00 0c lwz r0,12(r4)
    90: 7d 70 10 26 mfocrf r11,1
    94: 55 6b f7 fe rlwinm r11,r11,30,31,31
    98: 7d 29 5a 14 add r9,r9,r11
    9c: 7f 8a 48 40 cmplw cr7,r10,r9
    a0: 7d 29 02 14 add r9,r9,r0
    a4: 7d 70 10 26 mfocrf r11,1
    a8: 55 6b f7 fe rlwinm r11,r11,30,31,31
    ac: 7d 29 5a 14 add r9,r9,r11
    b0: 7f 80 48 40 cmplw cr7,r0,r9
    b4: 7d 29 2a 14 add r9,r9,r5
    b8: 7c 10 10 26 mfocrf r0,1
    bc: 54 00 f7 fe rlwinm r0,r0,30,31,31
    c0: 7d 29 02 14 add r9,r9,r0
    c4: 7f 85 48 40 cmplw cr7,r5,r9
    c8: 7c 09 32 14 add r0,r9,r6
    cc: 7d 50 10 26 mfocrf r10,1
    d0: 55 4a f7 fe rlwinm r10,r10,30,31,31
    d4: 7c 00 52 14 add r0,r0,r10
    d8: 7f 80 30 40 cmplw cr7,r0,r6
    dc: 7d 30 10 26 mfocrf r9,1
    e0: 55 29 ef fe rlwinm r9,r9,29,31,31
    e4: 7c 09 02 14 add r0,r9,r0
    e8: 54 03 80 3e rotlwi r3,r0,16
    ec: 7c 03 02 14 add r0,r3,r0
    f0: 7c 03 00 f8 not r3,r0
    f4: 78 63 84 22 rldicl r3,r3,48,48
    f8: 4e 80 00 20 blr

    This patch implements it in assembly for both PPC32 and PPC64

    Link: https://github.com/linuxppc/linux/issues/9
    Signed-off-by: Christophe Leroy
    Reviewed-by: Segher Boessenkool
    Signed-off-by: Michael Ellerman

    Christophe Leroy
     
  • Improve __csum_partial by interleaving loads and adds.

    On a 8xx, it brings neither improvement nor degradation.
    On a 83xx, it brings a 25% improvement.

    Signed-off-by: Christophe Leroy
    Reviewed-by: Segher Boessenkool
    Signed-off-by: Michael Ellerman

    Christophe Leroy
     

14 Nov, 2016

1 commit

  • This macro is taken from s390, and allows more flexibility in
    changing exception table format.

    mpe: Put it in ppc_asm.h and only define one version using
    stringinfy_in_c(). Add some empty definitions and headers to keep the
    selftests happy.

    Signed-off-by: Nicholas Piggin
    Signed-off-by: Michael Ellerman

    Nicholas Piggin
     

15 Oct, 2016

1 commit

  • Pull kbuild updates from Michal Marek:

    - EXPORT_SYMBOL for asm source by Al Viro.

    This does bring a regression, because genksyms no longer generates
    checksums for these symbols (CONFIG_MODVERSIONS). Nick Piggin is
    working on a patch to fix this.

    Plus, we are talking about functions like strcpy(), which rarely
    change prototypes.

    - Fixes for PPC fallout of the above by Stephen Rothwell and Nick
    Piggin

    - fixdep speedup by Alexey Dobriyan.

    - preparatory work by Nick Piggin to allow architectures to build with
    -ffunction-sections, -fdata-sections and --gc-sections

    - CONFIG_THIN_ARCHIVES support by Stephen Rothwell

    - fix for filenames with colons in the initramfs source by me.

    * 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild: (22 commits)
    initramfs: Escape colons in depfile
    ppc: there is no clear_pages to export
    powerpc/64: whitelist unresolved modversions CRCs
    kbuild: -ffunction-sections fix for archs with conflicting sections
    kbuild: add arch specific post-link Makefile
    kbuild: allow archs to select link dead code/data elimination
    kbuild: allow architectures to use thin archives instead of ld -r
    kbuild: Regenerate genksyms lexer
    kbuild: genksyms fix for typeof handling
    fixdep: faster CONFIG_ search
    ia64: move exports to definitions
    sparc32: debride memcpy.S a bit
    [sparc] unify 32bit and 64bit string.h
    sparc: move exports to definitions
    ppc: move exports to definitions
    arm: move exports to definitions
    s390: move exports to definitions
    m68k: move exports to definitions
    alpha: move exports to actual definitions
    x86: move exports to actual definitions
    ...

    Linus Torvalds
     

08 Sep, 2016

1 commit

  • Commit 7aef4136566b0 ("powerpc32: rewrite csum_partial_copy_generic()
    based on copy_tofrom_user()") introduced a bug when destination address
    is odd and len is lower than cacheline size.

    In that case the resulting csum value doesn't have to be rotated one
    byte because the cache-aligned copy part is skipped so no alignment
    is performed.

    Fixes: 7aef4136566b0 ("powerpc32: rewrite csum_partial_copy_generic() based on copy_tofrom_user()")
    Cc: stable@vger.kernel.org # v4.6+
    Reported-by: Alessio Igor Bogani
    Signed-off-by: Christophe Leroy
    Tested-by: Alessio Igor Bogani
    Signed-off-by: Michael Ellerman

    Christophe Leroy
     

10 Aug, 2016

1 commit

  • Commit 7aef4136566b0 ("powerpc32: rewrite csum_partial_copy_generic()
    based on copy_tofrom_user()") introduced a bug when destination
    address is odd and initial csum is not null

    In that (rare) case the initial csum value has to be rotated one byte
    as well as the resulting value is

    This patch also fixes related comments

    Fixes: 7aef4136566b0 ("powerpc32: rewrite csum_partial_copy_generic() based on copy_tofrom_user()")
    Signed-off-by: Christophe Leroy
    Signed-off-by: Michael Ellerman

    Christophe Leroy
     

08 Aug, 2016

1 commit


10 Mar, 2016

1 commit

  • csum_partial is often called for small fixed length packets
    for which it is suboptimal to use the generic csum_partial()
    function.

    For instance, in my configuration, I got:
    * One place calling it with constant len 4
    * Seven places calling it with constant len 8
    * Three places calling it with constant len 14
    * One place calling it with constant len 20
    * One place calling it with constant len 24
    * One place calling it with constant len 32

    This patch renames csum_partial() to __csum_partial() and
    implements csum_partial() as a wrapper inline function which
    * uses csum_add() for small 16bits multiple constant length
    * uses ip_fast_csum() for other 32bits multiple constant
    * uses __csum_partial() in all other cases

    Signed-off-by: Christophe Leroy
    Signed-off-by: Scott Wood

    Christophe Leroy
     

05 Mar, 2016

4 commits

  • On the 8xx, load latency is 2 cycles and taking branches also takes
    2 cycles. So let's unroll the loop.

    This patch improves csum_partial() speed by around 10% on both:
    * 8xx (single issue processor with parallel execution)
    * 83xx (superscalar 6xx processor with dual instruction fetch
    and parallel execution)

    Signed-off-by: Christophe Leroy
    Signed-off-by: Scott Wood

    Christophe Leroy
     
  • r5 does contain the value to be updated, so lets use r5 all way long
    for that. It makes the code more readable.

    To avoid confusion, it is better to use adde instead of addc

    The first addition is useless. Its only purpose is to clear carry.
    As r4 is a signed int that is always positive, this can be done by
    using srawi instead of srwi

    Let's also remove the comment about bdnz having no overhead as it
    is not correct on all powerpc, at least on MPC8xx

    In the last part, in our situation, the remaining quantity of bytes
    to be proceeded is between 0 and 3. Therefore, we can base that part
    on the value of bit 31 and bit 30 of r4 instead of anding r4 with 3
    then proceding on comparisons and substractions.

    Signed-off-by: Christophe Leroy
    Signed-off-by: Scott Wood

    Christophe Leroy
     
  • csum_partial_copy_generic() does the same as copy_tofrom_user and also
    calculates the checksum during the copy. Unlike copy_tofrom_user(),
    the existing version of csum_partial_copy_generic() doesn't take
    benefit of the cache.

    This patch is a rewrite of csum_partial_copy_generic() based on
    copy_tofrom_user().
    The previous version of csum_partial_copy_generic() was handling
    errors. Now we have the checksum wrapper functions to handle the error
    case like in powerpc64 so we can make the error case simple:
    just return -EFAULT.
    copy_tofrom_user() only has r12 available => we use it for the
    checksum r7 and r8 which contains pointers to error feedback are used,
    so we stack them.

    On a TCP benchmark using socklib on the loopback interface on which
    checksum offload and scatter/gather have been deactivated, we get
    about 20% performance increase.

    Signed-off-by: Christophe Leroy
    Signed-off-by: Scott Wood

    Christophe Leroy
     
  • In several architectures, ip_fast_csum() is inlined
    There are functions like ip_send_check() which do nothing
    much more than calling ip_fast_csum().
    Inlining ip_fast_csum() allows the compiler to optimise better

    Suggested-by: Eric Dumazet
    Signed-off-by: Christophe Leroy
    [scottwood: whitespace and cast fixes]
    Signed-off-by: Scott Wood

    Christophe Leroy
     

08 Aug, 2015

1 commit

  • csum_tcpudp_magic() is only a few instructions, and does modify
    really few registers. So it is not worth having it as a separate
    function and suffer function branching and saving of volatile
    registers.

    This patch makes it inline by use of the already existing
    csum_tcpudp_nofold() function.

    Signed-off-by: Christophe Leroy
    Signed-off-by: Scott Wood

    LEROY Christophe
     

10 Oct, 2005

1 commit