01 May, 2014

1 commit


28 Apr, 2014

1 commit

  • The asm-generic, big-endian version of zero_bytemask creates a mask of
    bytes preceding the first zero-byte by left shifting ~0ul based on the
    position of the first zero byte.

    Unfortunately, if the first (top) byte is zero, the output of
    prep_zero_mask has only the top bit set, resulting in undefined C
    behaviour as we shift left by an amount equal to the width of the type.
    As it happens, GCC doesn't manage to spot this through the call to fls(),
    but the issue remains if architectures choose to implement their shift
    instructions differently.

    An example would be arch/arm/ (AArch32), where LSL Rd, Rn, #32 results
    in Rd == 0x0, whilst on arch/arm64 (AArch64) LSL Xd, Xn, #64 results in
    Xd == Xn.

    Rather than check explicitly for the problematic shift, this patch adds
    an extra shift by 1, replacing fls with __fls. Since zero_bytemask is
    never called with a zero argument (has_zero() is used to check the data
    first), we don't need to worry about calling __fls(0), which is
    undefined.

    Cc:
    Cc: Victor Kamensky
    Signed-off-by: Will Deacon
    Signed-off-by: Linus Torvalds

    Will Deacon
     

13 Dec, 2013

1 commit


27 May, 2012

1 commit

  • This changes the interfaces in to be a bit more
    complicated, but a lot more generic.

    In particular, it allows us to really do the operations efficiently on
    both little-endian and big-endian machines, pretty much regardless of
    machine details. For example, if you can rely on a fast population
    count instruction on your architecture, this will allow you to make your
    optimized file with that.

    NOTE! The "generic" version in include/asm-generic/word-at-a-time.h is
    not truly generic, it actually only works on big-endian. Why? Because
    on little-endian the generic algorithms are wasteful, since you can
    inevitably do better. The x86 implementation is an example of that.

    (The only truly non-generic part of the asm-generic implementation is
    the "find_zero()" function, and you could make a little-endian version
    of it. And if the Kbuild infrastructure allowed us to pick a particular
    header file, that would be lovely)

    The functions are as follows:

    - WORD_AT_A_TIME_CONSTANTS: specific constants that the algorithm
    uses.

    - has_zero(): take a word, and determine if it has a zero byte in it.
    It gets the word, the pointer to the constant pool, and a pointer to
    an intermediate "data" field it can set.

    This is the "quick-and-dirty" zero tester: it's what is run inside
    the hot loops.

    - "prep_zero_mask()": take the word, the data that has_zero() produced,
    and the constant pool, and generate an *exact* mask of which byte had
    the first zero. This is run directly *outside* the loop, and allows
    the "has_zero()" function to answer the "is there a zero byte"
    question without necessarily getting exactly *which* byte is the
    first one to contain a zero.

    If you do multiple byte lookups concurrently (eg "hash_name()", which
    looks for both NUL and '/' bytes), after you've done the prep_zero_mask()
    phase, the result of those can be or'ed together to get the "either
    or" case.

    - The result from "prep_zero_mask()" can then be fed into "find_zero()"
    (to find the byte offset of the first byte that was zero) or into
    "zero_bytemask()" (to find the bytemask of the bytes preceding the
    zero byte).

    The existence of zero_bytemask() is optional, and is not necessary
    for the normal string routines. But dentry name hashing needs it, so
    if you enable DENTRY_WORD_AT_A_TIME you need to expose it.

    This changes the generic strncpy_from_user() function and the dentry
    hashing functions to use these modified word-at-a-time interfaces. This
    gets us back to the optimized state of the x86 strncpy that we lost in
    the previous commit when moving over to the generic version.

    Signed-off-by: Linus Torvalds

    Linus Torvalds