05 Jun, 2019

1 commit

  • Based on 2 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license 2 as published
    by the free software foundation this program is distributed in the
    hope that it will be useful but without any warranty without even
    the implied warranty of merchantability or fitness for a particular
    purpose see the gnu general public license for more details

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license as published by
    the free software foundation this program is distributed in the hope
    that it [would] be useful but without any warranty without even the
    implied warranty of merchantability or fitness for a particular
    purpose see the gnu general public license for more details

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-only

    has been chosen to replace the boilerplate/reference in 9 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Allison Randal
    Reviewed-by: Alexios Zavras
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190529141901.804956444@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

12 May, 2019

1 commit


26 Apr, 2019

3 commits

  • This patch integrates the utf8n patches with some higher level API to
    perform UTF-8 string comparison, normalization and casefolding
    operations. Implemented is a variation of NFD, and casefold is
    performed by doing full casefold on top of NFD. These algorithms are
    based on the core implemented by Olaf Weber from SGI.

    Signed-off-by: Gabriel Krisman Bertazi
    Signed-off-by: Theodore Ts'o

    Gabriel Krisman Bertazi
     
  • Remove the Hangul decompositions from the utf8data trie, and do
    algorithmic decomposition to calculate them on the fly. To store the
    decomposition the caller of utf8lookup()/utf8nlookup() must provide a
    12-byte buffer, which is used to synthesize a leaf with the
    decomposition. This significantly reduces the size of the utf8data[]
    array.

    Changes made by Gabriel:
    Rebase to mainline
    Fix checkpatch errors
    Extract robustness fixes and merge back to original mkutf8data.c patch
    Regenerate utf8data.h

    Signed-off-by: Olaf Weber
    Signed-off-by: Gabriel Krisman Bertazi
    Signed-off-by: Theodore Ts'o

    Olaf Weber
     
  • Supporting functions for UTF-8 normalization are in utf8norm.c with the
    header utf8norm.h. Two normalization forms are supported: nfdi and
    nfdicf.

    nfdi:
    - Apply unicode normalization form NFD.
    - Remove any Default_Ignorable_Code_Point.

    nfdicf:
    - Apply unicode normalization form NFD.
    - Remove any Default_Ignorable_Code_Point.
    - Apply a full casefold (C + F).

    For the purposes of the code, a string is valid UTF-8 if:

    - The values encoded are 0x1..0x10FFFF.
    - The surrogate codepoints 0xD800..0xDFFFF are not encoded.
    - The shortest possible encoding is used for all values.

    The supporting functions work on null-terminated strings (utf8 prefix)
    and on length-limited strings (utf8n prefix).

    From the original SGI patch and for conformity with coding standards,
    the utf8data_t typedef was dropped, since it was just masking the struct
    keyword. On other occasions, namely utf8leaf_t and utf8trie_t, I
    decided to keep it, since they are simple pointers to memory buffers,
    and using uchars here wouldn't provide any more meaningful information.

    From the original submission, we also converted from the compatibility
    form to canonical.

    Changes made by Gabriel:
    Rebase to Mainline
    Fix up checkpatch.pl warnings
    Drop typedefs
    move out of libxfs
    Convert from NFKD to NFD

    Signed-off-by: Olaf Weber
    Signed-off-by: Gabriel Krisman Bertazi
    Signed-off-by: Theodore Ts'o

    Olaf Weber