28 Sep, 2020

1 commit

  • Get the generic casefolding code in sync with the patches that are
    queued in f2fs.git#dev for 5.10.

    Equivalently, this reverts the patch
    "ANDROID-fs-adjust-casefolding-support-to-match-android-mainline.patch"
    from the android-mainline quilt series, with the following conflicts:

    Conflicts:
    fs/ext4/hash.c # due to "ANDROID: ext4: Handle casefolding with encryption"
    fs/ext4/namei.c # due to "ANDROID: ext4: Handle casefolding with encryption"
    fs/f2fs/dir.c # due to "ANDROID: f2fs: Handle casefolding with Encryption"

    Bug: 161184936
    Cc: Daniel Rosenberg
    Cc: Paul Lawrence
    Cc: Jaegeuk Kim
    Change-Id: I0ae169f0f5f413fb21e4be7a163213aef3fa6756
    Signed-off-by: Eric Biggers

    Eric Biggers
     

08 Apr, 2020

1 commit


25 Mar, 2020

1 commit


21 Feb, 2020

1 commit

  • This adds a case insensitive hash function to allow taking the hash
    without needing to allocate a casefolded copy of the string.

    Signed-off-by: Daniel Rosenberg
    Test: Boots, /data/media is case insensitive
    Bug: 138322712
    Link: https://lore.kernel.org/linux-f2fs-devel/20200208013552.241832-1-drosen@google.com/T/#t
    Change-Id: I43c7d38a8e22f4479397f35e6343bd326901cdba

    Daniel Rosenberg
     

04 Feb, 2020

1 commit

  • In old days, the "host-progs" syntax was used for specifying host
    programs. It was renamed to the current "hostprogs-y" in 2004.

    It is typically useful in scripts/Makefile because it allows Kbuild to
    selectively compile host programs based on the kernel configuration.

    This commit renames like follows:

    always -> always-y
    hostprogs-y -> hostprogs

    So, scripts/Makefile will look like this:

    always-$(CONFIG_BUILD_BIN2C) += ...
    always-$(CONFIG_KALLSYMS) += ...
    ...
    hostprogs := $(always-y) $(always-m)

    I think this makes more sense because a host program is always a host
    program, irrespective of the kernel configuration. We want to specify
    which ones to compile by CONFIG options, so always-y will be handier.

    The "always", "hostprogs-y", "hostprogs-m" will be kept for backward
    compatibility for a while.

    Signed-off-by: Masahiro Yamada

    Masahiro Yamada
     

17 Sep, 2019

2 commits

  • Don't populate the array 'token' on the stack but instead make it
    static const. Makes the object code smaller by 234 bytes.

    Before:
    text data bss dec hex filename
    5371 272 0 5643 160b fs/unicode/utf8-core.o

    After:
    text data bss dec hex filename
    5041 368 0 5409 1521 fs/unicode/utf8-core.o

    (gcc version 9.2.1, amd64)

    Signed-off-by: Colin Ian King
    Reviewed-by: Theodore Ts'o
    Signed-off-by: Gabriel Krisman Bertazi

    Colin Ian King
     
  • Move the static keyword to the front of declarations of nfdi_test_data
    and nfdicf_test_data, and resolve the following compiler warnings that
    can be seen when building with warnings enabled (W=1):

    fs/unicode/utf8-selftest.c:38:1: warning:
    ‘static’ is not at beginning of declaration [-Wold-style-declaration]

    fs/unicode/utf8-selftest.c:92:1: warning:
    ‘static’ is not at beginning of declaration [-Wold-style-declaration]

    Signed-off-by: Krzysztof Wilczynski
    Signed-off-by: Gabriel Krisman Bertazi

    Krzysztof Wilczynski
     

11 Jul, 2019

1 commit

  • Pull ext4 updates from Ted Ts'o:
    "Many bug fixes and cleanups, and an optimization for case-insensitive
    lookups"

    * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
    ext4: fix coverity warning on error path of filename setup
    ext4: replace ktype default_attrs with default_groups
    ext4: rename htree_inline_dir_to_tree() to ext4_inlinedir_to_tree()
    ext4: refactor initialize_dirent_tail()
    ext4: rename "dirent_csum" functions to use "dirblock"
    ext4: allow directory holes
    jbd2: drop declaration of journal_sync_buffer()
    ext4: use jbd2_inode dirty range scoping
    jbd2: introduce jbd2_inode dirty range scoping
    mm: add filemap_fdatawait_range_keep_errors()
    ext4: remove redundant assignment to node
    ext4: optimize case-insensitive lookups
    ext4: make __ext4_get_inode_loc plug
    ext4: clean up kerneldoc warnigns when building with W=1
    ext4: only set project inherit bit for directory
    ext4: enforce the immutable flag on open files
    ext4: don't allow any modifications to an immutable file
    jbd2: fix typo in comment of journal_submit_inode_data_buffers
    jbd2: fix some print format mistakes
    ext4: gracefully handle ext4_break_layouts() failure during truncate

    Linus Torvalds
     

20 Jun, 2019

1 commit

  • Temporarily cache a casefolded version of the file name under lookup in
    ext4_filename, to avoid repeatedly casefolding it. I got up to 30%
    speedup on lookups of large directories (>100k entries), depending on
    the length of the string under lookup.

    Signed-off-by: Gabriel Krisman Bertazi
    Signed-off-by: Theodore Ts'o

    Gabriel Krisman Bertazi
     

05 Jun, 2019

2 commits

  • Based on 2 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license 2 as published
    by the free software foundation this program is distributed in the
    hope that it will be useful but without any warranty without even
    the implied warranty of merchantability or fitness for a particular
    purpose see the gnu general public license for more details

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license as published by
    the free software foundation this program is distributed in the hope
    that it [would] be useful but without any warranty without even the
    implied warranty of merchantability or fitness for a particular
    purpose see the gnu general public license for more details

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-only

    has been chosen to replace the boilerplate/reference in 9 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Allison Randal
    Reviewed-by: Alexios Zavras
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190529141901.804956444@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     
  • Based on 1 normalized pattern(s):

    this software is licensed under the terms of the gnu general public
    license version 2 as published by the free software foundation and
    may be copied distributed and modified under those terms this
    program is distributed in the hope that it will be useful but
    without any warranty without even the implied warranty of
    merchantability or fitness for a particular purpose see the gnu
    general public license for more details

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-only

    has been chosen to replace the boilerplate/reference in 285 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Alexios Zavras
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190529141900.642774971@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

21 May, 2019

1 commit


13 May, 2019

1 commit


12 May, 2019

1 commit


29 Apr, 2019

1 commit

  • scripts/mkutf8data is used only when regenerating utf8data.h,
    which never happens in the normal kernel build. However, it is
    irrespectively built if CONFIG_UNICODE is enabled.

    Moreover, there is no good reason for it to reside in the scripts/
    directory since it is only used in fs/unicode/.

    Hence, move it from scripts/ to fs/unicode/.

    In some cases, we bypass build artifacts in the normal build. The
    conventional way to do so is to surround the code with ifdef REGENERATE_*.

    For example,

    - 7373f4f83c71 ("kbuild: add implicit rules for parser generation")
    - 6aaf49b495b4 ("crypto: arm,arm64 - Fix random regeneration of S_shipped")

    I rewrote the rule in a more kbuild'ish style.

    In the normal build, utf8data.h is just shipped from the check-in file.

    $ make
    [ snip ]
    SHIPPED fs/unicode/utf8data.h
    CC fs/unicode/utf8-norm.o
    CC fs/unicode/utf8-core.o
    CC fs/unicode/utf8-selftest.o
    AR fs/unicode/built-in.a

    If you want to generate utf8data.h based on UCD, put *.txt files into
    fs/unicode/, then pass REGENERATE_UTF8DATA=1 from the command line.
    The mkutf8data tool will be automatically compiled to generate the
    utf8data.h from the *.txt files.

    $ make REGENERATE_UTF8DATA=1
    [ snip ]
    HOSTCC fs/unicode/mkutf8data
    GEN fs/unicode/utf8data.h
    CC fs/unicode/utf8-norm.o
    CC fs/unicode/utf8-core.o
    CC fs/unicode/utf8-selftest.o
    AR fs/unicode/built-in.a

    I renamed the check-in utf8data.h to utf8data.h_shipped so that this
    will work for the out-of-tree build.

    You can update it based on the latest UCD like this:

    $ make REGENERATE_UTF8DATA=1 fs/unicode/
    $ cp fs/unicode/utf8data.h fs/unicode/utf8data.h_shipped

    Also, I added entries to .gitignore and dontdiff.

    Signed-off-by: Masahiro Yamada
    Signed-off-by: Theodore Ts'o

    Masahiro Yamada
     

26 Apr, 2019

6 commits

  • Regenerate utf8data.h based on the latest UCD files and run tests
    against the latest version.

    Signed-off-by: Gabriel Krisman Bertazi
    Signed-off-by: Theodore Ts'o

    Gabriel Krisman Bertazi
     
  • This implements a in-kernel sanity test module for the utf8
    normalization core. At probe time, it will run basic sequences through
    the utf8n core, to identify problems will equivalent sequences and
    normalization/casefold code. This is supposed to be useful for
    regression testing when adding support for a new version of utf8 to
    linux.

    Signed-off-by: Gabriel Krisman Bertazi
    Signed-off-by: Theodore Ts'o

    Gabriel Krisman Bertazi
     
  • This patch integrates the utf8n patches with some higher level API to
    perform UTF-8 string comparison, normalization and casefolding
    operations. Implemented is a variation of NFD, and casefold is
    performed by doing full casefold on top of NFD. These algorithms are
    based on the core implemented by Olaf Weber from SGI.

    Signed-off-by: Gabriel Krisman Bertazi
    Signed-off-by: Theodore Ts'o

    Gabriel Krisman Bertazi
     
  • Remove the Hangul decompositions from the utf8data trie, and do
    algorithmic decomposition to calculate them on the fly. To store the
    decomposition the caller of utf8lookup()/utf8nlookup() must provide a
    12-byte buffer, which is used to synthesize a leaf with the
    decomposition. This significantly reduces the size of the utf8data[]
    array.

    Changes made by Gabriel:
    Rebase to mainline
    Fix checkpatch errors
    Extract robustness fixes and merge back to original mkutf8data.c patch
    Regenerate utf8data.h

    Signed-off-by: Olaf Weber
    Signed-off-by: Gabriel Krisman Bertazi
    Signed-off-by: Theodore Ts'o

    Olaf Weber
     
  • Supporting functions for UTF-8 normalization are in utf8norm.c with the
    header utf8norm.h. Two normalization forms are supported: nfdi and
    nfdicf.

    nfdi:
    - Apply unicode normalization form NFD.
    - Remove any Default_Ignorable_Code_Point.

    nfdicf:
    - Apply unicode normalization form NFD.
    - Remove any Default_Ignorable_Code_Point.
    - Apply a full casefold (C + F).

    For the purposes of the code, a string is valid UTF-8 if:

    - The values encoded are 0x1..0x10FFFF.
    - The surrogate codepoints 0xD800..0xDFFFF are not encoded.
    - The shortest possible encoding is used for all values.

    The supporting functions work on null-terminated strings (utf8 prefix)
    and on length-limited strings (utf8n prefix).

    From the original SGI patch and for conformity with coding standards,
    the utf8data_t typedef was dropped, since it was just masking the struct
    keyword. On other occasions, namely utf8leaf_t and utf8trie_t, I
    decided to keep it, since they are simple pointers to memory buffers,
    and using uchars here wouldn't provide any more meaningful information.

    From the original submission, we also converted from the compatibility
    form to canonical.

    Changes made by Gabriel:
    Rebase to Mainline
    Fix up checkpatch.pl warnings
    Drop typedefs
    move out of libxfs
    Convert from NFKD to NFD

    Signed-off-by: Olaf Weber
    Signed-off-by: Gabriel Krisman Bertazi
    Signed-off-by: Theodore Ts'o

    Olaf Weber
     
  • The decomposition and casefolding of UTF-8 characters are described in a
    prefix tree in utf8data.h, which is a generate from the Unicode
    Character Database (UCD), published by the Unicode Consortium, and
    should not be edited by hand. The structures in utf8data.h are meant to
    be used for lookup operations by the unicode subsystem, when decoding a
    utf-8 string.

    mkutf8data.c is the source for a program that generates utf8data.h. It
    was written by Olaf Weber from SGI and originally proposed to be merged
    into Linux in 2014. The original proposal performed the compatibility
    decomposition, NFKD, but the current version was modified by me to do
    canonical decomposition, NFD, as suggested by the community. The
    changes from the original submission are:

    * Rebase to mainline.
    * Fix out-of-tree-build.
    * Update makefile to build 11.0.0 ucd files.
    * drop references to xfs.
    * Convert NFKD to NFD.
    * Merge back robustness fixes from original patch. Requested by
    Dave Chinner.

    The original submission is archived at:

    The utf8data.h file can be regenerated using the instructions in
    fs/unicode/README.utf8data.

    - Notes on the update from 8.0.0 to 11.0:

    The structure of the ucd files and special cases have not experienced
    any changes between versions 8.0.0 and 11.0.0. 8.0.0 saw the addition
    of Cherokee LC characters, which is an interesting case for
    case-folding. The update is accompanied by new tests on the test_ucd
    module to catch specific cases. No changes to mkutf8data script were
    required for the updates.

    Signed-off-by: Gabriel Krisman Bertazi
    Signed-off-by: Theodore Ts'o

    Gabriel Krisman Bertazi