29 Jan, 2020

2 commits

  • commit 2c6b7bcd747201441923a0d3062577a8d1fbd8f8 upstream.

    Commit 8a23eb804ca4 ("Make filldir[64]() verify the directory entry
    filename is valid") added some minimal validity checks on the directory
    entries passed to filldir[64](). But they really were pretty minimal.

    This fleshes out at least the name length check: we used to disallow
    zero-length names, but really, negative lengths or oevr-long names
    aren't ok either. Both could happen if there is some filesystem
    corruption going on.

    Now, most filesystems tend to use just an "unsigned char" or similar for
    the length of a directory entry name, so even with a corrupt filesystem
    you should never see anything odd like that. But since we then use the
    name length to create the directory entry record length, let's make sure
    it actually is half-way sensible.

    Note how POSIX states that the size of a path component is limited by
    NAME_MAX, but we actually use PATH_MAX for the check here. That's
    because while NAME_MAX is generally the correct maximum name length
    (it's 255, for the same old "name length is usually just a byte on
    disk"), there's nothing in the VFS layer that really cares.

    So the real limitation at a VFS layer is the total pathname length you
    can pass as a filename: PATH_MAX.

    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Linus Torvalds
     
  • commit 3c2659bd1db81ed6a264a9fc6262d51667d655ad upstream.

    In commit 9f79b78ef744 ("Convert filldir[64]() from __put_user() to
    unsafe_put_user()") I changed filldir to not do individual __put_user()
    accesses, but instead use unsafe_put_user() surrounded by the proper
    user_access_begin/end() pair.

    That make them enormously faster on modern x86, where the STAC/CLAC
    games make individual user accesses fairly heavy-weight.

    However, the user_access_begin() range was not really the exact right
    one, since filldir() has the unfortunate problem that it needs to not
    only fill out the new directory entry, it also needs to fix up the
    previous one to contain the proper file offset.

    It's unfortunate, but the "d_off" field in "struct dirent" is _not_ the
    file offset of the directory entry itself - it's the offset of the next
    one. So we end up backfilling the offset in the previous entry as we
    walk along.

    But since x86 didn't really care about the exact range, and used to be
    the only architecture that did anything fancy in user_access_begin() to
    begin with, the filldir[64]() changes did something lazy, and even
    commented on it:

    /*
    * Note! This range-checks 'previous' (which may be NULL).
    * The real range was checked in getdents
    */
    if (!user_access_begin(dirent, sizeof(*dirent)))
    goto efault;

    and it all worked fine.

    But now 32-bit ppc is starting to also implement user_access_begin(),
    and the fact that we faked the range to only be the (possibly not even
    valid) previous directory entry becomes a problem, because ppc32 will
    actually be using the range that is passed in for more than just "check
    that it's user space".

    This is a complete rewrite of Christophe's original patch.

    By saving off the record length of the previous entry instead of a
    pointer to it in the filldir data structures, we can simplify the range
    check and the writing of the previous entry d_off field. No need for
    any conditionals in the user accesses themselves, although we retain the
    conditional EINTR checking for the "was this the first directory entry"
    signal handling latency logic.

    Fixes: 9f79b78ef744 ("Convert filldir[64]() from __put_user() to unsafe_put_user()")
    Link: https://lore.kernel.org/lkml/a02d3426f93f7eb04960a4d9140902d278cab0bb.1579697910.git.christophe.leroy@c-s.fr/
    Link: https://lore.kernel.org/lkml/408c90c4068b00ea8f1c41cca45b84ec23d4946b.1579783936.git.christophe.leroy@c-s.fr/
    Reported-and-tested-by: Christophe Leroy
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Linus Torvalds
     

19 Oct, 2019

1 commit

  • This was always meant to be a temporary thing, just for testing and to
    see if it actually ever triggered.

    The only thing that reported it was syzbot doing disk image fuzzing, and
    then that warning is expected. So let's just remove it before -rc4,
    because the extra sanity testing should probably go to -stable, but we
    don't want the warning to do so.

    Reported-by: syzbot+3031f712c7ad5dd4d926@syzkaller.appspotmail.com
    Fixes: 8a23eb804ca4 ("Make filldir[64]() verify the directory entry filename is valid")
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

08 Oct, 2019

1 commit

  • In commit 9f79b78ef744 ("Convert filldir[64]() from __put_user() to
    unsafe_put_user()") I made filldir() use unsafe_put_user(), which
    improves code generation on x86 enormously.

    But because we didn't have a "unsafe_copy_to_user()", the dirent name
    copy was also done by hand with unsafe_put_user() in a loop, and it
    turns out that a lot of other architectures didn't like that, because
    unlike x86, they have various alignment issues.

    Most non-x86 architectures trap and fix it up, and some (like xtensa)
    will just fail unaligned put_user() accesses unconditionally. Which
    makes that "copy using put_user() in a loop" not work for them at all.

    I could make that code do explicit alignment etc, but the architectures
    that don't like unaligned accesses also don't really use the fancy
    "user_access_begin/end()" model, so they might just use the regular old
    __copy_to_user() interface.

    So this commit takes that looping implementation, turns it into the x86
    version of "unsafe_copy_to_user()", and makes other architectures
    implement the unsafe copy version as __copy_to_user() (the same way they
    do for the other unsafe_xyz() accessor functions).

    Note that it only does this for the copying _to_ user space, and we
    still don't have a unsafe version of copy_from_user().

    That's partly because we have no current users of it, but also partly
    because the copy_from_user() case is slightly different and cannot
    efficiently be implemented in terms of a unsafe_get_user() loop (because
    gcc can't do asm goto with outputs).

    It would be trivial to do this using "rep movsb", which would work
    really nicely on newer x86 cores, but really badly on some older ones.

    Al Viro is looking at cleaning up all our user copy routines to make
    this all a non-issue, but for now we have this simple-but-stupid version
    for x86 that works fine for the dirent name copy case because those
    names are short strings and we simply don't need anything fancier.

    Fixes: 9f79b78ef744 ("Convert filldir[64]() from __put_user() to unsafe_put_user()")
    Reported-by: Guenter Roeck
    Reported-and-tested-by: Tony Luck
    Cc: Al Viro
    Cc: Max Filippov
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

06 Oct, 2019

2 commits

  • This has been discussed several times, and now filesystem people are
    talking about doing it individually at the filesystem layer, so head
    that off at the pass and just do it in getdents{64}().

    This is partially based on a patch by Jann Horn, but checks for NUL
    bytes as well, and somewhat simplified.

    There's also commentary about how it might be better if invalid names
    due to filesystem corruption don't cause an immediate failure, but only
    an error at the end of the readdir(), so that people can still see the
    filenames that are ok.

    There's also been discussion about just how much POSIX strictly speaking
    requires this since it's about filesystem corruption. It's really more
    "protect user space from bad behavior" as pointed out by Jann. But
    since Eric Biederman looked up the POSIX wording, here it is for context:

    "From readdir:

    The readdir() function shall return a pointer to a structure
    representing the directory entry at the current position in the
    directory stream specified by the argument dirp, and position the
    directory stream at the next entry. It shall return a null pointer
    upon reaching the end of the directory stream. The structure dirent
    defined in the header describes a directory entry.

    From definitions:

    3.129 Directory Entry (or Link)

    An object that associates a filename with a file. Several directory
    entries can associate names with the same file.

    ...

    3.169 Filename

    A name consisting of 1 to {NAME_MAX} bytes used to name a file. The
    characters composing the name may be selected from the set of all
    character values excluding the slash character and the null byte. The
    filenames dot and dot-dot have special meaning. A filename is
    sometimes referred to as a 'pathname component'."

    Note that I didn't bother adding the checks to any legacy interfaces
    that nobody uses.

    Also note that if this ends up being noticeable as a performance
    regression, we can fix that to do a much more optimized model that
    checks for both NUL and '/' at the same time one word at a time.

    We haven't really tended to optimize 'memchr()', and it only checks for
    one pattern at a time anyway, and we really _should_ check for NUL too
    (but see the comment about "soft errors" in the code about why it
    currently only checks for '/')

    See the CONFIG_DCACHE_WORD_ACCESS case of hash_name() for how the name
    lookup code looks for pathname terminating characters in parallel.

    Link: https://lore.kernel.org/lkml/20190118161440.220134-2-jannh@google.com/
    Cc: Alexander Viro
    Cc: Jann Horn
    Cc: Eric W. Biederman
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • We really should avoid the "__{get,put}_user()" functions entirely,
    because they can easily be mis-used and the original intent of being
    used for simple direct user accesses no longer holds in a post-SMAP/PAN
    world.

    Manually optimizing away the user access range check makes no sense any
    more, when the range check is generally much cheaper than the "enable
    user accesses" code that the __{get,put}_user() functions still need.

    So instead of __put_user(), use the unsafe_put_user() interface with
    user_access_{begin,end}() that really does generate better code these
    days, and which is generally a nicer interface. Under some loads, the
    multiple user writes that filldir() does are actually quite noticeable.

    This also makes the dirent name copy use unsafe_put_user() with a couple
    of macros. We do not want to make function calls with SMAP/PAN
    disabled, and the code this generates is quite good when the
    architecture uses "asm goto" for unsafe_put_user() like x86 does.

    Note that this doesn't bother with the legacy cases. Nobody should use
    them anyway, so performance doesn't really matter there.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

04 Jan, 2019

1 commit

  • Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument
    of the user address range verification function since we got rid of the
    old racy i386-only code to walk page tables by hand.

    It existed because the original 80386 would not honor the write protect
    bit when in kernel mode, so you had to do COW by hand before doing any
    user access. But we haven't supported that in a long time, and these
    days the 'type' argument is a purely historical artifact.

    A discussion about extending 'user_access_begin()' to do the range
    checking resulted this patch, because there is no way we're going to
    move the old VERIFY_xyz interface to that model. And it's best done at
    the end of the merge window when I've done most of my merges, so let's
    just get this done once and for all.

    This patch was mostly done with a sed-script, with manual fix-ups for
    the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form.

    There were a couple of notable cases:

    - csky still had the old "verify_area()" name as an alias.

    - the iter_iov code had magical hardcoded knowledge of the actual
    values of VERIFY_{READ,WRITE} (not that they mattered, since nothing
    really used it)

    - microblaze used the type argument for a debug printout

    but other than those oddities this should be a total no-op patch.

    I tried to fix up all architectures, did fairly extensive grepping for
    access_ok() uses, and the changes are trivial, but I may have missed
    something. Any missed conversion should be trivially fixable, though.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

03 Apr, 2018

1 commit

  • Using this helper allows us to avoid the in-kernel calls to the
    sys_getdents64() syscall. The ksys_ prefix denotes that this function
    is meant as a drop-in replacement for the syscall. In particular, it
    uses the same calling convention as sys_getdents64().

    This patch is part of a series which removes in-kernel calls to syscalls.
    On this basis, the syscall entry path can be streamlined. For details, see
    http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

    Cc: Alexander Viro
    Signed-off-by: Dominik Brodowski

    Dominik Brodowski
     

07 Nov, 2017

1 commit


02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

10 Oct, 2017

1 commit

  • There was mutex_lock_interruptible() initially, and it was changed
    to rwsem, but there were not killable rwsem primitives that time.
    >From commit 9902af79c01a:

    "The main issue is the lack of down_write_killable(), so the places
    like readdir.c switched to plain inode_lock(); once killable
    variants of rwsem primitives appear, that'll be dealt with"

    Use down_read_killable() same as down_write_killable() in !shared
    case, as concurrent inode_lock() may take much time, that may be
    wanted to be interrupted by user.

    Signed-off-by: Kirill Tkhai
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: arnd@arndb.de
    Cc: avagin@virtuozzo.com
    Cc: davem@davemloft.net
    Cc: fenghua.yu@intel.com
    Cc: gorcunov@virtuozzo.com
    Cc: heiko.carstens@de.ibm.com
    Cc: hpa@zytor.com
    Cc: ink@jurassic.park.msu.ru
    Cc: mattst88@gmail.com
    Cc: rientjes@google.com
    Cc: rth@twiddle.net
    Cc: schwidefsky@de.ibm.com
    Cc: tony.luck@intel.com
    Cc: viro@zeniv.linux.org.uk
    Link: http://lkml.kernel.org/r/150670120820.23930.5455667921545937220.stgit@localhost.localdomain
    Signed-off-by: Ingo Molnar

    Kirill Tkhai
     

18 Apr, 2017

1 commit


25 Dec, 2016

1 commit


26 May, 2016

1 commit


25 May, 2016

1 commit

  • Pull ext4 updates from Ted Ts'o:
    "Fix a number of bugs, most notably a potential stale data exposure
    after a crash and a potential BUG_ON crash if a file has the data
    journalling flag enabled while it has dirty delayed allocation blocks
    that haven't been written yet. Also fix a potential crash in the new
    project quota code and a maliciously corrupted file system.

    In addition, fix some DAX-specific bugs, including when there is a
    transient ENOSPC situation and races between writes via direct I/O and
    an mmap'ed segment that could lead to lost I/O.

    Finally the usual set of miscellaneous cleanups"

    * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (23 commits)
    ext4: pre-zero allocated blocks for DAX IO
    ext4: refactor direct IO code
    ext4: fix race in transient ENOSPC detection
    ext4: handle transient ENOSPC properly for DAX
    dax: call get_blocks() with create == 1 for write faults to unwritten extents
    ext4: remove unmeetable inconsisteny check from ext4_find_extent()
    jbd2: remove excess descriptions for handle_s
    ext4: remove unnecessary bio get/put
    ext4: silence UBSAN in ext4_mb_init()
    ext4: address UBSAN warning in mb_find_order_for_block()
    ext4: fix oops on corrupted filesystem
    ext4: fix check of dqget() return value in ext4_ioctl_setproject()
    ext4: clean up error handling when orphan list is corrupted
    ext4: fix hang when processing corrupted orphaned inode list
    ext4: remove trailing \n from ext4_warning/ext4_error calls
    ext4: fix races between changing inode journal mode and ext4_writepages
    ext4: handle unwritten or delalloc buffers before enabling data journaling
    ext4: fix jbd2 handle extension in ext4_ext_truncate_extend_restart()
    ext4: do not ask jbd2 to write data for delalloc buffers
    jbd2: add support for avoiding data writes during transaction commits
    ...

    Linus Torvalds
     

03 May, 2016

3 commits


24 Apr, 2016

1 commit

  • If a directory has a large number of empty blocks, iterating over all
    of them can take a long time, leading to scheduler warnings and users
    getting irritated when they can't kill a process in the middle of one
    of these long-running readdir operations. Fix this by adding checks to
    ext4_readdir() and ext4_htree_fill_tree().

    This was reverted earlier due to a typo in the original commit where I
    experimented with using signal_pending() instead of
    fatal_signal_pending(). The test was in the wrong place if we were
    going to return signal_pending() since we would end up returning
    duplicant entries. See 9f2394c9be47 for a more detailed explanation.

    Added fix as suggested by Linus to check for signal_pending() in
    in the filldir() functions.

    Reported-by: Benjamin LaHaise
    Google-Bug-Id: 27880676
    Signed-off-by: Theodore Ts'o

    Theodore Ts'o
     

23 Jan, 2016

1 commit

  • parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
    inode_foo(inode) being mutex_foo(&inode->i_mutex).

    Please, use those for access to ->i_mutex; over the coming cycle
    ->i_mutex will become rwsem, with ->lookup() done with it held
    only shared.

    Signed-off-by: Al Viro

    Al Viro
     

01 Nov, 2014

1 commit


05 Jun, 2014

1 commit

  • Before the patch, read creates FAN_ACCESS_PERM and FAN_ACCESS events,
    readdir creates only FAN_ACCESS_PERM events.

    This is inconsistent.

    After the patch, readdir creates FAN_ACCESS_PERM and FAN_ACCESS events.

    Signed-off-by: Heinrich Schuchardt
    Reviewed-by: Jan Kara
    Cc: Eric Paris
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Heinrich Schuchardt
     

25 Oct, 2013

1 commit


29 Jun, 2013

4 commits


23 Feb, 2013

1 commit


27 Sep, 2012

1 commit


30 May, 2012

1 commit


29 Feb, 2012

1 commit


10 Aug, 2010

1 commit

  • Using:

    gcc (GCC) 4.5.0 20100610 (prerelease)

    The following warnings appear:

    fs/readdir.c: In function `filldir64':
    fs/readdir.c:240:15: warning: `dirent' is used uninitialized in this function
    fs/readdir.c: In function `filldir':
    fs/readdir.c:155:15: warning: `dirent' is used uninitialized in this function
    fs/compat.c: In function `compat_filldir64':
    fs/compat.c:1071:11: warning: `dirent' is used uninitialized in this function
    fs/compat.c: In function `compat_filldir':
    fs/compat.c:984:15: warning: `dirent' is used uninitialized in this function

    The warnings are related to the use of the NAME_OFFSET() macro. Luckily,
    it appears as though the standard offsetof() macro is what is being
    implemented by NAME_OFFSET(), thus we can fix the warning and use a more
    standard code construct at the same time.

    Signed-off-by: Kevin Winchester
    Cc: Al Viro
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kevin Winchester
     

14 Jan, 2009

3 commits


23 Oct, 2008

1 commit


25 Aug, 2008

1 commit


07 Dec, 2007

1 commit


09 May, 2007

2 commits