02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

02 Mar, 2017

1 commit


11 Oct, 2016

1 commit

  • Pull more vfs updates from Al Viro:
    ">rename2() work from Miklos + current_time() from Deepa"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    fs: Replace current_fs_time() with current_time()
    fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps
    fs: Replace CURRENT_TIME with current_time() for inode timestamps
    fs: proc: Delete inode time initializations in proc_alloc_inode()
    vfs: Add current_time() api
    vfs: add note about i_op->rename changes to porting
    fs: rename "rename2" i_op to "rename"
    vfs: remove unused i_op->rename
    fs: make remaining filesystems use .rename2
    libfs: support RENAME_NOREPLACE in simple_rename()
    fs: support RENAME_NOREPLACE for local filesystems
    ncpfs: fix unused variable warning

    Linus Torvalds
     

08 Oct, 2016

1 commit


28 Sep, 2016

1 commit

  • current_fs_time() uses struct super_block* as an argument.
    As per Linus's suggestion, this is changed to take struct
    inode* as a parameter instead. This is because the function
    is primarily meant for vfs inode timestamps.
    Also the function was renamed as per Arnd's suggestion.

    Change all calls to current_fs_time() to use the new
    current_time() function instead. current_fs_time() will be
    deleted.

    Signed-off-by: Deepa Dinamani
    Signed-off-by: Al Viro

    Deepa Dinamani
     

22 Sep, 2016

2 commits

  • Currently, notify_change() clears capabilities or IMA attributes by
    calling security_inode_killpriv() before calling into ->setattr. Thus it
    happens before any other permission checks in inode_change_ok() and user
    is thus allowed to trigger clearing of capabilities or IMA attributes
    for any file he can look up e.g. by calling chown for that file. This is
    unexpected and can lead to user DoSing a system.

    Fix the problem by calling security_inode_killpriv() at the end of
    inode_change_ok() instead of from notify_change(). At that moment we are
    sure user has permissions to do the requested change.

    References: CVE-2015-1350
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Jan Kara
     
  • inode_change_ok() will be resposible for clearing capabilities and IMA
    extended attributes and as such will need dentry. Give it as an argument
    to inode_change_ok() instead of an inode. Also rename inode_change_ok()
    to setattr_prepare() to better relect that it does also some
    modifications in addition to checks.

    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Jan Kara
     

16 Sep, 2016

1 commit

  • This fixes a bug where the permission was not properly checked in
    overlayfs. The testcase is ltp/utimensat01.

    It is also cleaner and safer to do the permission checking in the vfs
    helper instead of the caller.

    This patch introduces an additional ia_valid flag ATTR_TOUCH (since
    touch(1) is the most obvious user of utimes(NULL)) that is passed into
    notify_change whenever the conditions for this special permission checking
    mode are met.

    Reported-by: Aihua Zhang
    Signed-off-by: Miklos Szeredi
    Tested-by: Aihua Zhang
    Cc: # v3.18+

    Miklos Szeredi
     

06 Jul, 2016

1 commit

  • When a filesystem outside of init_user_ns is mounted it could have
    uids and gids stored in it that do not map to init_user_ns.

    The plan is to allow those filesystems to set i_uid to INVALID_UID and
    i_gid to INVALID_GID for unmapped uids and gids and then to handle
    that strange case in the vfs to ensure there is consistent robust
    handling of the weirdness.

    Upon a careful review of the vfs and filesystems about the only case
    where there is any possibility of confusion or trouble is when the
    inode is written back to disk. In that case filesystems typically
    read the inode->i_uid and inode->i_gid and write them to disk even
    when just an inode timestamp is being updated.

    Which leads to a rule that is very simple to implement and understand
    inodes whose i_uid or i_gid is not valid may not be written.

    In dealing with access times this means treat those inodes as if the
    inode flag S_NOATIME was set. Reads of the inodes appear safe and
    useful, but any write or modification is disallowed. The only inode
    write that is allowed is a chown that sets the uid and gid on the
    inode to valid values. After such a chown the inode is normal and may
    be treated as such.

    Denying all writes to inodes with uids or gids unknown to the vfs also
    prevents several oddball cases where corruption would have occurred
    because the vfs does not have complete information.

    One problem case that is prevented is attempting to use the gid of a
    directory for new inodes where the directories sgid bit is set but the
    directories gid is not mapped.

    Another problem case avoided is attempting to update the evm hash
    after setxattr, removexattr, and setattr. As the evm hash includeds
    the inode->i_uid or inode->i_gid not knowning the uid or gid prevents
    a correct evm hash from being computed. evm hash verification also
    fails when i_uid or i_gid is unknown but that is essentially harmless
    as it does not cause filesystem corruption.

    Acked-by: Seth Forshee
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

28 Jun, 2016

1 commit

  • Add checks to notify_change to verify that uid and gid changes
    will map into the superblock's user namespace. If they do not
    fail with -EOVERFLOW.

    This is mandatory so that fileystems don't have to even think
    of dealing with ia_uid and ia_gid that

    --EWB Moved the test from inode_change_ok to notify_change

    Signed-off-by: Seth Forshee
    Acked-by: Serge Hallyn
    Signed-off-by: Eric W. Biederman

    Seth Forshee
     

23 Jan, 2016

1 commit

  • parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
    inode_foo(inode) being mutex_foo(&inode->i_mutex).

    Please, use those for access to ->i_mutex; over the coming cycle
    ->i_mutex will become rwsem, with ->lookup() done with it held
    only shared.

    Signed-off-by: Al Viro

    Al Viro
     

11 Jun, 2014

1 commit

  • The kernel has no concept of capabilities with respect to inodes; inodes
    exist independently of namespaces. For example, inode_capable(inode,
    CAP_LINUX_IMMUTABLE) would be nonsense.

    This patch changes inode_capable to check for uid and gid mappings and
    renames it to capable_wrt_inode_uidgid, which should make it more
    obvious what it does.

    Fixes CVE-2014-4014.

    Cc: Theodore Ts'o
    Cc: Serge Hallyn
    Cc: "Eric W. Biederman"
    Cc: Dave Chinner
    Cc: stable@vger.kernel.org
    Signed-off-by: Andy Lutomirski
    Signed-off-by: Linus Torvalds

    Andy Lutomirski
     

06 Dec, 2013

1 commit

  • Currently notify_change directly updates i_version for size updates,
    which not only is counter to how all other fields are updated through
    struct iattr, but also breaks XFS, which need inode updates to happen
    under its own lock, and synchronized to the structure that gets written
    to the log.

    Remove the update in the common code, and it to btrfs and ext4,
    XFS already does a proper updaste internally and currently gets a
    double update with the existing code.

    IMHO this is 3.13 and -stable material and should go in through the XFS
    tree.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Andreas Dilger
    Acked-by: Jan Kara
    Reviewed-by: Dave Chinner
    Signed-off-by: Chris Mason
    Signed-off-by: Ben Myers

    Christoph Hellwig
     

09 Nov, 2013

1 commit


20 Nov, 2012

1 commit

  • - Allow chown if CAP_CHOWN is present in the current user namespace
    and the uid of the inode maps into the current user namespace, and
    the destination uid or gid maps into the current user namespace.

    - Allow perserving setgid when changing an inode if CAP_FSETID is
    present in the current user namespace and the owner of the file has
    a mapping into the current user namespace.

    Acked-by: Serge E. Hallyn
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

08 Sep, 2012

1 commit

  • Changing an inode's metadata may result in our not needing to appraise
    the file. In such cases, we must remove 'security.ima'.

    Changelog v1:
    - use ima_inode_post_setattr() stub function, if IMA_APPRAISE not configured

    Signed-off-by: Mimi Zohar
    Acked-by: Serge Hallyn
    Acked-by: Dmitry Kasatkin

    Mimi Zohar
     

14 Jul, 2012

1 commit


31 May, 2012

1 commit

  • When a file is truncated with truncate()/ftruncate() and then closed,
    iversion is not updated. This patch uses ATTR_SIZE flag as an indication
    to increment iversion.

    Mimi said:

    On fput(), i_version is used to detect and flag files that have changed
    and need to be re-measured in the IMA measurement policy. When a file
    is truncated with truncate()/ftruncate() and then closed, i_version is
    not updated. As a result, although the file has changed, it will not be
    re-measured and added to the IMA measurement list on subsequent access.

    Signed-off-by: Dmitry Kasatkin
    Acked-by: Mimi Zohar
    Cc: Al Viro
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Dmitry Kasatkin
     

03 May, 2012

1 commit


29 Feb, 2012

1 commit


04 Jan, 2012

1 commit


09 Aug, 2011

1 commit


21 Jul, 2011

2 commits

  • Let filesystems handle waiting for direct I/O requests themselves instead
    of doing it beforehand. This means filesystem-specific locks to prevent
    new dio referenes from appearing can be held. This is important to allow
    generalizing i_dio_count to non-DIO_LOCKING filesystems.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • i_alloc_sem is a rather special rw_semaphore. It's the last one that may
    be released by a non-owner, and it's write side is always mirrored by
    real exclusion. It's intended use it to wait for all pending direct I/O
    requests to finish before starting a truncate.

    Replace it with a hand-grown construct:

    - exclusion for truncates is already guaranteed by i_mutex, so it can
    simply fall way
    - the reader side is replaced by an i_dio_count member in struct inode
    that counts the number of pending direct I/O requests. Truncate can't
    proceed as long as it's non-zero
    - when i_dio_count reaches non-zero we wake up a pending truncate using
    wake_up_bit on a new bit in i_flags
    - new references to i_dio_count can't appear while we are waiting for
    it to read zero because the direct I/O count always needs i_mutex
    (or an equivalent like XFS's i_iolock) for starting a new operation.

    This scheme is much simpler, and saves the space of a spinlock_t and a
    struct list_head in struct inode (typically 160 bits on a non-debug 64-bit
    system).

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

19 Jul, 2011

1 commit


29 May, 2011

1 commit

  • Some recent benchmarking on btrfs showed that a major scaling bottleneck
    on large systems on btrfs is currently the xattr lookup on every write.

    Why xattr lookup on every write I hear you ask?

    write wants to drop suid and security related xattrs that could set o
    capabilities for executables. To do that it currently looks up
    security.capability on EVERY write (even for non executables) to decide
    whether to drop it or not.

    In btrfs this causes an additional tree walk, hitting some per file system
    locks and quite bad scalability. In a simple read workload on a 8S
    system I saw over 90% CPU time in spinlocks related to that.

    Chris Mason tells me this is also a problem in ext4, where it hits
    the global mbcache lock.

    This patch adds a simple per inode to avoid this problem. We only
    do the lookup once per file and then if there is no xattr cache
    the decision. All xattr changes clear the flag.

    I also used the same flag to avoid the suid check, although
    that one is pretty cheap.

    A file system can also set this flag when it creates the inode,
    if it has a cheap way to do so. This is done for some common file systems
    in followon patches.

    With this patch a major part of the lock contention disappears
    for btrfs. Some testing on smaller systems didn't show significant
    performance changes, but at least it helps the larger systems
    and is generally more efficient.

    v2: Rename is_sgid. add file system helper.
    Cc: chris.mason@oracle.com
    Cc: josef@redhat.com
    Cc: viro@zeniv.linux.org.uk
    Cc: agruen@linbit.com
    Cc: Serge E. Hallyn
    Signed-off-by: Andi Kleen
    Signed-off-by: Al Viro

    Andi Kleen
     

31 Mar, 2011

1 commit


24 Mar, 2011

1 commit


10 Aug, 2010

4 commits

  • Make sure we check the truncate constraints early on in ->setattr by adding
    those checks to inode_change_ok. Also clean up and document inode_change_ok
    to make this obvious.

    As a fallout we don't have to call inode_newsize_ok from simple_setsize and
    simplify it down to a truncate_setsize which doesn't return an error. This
    simplifies a lot of setattr implementations and means we use truncate_setsize
    almost everywhere. Get rid of fat_setsize now that it's trivial and mark
    ext2_setsize static to make the calling convention obvious.

    Keep the inode_newsize_ok in vmtruncate for now as all callers need an
    audit for its removal anyway.

    Note: setattr code in ecryptfs doesn't call inode_change_ok at all and
    needs a deeper audit, but that is left for later.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Replace inode_setattr with opencoded variants of it in all callers. This
    moves the remaining call to vmtruncate into the filesystem methods where it
    can be replaced with the proper truncate sequence.

    In a few cases it was obvious that we would never end up calling vmtruncate
    so it was left out in the opencoded variant:

    spufs: explicitly checks for ATTR_SIZE earlier
    btrfs,hugetlbfs,logfs,dlmfs: explicitly clears ATTR_SIZE earlier
    ufs: contains an opencoded simple_seattr + truncate that sets the filesize just above

    In addition to that ncpfs called inode_setattr with handcrafted iattrs,
    which allowed to trim down the opencoded variant.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • With the new truncate sequence every filesystem that wants to support file
    size changes on disk needs to implement its own ->setattr. So instead
    of calling inode_setattr which supports size changes call into a simple
    method that doesn't support this. simple_setattr is almost what we
    want except that it does not mark the inode dirty after changes. Given
    that marking the inode dirty is a no-op for the simple in-memory filesystems
    that use simple_setattr currently just add the mark_inode_dirty call.

    Also add a WARN_ON for the presence of a truncate method to simple_setattr
    to catch new instances of it during the transition period.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Despite its name it's now a generic implementation of ->setattr, but
    rather a helper to copy attributes from a struct iattr to the inode.
    Rename it to setattr_copy to reflect this fact.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

28 May, 2010

1 commit

  • Introduce a new truncate calling sequence into fs/mm subsystems. Rather than
    setattr > vmtruncate > truncate, have filesystems call their truncate sequence
    from ->setattr if filesystem specific operations are required. vmtruncate is
    deprecated, and truncate_pagecache and inode_newsize_ok helpers introduced
    previously should be used.

    simple_setattr is introduced for simple in-ram filesystems to implement
    the new truncate sequence. Eventually all filesystems should be converted
    to implement a setattr, and the default code in notify_change should go
    away.

    simple_setsize is also introduced to perform just the ATTR_SIZE portion
    of simple_setattr (ie. changing i_size and trimming pagecache).

    To implement the new truncate sequence:
    - filesystem specific manipulations (eg freeing blocks) must be done in
    the setattr method rather than ->truncate.
    - vmtruncate can not be used by core code to trim blocks past i_size in
    the event of write failure after allocation, so this must be performed
    in the fs code.
    - convert usage of helpers block_write_begin, nobh_write_begin,
    cont_write_begin, and *blockdev_direct_IO* to use _newtrunc postfixed
    variants. These avoid calling vmtruncate to trim blocks (see previous).
    - inode_setattr should not be used. generic_setattr is a new function
    to be used to copy simple attributes into the generic inode.
    - make use of the better opportunity to handle errors with the new sequence.

    Big problem with the previous calling sequence: the filesystem is not called
    until i_size has already changed. This means it is not allowed to fail the
    call, and also it does not know what the previous i_size was. Also, generic
    code calling vmtruncate to truncate allocated blocks in case of error had
    no good way to return a meaningful error (or, for example, atomically handle
    block deallocation).

    Cc: Christoph Hellwig
    Acked-by: Jan Kara
    Signed-off-by: Nick Piggin
    Signed-off-by: Al Viro

    npiggin@suse.de
     

07 Mar, 2010

1 commit

  • Make sure compiler won't do weird things with limits. E.g. fetching them
    twice may return 2 different values after writable limits are implemented.

    I.e. either use rlimit helpers added in commit 3e10e716abf3 ("resource:
    add helpers for fetching rlimits") or ACCESS_ONCE if not applicable.

    Signed-off-by: Jiri Slaby
    Cc: Alexander Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiri Slaby
     

05 Mar, 2010

1 commit

  • Currently notify_change calls vfs_dq_transfer directly. This means
    we tie the quota code into the VFS. Get rid of that and make the
    filesystem responsible for the transfer. Most filesystems already
    do this, only ufs and udf need the code added, and for jfs it needs to
    be enabled unconditionally instead of only when ACLs are enabled.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Christoph Hellwig
     

24 Sep, 2009

1 commit

  • Introduce new truncate helpers truncate_pagecache and inode_newsize_ok.
    vmtruncate is also consolidated from mm/memory.c and mm/nommu.c and
    into mm/truncate.c.

    Reviewed-by: Christoph Hellwig
    Signed-off-by: Nick Piggin
    Signed-off-by: Al Viro

    npiggin@suse.de
     

26 Mar, 2009

1 commit


14 Nov, 2008

1 commit

  • Wrap access to task credentials so that they can be separated more easily from
    the task_struct during the introduction of COW creds.

    Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

    Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more
    sense to use RCU directly rather than a convenient wrapper; these will be
    addressed by later patches.

    Signed-off-by: David Howells
    Reviewed-by: James Morris
    Acked-by: Serge Hallyn
    Cc: Al Viro
    Signed-off-by: James Morris

    David Howells
     

23 Oct, 2008

1 commit


27 Jul, 2008

1 commit

  • Move the immutable and append-only checks from chmod, chown and utimes
    into notify_change(). Checks for immutable and append-only files are
    always performed by the VFS and not by the filesystem (see
    permission() and may_...() in namei.c), so these belong in
    notify_change(), and not in inode_change_ok().

    This should be completely equivalent.

    CC: Ulrich Drepper
    CC: Michael Kerrisk
    Signed-off-by: Miklos Szeredi
    Signed-off-by: Al Viro

    Miklos Szeredi