11 Oct, 2016
1 commit
-
Pull more vfs updates from Al Viro:
">rename2() work from Miklos + current_time() from Deepa"* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: Replace current_fs_time() with current_time()
fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps
fs: Replace CURRENT_TIME with current_time() for inode timestamps
fs: proc: Delete inode time initializations in proc_alloc_inode()
vfs: Add current_time() api
vfs: add note about i_op->rename changes to porting
fs: rename "rename2" i_op to "rename"
vfs: remove unused i_op->rename
fs: make remaining filesystems use .rename2
libfs: support RENAME_NOREPLACE in simple_rename()
fs: support RENAME_NOREPLACE for local filesystems
ncpfs: fix unused variable warning
08 Oct, 2016
1 commit
28 Sep, 2016
1 commit
-
current_fs_time() uses struct super_block* as an argument.
As per Linus's suggestion, this is changed to take struct
inode* as a parameter instead. This is because the function
is primarily meant for vfs inode timestamps.
Also the function was renamed as per Arnd's suggestion.Change all calls to current_fs_time() to use the new
current_time() function instead. current_fs_time() will be
deleted.Signed-off-by: Deepa Dinamani
Signed-off-by: Al Viro
22 Sep, 2016
2 commits
-
Currently, notify_change() clears capabilities or IMA attributes by
calling security_inode_killpriv() before calling into ->setattr. Thus it
happens before any other permission checks in inode_change_ok() and user
is thus allowed to trigger clearing of capabilities or IMA attributes
for any file he can look up e.g. by calling chown for that file. This is
unexpected and can lead to user DoSing a system.Fix the problem by calling security_inode_killpriv() at the end of
inode_change_ok() instead of from notify_change(). At that moment we are
sure user has permissions to do the requested change.References: CVE-2015-1350
Reviewed-by: Christoph Hellwig
Signed-off-by: Jan Kara -
inode_change_ok() will be resposible for clearing capabilities and IMA
extended attributes and as such will need dentry. Give it as an argument
to inode_change_ok() instead of an inode. Also rename inode_change_ok()
to setattr_prepare() to better relect that it does also some
modifications in addition to checks.Reviewed-by: Christoph Hellwig
Signed-off-by: Jan Kara
16 Sep, 2016
1 commit
-
This fixes a bug where the permission was not properly checked in
overlayfs. The testcase is ltp/utimensat01.It is also cleaner and safer to do the permission checking in the vfs
helper instead of the caller.This patch introduces an additional ia_valid flag ATTR_TOUCH (since
touch(1) is the most obvious user of utimes(NULL)) that is passed into
notify_change whenever the conditions for this special permission checking
mode are met.Reported-by: Aihua Zhang
Signed-off-by: Miklos Szeredi
Tested-by: Aihua Zhang
Cc: # v3.18+
06 Jul, 2016
1 commit
-
When a filesystem outside of init_user_ns is mounted it could have
uids and gids stored in it that do not map to init_user_ns.The plan is to allow those filesystems to set i_uid to INVALID_UID and
i_gid to INVALID_GID for unmapped uids and gids and then to handle
that strange case in the vfs to ensure there is consistent robust
handling of the weirdness.Upon a careful review of the vfs and filesystems about the only case
where there is any possibility of confusion or trouble is when the
inode is written back to disk. In that case filesystems typically
read the inode->i_uid and inode->i_gid and write them to disk even
when just an inode timestamp is being updated.Which leads to a rule that is very simple to implement and understand
inodes whose i_uid or i_gid is not valid may not be written.In dealing with access times this means treat those inodes as if the
inode flag S_NOATIME was set. Reads of the inodes appear safe and
useful, but any write or modification is disallowed. The only inode
write that is allowed is a chown that sets the uid and gid on the
inode to valid values. After such a chown the inode is normal and may
be treated as such.Denying all writes to inodes with uids or gids unknown to the vfs also
prevents several oddball cases where corruption would have occurred
because the vfs does not have complete information.One problem case that is prevented is attempting to use the gid of a
directory for new inodes where the directories sgid bit is set but the
directories gid is not mapped.Another problem case avoided is attempting to update the evm hash
after setxattr, removexattr, and setattr. As the evm hash includeds
the inode->i_uid or inode->i_gid not knowning the uid or gid prevents
a correct evm hash from being computed. evm hash verification also
fails when i_uid or i_gid is unknown but that is essentially harmless
as it does not cause filesystem corruption.Acked-by: Seth Forshee
Signed-off-by: "Eric W. Biederman"
28 Jun, 2016
1 commit
-
Add checks to notify_change to verify that uid and gid changes
will map into the superblock's user namespace. If they do not
fail with -EOVERFLOW.This is mandatory so that fileystems don't have to even think
of dealing with ia_uid and ia_gid that--EWB Moved the test from inode_change_ok to notify_change
Signed-off-by: Seth Forshee
Acked-by: Serge Hallyn
Signed-off-by: Eric W. Biederman
23 Jan, 2016
1 commit
-
parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
inode_foo(inode) being mutex_foo(&inode->i_mutex).Please, use those for access to ->i_mutex; over the coming cycle
->i_mutex will become rwsem, with ->lookup() done with it held
only shared.Signed-off-by: Al Viro
11 Jun, 2014
1 commit
-
The kernel has no concept of capabilities with respect to inodes; inodes
exist independently of namespaces. For example, inode_capable(inode,
CAP_LINUX_IMMUTABLE) would be nonsense.This patch changes inode_capable to check for uid and gid mappings and
renames it to capable_wrt_inode_uidgid, which should make it more
obvious what it does.Fixes CVE-2014-4014.
Cc: Theodore Ts'o
Cc: Serge Hallyn
Cc: "Eric W. Biederman"
Cc: Dave Chinner
Cc: stable@vger.kernel.org
Signed-off-by: Andy Lutomirski
Signed-off-by: Linus Torvalds
06 Dec, 2013
1 commit
-
Currently notify_change directly updates i_version for size updates,
which not only is counter to how all other fields are updated through
struct iattr, but also breaks XFS, which need inode updates to happen
under its own lock, and synchronized to the structure that gets written
to the log.Remove the update in the common code, and it to btrfs and ext4,
XFS already does a proper updaste internally and currently gets a
double update with the existing code.IMHO this is 3.13 and -stable material and should go in through the XFS
tree.Signed-off-by: Christoph Hellwig
Reviewed-by: Andreas Dilger
Acked-by: Jan Kara
Reviewed-by: Dave Chinner
Signed-off-by: Chris Mason
Signed-off-by: Ben Myers
09 Nov, 2013
1 commit
-
NFSv4 uses leases to guarantee that clients can cache metadata as well
as data.Cc: Mikulas Patocka
Cc: David Howells
Cc: Tyler Hicks
Cc: Dustin Kirkland
Acked-by: Jeff Layton
Signed-off-by: J. Bruce Fields
Signed-off-by: Al Viro
20 Nov, 2012
1 commit
-
- Allow chown if CAP_CHOWN is present in the current user namespace
and the uid of the inode maps into the current user namespace, and
the destination uid or gid maps into the current user namespace.- Allow perserving setgid when changing an inode if CAP_FSETID is
present in the current user namespace and the owner of the file has
a mapping into the current user namespace.Acked-by: Serge E. Hallyn
Signed-off-by: "Eric W. Biederman"
08 Sep, 2012
1 commit
-
Changing an inode's metadata may result in our not needing to appraise
the file. In such cases, we must remove 'security.ima'.Changelog v1:
- use ima_inode_post_setattr() stub function, if IMA_APPRAISE not configuredSigned-off-by: Mimi Zohar
Acked-by: Serge Hallyn
Acked-by: Dmitry Kasatkin
14 Jul, 2012
1 commit
-
Cc: Djalal Harouni
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Al Viro
31 May, 2012
1 commit
-
When a file is truncated with truncate()/ftruncate() and then closed,
iversion is not updated. This patch uses ATTR_SIZE flag as an indication
to increment iversion.Mimi said:
On fput(), i_version is used to detect and flag files that have changed
and need to be re-measured in the IMA measurement policy. When a file
is truncated with truncate()/ftruncate() and then closed, i_version is
not updated. As a result, although the file has changed, it will not be
re-measured and added to the IMA measurement list on subsequent access.Signed-off-by: Dmitry Kasatkin
Acked-by: Mimi Zohar
Cc: Al Viro
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Al Viro
03 May, 2012
1 commit
-
Acked-by: Serge Hallyn
Signed-off-by: Eric W. Biederman
29 Feb, 2012
1 commit
-
For files only using THIS_MODULE and/or EXPORT_SYMBOL, map
them onto including export.h -- or if the file isn't even
using those, then just delete the include. Fix up any implicit
include dependencies that were being masked by module.h along
the way.Signed-off-by: Paul Gortmaker
04 Jan, 2012
1 commit
-
Signed-off-by: Al Viro
09 Aug, 2011
1 commit
-
Conflicts:
fs/attr.cResolve conflict manually.
Signed-off-by: James Morris
21 Jul, 2011
2 commits
-
Let filesystems handle waiting for direct I/O requests themselves instead
of doing it beforehand. This means filesystem-specific locks to prevent
new dio referenes from appearing can be held. This is important to allow
generalizing i_dio_count to non-DIO_LOCKING filesystems.Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro -
i_alloc_sem is a rather special rw_semaphore. It's the last one that may
be released by a non-owner, and it's write side is always mirrored by
real exclusion. It's intended use it to wait for all pending direct I/O
requests to finish before starting a truncate.Replace it with a hand-grown construct:
- exclusion for truncates is already guaranteed by i_mutex, so it can
simply fall way
- the reader side is replaced by an i_dio_count member in struct inode
that counts the number of pending direct I/O requests. Truncate can't
proceed as long as it's non-zero
- when i_dio_count reaches non-zero we wake up a pending truncate using
wake_up_bit on a new bit in i_flags
- new references to i_dio_count can't appear while we are waiting for
it to read zero because the direct I/O count always needs i_mutex
(or an equivalent like XFS's i_iolock) for starting a new operation.This scheme is much simpler, and saves the space of a spinlock_t and a
struct list_head in struct inode (typically 160 bits on a non-debug 64-bit
system).Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro
19 Jul, 2011
1 commit
-
Changing the inode's metadata may require the 'security.evm' extended
attribute to be re-calculated and updated.Signed-off-by: Mimi Zohar
Acked-by: Serge Hallyn
29 May, 2011
1 commit
-
Some recent benchmarking on btrfs showed that a major scaling bottleneck
on large systems on btrfs is currently the xattr lookup on every write.Why xattr lookup on every write I hear you ask?
write wants to drop suid and security related xattrs that could set o
capabilities for executables. To do that it currently looks up
security.capability on EVERY write (even for non executables) to decide
whether to drop it or not.In btrfs this causes an additional tree walk, hitting some per file system
locks and quite bad scalability. In a simple read workload on a 8S
system I saw over 90% CPU time in spinlocks related to that.Chris Mason tells me this is also a problem in ext4, where it hits
the global mbcache lock.This patch adds a simple per inode to avoid this problem. We only
do the lookup once per file and then if there is no xattr cache
the decision. All xattr changes clear the flag.I also used the same flag to avoid the suid check, although
that one is pretty cheap.A file system can also set this flag when it creates the inode,
if it has a cheap way to do so. This is done for some common file systems
in followon patches.With this patch a major part of the lock contention disappears
for btrfs. Some testing on smaller systems didn't show significant
performance changes, but at least it helps the larger systems
and is generally more efficient.v2: Rename is_sgid. add file system helper.
Cc: chris.mason@oracle.com
Cc: josef@redhat.com
Cc: viro@zeniv.linux.org.uk
Cc: agruen@linbit.com
Cc: Serge E. Hallyn
Signed-off-by: Andi Kleen
Signed-off-by: Al Viro
31 Mar, 2011
1 commit
-
Fixes generated by 'codespell' and manually reviewed.
Signed-off-by: Lucas De Marchi
24 Mar, 2011
1 commit
-
And give it a kernel-doc comment.
[akpm@linux-foundation.org: btrfs changed in linux-next]
Signed-off-by: Serge E. Hallyn
Cc: "Eric W. Biederman"
Cc: Daniel Lezcano
Acked-by: David Howells
Cc: James Morris
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
10 Aug, 2010
4 commits
-
Make sure we check the truncate constraints early on in ->setattr by adding
those checks to inode_change_ok. Also clean up and document inode_change_ok
to make this obvious.As a fallout we don't have to call inode_newsize_ok from simple_setsize and
simplify it down to a truncate_setsize which doesn't return an error. This
simplifies a lot of setattr implementations and means we use truncate_setsize
almost everywhere. Get rid of fat_setsize now that it's trivial and mark
ext2_setsize static to make the calling convention obvious.Keep the inode_newsize_ok in vmtruncate for now as all callers need an
audit for its removal anyway.Note: setattr code in ecryptfs doesn't call inode_change_ok at all and
needs a deeper audit, but that is left for later.Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro -
Replace inode_setattr with opencoded variants of it in all callers. This
moves the remaining call to vmtruncate into the filesystem methods where it
can be replaced with the proper truncate sequence.In a few cases it was obvious that we would never end up calling vmtruncate
so it was left out in the opencoded variant:spufs: explicitly checks for ATTR_SIZE earlier
btrfs,hugetlbfs,logfs,dlmfs: explicitly clears ATTR_SIZE earlier
ufs: contains an opencoded simple_seattr + truncate that sets the filesize just aboveIn addition to that ncpfs called inode_setattr with handcrafted iattrs,
which allowed to trim down the opencoded variant.Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro -
With the new truncate sequence every filesystem that wants to support file
size changes on disk needs to implement its own ->setattr. So instead
of calling inode_setattr which supports size changes call into a simple
method that doesn't support this. simple_setattr is almost what we
want except that it does not mark the inode dirty after changes. Given
that marking the inode dirty is a no-op for the simple in-memory filesystems
that use simple_setattr currently just add the mark_inode_dirty call.Also add a WARN_ON for the presence of a truncate method to simple_setattr
to catch new instances of it during the transition period.Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro -
Despite its name it's now a generic implementation of ->setattr, but
rather a helper to copy attributes from a struct iattr to the inode.
Rename it to setattr_copy to reflect this fact.Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro
28 May, 2010
1 commit
-
Introduce a new truncate calling sequence into fs/mm subsystems. Rather than
setattr > vmtruncate > truncate, have filesystems call their truncate sequence
from ->setattr if filesystem specific operations are required. vmtruncate is
deprecated, and truncate_pagecache and inode_newsize_ok helpers introduced
previously should be used.simple_setattr is introduced for simple in-ram filesystems to implement
the new truncate sequence. Eventually all filesystems should be converted
to implement a setattr, and the default code in notify_change should go
away.simple_setsize is also introduced to perform just the ATTR_SIZE portion
of simple_setattr (ie. changing i_size and trimming pagecache).To implement the new truncate sequence:
- filesystem specific manipulations (eg freeing blocks) must be done in
the setattr method rather than ->truncate.
- vmtruncate can not be used by core code to trim blocks past i_size in
the event of write failure after allocation, so this must be performed
in the fs code.
- convert usage of helpers block_write_begin, nobh_write_begin,
cont_write_begin, and *blockdev_direct_IO* to use _newtrunc postfixed
variants. These avoid calling vmtruncate to trim blocks (see previous).
- inode_setattr should not be used. generic_setattr is a new function
to be used to copy simple attributes into the generic inode.
- make use of the better opportunity to handle errors with the new sequence.Big problem with the previous calling sequence: the filesystem is not called
until i_size has already changed. This means it is not allowed to fail the
call, and also it does not know what the previous i_size was. Also, generic
code calling vmtruncate to truncate allocated blocks in case of error had
no good way to return a meaningful error (or, for example, atomically handle
block deallocation).Cc: Christoph Hellwig
Acked-by: Jan Kara
Signed-off-by: Nick Piggin
Signed-off-by: Al Viro
07 Mar, 2010
1 commit
-
Make sure compiler won't do weird things with limits. E.g. fetching them
twice may return 2 different values after writable limits are implemented.I.e. either use rlimit helpers added in commit 3e10e716abf3 ("resource:
add helpers for fetching rlimits") or ACCESS_ONCE if not applicable.Signed-off-by: Jiri Slaby
Cc: Alexander Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
05 Mar, 2010
1 commit
-
Currently notify_change calls vfs_dq_transfer directly. This means
we tie the quota code into the VFS. Get rid of that and make the
filesystem responsible for the transfer. Most filesystems already
do this, only ufs and udf need the code added, and for jfs it needs to
be enabled unconditionally instead of only when ACLs are enabled.Signed-off-by: Christoph Hellwig
Signed-off-by: Jan Kara
24 Sep, 2009
1 commit
-
Introduce new truncate helpers truncate_pagecache and inode_newsize_ok.
vmtruncate is also consolidated from mm/memory.c and mm/nommu.c and
into mm/truncate.c.Reviewed-by: Christoph Hellwig
Signed-off-by: Nick Piggin
Signed-off-by: Al Viro
26 Mar, 2009
1 commit
-
Use lowercase names of quota functions instead of old uppercase ones.
Signed-off-by: Jan Kara
CC: Alexander Viro
14 Nov, 2008
1 commit
-
Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().
Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.Signed-off-by: David Howells
Reviewed-by: James Morris
Acked-by: Serge Hallyn
Cc: Al Viro
Signed-off-by: James Morris
23 Oct, 2008
1 commit
-
Call security_inode_setattr() consistetly before inode_change_ok().
It doesn't make sense to try to "optimize" the i_op->setattr == NULL
case, as most filesystem do define their own setattr function.Signed-off-by: Miklos Szeredi
27 Jul, 2008
2 commits
-
Move the immutable and append-only checks from chmod, chown and utimes
into notify_change(). Checks for immutable and append-only files are
always performed by the VFS and not by the filesystem (see
permission() and may_...() in namei.c), so these belong in
notify_change(), and not in inode_change_ok().This should be completely equivalent.
CC: Ulrich Drepper
CC: Michael Kerrisk
Signed-off-by: Miklos Szeredi
Signed-off-by: Al Viro -
Add a new ia_valid flag: ATTR_TIMES_SET, to handle the
UTIMES_OMIT/UTIMES_NOW and UTIMES_NOW/UTIMES_OMIT cases. In these
cases neither ATTR_MTIME_SET nor ATTR_ATIME_SET is in the flags, yet
the POSIX draft specifies that permission checking is performed the
same way as if one or both of the times was explicitly set to a
timestamp.See the path "vfs: utimensat(): fix error checking for
{UTIME_NOW,UTIME_OMIT} case" by Michael Kerrisk for the patch
introducing this behavior.This is a cleanup, as well as allowing filesystems (NFS/fuse/...) to
perform their own permission checking instead of the default.CC: Ulrich Drepper
CC: Michael Kerrisk
Signed-off-by: Miklos Szeredi
Signed-off-by: Al Viro
19 Oct, 2007
1 commit
-
When an unprivileged process attempts to modify a file that has the setuid or
setgid bits set, the VFS will attempt to clear these bits. The VFS will set
the ATTR_KILL_SUID or ATTR_KILL_SGID bits in the ia_valid mask, and then call
notify_change to clear these bits and set the mode accordingly.With a networked filesystem (NFS and CIFS in particular but likely others),
the client machine or process may not have credentials that allow for setting
the mode. In some situations, this can lead to file corruption, an operation
failing outright because the setattr fails, or to races that lead to a mode
change being reverted.In this situation, we'd like to just leave the handling of this to the server
and ignore these bits. The problem is that by the time the setattr op is
called, the VFS has already reinterpreted the ATTR_KILL_* bits into a mode
change. The setattr operation has no way to know its intent.The following patch fixes this by making notify_change no longer clear the
ATTR_KILL_SUID and ATTR_KILL_SGID bits in the ia_valid before handing it off
to the setattr inode op. setattr can then check for the presence of these
bits, and if they're set it can assume that the mode change was only for the
purposes of clearing these bits.This means that we now have an implicit assumption that notify_change is never
called with ATTR_MODE and either ATTR_KILL_S*ID bit set. Nothing currently
enforces that, so this patch also adds a BUG() if that occurs.Signed-off-by: Jeff Layton
Cc: Michael Halcrow
Cc: Christoph Hellwig
Cc: Neil Brown
Cc: "J. Bruce Fields"
Cc: Chris Mason
Cc: Jeff Mahoney
Cc: "Vladimir V. Saveliev"
Cc: Josef 'Jeff' Sipek
Cc: Trond Myklebust
Cc: Steven French
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds