23 Jan, 2016
1 commit
-
parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
inode_foo(inode) being mutex_foo(&inode->i_mutex).Please, use those for access to ->i_mutex; over the coming cycle
->i_mutex will become rwsem, with ->lookup() done with it held
only shared.Signed-off-by: Al Viro
11 Jun, 2014
1 commit
-
The kernel has no concept of capabilities with respect to inodes; inodes
exist independently of namespaces. For example, inode_capable(inode,
CAP_LINUX_IMMUTABLE) would be nonsense.This patch changes inode_capable to check for uid and gid mappings and
renames it to capable_wrt_inode_uidgid, which should make it more
obvious what it does.Fixes CVE-2014-4014.
Cc: Theodore Ts'o
Cc: Serge Hallyn
Cc: "Eric W. Biederman"
Cc: Dave Chinner
Cc: stable@vger.kernel.org
Signed-off-by: Andy Lutomirski
Signed-off-by: Linus Torvalds
06 Dec, 2013
1 commit
-
Currently notify_change directly updates i_version for size updates,
which not only is counter to how all other fields are updated through
struct iattr, but also breaks XFS, which need inode updates to happen
under its own lock, and synchronized to the structure that gets written
to the log.Remove the update in the common code, and it to btrfs and ext4,
XFS already does a proper updaste internally and currently gets a
double update with the existing code.IMHO this is 3.13 and -stable material and should go in through the XFS
tree.Signed-off-by: Christoph Hellwig
Reviewed-by: Andreas Dilger
Acked-by: Jan Kara
Reviewed-by: Dave Chinner
Signed-off-by: Chris Mason
Signed-off-by: Ben Myers
09 Nov, 2013
1 commit
-
NFSv4 uses leases to guarantee that clients can cache metadata as well
as data.Cc: Mikulas Patocka
Cc: David Howells
Cc: Tyler Hicks
Cc: Dustin Kirkland
Acked-by: Jeff Layton
Signed-off-by: J. Bruce Fields
Signed-off-by: Al Viro
20 Nov, 2012
1 commit
-
- Allow chown if CAP_CHOWN is present in the current user namespace
and the uid of the inode maps into the current user namespace, and
the destination uid or gid maps into the current user namespace.- Allow perserving setgid when changing an inode if CAP_FSETID is
present in the current user namespace and the owner of the file has
a mapping into the current user namespace.Acked-by: Serge E. Hallyn
Signed-off-by: "Eric W. Biederman"
08 Sep, 2012
1 commit
-
Changing an inode's metadata may result in our not needing to appraise
the file. In such cases, we must remove 'security.ima'.Changelog v1:
- use ima_inode_post_setattr() stub function, if IMA_APPRAISE not configuredSigned-off-by: Mimi Zohar
Acked-by: Serge Hallyn
Acked-by: Dmitry Kasatkin
14 Jul, 2012
1 commit
-
Cc: Djalal Harouni
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Al Viro
31 May, 2012
1 commit
-
When a file is truncated with truncate()/ftruncate() and then closed,
iversion is not updated. This patch uses ATTR_SIZE flag as an indication
to increment iversion.Mimi said:
On fput(), i_version is used to detect and flag files that have changed
and need to be re-measured in the IMA measurement policy. When a file
is truncated with truncate()/ftruncate() and then closed, i_version is
not updated. As a result, although the file has changed, it will not be
re-measured and added to the IMA measurement list on subsequent access.Signed-off-by: Dmitry Kasatkin
Acked-by: Mimi Zohar
Cc: Al Viro
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Al Viro
03 May, 2012
1 commit
-
Acked-by: Serge Hallyn
Signed-off-by: Eric W. Biederman
29 Feb, 2012
1 commit
-
For files only using THIS_MODULE and/or EXPORT_SYMBOL, map
them onto including export.h -- or if the file isn't even
using those, then just delete the include. Fix up any implicit
include dependencies that were being masked by module.h along
the way.Signed-off-by: Paul Gortmaker
04 Jan, 2012
1 commit
-
Signed-off-by: Al Viro
09 Aug, 2011
1 commit
-
Conflicts:
fs/attr.cResolve conflict manually.
Signed-off-by: James Morris
21 Jul, 2011
2 commits
-
Let filesystems handle waiting for direct I/O requests themselves instead
of doing it beforehand. This means filesystem-specific locks to prevent
new dio referenes from appearing can be held. This is important to allow
generalizing i_dio_count to non-DIO_LOCKING filesystems.Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro -
i_alloc_sem is a rather special rw_semaphore. It's the last one that may
be released by a non-owner, and it's write side is always mirrored by
real exclusion. It's intended use it to wait for all pending direct I/O
requests to finish before starting a truncate.Replace it with a hand-grown construct:
- exclusion for truncates is already guaranteed by i_mutex, so it can
simply fall way
- the reader side is replaced by an i_dio_count member in struct inode
that counts the number of pending direct I/O requests. Truncate can't
proceed as long as it's non-zero
- when i_dio_count reaches non-zero we wake up a pending truncate using
wake_up_bit on a new bit in i_flags
- new references to i_dio_count can't appear while we are waiting for
it to read zero because the direct I/O count always needs i_mutex
(or an equivalent like XFS's i_iolock) for starting a new operation.This scheme is much simpler, and saves the space of a spinlock_t and a
struct list_head in struct inode (typically 160 bits on a non-debug 64-bit
system).Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro
19 Jul, 2011
1 commit
-
Changing the inode's metadata may require the 'security.evm' extended
attribute to be re-calculated and updated.Signed-off-by: Mimi Zohar
Acked-by: Serge Hallyn
29 May, 2011
1 commit
-
Some recent benchmarking on btrfs showed that a major scaling bottleneck
on large systems on btrfs is currently the xattr lookup on every write.Why xattr lookup on every write I hear you ask?
write wants to drop suid and security related xattrs that could set o
capabilities for executables. To do that it currently looks up
security.capability on EVERY write (even for non executables) to decide
whether to drop it or not.In btrfs this causes an additional tree walk, hitting some per file system
locks and quite bad scalability. In a simple read workload on a 8S
system I saw over 90% CPU time in spinlocks related to that.Chris Mason tells me this is also a problem in ext4, where it hits
the global mbcache lock.This patch adds a simple per inode to avoid this problem. We only
do the lookup once per file and then if there is no xattr cache
the decision. All xattr changes clear the flag.I also used the same flag to avoid the suid check, although
that one is pretty cheap.A file system can also set this flag when it creates the inode,
if it has a cheap way to do so. This is done for some common file systems
in followon patches.With this patch a major part of the lock contention disappears
for btrfs. Some testing on smaller systems didn't show significant
performance changes, but at least it helps the larger systems
and is generally more efficient.v2: Rename is_sgid. add file system helper.
Cc: chris.mason@oracle.com
Cc: josef@redhat.com
Cc: viro@zeniv.linux.org.uk
Cc: agruen@linbit.com
Cc: Serge E. Hallyn
Signed-off-by: Andi Kleen
Signed-off-by: Al Viro
31 Mar, 2011
1 commit
-
Fixes generated by 'codespell' and manually reviewed.
Signed-off-by: Lucas De Marchi
24 Mar, 2011
1 commit
-
And give it a kernel-doc comment.
[akpm@linux-foundation.org: btrfs changed in linux-next]
Signed-off-by: Serge E. Hallyn
Cc: "Eric W. Biederman"
Cc: Daniel Lezcano
Acked-by: David Howells
Cc: James Morris
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
10 Aug, 2010
4 commits
-
Make sure we check the truncate constraints early on in ->setattr by adding
those checks to inode_change_ok. Also clean up and document inode_change_ok
to make this obvious.As a fallout we don't have to call inode_newsize_ok from simple_setsize and
simplify it down to a truncate_setsize which doesn't return an error. This
simplifies a lot of setattr implementations and means we use truncate_setsize
almost everywhere. Get rid of fat_setsize now that it's trivial and mark
ext2_setsize static to make the calling convention obvious.Keep the inode_newsize_ok in vmtruncate for now as all callers need an
audit for its removal anyway.Note: setattr code in ecryptfs doesn't call inode_change_ok at all and
needs a deeper audit, but that is left for later.Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro -
Replace inode_setattr with opencoded variants of it in all callers. This
moves the remaining call to vmtruncate into the filesystem methods where it
can be replaced with the proper truncate sequence.In a few cases it was obvious that we would never end up calling vmtruncate
so it was left out in the opencoded variant:spufs: explicitly checks for ATTR_SIZE earlier
btrfs,hugetlbfs,logfs,dlmfs: explicitly clears ATTR_SIZE earlier
ufs: contains an opencoded simple_seattr + truncate that sets the filesize just aboveIn addition to that ncpfs called inode_setattr with handcrafted iattrs,
which allowed to trim down the opencoded variant.Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro -
With the new truncate sequence every filesystem that wants to support file
size changes on disk needs to implement its own ->setattr. So instead
of calling inode_setattr which supports size changes call into a simple
method that doesn't support this. simple_setattr is almost what we
want except that it does not mark the inode dirty after changes. Given
that marking the inode dirty is a no-op for the simple in-memory filesystems
that use simple_setattr currently just add the mark_inode_dirty call.Also add a WARN_ON for the presence of a truncate method to simple_setattr
to catch new instances of it during the transition period.Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro -
Despite its name it's now a generic implementation of ->setattr, but
rather a helper to copy attributes from a struct iattr to the inode.
Rename it to setattr_copy to reflect this fact.Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro
28 May, 2010
1 commit
-
Introduce a new truncate calling sequence into fs/mm subsystems. Rather than
setattr > vmtruncate > truncate, have filesystems call their truncate sequence
from ->setattr if filesystem specific operations are required. vmtruncate is
deprecated, and truncate_pagecache and inode_newsize_ok helpers introduced
previously should be used.simple_setattr is introduced for simple in-ram filesystems to implement
the new truncate sequence. Eventually all filesystems should be converted
to implement a setattr, and the default code in notify_change should go
away.simple_setsize is also introduced to perform just the ATTR_SIZE portion
of simple_setattr (ie. changing i_size and trimming pagecache).To implement the new truncate sequence:
- filesystem specific manipulations (eg freeing blocks) must be done in
the setattr method rather than ->truncate.
- vmtruncate can not be used by core code to trim blocks past i_size in
the event of write failure after allocation, so this must be performed
in the fs code.
- convert usage of helpers block_write_begin, nobh_write_begin,
cont_write_begin, and *blockdev_direct_IO* to use _newtrunc postfixed
variants. These avoid calling vmtruncate to trim blocks (see previous).
- inode_setattr should not be used. generic_setattr is a new function
to be used to copy simple attributes into the generic inode.
- make use of the better opportunity to handle errors with the new sequence.Big problem with the previous calling sequence: the filesystem is not called
until i_size has already changed. This means it is not allowed to fail the
call, and also it does not know what the previous i_size was. Also, generic
code calling vmtruncate to truncate allocated blocks in case of error had
no good way to return a meaningful error (or, for example, atomically handle
block deallocation).Cc: Christoph Hellwig
Acked-by: Jan Kara
Signed-off-by: Nick Piggin
Signed-off-by: Al Viro
07 Mar, 2010
1 commit
-
Make sure compiler won't do weird things with limits. E.g. fetching them
twice may return 2 different values after writable limits are implemented.I.e. either use rlimit helpers added in commit 3e10e716abf3 ("resource:
add helpers for fetching rlimits") or ACCESS_ONCE if not applicable.Signed-off-by: Jiri Slaby
Cc: Alexander Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
05 Mar, 2010
1 commit
-
Currently notify_change calls vfs_dq_transfer directly. This means
we tie the quota code into the VFS. Get rid of that and make the
filesystem responsible for the transfer. Most filesystems already
do this, only ufs and udf need the code added, and for jfs it needs to
be enabled unconditionally instead of only when ACLs are enabled.Signed-off-by: Christoph Hellwig
Signed-off-by: Jan Kara
24 Sep, 2009
1 commit
-
Introduce new truncate helpers truncate_pagecache and inode_newsize_ok.
vmtruncate is also consolidated from mm/memory.c and mm/nommu.c and
into mm/truncate.c.Reviewed-by: Christoph Hellwig
Signed-off-by: Nick Piggin
Signed-off-by: Al Viro
26 Mar, 2009
1 commit
-
Use lowercase names of quota functions instead of old uppercase ones.
Signed-off-by: Jan Kara
CC: Alexander Viro
14 Nov, 2008
1 commit
-
Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().
Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.Signed-off-by: David Howells
Reviewed-by: James Morris
Acked-by: Serge Hallyn
Cc: Al Viro
Signed-off-by: James Morris
23 Oct, 2008
1 commit
-
Call security_inode_setattr() consistetly before inode_change_ok().
It doesn't make sense to try to "optimize" the i_op->setattr == NULL
case, as most filesystem do define their own setattr function.Signed-off-by: Miklos Szeredi
27 Jul, 2008
2 commits
-
Move the immutable and append-only checks from chmod, chown and utimes
into notify_change(). Checks for immutable and append-only files are
always performed by the VFS and not by the filesystem (see
permission() and may_...() in namei.c), so these belong in
notify_change(), and not in inode_change_ok().This should be completely equivalent.
CC: Ulrich Drepper
CC: Michael Kerrisk
Signed-off-by: Miklos Szeredi
Signed-off-by: Al Viro -
Add a new ia_valid flag: ATTR_TIMES_SET, to handle the
UTIMES_OMIT/UTIMES_NOW and UTIMES_NOW/UTIMES_OMIT cases. In these
cases neither ATTR_MTIME_SET nor ATTR_ATIME_SET is in the flags, yet
the POSIX draft specifies that permission checking is performed the
same way as if one or both of the times was explicitly set to a
timestamp.See the path "vfs: utimensat(): fix error checking for
{UTIME_NOW,UTIME_OMIT} case" by Michael Kerrisk for the patch
introducing this behavior.This is a cleanup, as well as allowing filesystems (NFS/fuse/...) to
perform their own permission checking instead of the default.CC: Ulrich Drepper
CC: Michael Kerrisk
Signed-off-by: Miklos Szeredi
Signed-off-by: Al Viro