07 Jan, 2012
1 commit
-
If there are any inodes on the super block that have been unlinked
(i_nlink == 0) but have not yet been deleted then prevent the
remounting the super block read-only.Reported-by: Toshiyuki Okajima
Signed-off-by: Miklos Szeredi
Tested-by: Toshiyuki Okajima
Signed-off-by: Al Viro
27 Jul, 2011
1 commit
-
This allows us to move duplicated code in
(atomic_inc_not_zero() for now) toSigned-off-by: Arun Sharma
Reviewed-by: Eric Dumazet
Cc: Ingo Molnar
Cc: David Miller
Cc: Eric Dumazet
Acked-by: Mike Frysinger
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
17 Mar, 2011
3 commits
-
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
fix cdev leak on O_PATH final fput() -
__fput doesn't need a cdev_put() for O_PATH handles.
Signed-off-by: mszeredi@suse.cz
Signed-off-by: Al Viro -
…s/security-testing-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: (33 commits)
AppArmor: kill unused macros in lsm.c
AppArmor: cleanup generated files correctly
KEYS: Add an iovec version of KEYCTL_INSTANTIATE
KEYS: Add a new keyctl op to reject a key with a specified error code
KEYS: Add a key type op to permit the key description to be vetted
KEYS: Add an RCU payload dereference macro
AppArmor: Cleanup make file to remove cruft and make it easier to read
SELinux: implement the new sb_remount LSM hook
LSM: Pass -o remount options to the LSM
SELinux: Compute SID for the newly created socket
SELinux: Socket retains creator role and MLS attribute
SELinux: Auto-generate security_is_socket_class
TOMOYO: Fix memory leak upon file open.
Revert "selinux: simplify ioctl checking"
selinux: drop unused packet flow permissions
selinux: Fix packet forwarding checks on postrouting
selinux: Fix wrong checks for selinux_policycap_netpeer
selinux: Fix check for xfrm selinux context algorithm
ima: remove unnecessary call to ima_must_measure
IMA: remove IMA imbalance checking
...
15 Mar, 2011
2 commits
-
Just need to make sure that AF_UNIX garbage collector won't
confuse O_PATHed socket on filesystem for real AF_UNIX opened
socket.Signed-off-by: Al Viro
-
New flag for open(2) - O_PATH. Semantics:
* pathname is resolved, but the file itself is _NOT_ opened
as far as filesystem is concerned.
* almost all operations on the resulting descriptors shall
fail with -EBADF. Exceptions are:
1) operations on descriptors themselves (i.e.
close(), dup(), dup2(), dup3(), fcntl(fd, F_DUPFD),
fcntl(fd, F_DUPFD_CLOEXEC, ...), fcntl(fd, F_GETFD),
fcntl(fd, F_SETFD, ...))
2) fcntl(fd, F_GETFL), for a common non-destructive way to
check if descriptor is open
3) "dfd" arguments of ...at(2) syscalls, i.e. the starting
points of pathname resolution
* closing such descriptor does *NOT* affect dnotify or
posix locks.
* permissions are checked as usual along the way to file;
no permission checks are applied to the file itself. Of course,
giving such thing to syscall will result in permission checks (at
the moment it means checking that starting point of ....at() is
a directory and caller has exec permissions on it).fget() and fget_light() return NULL on such descriptors; use of
fget_raw() and fget_raw_light() is needed to get them. That protects
existing code from dealing with those things.There are two things still missing (they come in the next commits):
one is handling of symlinks (right now we refuse to open them that
way; see the next commit for semantics related to those) and another
is descriptor passing via SCM_RIGHTS datagrams.Signed-off-by: Al Viro
08 Mar, 2011
1 commit
10 Feb, 2011
1 commit
-
ima_counts_get() updated the readcount and invalidated the PCR,
as necessary. Only update the i_readcount in the VFS layer.
Move the PCR invalidation checks to ima_file_check(), where it
belongs.Maintaining the i_readcount in the VFS layer, will allow other
subsystems to use i_readcount.Signed-off-by: Mimi Zohar
Acked-by: Eric Paris
05 Feb, 2011
1 commit
-
In get_empty_filp() since 2.6.29, file_free(f) is called with f->f_cred == NULL
when security_file_alloc() returned an error. As a result, kernel will panic()
due to put_cred(NULL) call within RCU callback.Fix this bug by assigning f->f_cred before calling security_file_alloc().
Signed-off-by: Tetsuo Handa
Signed-off-by: David Howells
Signed-off-by: Linus Torvalds
17 Jan, 2011
1 commit
-
There's an unlikely() in fget_light() that assumes the file ref count
will be 1. Running the annotate branch profiler on a desktop that is
performing daily tasks (running firefox, evolution, xchat and is also part
of a distcc farm), it shows that the ref count is not 1 that often.correct incorrect % Function File Line
------- --------- - -------- ---- ----
1035099358 6209599193 85 fget_light file_table.c 315Cc: Al Viro
Cc: Christoph Hellwig
Signed-off-by: Steven Rostedt
Signed-off-by: Al Viro
27 Oct, 2010
1 commit
-
Robin Holt tried to boot a 16TB system and found af_unix was overflowing
a 32bit value :We were seeing a failure which prevented boot. The kernel was incapable
of creating either a named pipe or unix domain socket. This comes down
to a common kernel function called unix_create1() which does:atomic_inc(&unix_nr_socks);
if (atomic_read(&unix_nr_socks) > 2 * get_max_files())
goto out;The function get_max_files() is a simple return of files_stat.max_files.
files_stat.max_files is a signed integer and is computed in
fs/file_table.c's files_init().n = (mempages * (PAGE_SIZE / 1024)) / 10;
files_stat.max_files = n;In our case, mempages (total_ram_pages) is approx 3,758,096,384
(0xe0000000). That leaves max_files at approximately 1,503,238,553.
This causes 2 * get_max_files() to integer overflow.Fix is to let /proc/sys/fs/file-nr & /proc/sys/fs/file-max use long
integers, and change af_unix to use an atomic_long_t instead of atomic_t.get_max_files() is changed to return an unsigned long. get_nr_files() is
changed to return a long.unix_nr_socks is changed from atomic_t to atomic_long_t, while not
strictly needed to address Robin problem.Before patch (on a 64bit kernel) :
# echo 2147483648 >/proc/sys/fs/file-max
# cat /proc/sys/fs/file-max
-18446744071562067968After patch:
# echo 2147483648 >/proc/sys/fs/file-max
# cat /proc/sys/fs/file-max
2147483648
# cat /proc/sys/fs/file-nr
704 0 2147483648Reported-by: Robin Holt
Signed-off-by: Eric Dumazet
Acked-by: David Miller
Reviewed-by: Robin Holt
Tested-by: Robin Holt
Cc: Al Viro
Cc: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
18 Aug, 2010
2 commits
-
fs: scale files_lock
Improve scalability of files_lock by adding per-cpu, per-sb files lists,
protected with an lglock. The lglock provides fast access to the per-cpu lists
to add and remove files. It also provides a snapshot of all the per-cpu lists
(although this is very slow).One difficulty with this approach is that a file can be removed from the list
by another CPU. We must track which per-cpu list the file is on with a new
variale in the file struct (packed into a hole on 64-bit archs). Scalability
could suffer if files are frequently removed from different cpu's list.However loads with frequent removal of files imply short interval between
adding and removing the files, and the scheduler attempts to avoid moving
processes too far away. Also, even in the case of cross-CPU removal, the
hardware has much more opportunity to parallelise cacheline transfers with N
cachelines than with 1.A worst-case test of 1 CPU allocating files subsequently being freed by N CPUs
degenerates to contending on a single lock, which is no worse than before. When
more than one CPU are allocating files, even if they are always freed by
different CPUs, there will be more parallelism than the single-lock case.Testing results:
On a 2 socket, 8 core opteron, I measure the number of times the lock is taken
to remove the file, the number of times it is removed by the same CPU that
added it, and the number of times it is removed by the same node that added it.Booting: locks= 25049 cpu-hits= 23174 (92.5%) node-hits= 23945 (95.6%)
kbuild -j16 locks=2281913 cpu-hits=2208126 (96.8%) node-hits=2252674 (98.7%)
dbench 64 locks=4306582 cpu-hits=4287247 (99.6%) node-hits=4299527 (99.8%)So a file is removed from the same CPU it was added by over 90% of the time.
It remains within the same node 95% of the time.Tim Chen ran some numbers for a 64 thread Nehalem system performing a compile.
throughput
2.6.34-rc2 24.5
+patch 24.9us sys idle IO wait (in %)
2.6.34-rc2 51.25 28.25 17.25 3.25
+patch 53.75 18.5 19 8.75So significantly less CPU time spent in kernel code, higher idle time and
slightly higher throughput.Single threaded performance difference was within the noise of microbenchmarks.
That is not to say penalty does not exist, the code is larger and more memory
accesses required so it will be slightly slower.Cc: linux-kernel@vger.kernel.org
Cc: Tim Chen
Cc: Andi Kleen
Signed-off-by: Nick Piggin
Signed-off-by: Al Viro -
fs: cleanup files_lock locking
Lock tty_files with a new spinlock, tty_files_lock; provide helpers to
manipulate the per-sb files list; unexport the files_lock spinlock.Cc: linux-kernel@vger.kernel.org
Cc: Christoph Hellwig
Cc: Alan Cox
Acked-by: Andi Kleen
Acked-by: Greg Kroah-Hartman
Signed-off-by: Nick Piggin
Signed-off-by: Al Viro
13 Aug, 2010
1 commit
-
This reverts commit 3bcf3860a4ff9bbc522820b4b765e65e4deceb3e (and the
accompanying commit c1e5c954020e "vfs/fsnotify: fsnotify_close can delay
the final work in fput" that was a horribly ugly hack to make it work at
all).The 'struct file' approach not only causes that disgusting hack, it
somehow breaks pulseaudio, probably due to some other subtlety with
f_count handling.Fix up various conflicts due to later fsnotify work.
Signed-off-by: Linus Torvalds
11 Aug, 2010
1 commit
-
Improve the description of fget_light(), which is currently incorrect
about needing a prior refcnt (judging by the way it is actually used).Signed-off-by: Tony Battersby
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
28 Jul, 2010
1 commit
-
fanotify almost works like so:
user context calls fsnotify_* function with a struct file.
fsnotify takes a reference on the struct path
user context goes about it's buissinessat some later point in time the fsnotify listener gets the struct path
fanotify listener calls dentry_open() to create a file which userspace can deal with
listener drops the reference on the struct path
at some later point the listener calls close() on it's new fileWith the switch from struct path to struct file this presents a problem for
fput() and fsnotify_close(). fsnotify_close() is called when the filp has
already reached 0 and __fput() wants to do it's cleanup.The solution presented here is a bit odd. If an event is created from a
struct file we take a reference on the file. We check however if the f_count
was already 0 and if so we take an EXTRA reference EVEN THOUGH IT WAS ZERO.
In __fput() (where we know the f_count hit 0 once) we check if the f_count is
non-zero and if so we drop that 'extra' ref and return without destroying the
file.Signed-off-by: Eric Paris
28 May, 2010
1 commit
-
__aio_put_req() plays sick games with file refcount. What
it wants is fput() from atomic context; it's almost always
done with f_count > 1, so they only have to deal with delayed
work in rare cases when their reference happens to be the
last one. Current code decrements f_count and if it hasn't
hit 0, everything is fine. Otherwise it keeps a pointer
to struct file (with zero f_count!) around and has delayed
work do __fput() on it.Better way to do it: use atomic_long_add_unless( , -1, 1)
instead of !atomic_long_dec_and_test(). IOW, decrement it
only if it's not the last reference, leave refcount alone
if it was. And use normal fput() in delayed work.I've made that atomic_long_add_unless call a new helper -
fput_atomic(). Drops a reference to file if it's safe to
do in atomic (i.e. if that's not the last one), tells if
it had been able to do that. aio.c converted to it, __fput()
use is gone. req->ki_file *always* contributes to refcount
now. And __fput() became static.Signed-off-by: Al Viro
07 Mar, 2010
1 commit
-
We'll introduce FMODE_RANDOM which will be runtime modified. So protect
all runtime modification to f_mode with f_lock to avoid races.Signed-off-by: Wu Fengguang
Cc: Al Viro
Cc: Christoph Hellwig
Cc: Trond Myklebust
Cc: Chuck Lever
Cc: [2.6.33.x]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
07 Feb, 2010
1 commit
-
Hooks: Just Say No.
Signed-off-by: Al Viro
23 Dec, 2009
1 commit
-
When alloc_file() and init_file() were combined, the error handling of
mnt_clone_write() was taken into alloc_file() in a somewhat obfuscated
way. Since we don't use the error code for anything except warning,
we might as well warn directly without an extra variable.Signed-off-by: Roland Dreier
Signed-off-by: Al Viro
17 Dec, 2009
6 commits
-
Commit 3d1e4631 ("get rid of init_file()") removed the export of
alloc_file() -- possibly inadvertently, since that commit mainly
consisted of deleting the lines between the end of alloc_file() and
the start of the code in init_file().There is in fact one modular use of alloc_file() in the tree, in
drivers/infiniband/core/uverbs_main.c, so re-add the export to fix:ERROR: "alloc_file" [drivers/infiniband/core/ib_uverbs.ko] undefined!
when CONFIG_INFINIBAND_USER_ACCESS=m.
Cc: Al Viro
Signed-off-by: Roland Dreier
Signed-off-by: Linus Torvalds -
There are 2 groups of alloc_file() callers:
* ones that are followed by ima_counts_get
* ones giving non-regular files
So let's pull that ima_counts_get() into alloc_file();
it's a no-op in case of non-regular files.Signed-off-by: Al Viro
-
All users outside of fs/ of get_empty_filp() have been removed. This patch
moves the definition from the include/ directory to internal.h so no new
users crop up and removes the EXPORT_SYMBOL. I'd love to see open intents
stop using it too, but that's a problem for another day and a smarter
developer!Signed-off-by: Eric Paris
Acked-by: Miklos Szeredi
Signed-off-by: Al Viro -
... and have the caller grab both mnt and dentry; kill
leak in infiniband, while we are at it.Signed-off-by: Al Viro
-
Signed-off-by: Al Viro
-
Signed-off-by: Al Viro
25 Oct, 2009
1 commit
-
Based on discussions on LKML and LSM, where there are consecutive
security_ and ima_ calls in the vfs layer, move the ima_ calls to
the existing security_ hooks.Signed-off-by: Mimi Zohar
Signed-off-by: James Morris
24 Sep, 2009
1 commit
-
It's unused.
It isn't needed -- read or write flag is already passed and sysctl
shouldn't care about the rest.It _was_ used in two places at arch/frv for some reason.
Signed-off-by: Alexey Dobriyan
Cc: David Howells
Cc: "Eric W. Biederman"
Cc: Al Viro
Cc: Ralf Baechle
Cc: Martin Schwidefsky
Cc: Ingo Molnar
Cc: "David S. Miller"
Cc: James Morris
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
12 Jun, 2009
2 commits
-
This function walks the s_files lock, and operates primarily on the
files in a superblock, so it better belongs here (eg. see also
fs_may_remount_ro).[AV: ... and it shouldn't be static after that move]
Signed-off-by: Nick Piggin
Signed-off-by: Al Viro -
This patch speeds up lmbench lat_mmap test by about another 2% after the
first patch.Before:
avg = 462.286
std = 5.46106After:
avg = 453.12
std = 9.58257(50 runs of each, stddev gives a reasonable confidence)
It does this by introducing mnt_clone_write, which avoids some heavyweight
operations of mnt_want_write if called on a vfsmount which we know already
has a write count; and mnt_want_write_file, which can call mnt_clone_write
if the file is open for write.After these two patches, mnt_want_write and mnt_drop_write go from 7% on
the profile down to 1.3% (including mnt_clone_write).[AV: mnt_want_write_file() should take file alone and derive mnt from it;
not only all callers have that form, but that's the only mnt about which
we know that it's already held for write if file is opened for write]Cc: Dave Hansen
Signed-off-by: Nick Piggin
Signed-off-by: Al Viro
30 Mar, 2009
1 commit
-
'struct path' is not used in alloc_file().
Signed-off-by: Tero Roponen
Signed-off-by: Jiri Kosina
27 Mar, 2009
1 commit
-
* 'bkl-removal' of git://git.lwn.net/linux-2.6:
Rationalize fasync return values
Move FASYNC bit handling to f_op->fasync()
Use f_lock to protect f_flags
Rename struct file->f_ep_lock
16 Mar, 2009
1 commit
-
This lock moves out of the CONFIG_EPOLL ifdef and becomes f_lock. For now,
epoll remains the only user, but a future patch will use it to protect
f_flags as well.Cc: Davide Libenzi
Reviewed-by: Christoph Hellwig
Signed-off-by: Jonathan Corbet
06 Feb, 2009
2 commits
-
Conflicts:
fs/namei.cManually merged per:
diff --cc fs/namei.c
index 734f2b5,bbc15c2..0000000
--- a/fs/namei.c
+++ b/fs/namei.c
@@@ -860,9 -848,8 +849,10 @@@ static int __link_path_walk(const char
nd->flags |= LOOKUP_CONTINUE;
err = exec_permission_lite(inode);
if (err == -EAGAIN)
- err = vfs_permission(nd, MAY_EXEC);
+ err = inode_permission(nd->path.dentry->d_inode,
+ MAY_EXEC);
+ if (!err)
+ err = ima_path_check(&nd->path, MAY_EXEC);
if (err)
break;@@@ -1525,14 -1506,9 +1509,14 @@@ int may_open(struct path *path, int acc
flag &= ~O_TRUNC;
}- error = vfs_permission(nd, acc_mode);
+ error = inode_permission(inode, acc_mode);
if (error)
return error;
+
- error = ima_path_check(&nd->path,
++ error = ima_path_check(path,
+ acc_mode & (MAY_READ | MAY_WRITE | MAY_EXEC));
+ if (error)
+ return error;
/*
* An append-only file must be opened in append mode for writing.
*/Signed-off-by: James Morris
-
This patch replaces the generic integrity hooks, for which IMA registered
itself, with IMA integrity hooks in the appropriate places directly
in the fs directory.Signed-off-by: Mimi Zohar
Acked-by: Serge Hallyn
Signed-off-by: James Morris
01 Jan, 2009
1 commit
-
Instead of creating the "filp" kmem_cache in vfs_caches_init(),
we can do it a litle be later in files_init(), so that filp_cachep
is static to fs/file_table.cAcked-by: Paul E. McKenney
Signed-off-by: Eric Dumazet
Signed-off-by: Al Viro
14 Nov, 2008
3 commits
-
Attach creds to file structs and discard f_uid/f_gid.
file_operations::open() methods (such as hppfs_open()) should use file->f_cred
rather than current_cred(). At the moment file->f_cred will be current_cred()
at this point.Signed-off-by: David Howells
Reviewed-by: James Morris
Signed-off-by: James Morris -
Wrap current->cred and a few other accessors to hide their actual
implementation.Signed-off-by: David Howells
Acked-by: James Morris
Acked-by: Serge Hallyn
Signed-off-by: James Morris -
Separate the task security context from task_struct. At this point, the
security data is temporarily embedded in the task_struct with two pointers
pointing to it.Note that the Alpha arch is altered as it refers to (E)UID and (E)GID in
entry.S via asm-offsets.With comment fixes Signed-off-by: Marc Dionne
Signed-off-by: David Howells
Acked-by: James Morris
Acked-by: Serge Hallyn
Signed-off-by: James Morris