18 Aug, 2010
2 commits
-
fs: scale files_lock
Improve scalability of files_lock by adding per-cpu, per-sb files lists,
protected with an lglock. The lglock provides fast access to the per-cpu lists
to add and remove files. It also provides a snapshot of all the per-cpu lists
(although this is very slow).One difficulty with this approach is that a file can be removed from the list
by another CPU. We must track which per-cpu list the file is on with a new
variale in the file struct (packed into a hole on 64-bit archs). Scalability
could suffer if files are frequently removed from different cpu's list.However loads with frequent removal of files imply short interval between
adding and removing the files, and the scheduler attempts to avoid moving
processes too far away. Also, even in the case of cross-CPU removal, the
hardware has much more opportunity to parallelise cacheline transfers with N
cachelines than with 1.A worst-case test of 1 CPU allocating files subsequently being freed by N CPUs
degenerates to contending on a single lock, which is no worse than before. When
more than one CPU are allocating files, even if they are always freed by
different CPUs, there will be more parallelism than the single-lock case.Testing results:
On a 2 socket, 8 core opteron, I measure the number of times the lock is taken
to remove the file, the number of times it is removed by the same CPU that
added it, and the number of times it is removed by the same node that added it.Booting: locks= 25049 cpu-hits= 23174 (92.5%) node-hits= 23945 (95.6%)
kbuild -j16 locks=2281913 cpu-hits=2208126 (96.8%) node-hits=2252674 (98.7%)
dbench 64 locks=4306582 cpu-hits=4287247 (99.6%) node-hits=4299527 (99.8%)So a file is removed from the same CPU it was added by over 90% of the time.
It remains within the same node 95% of the time.Tim Chen ran some numbers for a 64 thread Nehalem system performing a compile.
throughput
2.6.34-rc2 24.5
+patch 24.9us sys idle IO wait (in %)
2.6.34-rc2 51.25 28.25 17.25 3.25
+patch 53.75 18.5 19 8.75So significantly less CPU time spent in kernel code, higher idle time and
slightly higher throughput.Single threaded performance difference was within the noise of microbenchmarks.
That is not to say penalty does not exist, the code is larger and more memory
accesses required so it will be slightly slower.Cc: linux-kernel@vger.kernel.org
Cc: Tim Chen
Cc: Andi Kleen
Signed-off-by: Nick Piggin
Signed-off-by: Al Viro -
fs: cleanup files_lock locking
Lock tty_files with a new spinlock, tty_files_lock; provide helpers to
manipulate the per-sb files list; unexport the files_lock spinlock.Cc: linux-kernel@vger.kernel.org
Cc: Christoph Hellwig
Cc: Alan Cox
Acked-by: Andi Kleen
Acked-by: Greg Kroah-Hartman
Signed-off-by: Nick Piggin
Signed-off-by: Al Viro
13 Aug, 2010
1 commit
-
This reverts commit 3bcf3860a4ff9bbc522820b4b765e65e4deceb3e (and the
accompanying commit c1e5c954020e "vfs/fsnotify: fsnotify_close can delay
the final work in fput" that was a horribly ugly hack to make it work at
all).The 'struct file' approach not only causes that disgusting hack, it
somehow breaks pulseaudio, probably due to some other subtlety with
f_count handling.Fix up various conflicts due to later fsnotify work.
Signed-off-by: Linus Torvalds
11 Aug, 2010
1 commit
-
Improve the description of fget_light(), which is currently incorrect
about needing a prior refcnt (judging by the way it is actually used).Signed-off-by: Tony Battersby
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
28 Jul, 2010
1 commit
-
fanotify almost works like so:
user context calls fsnotify_* function with a struct file.
fsnotify takes a reference on the struct path
user context goes about it's buissinessat some later point in time the fsnotify listener gets the struct path
fanotify listener calls dentry_open() to create a file which userspace can deal with
listener drops the reference on the struct path
at some later point the listener calls close() on it's new fileWith the switch from struct path to struct file this presents a problem for
fput() and fsnotify_close(). fsnotify_close() is called when the filp has
already reached 0 and __fput() wants to do it's cleanup.The solution presented here is a bit odd. If an event is created from a
struct file we take a reference on the file. We check however if the f_count
was already 0 and if so we take an EXTRA reference EVEN THOUGH IT WAS ZERO.
In __fput() (where we know the f_count hit 0 once) we check if the f_count is
non-zero and if so we drop that 'extra' ref and return without destroying the
file.Signed-off-by: Eric Paris
28 May, 2010
1 commit
-
__aio_put_req() plays sick games with file refcount. What
it wants is fput() from atomic context; it's almost always
done with f_count > 1, so they only have to deal with delayed
work in rare cases when their reference happens to be the
last one. Current code decrements f_count and if it hasn't
hit 0, everything is fine. Otherwise it keeps a pointer
to struct file (with zero f_count!) around and has delayed
work do __fput() on it.Better way to do it: use atomic_long_add_unless( , -1, 1)
instead of !atomic_long_dec_and_test(). IOW, decrement it
only if it's not the last reference, leave refcount alone
if it was. And use normal fput() in delayed work.I've made that atomic_long_add_unless call a new helper -
fput_atomic(). Drops a reference to file if it's safe to
do in atomic (i.e. if that's not the last one), tells if
it had been able to do that. aio.c converted to it, __fput()
use is gone. req->ki_file *always* contributes to refcount
now. And __fput() became static.Signed-off-by: Al Viro
07 Mar, 2010
1 commit
-
We'll introduce FMODE_RANDOM which will be runtime modified. So protect
all runtime modification to f_mode with f_lock to avoid races.Signed-off-by: Wu Fengguang
Cc: Al Viro
Cc: Christoph Hellwig
Cc: Trond Myklebust
Cc: Chuck Lever
Cc: [2.6.33.x]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
07 Feb, 2010
1 commit
-
Hooks: Just Say No.
Signed-off-by: Al Viro
23 Dec, 2009
1 commit
-
When alloc_file() and init_file() were combined, the error handling of
mnt_clone_write() was taken into alloc_file() in a somewhat obfuscated
way. Since we don't use the error code for anything except warning,
we might as well warn directly without an extra variable.Signed-off-by: Roland Dreier
Signed-off-by: Al Viro
17 Dec, 2009
6 commits
-
Commit 3d1e4631 ("get rid of init_file()") removed the export of
alloc_file() -- possibly inadvertently, since that commit mainly
consisted of deleting the lines between the end of alloc_file() and
the start of the code in init_file().There is in fact one modular use of alloc_file() in the tree, in
drivers/infiniband/core/uverbs_main.c, so re-add the export to fix:ERROR: "alloc_file" [drivers/infiniband/core/ib_uverbs.ko] undefined!
when CONFIG_INFINIBAND_USER_ACCESS=m.
Cc: Al Viro
Signed-off-by: Roland Dreier
Signed-off-by: Linus Torvalds -
There are 2 groups of alloc_file() callers:
* ones that are followed by ima_counts_get
* ones giving non-regular files
So let's pull that ima_counts_get() into alloc_file();
it's a no-op in case of non-regular files.Signed-off-by: Al Viro
-
All users outside of fs/ of get_empty_filp() have been removed. This patch
moves the definition from the include/ directory to internal.h so no new
users crop up and removes the EXPORT_SYMBOL. I'd love to see open intents
stop using it too, but that's a problem for another day and a smarter
developer!Signed-off-by: Eric Paris
Acked-by: Miklos Szeredi
Signed-off-by: Al Viro -
... and have the caller grab both mnt and dentry; kill
leak in infiniband, while we are at it.Signed-off-by: Al Viro
-
Signed-off-by: Al Viro
-
Signed-off-by: Al Viro
25 Oct, 2009
1 commit
-
Based on discussions on LKML and LSM, where there are consecutive
security_ and ima_ calls in the vfs layer, move the ima_ calls to
the existing security_ hooks.Signed-off-by: Mimi Zohar
Signed-off-by: James Morris
24 Sep, 2009
1 commit
-
It's unused.
It isn't needed -- read or write flag is already passed and sysctl
shouldn't care about the rest.It _was_ used in two places at arch/frv for some reason.
Signed-off-by: Alexey Dobriyan
Cc: David Howells
Cc: "Eric W. Biederman"
Cc: Al Viro
Cc: Ralf Baechle
Cc: Martin Schwidefsky
Cc: Ingo Molnar
Cc: "David S. Miller"
Cc: James Morris
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
12 Jun, 2009
2 commits
-
This function walks the s_files lock, and operates primarily on the
files in a superblock, so it better belongs here (eg. see also
fs_may_remount_ro).[AV: ... and it shouldn't be static after that move]
Signed-off-by: Nick Piggin
Signed-off-by: Al Viro -
This patch speeds up lmbench lat_mmap test by about another 2% after the
first patch.Before:
avg = 462.286
std = 5.46106After:
avg = 453.12
std = 9.58257(50 runs of each, stddev gives a reasonable confidence)
It does this by introducing mnt_clone_write, which avoids some heavyweight
operations of mnt_want_write if called on a vfsmount which we know already
has a write count; and mnt_want_write_file, which can call mnt_clone_write
if the file is open for write.After these two patches, mnt_want_write and mnt_drop_write go from 7% on
the profile down to 1.3% (including mnt_clone_write).[AV: mnt_want_write_file() should take file alone and derive mnt from it;
not only all callers have that form, but that's the only mnt about which
we know that it's already held for write if file is opened for write]Cc: Dave Hansen
Signed-off-by: Nick Piggin
Signed-off-by: Al Viro
30 Mar, 2009
1 commit
-
'struct path' is not used in alloc_file().
Signed-off-by: Tero Roponen
Signed-off-by: Jiri Kosina
27 Mar, 2009
1 commit
-
* 'bkl-removal' of git://git.lwn.net/linux-2.6:
Rationalize fasync return values
Move FASYNC bit handling to f_op->fasync()
Use f_lock to protect f_flags
Rename struct file->f_ep_lock
16 Mar, 2009
1 commit
-
This lock moves out of the CONFIG_EPOLL ifdef and becomes f_lock. For now,
epoll remains the only user, but a future patch will use it to protect
f_flags as well.Cc: Davide Libenzi
Reviewed-by: Christoph Hellwig
Signed-off-by: Jonathan Corbet
06 Feb, 2009
2 commits
-
Conflicts:
fs/namei.cManually merged per:
diff --cc fs/namei.c
index 734f2b5,bbc15c2..0000000
--- a/fs/namei.c
+++ b/fs/namei.c
@@@ -860,9 -848,8 +849,10 @@@ static int __link_path_walk(const char
nd->flags |= LOOKUP_CONTINUE;
err = exec_permission_lite(inode);
if (err == -EAGAIN)
- err = vfs_permission(nd, MAY_EXEC);
+ err = inode_permission(nd->path.dentry->d_inode,
+ MAY_EXEC);
+ if (!err)
+ err = ima_path_check(&nd->path, MAY_EXEC);
if (err)
break;@@@ -1525,14 -1506,9 +1509,14 @@@ int may_open(struct path *path, int acc
flag &= ~O_TRUNC;
}- error = vfs_permission(nd, acc_mode);
+ error = inode_permission(inode, acc_mode);
if (error)
return error;
+
- error = ima_path_check(&nd->path,
++ error = ima_path_check(path,
+ acc_mode & (MAY_READ | MAY_WRITE | MAY_EXEC));
+ if (error)
+ return error;
/*
* An append-only file must be opened in append mode for writing.
*/Signed-off-by: James Morris
-
This patch replaces the generic integrity hooks, for which IMA registered
itself, with IMA integrity hooks in the appropriate places directly
in the fs directory.Signed-off-by: Mimi Zohar
Acked-by: Serge Hallyn
Signed-off-by: James Morris
01 Jan, 2009
1 commit
-
Instead of creating the "filp" kmem_cache in vfs_caches_init(),
we can do it a litle be later in files_init(), so that filp_cachep
is static to fs/file_table.cAcked-by: Paul E. McKenney
Signed-off-by: Eric Dumazet
Signed-off-by: Al Viro
14 Nov, 2008
3 commits
-
Attach creds to file structs and discard f_uid/f_gid.
file_operations::open() methods (such as hppfs_open()) should use file->f_cred
rather than current_cred(). At the moment file->f_cred will be current_cred()
at this point.Signed-off-by: David Howells
Reviewed-by: James Morris
Signed-off-by: James Morris -
Wrap current->cred and a few other accessors to hide their actual
implementation.Signed-off-by: David Howells
Acked-by: James Morris
Acked-by: Serge Hallyn
Signed-off-by: James Morris -
Separate the task security context from task_struct. At this point, the
security data is temporarily embedded in the task_struct with two pointers
pointing to it.Note that the Alpha arch is altered as it refers to (E)UID and (E)GID in
entry.S via asm-offsets.With comment fixes Signed-off-by: Marc Dionne
Signed-off-by: David Howells
Acked-by: James Morris
Acked-by: Serge Hallyn
Signed-off-by: James Morris
02 Nov, 2008
1 commit
-
As it is, all instances of ->release() for files that have ->fasync()
need to remember to evict file from fasync lists; forgetting that
creates a hole and we actually have a bunch that *does* forget.So let's keep our lives simple - let __fput() check FASYNC in
file->f_flags and call ->fasync() there if it's been set. And lose that
crap in ->release() instances - leaving it there is still valid, but we
don't have to bother anymore.Signed-off-by: Al Viro
Signed-off-by: Linus Torvalds
21 Oct, 2008
1 commit
-
Signed-off-by: Al Viro
27 Jul, 2008
1 commit
-
make it atomic_long_t; while we are at it, get rid of useless checks in affs,
hfs and hpfs - ->open() always has it equal to 1, ->release() - to 0.Signed-off-by: Al Viro
02 May, 2008
1 commit
-
Initial splitoff of the low-level stuff; taken to fdtable.h
Signed-off-by: Al Viro
19 Apr, 2008
3 commits
-
There have been a few oopses caused by 'struct file's with NULL f_vfsmnts.
There was also a set of potentially missed mnt_want_write()s from
dentry_open() calls.This patch provides a very simple debugging framework to catch these kinds of
bugs. It will WARN_ON() them, but should stop us from having any oopses or
mnt_writer count imbalances.I'm quite convinced that this is a good thing because it found bugs in the
stuff I was working on as soon as I wrote it.[hch: made it conditional on a debug option.
But it's still a little bit too ugly][hch: merged forced remount r/o fix from Dave and akpm's fix for the fix]
Signed-off-by: Dave Hansen
Acked-by: Al Viro
Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Al Viro -
This is the first really tricky patch in the series. It elevates the writer
count on a mount each time a non-special file is opened for write.We used to do this in may_open(), but Miklos pointed out that __dentry_open()
is used as well to create filps. This will cover even those cases, while a
call in may_open() would not have.There is also an elevated count around the vfs_create() call in open_namei().
See the comments for more details, but we need this to fix a 'create, remount,
fail r/w open()' race.Some filesystems forego the use of normal vfs calls to create
struct files. Make sure that these users elevate the mnt
writer count because they will get __fput(), and we need
to make sure they're balanced.Acked-by: Al Viro
Signed-off-by: Christoph Hellwig
Signed-off-by: Dave Hansen
Signed-off-by: Andrew Morton
Signed-off-by: Al Viro -
If someone decides to demote a file from r/w to just
r/o, they can use this same code as __fput().NFS does just that, and will use this in the next
patch.AV: drop write access in __fput() only after we evict from file list.
Signed-off-by: Dave Hansen
Cc: Erez Zadok
Cc: Trond Myklebust
Cc: "J Bruce Fields"
Acked-by: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro
19 Mar, 2008
1 commit
-
Some new uses of get_empty_filp() have crept in; switched
to alloc_file() to make sure that pieces of initialization
won't be missing.We really need to kill get_empty_filp().
[AV] fixed dentry leak on failure exit in anon_inode_getfd()
Cc: Erez Zadok
Cc: Trond Myklebust
Cc: "J Bruce Fields"
Acked-by: Al Viro
Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Dave Hansen
Signed-off-by: Al Viro
09 Feb, 2008
1 commit
-
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Harvey Harrison
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds