Eric Lee / linux-smarc-t335x-v3.2

18 Aug, 2010

2 commits

6416ccb78 fs: scale files_lock ... Browse Code »

fs: scale files_lock

Improve scalability of files_lock by adding per-cpu, per-sb files lists,
protected with an lglock. The lglock provides fast access to the per-cpu lists
to add and remove files. It also provides a snapshot of all the per-cpu lists
(although this is very slow).

One difficulty with this approach is that a file can be removed from the list
by another CPU. We must track which per-cpu list the file is on with a new
variale in the file struct (packed into a hole on 64-bit archs). Scalability
could suffer if files are frequently removed from different cpu's list.

However loads with frequent removal of files imply short interval between
adding and removing the files, and the scheduler attempts to avoid moving
processes too far away. Also, even in the case of cross-CPU removal, the
hardware has much more opportunity to parallelise cacheline transfers with N
cachelines than with 1.

A worst-case test of 1 CPU allocating files subsequently being freed by N CPUs
degenerates to contending on a single lock, which is no worse than before. When
more than one CPU are allocating files, even if they are always freed by
different CPUs, there will be more parallelism than the single-lock case.

Testing results:

On a 2 socket, 8 core opteron, I measure the number of times the lock is taken
to remove the file, the number of times it is removed by the same CPU that
added it, and the number of times it is removed by the same node that added it.

Booting: locks= 25049 cpu-hits= 23174 (92.5%) node-hits= 23945 (95.6%)
kbuild -j16 locks=2281913 cpu-hits=2208126 (96.8%) node-hits=2252674 (98.7%)
dbench 64 locks=4306582 cpu-hits=4287247 (99.6%) node-hits=4299527 (99.8%)

So a file is removed from the same CPU it was added by over 90% of the time.
It remains within the same node 95% of the time.

Tim Chen ran some numbers for a 64 thread Nehalem system performing a compile.

throughput
2.6.34-rc2 24.5
+patch 24.9

us sys idle IO wait (in %)
2.6.34-rc2 51.25 28.25 17.25 3.25
+patch 53.75 18.5 19 8.75

So significantly less CPU time spent in kernel code, higher idle time and
slightly higher throughput.

Single threaded performance difference was within the noise of microbenchmarks.
That is not to say penalty does not exist, the code is larger and more memory
accesses required so it will be slightly slower.

Cc: linux-kernel@vger.kernel.org
Cc: Tim Chen
Cc: Andi Kleen
Signed-off-by: Nick Piggin
Signed-off-by: Al Viro

Nick Piggin
2010-08-18 20:35:48 +0800
ee2ffa0df fs: cleanup files_lock locking ... Browse Code »

fs: cleanup files_lock locking

Lock tty_files with a new spinlock, tty_files_lock; provide helpers to
manipulate the per-sb files list; unexport the files_lock spinlock.

Cc: linux-kernel@vger.kernel.org
Cc: Christoph Hellwig
Cc: Alan Cox
Acked-by: Andi Kleen
Acked-by: Greg Kroah-Hartman
Signed-off-by: Nick Piggin
Signed-off-by: Al Viro

Nick Piggin
2010-08-18 20:35:47 +0800

13 Aug, 2010

1 commit

2069601b3 Revert "fsnotify: store struct file not struct path" ... Browse Code »

This reverts commit 3bcf3860a4ff9bbc522820b4b765e65e4deceb3e (and the
accompanying commit c1e5c954020e "vfs/fsnotify: fsnotify_close can delay
the final work in fput" that was a horribly ugly hack to make it work at
all).

The 'struct file' approach not only causes that disgusting hack, it
somehow breaks pulseaudio, probably due to some other subtlety with
f_count handling.

Fix up various conflicts due to later fsnotify work.

Signed-off-by: Linus Torvalds

Linus Torvalds
2010-08-13 05:23:04 +0800

11 Aug, 2010

1 commit

58939473b vfs: improve comment describing fget_light() ... Browse Code »

Improve the description of fget_light(), which is currently incorrect
about needing a prior refcnt (judging by the way it is actually used).

Signed-off-by: Tony Battersby
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tony Battersby
2010-08-11 23:59:02 +0800

28 Jul, 2010

1 commit

c1e5c9540 vfs/fsnotify: fsnotify_close can delay the final work in fput ... Browse Code »

fanotify almost works like so:

user context calls fsnotify_* function with a struct file.
fsnotify takes a reference on the struct path
user context goes about it's buissiness

at some later point in time the fsnotify listener gets the struct path
fanotify listener calls dentry_open() to create a file which userspace can deal with
listener drops the reference on the struct path
at some later point the listener calls close() on it's new file

With the switch from struct path to struct file this presents a problem for
fput() and fsnotify_close(). fsnotify_close() is called when the filp has
already reached 0 and __fput() wants to do it's cleanup.

The solution presented here is a bit odd. If an event is created from a
struct file we take a reference on the file. We check however if the f_count
was already 0 and if so we take an EXTRA reference EVEN THOUGH IT WAS ZERO.
In __fput() (where we know the f_count hit 0 once) we check if the f_count is
non-zero and if so we drop that 'extra' ref and return without destroying the
file.

Signed-off-by: Eric Paris

Eric Paris
2010-07-28 22:18:51 +0800

28 May, 2010

1 commit

d7065da03 get rid of the magic around f_count in aio ... Browse Code »

__aio_put_req() plays sick games with file refcount. What
it wants is fput() from atomic context; it's almost always
done with f_count > 1, so they only have to deal with delayed
work in rare cases when their reference happens to be the
last one. Current code decrements f_count and if it hasn't
hit 0, everything is fine. Otherwise it keeps a pointer
to struct file (with zero f_count!) around and has delayed
work do __fput() on it.

Better way to do it: use atomic_long_add_unless( , -1, 1)
instead of !atomic_long_dec_and_test(). IOW, decrement it
only if it's not the last reference, leave refcount alone
if it was. And use normal fput() in delayed work.

I've made that atomic_long_add_unless call a new helper -
fput_atomic(). Drops a reference to file if it's safe to
do in atomic (i.e. if that's not the last one), tells if
it had been able to do that. aio.c converted to it, __fput()
use is gone. req->ki_file *always* contributes to refcount
now. And __fput() became static.

Signed-off-by: Al Viro

Al Viro
2010-05-28 10:03:07 +0800

07 Mar, 2010

1 commit

42e496086 vfs: take f_lock on modifying f_mode after open time ... Browse Code »

We'll introduce FMODE_RANDOM which will be runtime modified. So protect
all runtime modification to f_mode with f_lock to avoid races.

Signed-off-by: Wu Fengguang
Cc: Al Viro
Cc: Christoph Hellwig
Cc: Trond Myklebust
Cc: Chuck Lever
Cc: [2.6.33.x]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wu Fengguang
2010-03-07 03:26:25 +0800

07 Feb, 2010

1 commit

89068c576 Take ima_file_free() to proper place. ... Browse Code »

Hooks: Just Say No.

Signed-off-by: Al Viro

Al Viro
2010-02-07 16:07:29 +0800

23 Dec, 2009

1 commit

385e3ed4f alloc_file(): simplify handling of mnt_clone_write() errors ... Browse Code »

When alloc_file() and init_file() were combined, the error handling of
mnt_clone_write() was taken into alloc_file() in a somewhat obfuscated
way. Since we don't use the error code for anything except warning,
we might as well warn directly without an extra variable.

Signed-off-by: Roland Dreier
Signed-off-by: Al Viro

Roland Dreier
2009-12-23 01:27:33 +0800

17 Dec, 2009

6 commits

73efc4681 re-export alloc_file() ... Browse Code »

Commit 3d1e4631 ("get rid of init_file()") removed the export of
alloc_file() -- possibly inadvertently, since that commit mainly
consisted of deleting the lines between the end of alloc_file() and
the start of the code in init_file().

There is in fact one modular use of alloc_file() in the tree, in
drivers/infiniband/core/uverbs_main.c, so re-add the export to fix:

ERROR: "alloc_file" [drivers/infiniband/core/ib_uverbs.ko] undefined!

when CONFIG_INFINIBAND_USER_ACCESS=m.

Cc: Al Viro
Signed-off-by: Roland Dreier
Signed-off-by: Linus Torvalds

Roland Dreier
2009-12-17 05:29:19 +0800
0552f879d Untangling ima mess, part 1: alloc_file() ... Browse Code »

There are 2 groups of alloc_file() callers:
* ones that are followed by ima_counts_get
* ones giving non-regular files
So let's pull that ima_counts_get() into alloc_file();
it's a no-op in case of non-regular files.

Signed-off-by: Al Viro

Al Viro
2009-12-17 01:16:47 +0800
e81e3f4dc fs: move get_empty_filp() deffinition to internal.h ... Browse Code »

All users outside of fs/ of get_empty_filp() have been removed. This patch
moves the definition from the include/ directory to internal.h so no new
users crop up and removes the EXPORT_SYMBOL. I'd love to see open intents
stop using it too, but that's a problem for another day and a smarter
developer!

Signed-off-by: Eric Paris
Acked-by: Miklos Szeredi
Signed-off-by: Al Viro

Eric Paris
2009-12-17 01:16:45 +0800
2c48b9c45 switch alloc_file() to passing struct path ... Browse Code »

... and have the caller grab both mnt and dentry; kill
leak in infiniband, while we are at it.

Signed-off-by: Al Viro

Al Viro
2009-12-17 01:16:42 +0800
3d1e46315 get rid of init_file() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2009-12-17 01:16:42 +0800
732741274 unexport get_empty_filp() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2009-12-17 01:16:41 +0800

25 Oct, 2009

1 commit

6c21a7fb4 LSM: imbed ima calls in the security hooks ... Browse Code »

Based on discussions on LKML and LSM, where there are consecutive
security_ and ima_ calls in the vfs layer, move the ima_ calls to
the existing security_ hooks.

Signed-off-by: Mimi Zohar
Signed-off-by: James Morris

Mimi Zohar
2009-10-25 12:22:48 +0800

24 Sep, 2009

1 commit

8d65af789 sysctl: remove "struct file *" argument of ->proc_handler ... Browse Code »

It's unused.

It isn't needed -- read or write flag is already passed and sysctl
shouldn't care about the rest.

It _was_ used in two places at arch/frv for some reason.

Signed-off-by: Alexey Dobriyan
Cc: David Howells
Cc: "Eric W. Biederman"
Cc: Al Viro
Cc: Ralf Baechle
Cc: Martin Schwidefsky
Cc: Ingo Molnar
Cc: "David S. Miller"
Cc: James Morris
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2009-09-24 22:21:04 +0800

12 Jun, 2009

2 commits

864d7c4c0 fs: move mark_files_ro into file_table.c ... Browse Code »

This function walks the s_files lock, and operates primarily on the
files in a superblock, so it better belongs here (eg. see also
fs_may_remount_ro).

[AV: ... and it shouldn't be static after that move]

Signed-off-by: Nick Piggin
Signed-off-by: Al Viro

npiggin@suse.de
2009-06-12 09:36:02 +0800
96029c4e0 fs: introduce mnt_clone_write ... Browse Code »

This patch speeds up lmbench lat_mmap test by about another 2% after the
first patch.

Before:
avg = 462.286
std = 5.46106

After:
avg = 453.12
std = 9.58257

(50 runs of each, stddev gives a reasonable confidence)

It does this by introducing mnt_clone_write, which avoids some heavyweight
operations of mnt_want_write if called on a vfsmount which we know already
has a write count; and mnt_want_write_file, which can call mnt_clone_write
if the file is open for write.

After these two patches, mnt_want_write and mnt_drop_write go from 7% on
the profile down to 1.3% (including mnt_clone_write).

[AV: mnt_want_write_file() should take file alone and derive mnt from it;
not only all callers have that form, but that's the only mnt about which
we know that it's already held for write if file is opened for write]

Cc: Dave Hansen
Signed-off-by: Nick Piggin
Signed-off-by: Al Viro

npiggin@suse.de
2009-06-12 09:36:02 +0800

30 Mar, 2009

1 commit

a4e49cb69 trivial: remove unused variable 'path' in alloc_file() ... Browse Code »

'struct path' is not used in alloc_file().

Signed-off-by: Tero Roponen
Signed-off-by: Jiri Kosina

Tero Roponen
2009-03-30 21:22:03 +0800

27 Mar, 2009

1 commit

8e9d20897 Merge branch 'bkl-removal' of git://git.lwn.net/linux-2.6 ... Browse Code »

* 'bkl-removal' of git://git.lwn.net/linux-2.6:
Rationalize fasync return values
Move FASYNC bit handling to f_op->fasync()
Use f_lock to protect f_flags
Rename struct file->f_ep_lock

Linus Torvalds
2009-03-27 07:14:02 +0800

16 Mar, 2009

1 commit

684999149 Rename struct file->f_ep_lock ... Browse Code »

This lock moves out of the CONFIG_EPOLL ifdef and becomes f_lock. For now,
epoll remains the only user, but a future patch will use it to protect
f_flags as well.

Cc: Davide Libenzi
Reviewed-by: Christoph Hellwig
Signed-off-by: Jonathan Corbet

Jonathan Corbet
2009-03-16 22:32:27 +0800

06 Feb, 2009

2 commits

cb5629b10 Merge branch 'master' into next ... Browse Code »

Conflicts:
fs/namei.c

Manually merged per:

diff --cc fs/namei.c
index 734f2b5,bbc15c2..0000000
--- a/fs/namei.c
+++ b/fs/namei.c
@@@ -860,9 -848,8 +849,10 @@@ static int __link_path_walk(const char
nd->flags |= LOOKUP_CONTINUE;
err = exec_permission_lite(inode);
if (err == -EAGAIN)
- err = vfs_permission(nd, MAY_EXEC);
+ err = inode_permission(nd->path.dentry->d_inode,
+ MAY_EXEC);
+ if (!err)
+ err = ima_path_check(&nd->path, MAY_EXEC);
if (err)
break;

@@@ -1525,14 -1506,9 +1509,14 @@@ int may_open(struct path *path, int acc
flag &= ~O_TRUNC;
}

- error = vfs_permission(nd, acc_mode);
+ error = inode_permission(inode, acc_mode);
if (error)
return error;
+
- error = ima_path_check(&nd->path,
++ error = ima_path_check(path,
+ acc_mode & (MAY_READ | MAY_WRITE | MAY_EXEC));
+ if (error)
+ return error;
/*
* An append-only file must be opened in append mode for writing.
*/

Signed-off-by: James Morris

James Morris
2009-02-06 08:01:45 +0800
6146f0d5e integrity: IMA hooks ... Browse Code »

This patch replaces the generic integrity hooks, for which IMA registered
itself, with IMA integrity hooks in the appropriate places directly
in the fs directory.

Signed-off-by: Mimi Zohar
Acked-by: Serge Hallyn
Signed-off-by: James Morris

Mimi Zohar
2009-02-06 06:05:30 +0800

01 Jan, 2009

1 commit

b6b3fdead filp_cachep can be static in fs/file_table.c ... Browse Code »

Instead of creating the "filp" kmem_cache in vfs_caches_init(),
we can do it a litle be later in files_init(), so that filp_cachep
is static to fs/file_table.c

Acked-by: Paul E. McKenney

Signed-off-by: Eric Dumazet
Signed-off-by: Al Viro

Eric Dumazet
2009-01-01 07:07:42 +0800

14 Nov, 2008

3 commits

d76b0d9b2 CRED: Use creds in file structs ... Browse Code »

Attach creds to file structs and discard f_uid/f_gid.

file_operations::open() methods (such as hppfs_open()) should use file->f_cred
rather than current_cred(). At the moment file->f_cred will be current_cred()
at this point.

Signed-off-by: David Howells
Reviewed-by: James Morris
Signed-off-by: James Morris

David Howells
2008-11-14 07:39:25 +0800
86a264abe CRED: Wrap current->cred and a few other accessors ... Browse Code »

Wrap current->cred and a few other accessors to hide their actual
implementation.

Signed-off-by: David Howells
Acked-by: James Morris
Acked-by: Serge Hallyn
Signed-off-by: James Morris

David Howells
2008-11-14 07:39:18 +0800
b6dff3ec5 CRED: Separate task security context from task_struct ... Browse Code »

Separate the task security context from task_struct. At this point, the
security data is temporarily embedded in the task_struct with two pointers
pointing to it.

Note that the Alpha arch is altered as it refers to (E)UID and (E)GID in
entry.S via asm-offsets.

With comment fixes Signed-off-by: Marc Dionne

Signed-off-by: David Howells
Acked-by: James Morris
Acked-by: Serge Hallyn
Signed-off-by: James Morris

David Howells
2008-11-14 07:39:16 +0800

02 Nov, 2008

1 commit

233e70f42 saner FASYNC handling on file close ... Browse Code »

As it is, all instances of ->release() for files that have ->fasync()
need to remember to evict file from fasync lists; forgetting that
creates a hole and we actually have a bunch that *does* forget.

So let's keep our lives simple - let __fput() check FASYNC in
file->f_flags and call ->fasync() there if it's been set. And lose that
crap in ->release() instances - leaving it there is still valid, but we
don't have to bother anymore.

Signed-off-by: Al Viro
Signed-off-by: Linus Torvalds

Al Viro
2008-11-02 00:49:46 +0800

21 Oct, 2008

1 commit

aeb5d7270 [PATCH] introduce fmode_t, do annotations ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2008-10-21 19:47:06 +0800

27 Jul, 2008

1 commit

516e0cc56 [PATCH] f_count may wrap around ... Browse Code »

make it atomic_long_t; while we are at it, get rid of useless checks in affs,
hfs and hpfs - ->open() always has it equal to 1, ->release() - to 0.

Signed-off-by: Al Viro

Al Viro
2008-07-27 08:53:40 +0800

02 May, 2008

1 commit

9f3acc314 [PATCH] split linux/file.h ... Browse Code »

Initial splitoff of the low-level stuff; taken to fdtable.h

Signed-off-by: Al Viro

Al Viro
2008-05-02 01:08:16 +0800

19 Apr, 2008

3 commits

ad775f5a8 [PATCH] r/o bind mounts: debugging for missed calls ... Browse Code »

There have been a few oopses caused by 'struct file's with NULL f_vfsmnts.
There was also a set of potentially missed mnt_want_write()s from
dentry_open() calls.

This patch provides a very simple debugging framework to catch these kinds of
bugs. It will WARN_ON() them, but should stop us from having any oopses or
mnt_writer count imbalances.

I'm quite convinced that this is a good thing because it found bugs in the
stuff I was working on as soon as I wrote it.

[hch: made it conditional on a debug option.
But it's still a little bit too ugly]

[hch: merged forced remount r/o fix from Dave and akpm's fix for the fix]

Signed-off-by: Dave Hansen
Acked-by: Al Viro
Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Al Viro

Dave Hansen
2008-04-19 12:29:28 +0800
4a3fd211c [PATCH] r/o bind mounts: elevate write count for open()s ... Browse Code »

This is the first really tricky patch in the series. It elevates the writer
count on a mount each time a non-special file is opened for write.

We used to do this in may_open(), but Miklos pointed out that __dentry_open()
is used as well to create filps. This will cover even those cases, while a
call in may_open() would not have.

There is also an elevated count around the vfs_create() call in open_namei().
See the comments for more details, but we need this to fix a 'create, remount,
fail r/w open()' race.

Some filesystems forego the use of normal vfs calls to create
struct files. Make sure that these users elevate the mnt
writer count because they will get __fput(), and we need
to make sure they're balanced.

Acked-by: Al Viro
Signed-off-by: Christoph Hellwig
Signed-off-by: Dave Hansen
Signed-off-by: Andrew Morton
Signed-off-by: Al Viro

Dave Hansen
2008-04-19 12:29:25 +0800
aceaf78da [PATCH] r/o bind mounts: create helper to drop file write access ... Browse Code »

If someone decides to demote a file from r/w to just
r/o, they can use this same code as __fput().

NFS does just that, and will use this in the next
patch.

AV: drop write access in __fput() only after we evict from file list.

Signed-off-by: Dave Hansen
Cc: Erez Zadok
Cc: Trond Myklebust
Cc: "J Bruce Fields"
Acked-by: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Dave Hansen
2008-04-19 12:25:32 +0800

19 Mar, 2008

1 commit

430e285e0 [PATCH] fix up new filp allocators ... Browse Code »

Some new uses of get_empty_filp() have crept in; switched
to alloc_file() to make sure that pieces of initialization
won't be missing.

We really need to kill get_empty_filp().

[AV] fixed dentry leak on failure exit in anon_inode_getfd()

Cc: Erez Zadok
Cc: Trond Myklebust
Cc: "J Bruce Fields"
Acked-by: Al Viro
Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Dave Hansen
Signed-off-by: Al Viro

Dave Hansen
2008-03-19 18:54:05 +0800

09 Feb, 2008

1 commit

fc9b52cd8 fs: remove fastcall, it is always empty ... Browse Code »

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Harvey Harrison
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Harvey Harrison
2008-02-09 01:22:31 +0800