Eric Lee / smarc-fsl-linux-kernel

13 Nov, 2013

1 commit

9bc9ccd7d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs updates from Al Viro:
"All kinds of stuff this time around; some more notable parts:

- RCU'd vfsmounts handling
- new primitives for coredump handling
- files_lock is gone
- Bruce's delegations handling series
- exportfs fixes

plus misc stuff all over the place"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (101 commits)
ecryptfs: ->f_op is never NULL
locks: break delegations on any attribute modification
locks: break delegations on link
locks: break delegations on rename
locks: helper functions for delegation breaking
locks: break delegations on unlink
namei: minor vfs_unlink cleanup
locks: implement delegations
locks: introduce new FL_DELEG lock flag
vfs: take i_mutex on renamed file
vfs: rename I_MUTEX_QUOTA now that it's not used for quotas
vfs: don't use PARENT/CHILD lock classes for non-directories
vfs: pull ext4's double-i_mutex-locking into common code
exportfs: fix quadratic behavior in filehandle lookup
exportfs: better variable name
exportfs: move most of reconnect_path to helper function
exportfs: eliminate unused "noprogress" counter
exportfs: stop retrying once we race with rename/remove
exportfs: clear DISCONNECTED on all parents sooner
exportfs: more detailed comment for path_reconnect
...

Linus Torvalds
2013-11-13 14:34:18 +0800

09 Nov, 2013

1 commit

eee5cc270 get rid of s_files and files_lock ... Browse Code »

The only thing we need it for is alt-sysrq-r (emergency remount r/o)
and these days we can do just as well without going through the
list of files.

Signed-off-by: Al Viro

Al Viro
2013-11-09 13:16:20 +0800

25 Oct, 2013

1 commit

72c2d5319 file->f_op is never NULL... ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2013-10-25 11:34:54 +0800

20 Oct, 2013

1 commit

c7314d74f nfsd regression since delayed fput() ... Browse Code »

Background: nfsd v[23] had throughput regression since delayed fput
went in; every read or write ends up doing fput() and we get a pair
of extra context switches out of that (plus quite a bit of work
in queue_work itselfi, apparently). Use of schedule_delayed_work()
gives it a chance to accumulate a bit before we do __fput() on all
of them. I'm not too happy about that solution, but... on at least
one real-world setup it reverts about 10% throughput loss we got from
switch to delayed fput.

Signed-off-by: Al Viro

Al Viro
2013-10-20 20:44:39 +0800

12 Sep, 2013

1 commit

be49b30a9 fs/file_table.c:fput(): make comment more truthful ... Browse Code »

Cc: "Eric W. Biederman"
Cc: Andrey Vagin
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2013-09-12 06:59:01 +0800

04 Sep, 2013

1 commit

184cacabe only regular files with FMODE_WRITE need to be on s_files ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2013-09-04 10:50:28 +0800

13 Jul, 2013

2 commits

4f5e65a1c fput: turn "list_head delayed_fput_list" into llist_head ... Browse Code »

fput() and delayed_fput() can use llist and avoid the locking.

This is unlikely path, it is not that this change can improve
the performance, but this way the code looks simpler.

Signed-off-by: Oleg Nesterov
Suggested-by: Andrew Morton
Cc: Al Viro
Cc: Andrey Vagin
Cc: "Eric W. Biederman"
Cc: David Howells
Cc: Huang Ying
Cc: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Al Viro

Oleg Nesterov
2013-07-13 17:29:10 +0800
64372501e fs/file_table.c:fput(): add comment ... Browse Code »

A missed update to "fput: task_work_add() can fail if the caller has
passed exit_task_work()".

Cc: "Eric W. Biederman"
Cc: Al Viro
Cc: Andrey Vagin
Cc: David Howells
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Al Viro

Andrew Morton
2013-07-13 17:27:42 +0800

29 Jun, 2013

1 commit

c77cecee5 Replace a bunch of file->dentry->d_inode refs with file_inode() ... Browse Code »

Replace a bunch of file->dentry->d_inode refs with file_inode().

In __fput(), use file->f_inode instead so as not to be affected by any tricks
that file_inode() might grow.

Signed-off-by: David Howells
Signed-off-by: Al Viro

David Howells
2013-06-29 16:57:13 +0800

15 Jun, 2013

1 commit

e7b2c4069 fput: task_work_add() can fail if the caller has passed exit_task_work() ... Browse Code »

fput() assumes that it can't be called after exit_task_work() but
this is not true, for example free_ipc_ns()->shm_destroy() can do
this. In this case fput() silently leaks the file.

Change it to fallback to delayed_fput_work if task_work_add() fails.
The patch looks complicated but it is not, it changes the code from

if (PF_KTHREAD) {
schedule_work(...);
return;
}
task_work_add(...)

to
if (!PF_KTHREAD) {
if (!task_work_add(...))
return;
/* fallback */
}
schedule_work(...);

As for shm_destroy() in particular, we could make another fix but I
think this change makes sense anyway. There could be another similar
user, it is not safe to assume that task_work_add() can't fail.

Reported-by: Andrey Vagin
Signed-off-by: Oleg Nesterov
Signed-off-by: Al Viro

Oleg Nesterov
2013-06-15 09:39:08 +0800

02 Mar, 2013

1 commit

dd37978c5 cache the value of file_inode() in struct file ... Browse Code »

Note that this thing does *not* contribute to inode refcount;
it's pinned down by dentry.

Signed-off-by: Al Viro

Al Viro
2013-03-02 08:48:30 +0800

23 Feb, 2013

3 commits

39b652527 fs: Preserve error code in get_empty_filp(), part 2 ... Browse Code »

Allocating a file structure in function get_empty_filp() might fail because
of several reasons:
- not enough memory for file structures
- operation is not allowed
- user is over its limit

Currently the function returns NULL in all cases and we loose the exact
reason of the error. All callers of get_empty_filp() assume that the function
can fail with ENFILE only.

Return error through pointer. Change all callers to preserve this error code.

[AV: cleaned up a bit, carved the get_empty_filp() part out into a separate commit
(things remaining here deal with alloc_file()), removed pipe(2) behaviour change]

Signed-off-by: Anatol Pomozov
Reviewed-by: "Theodore Ts'o"
Signed-off-by: Al Viro

Anatol Pomozov
2013-02-23 12:31:32 +0800
1afc99bea propagate error from get_empty_filp() to its callers ... Browse Code »

Based on parts from Anatol's patch (the rest is the next commit).

Signed-off-by: Al Viro

Al Viro
2013-02-23 12:31:32 +0800
496ad9aa8 new helper: file_inode(file) ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2013-02-23 12:31:31 +0800

21 Dec, 2012

1 commit

72651cac8 fs: Fix imbalance in freeze protection in mark_files_ro() ... Browse Code »

File descriptors (even those for writing) do not hold freeze protection.
Thus mark_files_ro() must call __mnt_drop_write() to only drop protection
against remount read-only. Calling mnt_drop_write_file() as we do now
results in:

[ BUG: bad unlock balance detected! ]
3.7.0-rc6-00028-g88e75b6 #101 Not tainted
-------------------------------------
kworker/1:2/79 is trying to release lock (sb_writers) at:
[] mnt_drop_write+0x24/0x30
but there are no more locks to release!

Reported-by: Zdenek Kabelac
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara
Signed-off-by: Al Viro

Jan Kara
2012-12-21 02:57:36 +0800

10 Oct, 2012

1 commit

4b2c551f7 lglock: add DEFINE_STATIC_LGLOCK() ... Browse Code »

When the lglock doesn't need to be exported we can use
DEFINE_STATIC_LGLOCK().

Signed-off-by: Lai Jiangshan
Cc: Alexander Viro
Cc: Rusty Russell
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Al Viro

Lai Jiangshan
2012-10-10 13:15:44 +0800

03 Oct, 2012

1 commit

88265322c Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security ... Browse Code »

Pull security subsystem updates from James Morris:
"Highlights:

- Integrity: add local fs integrity verification to detect offline
attacks
- Integrity: add digital signature verification
- Simple stacking of Yama with other LSMs (per LSS discussions)
- IBM vTPM support on ppc64
- Add new driver for Infineon I2C TIS TPM
- Smack: add rule revocation for subject labels"

Fixed conflicts with the user namespace support in kernel/auditsc.c and
security/integrity/ima/ima_policy.c.

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (39 commits)
Documentation: Update git repository URL for Smack userland tools
ima: change flags container data type
Smack: setprocattr memory leak fix
Smack: implement revoking all rules for a subject label
Smack: remove task_wait() hook.
ima: audit log hashes
ima: generic IMA action flag handling
ima: rename ima_must_appraise_or_measure
audit: export audit_log_task_info
tpm: fix tpm_acpi sparse warning on different address spaces
samples/seccomp: fix 31 bit build on s390
ima: digital signature verification support
ima: add support for different security.ima data types
ima: add ima_inode_setxattr/removexattr function and calls
ima: add inode_post_setattr call
ima: replace iint spinblock with rwlock/read_lock
ima: allocating iint improvements
ima: add appraise action keywords and default rules
ima: integrity appraisal extension
vfs: move ima_file_free before releasing the file
...

Linus Torvalds
2012-10-03 12:38:48 +0800

27 Sep, 2012

1 commit

0ee8cdfe6 take fget() and friends to fs/file.c ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2012-09-27 09:08:56 +0800

08 Sep, 2012

1 commit

4199d35cb vfs: move ima_file_free before releasing the file ... Browse Code »

ima_file_free(), called on __fput(), currently flags files that have
changed, so that the file is re-measured. For appraising a files's
integrity, the file's hash must be re-calculated and stored in the
'security.ima' xattr to reflect any changes.

This patch moves the ima_file_free() call to before releasing the file
in preparation of ima-appraisal measuring the file and updating the
'security.ima' xattr.

Signed-off-by: Mimi Zohar
Acked-by: Serge Hallyn
Acked-by: Dmitry Kasatkin

Mimi Zohar
2012-09-08 02:57:27 +0800

31 Jul, 2012

1 commit

eb04c2828 fs: Add freezing handling to mnt_want_write() / mnt_drop_write() ... Browse Code »

Most of places where we want freeze protection coincides with the places where
we also have remount-ro protection. So make mnt_want_write() and
mnt_drop_write() (and their _file alternative) prevent freezing as well.
For the few cases that are really interested only in remount-ro protection
provide new function variants.

BugLink: https://bugs.launchpad.net/bugs/897421
Tested-by: Kamal Mostafa
Tested-by: Peter M. Petrakis
Tested-by: Dann Frazier
Tested-by: Massimo Morana
Signed-off-by: Jan Kara
Signed-off-by: Al Viro

Jan Kara
2012-07-31 13:40:38 +0800

30 Jul, 2012

1 commit

5c33b183a uninline file_free_rcu() ... Browse Code »

What inline? Its only use is passing its address to call_rcu(), for fuck sake!

Signed-off-by: Al Viro

Al Viro
2012-07-30 01:24:17 +0800

23 Jul, 2012

1 commit

4a9d4b024 switch fput to task_work_add ... Browse Code »
88

... and schedule_work() for interrupt/kernel_thread callers
(and yes, now it *is* OK to call from interrupt).

We are guaranteed that __fput() will be done before we return
to userland (or exit). Note that for fput() from a kernel
thread we get an async behaviour; it's almost always OK, but
sometimes you might need to have __fput() completed before
you do anything else. There are two mechanisms for that -
a general barrier (flush_delayed_fput()) and explicit
__fput_sync(). Both should be used with care (as was the
case for fput() from kernel threads all along). See comments
in fs/file_table.c for details.

Signed-off-by: Al Viro

Al Viro
2012-07-23 03:57:58 +0800

14 Jul, 2012

1 commit

85d7d618c mark_files_ro(): don't bother with mntget/mntput ... Browse Code »

mnt_drop_write_file() is safe under any lock

Signed-off-by: Al Viro

Al Viro
2012-07-14 20:35:46 +0800

30 May, 2012

2 commits

962830df3 brlocks/lglocks: API cleanups ... Browse Code »

lglocks and brlocks are currently generated with some complicated macros
in lglock.h. But there's no reason to not just use common utility
functions and put all the data into a common data structure.

In preparation, this patch changes the API to look more like normal
function calls with pointers, not magic macros.

The patch is rather large because I move over all users in one go to keep
it bisectable. This impacts the VFS somewhat in terms of lines changed.
But no actual behaviour change.

[akpm@linux-foundation.org: checkpatch fixes]
Signed-off-by: Andi Kleen
Cc: Al Viro
Cc: Rusty Russell
Signed-off-by: Andrew Morton
Signed-off-by: Rusty Russell
Signed-off-by: Al Viro

Andi Kleen
2012-05-30 11:28:41 +0800
eea62f831 brlocks/lglocks: turn into functions ... Browse Code »

lglocks and brlocks are currently generated with some complicated macros
in lglock.h. But there's no reason to not just use common utility
functions and put all the data into a common data structure.

Since there are at least two users it makes sense to share this code in a
library. This is also easier maintainable than a macro forest.

This will also make it later possible to dynamically allocate lglocks and
also use them in modules (this would both still need some additional, but
now straightforward, code)

[akpm@linux-foundation.org: checkpatch fixes]
Signed-off-by: Andi Kleen
Cc: Al Viro
Cc: Rusty Russell
Signed-off-by: Andrew Morton
Signed-off-by: Rusty Russell
Signed-off-by: Al Viro

Andi Kleen
2012-05-30 11:28:41 +0800

21 Mar, 2012

1 commit

b57ce9694 vfs: drop_file_write_access() made static ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2012-03-21 09:29:32 +0800

07 Jan, 2012

1 commit

8e8b87964 vfs: prevent remount read-only if pending removes ... Browse Code »

If there are any inodes on the super block that have been unlinked
(i_nlink == 0) but have not yet been deleted then prevent the
remounting the super block read-only.

Reported-by: Toshiyuki Okajima
Signed-off-by: Miklos Szeredi
Tested-by: Toshiyuki Okajima
Signed-off-by: Al Viro

Miklos Szeredi
2012-01-07 12:20:13 +0800

27 Jul, 2011

1 commit

60063497a atomic: use <linux/atomic.h> ... Browse Code »

This allows us to move duplicated code in
(atomic_inc_not_zero() for now) to

Signed-off-by: Arun Sharma
Reviewed-by: Eric Dumazet
Cc: Ingo Molnar
Cc: David Miller
Cc: Eric Dumazet
Acked-by: Mike Frysinger
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Arun Sharma
2011-07-27 07:49:47 +0800

17 Mar, 2011

3 commits

2e270d842 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
fix cdev leak on O_PATH final fput()

Linus Torvalds
2011-03-17 04:26:17 +0800
60ed8cf78 fix cdev leak on O_PATH final fput() ... Browse Code »

__fput doesn't need a cdev_put() for O_PATH handles.

Signed-off-by: mszeredi@suse.cz
Signed-off-by: Al Viro

Miklos Szeredi
2011-03-17 04:18:39 +0800
0f6e0e844 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorri… ... Browse Code »

…s/security-testing-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: (33 commits)
AppArmor: kill unused macros in lsm.c
AppArmor: cleanup generated files correctly
KEYS: Add an iovec version of KEYCTL_INSTANTIATE
KEYS: Add a new keyctl op to reject a key with a specified error code
KEYS: Add a key type op to permit the key description to be vetted
KEYS: Add an RCU payload dereference macro
AppArmor: Cleanup make file to remove cruft and make it easier to read
SELinux: implement the new sb_remount LSM hook
LSM: Pass -o remount options to the LSM
SELinux: Compute SID for the newly created socket
SELinux: Socket retains creator role and MLS attribute
SELinux: Auto-generate security_is_socket_class
TOMOYO: Fix memory leak upon file open.
Revert "selinux: simplify ioctl checking"
selinux: drop unused packet flow permissions
selinux: Fix packet forwarding checks on postrouting
selinux: Fix wrong checks for selinux_policycap_netpeer
selinux: Fix check for xfrm selinux context algorithm
ima: remove unnecessary call to ima_must_measure
IMA: remove IMA imbalance checking
...

Linus Torvalds
2011-03-17 00:15:43 +0800

15 Mar, 2011

2 commits

326be7b48 Allow passing O_PATH descriptors via SCM_RIGHTS datagrams ... Browse Code »

Just need to make sure that AF_UNIX garbage collector won't
confuse O_PATHed socket on filesystem for real AF_UNIX opened
socket.

Signed-off-by: Al Viro

Al Viro
2011-03-15 14:21:45 +0800
1abf0c718 New kind of open files - "location only". ... Browse Code »

New flag for open(2) - O_PATH. Semantics:
* pathname is resolved, but the file itself is _NOT_ opened
as far as filesystem is concerned.
* almost all operations on the resulting descriptors shall
fail with -EBADF. Exceptions are:
1) operations on descriptors themselves (i.e.
close(), dup(), dup2(), dup3(), fcntl(fd, F_DUPFD),
fcntl(fd, F_DUPFD_CLOEXEC, ...), fcntl(fd, F_GETFD),
fcntl(fd, F_SETFD, ...))
2) fcntl(fd, F_GETFL), for a common non-destructive way to
check if descriptor is open
3) "dfd" arguments of ...at(2) syscalls, i.e. the starting
points of pathname resolution
* closing such descriptor does *NOT* affect dnotify or
posix locks.
* permissions are checked as usual along the way to file;
no permission checks are applied to the file itself. Of course,
giving such thing to syscall will result in permission checks (at
the moment it means checking that starting point of ....at() is
a directory and caller has exec permissions on it).

fget() and fget_light() return NULL on such descriptors; use of
fget_raw() and fget_raw_light() is needed to get them. That protects
existing code from dealing with those things.

There are two things still missing (they come in the next commits):
one is handling of symlinks (right now we refuse to open them that
way; see the next commit for semantics related to those) and another
is descriptor passing via SCM_RIGHTS datagrams.

Signed-off-by: Al Viro

Al Viro
2011-03-15 14:21:45 +0800

08 Mar, 2011

1 commit

1cc26bada Merge branch 'master'; commit 'v2.6.38-rc7' into next Browse Code »

James Morris
2011-03-08 07:55:06 +0800

10 Feb, 2011

1 commit

890275b5e IMA: maintain i_readcount in the VFS layer ... Browse Code »

ima_counts_get() updated the readcount and invalidated the PCR,
as necessary. Only update the i_readcount in the VFS layer.
Move the PCR invalidation checks to ima_file_check(), where it
belongs.

Maintaining the i_readcount in the VFS layer, will allow other
subsystems to use i_readcount.

Signed-off-by: Mimi Zohar
Acked-by: Eric Paris

Mimi Zohar
2011-02-10 20:51:44 +0800

05 Feb, 2011

1 commit

78d297887 CRED: Fix kernel panic upon security_file_alloc() failure. ... Browse Code »

In get_empty_filp() since 2.6.29, file_free(f) is called with f->f_cred == NULL
when security_file_alloc() returned an error. As a result, kernel will panic()
due to put_cred(NULL) call within RCU callback.

Fix this bug by assigning f->f_cred before calling security_file_alloc().

Signed-off-by: Tetsuo Handa
Signed-off-by: David Howells
Signed-off-by: Linus Torvalds

Tetsuo Handa
2011-02-05 02:40:29 +0800

17 Jan, 2011

1 commit

3bc0ba430 fs: Remove unlikely() from fget_light() ... Browse Code »

There's an unlikely() in fget_light() that assumes the file ref count
will be 1. Running the annotate branch profiler on a desktop that is
performing daily tasks (running firefox, evolution, xchat and is also part
of a distcc farm), it shows that the ref count is not 1 that often.

correct incorrect % Function File Line
------- --------- - -------- ---- ----
1035099358 6209599193 85 fget_light file_table.c 315

Cc: Al Viro
Cc: Christoph Hellwig
Signed-off-by: Steven Rostedt
Signed-off-by: Al Viro

Steven Rostedt
2011-01-17 16:26:27 +0800

27 Oct, 2010

1 commit

518de9b39 fs: allow for more than 2^31 files ... Browse Code »

Robin Holt tried to boot a 16TB system and found af_unix was overflowing
a 32bit value :

We were seeing a failure which prevented boot. The kernel was incapable
of creating either a named pipe or unix domain socket. This comes down
to a common kernel function called unix_create1() which does:

atomic_inc(&unix_nr_socks);
if (atomic_read(&unix_nr_socks) > 2 * get_max_files())
goto out;

The function get_max_files() is a simple return of files_stat.max_files.
files_stat.max_files is a signed integer and is computed in
fs/file_table.c's files_init().

n = (mempages * (PAGE_SIZE / 1024)) / 10;
files_stat.max_files = n;

In our case, mempages (total_ram_pages) is approx 3,758,096,384
(0xe0000000). That leaves max_files at approximately 1,503,238,553.
This causes 2 * get_max_files() to integer overflow.

Fix is to let /proc/sys/fs/file-nr & /proc/sys/fs/file-max use long
integers, and change af_unix to use an atomic_long_t instead of atomic_t.

get_max_files() is changed to return an unsigned long. get_nr_files() is
changed to return a long.

unix_nr_socks is changed from atomic_t to atomic_long_t, while not
strictly needed to address Robin problem.

Before patch (on a 64bit kernel) :
# echo 2147483648 >/proc/sys/fs/file-max
# cat /proc/sys/fs/file-max
-18446744071562067968

After patch:
# echo 2147483648 >/proc/sys/fs/file-max
# cat /proc/sys/fs/file-max
2147483648
# cat /proc/sys/fs/file-nr
704 0 2147483648

Reported-by: Robin Holt
Signed-off-by: Eric Dumazet
Acked-by: David Miller
Reviewed-by: Robin Holt
Tested-by: Robin Holt
Cc: Al Viro
Cc: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric Dumazet
2010-10-27 07:52:15 +0800

18 Aug, 2010

2 commits

6416ccb78 fs: scale files_lock ... Browse Code »

fs: scale files_lock

Improve scalability of files_lock by adding per-cpu, per-sb files lists,
protected with an lglock. The lglock provides fast access to the per-cpu lists
to add and remove files. It also provides a snapshot of all the per-cpu lists
(although this is very slow).

One difficulty with this approach is that a file can be removed from the list
by another CPU. We must track which per-cpu list the file is on with a new
variale in the file struct (packed into a hole on 64-bit archs). Scalability
could suffer if files are frequently removed from different cpu's list.

However loads with frequent removal of files imply short interval between
adding and removing the files, and the scheduler attempts to avoid moving
processes too far away. Also, even in the case of cross-CPU removal, the
hardware has much more opportunity to parallelise cacheline transfers with N
cachelines than with 1.

A worst-case test of 1 CPU allocating files subsequently being freed by N CPUs
degenerates to contending on a single lock, which is no worse than before. When
more than one CPU are allocating files, even if they are always freed by
different CPUs, there will be more parallelism than the single-lock case.

Testing results:

On a 2 socket, 8 core opteron, I measure the number of times the lock is taken
to remove the file, the number of times it is removed by the same CPU that
added it, and the number of times it is removed by the same node that added it.

Booting: locks= 25049 cpu-hits= 23174 (92.5%) node-hits= 23945 (95.6%)
kbuild -j16 locks=2281913 cpu-hits=2208126 (96.8%) node-hits=2252674 (98.7%)
dbench 64 locks=4306582 cpu-hits=4287247 (99.6%) node-hits=4299527 (99.8%)

So a file is removed from the same CPU it was added by over 90% of the time.
It remains within the same node 95% of the time.

Tim Chen ran some numbers for a 64 thread Nehalem system performing a compile.

throughput
2.6.34-rc2 24.5
+patch 24.9

us sys idle IO wait (in %)
2.6.34-rc2 51.25 28.25 17.25 3.25
+patch 53.75 18.5 19 8.75

So significantly less CPU time spent in kernel code, higher idle time and
slightly higher throughput.

Single threaded performance difference was within the noise of microbenchmarks.
That is not to say penalty does not exist, the code is larger and more memory
accesses required so it will be slightly slower.

Cc: linux-kernel@vger.kernel.org
Cc: Tim Chen
Cc: Andi Kleen
Signed-off-by: Nick Piggin
Signed-off-by: Al Viro

Nick Piggin
2010-08-18 20:35:48 +0800
ee2ffa0df fs: cleanup files_lock locking ... Browse Code »

fs: cleanup files_lock locking

Lock tty_files with a new spinlock, tty_files_lock; provide helpers to
manipulate the per-sb files list; unexport the files_lock spinlock.

Cc: linux-kernel@vger.kernel.org
Cc: Christoph Hellwig
Cc: Alan Cox
Acked-by: Andi Kleen
Acked-by: Greg Kroah-Hartman
Signed-off-by: Nick Piggin
Signed-off-by: Al Viro

Nick Piggin
2010-08-18 20:35:47 +0800