Eric Lee / smarc-fsl-linux-kernel

08 Jan, 2011

1 commit

b4a45f5fe Merge branch 'vfs-scale-working' of git://git.kernel.org/pub/scm/linux/kernel/gi… ... Browse Code »

…t/npiggin/linux-npiggin

* 'vfs-scale-working' of git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin: (57 commits)
fs: scale mntget/mntput
fs: rename vfsmount counter helpers
fs: implement faster dentry memcmp
fs: prefetch inode data in dcache lookup
fs: improve scalability of pseudo filesystems
fs: dcache per-inode inode alias locking
fs: dcache per-bucket dcache hash locking
bit_spinlock: add required includes
kernel: add bl_list
xfs: provide simple rcu-walk ACL implementation
btrfs: provide simple rcu-walk ACL implementation
ext2,3,4: provide simple rcu-walk ACL implementation
fs: provide simple rcu-walk generic_check_acl implementation
fs: provide rcu-walk aware permission i_ops
fs: rcu-walk aware d_revalidate method
fs: cache optimise dentry and inode for rcu-walk
fs: dcache reduce branches in lookup path
fs: dcache remove d_mounted
fs: fs_struct use seqlock
fs: rcu-walk for path lookup
...

Linus Torvalds
2011-01-08 00:56:33 +0800

07 Jan, 2011

6 commits

31e6b01f4 fs: rcu-walk for path lookup ... Browse Code »

Perform common cases of path lookups without any stores or locking in the
ancestor dentry elements. This is called rcu-walk, as opposed to the current
algorithm which is a refcount based walk, or ref-walk.

This results in far fewer atomic operations on every path element,
significantly improving path lookup performance. It also avoids cacheline
bouncing on common dentries, significantly improving scalability.

The overall design is like this:
* LOOKUP_RCU is set in nd->flags, which distinguishes rcu-walk from ref-walk.
* Take the RCU lock for the entire path walk, starting with the acquiring
of the starting path (eg. root/cwd/fd-path). So now dentry refcounts are
not required for dentry persistence.
* synchronize_rcu is called when unregistering a filesystem, so we can
access d_ops and i_ops during rcu-walk.
* Similarly take the vfsmount lock for the entire path walk. So now mnt
refcounts are not required for persistence. Also we are free to perform mount
lookups, and to assume dentry mount points and mount roots are stable up and
down the path.
* Have a per-dentry seqlock to protect the dentry name, parent, and inode,
so we can load this tuple atomically, and also check whether any of its
members have changed.
* Dentry lookups (based on parent, candidate string tuple) recheck the parent
sequence after the child is found in case anything changed in the parent
during the path walk.
* inode is also RCU protected so we can load d_inode and use the inode for
limited things.
* i_mode, i_uid, i_gid can be tested for exec permissions during path walk.
* i_op can be loaded.

When we reach the destination dentry, we lock it, recheck lookup sequence,
and increment its refcount and mountpoint refcount. RCU and vfsmount locks
are dropped. This is termed "dropping rcu-walk". If the dentry refcount does
not match, we can not drop rcu-walk gracefully at the current point in the
lokup, so instead return -ECHILD (for want of a better errno). This signals the
path walking code to re-do the entire lookup with a ref-walk.

Aside from the final dentry, there are other situations that may be encounted
where we cannot continue rcu-walk. In that case, we drop rcu-walk (ie. take
a reference on the last good dentry) and continue with a ref-walk. Again, if
we can drop rcu-walk gracefully, we return -ECHILD and do the whole lookup
using ref-walk. But it is very important that we can continue with ref-walk
for most cases, particularly to avoid the overhead of double lookups, and to
gain the scalability advantages on common path elements (like cwd and root).

The cases where rcu-walk cannot continue are:
* NULL dentry (ie. any uncached path element)
* parent with d_inode->i_op->permission or ACLs
* dentries with d_revalidate
* Following links

In future patches, permission checks and d_revalidate become rcu-walk aware. It
may be possible eventually to make following links rcu-walk aware.

Uncached path elements will always require dropping to ref-walk mode, at the
very least because i_mutex needs to be grabbed, and objects allocated.

Signed-off-by: Nick Piggin

Nick Piggin
2011-01-07 14:50:27 +0800
dc0474be3 fs: dcache rationalise dget variants ... Browse Code »

dget_locked was a shortcut to avoid the lazy lru manipulation when we already
held dcache_lock (lru manipulation was relatively cheap at that point).
However, how that the lru lock is an innermost one, we never hold it at any
caller, so the lock cost can now be avoided. We already have well working lazy
dcache LRU, so it should be fine to defer LRU manipulations to scan time.

Signed-off-by: Nick Piggin

Nick Piggin
2011-01-07 14:50:24 +0800
b5c84bf6f fs: dcache remove dcache_lock ... Browse Code »

dcache_lock no longer protects anything. remove it.

Signed-off-by: Nick Piggin

Nick Piggin
2011-01-07 14:50:23 +0800
2fd6b7f50 fs: dcache scale subdirs ... Browse Code »

Protect d_subdirs and d_child with d_lock, except in filesystems that aren't
using dcache_lock for these anyway (eg. using i_mutex).

Note: if we change the locking rule in future so that ->d_child protection is
provided only with ->d_parent->d_lock, it may allow us to reduce some locking.
But it would be an exception to an otherwise regular locking scheme, so we'd
have to see some good results. Probably not worthwhile.

Signed-off-by: Nick Piggin

Nick Piggin
2011-01-07 14:50:21 +0800
da5029563 fs: dcache scale d_unhashed ... Browse Code »

Protect d_unhashed(dentry) condition with d_lock. This means keeping
DCACHE_UNHASHED bit in synch with hash manipulations.

Signed-off-by: Nick Piggin

Nick Piggin
2011-01-07 14:50:21 +0800
abb359450 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1436 commits)
cassini: Use local-mac-address prom property for Cassini MAC address
net: remove the duplicate #ifdef __KERNEL__
net: bridge: check the length of skb after nf_bridge_maybe_copy_header()
netconsole: clarify stopping message
netconsole: don't announce stopping if nothing happened
cnic: Fix the type field in SPQ messages
netfilter: fix export secctx error handling
netfilter: fix the race when initializing nf_ct_expect_hash_rnd
ipv4: IP defragmentation must be ECN aware
net: r6040: Return proper error for r6040_init_one
dcb: use after free in dcb_flushapp()
dcb: unlock on error in dcbnl_ieee_get()
net: ixp4xx_eth: Return proper error for eth_init_one
include/linux/if_ether.h: Add #define ETH_P_LINK_CTL for HPNA and wlan local tunnel
net: add POLLPRI to sock_def_readable()
af_unix: Avoid socket->sk NULL OOPS in stream connect security hooks.
net_sched: pfifo_head_drop problem
mac80211: remove stray extern
mac80211: implement off-channel TX using hw r-o-c offload
mac80211: implement hardware offload for remain-on-channel
...

Linus Torvalds
2011-01-07 04:30:19 +0800

06 Jan, 2011

1 commit

3610cda53 af_unix: Avoid socket->sk NULL OOPS in stream connect security hooks. ... Browse Code »

unix_release() can asynchornously set socket->sk to NULL, and
it does so without holding the unix_state_lock() on "other"
during stream connects.

However, the reverse mapping, sk->sk_socket, is only transitioned
to NULL under the unix_state_lock().

Therefore make the security hooks follow the reverse mapping instead
of the forward mapping.

Reported-by: Jeremy Fitzhardinge
Reported-by: Linus Torvalds
Signed-off-by: David S. Miller

David S. Miller
2011-01-06 07:38:53 +0800

04 Jan, 2011

1 commit

867c20265 ima: fix add LSM rule bug ... Browse Code »

If security_filter_rule_init() doesn't return a rule, then not everything
is as fine as the return code implies.

This bug only occurs when the LSM (eg. SELinux) is disabled at runtime.

Adding an empty LSM rule causes ima_match_rules() to always succeed,
ignoring any remaining rules.

default IMA TCB policy:
# PROC_SUPER_MAGIC
dont_measure fsmagic=0x9fa0
# SYSFS_MAGIC
dont_measure fsmagic=0x62656572
# DEBUGFS_MAGIC
dont_measure fsmagic=0x64626720
# TMPFS_MAGIC
dont_measure fsmagic=0x01021994
# SECURITYFS_MAGIC
dont_measure fsmagic=0x73636673

< LSM specific rule >
dont_measure obj_type=var_log_t

measure func=BPRM_CHECK
measure func=FILE_MMAP mask=MAY_EXEC
measure func=FILE_CHECK mask=MAY_READ uid=0

Thus without the patch, with the boot parameters 'tcb selinux=0', adding
the above 'dont_measure obj_type=var_log_t' rule to the default IMA TCB
measurement policy, would result in nothing being measured. The patch
prevents the default TCB policy from being replaced.

Signed-off-by: Mimi Zohar
Cc: James Morris
Acked-by: Serge Hallyn
Cc: David Safford
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mimi Zohar
2011-01-04 08:36:33 +0800

27 Dec, 2010

1 commit

17f7f4d9f Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 ... Browse Code »

Conflicts:
net/ipv4/fib_frontend.c

David S. Miller
2010-12-27 14:37:05 +0800

24 Dec, 2010

1 commit

3fc5e98d8 KEYS: Don't call up_write() if __key_link_begin() returns an error ... Browse Code »

In construct_alloc_key(), up_write() is called in the error path if
__key_link_begin() fails, but this is incorrect as __key_link_begin() only
returns with the nominated keyring locked if it returns successfully.

Without this patch, you might see the following in dmesg:

=====================================
[ BUG: bad unlock balance detected! ]
-------------------------------------
mount.cifs/5769 is trying to release lock (&key->sem) at:
[] request_key_and_link+0x263/0x3fc
but there are no more locks to release!

other info that might help us debug this:
3 locks held by mount.cifs/5769:
#0: (&type->s_umount_key#41/1){+.+.+.}, at: [] sget+0x278/0x3e7
#1: (&ret_buf->session_mutex){+.+.+.}, at: [] cifs_get_smb_ses+0x35a/0x443 [cifs]
#2: (root_key_user.cons_lock){+.+.+.}, at: [] request_key_and_link+0x10a/0x3fc

stack backtrace:
Pid: 5769, comm: mount.cifs Not tainted 2.6.37-rc6+ #1
Call Trace:
[] ? request_key_and_link+0x263/0x3fc
[] print_unlock_inbalance_bug+0xca/0xd5
[] lock_release_non_nested+0xc1/0x263
[] ? request_key_and_link+0x263/0x3fc
[] ? request_key_and_link+0x263/0x3fc
[] lock_release+0x17d/0x1a4
[] up_write+0x23/0x3b
[] request_key_and_link+0x263/0x3fc
[] ? cifs_get_spnego_key+0x61/0x21f [cifs]
[] request_key+0x41/0x74
[] cifs_get_spnego_key+0x200/0x21f [cifs]
[] CIFS_SessSetup+0x55d/0x1273 [cifs]
[] cifs_setup_session+0x90/0x1ae [cifs]
[] cifs_get_smb_ses+0x37f/0x443 [cifs]
[] cifs_mount+0x1aa1/0x23f3 [cifs]
[] ? alloc_debug_processing+0xdb/0x120
[] ? cifs_get_spnego_key+0x1ef/0x21f [cifs]
[] cifs_do_mount+0x165/0x2b3 [cifs]
[] vfs_kern_mount+0xaf/0x1dc
[] do_kern_mount+0x4d/0xef
[] do_mount+0x6f4/0x733
[] sys_mount+0x88/0xc2
[] system_call_fastpath+0x16/0x1b

Reported-by: Jeff Layton
Signed-off-by: David Howells
Reviewed-and-Tested-by: Jeff Layton
Signed-off-by: Linus Torvalds

David Howells
2010-12-24 07:31:48 +0800

24 Nov, 2010

2 commits

2fe66ec24 SELinux: indicate fatal error in compat netfilter code ... Browse Code »

The SELinux ip postroute code indicates when policy rejected a packet and
passes the error back up the stack. The compat code does not. This patch
sends the same kind of error back up the stack in the compat code.

Based-on-patch-by: Paul Moore
Signed-off-by: Eric Paris
Reviewed-by: Paul Moore
Signed-off-by: David S. Miller

Eric Paris
2010-11-24 02:50:17 +0800
04f6d70f6 SELinux: Only return netlink error when we know the return is fatal ... Browse Code »

Some of the SELinux netlink code returns a fatal error when the error might
actually be transient. This patch just silently drops packets on
potentially transient errors but continues to return a permanant error
indicator when the denial was because of policy.

Based-on-comments-by: Paul Moore
Signed-off-by: Eric Paris
Reviewed-by: Paul Moore
Signed-off-by: David S. Miller

Eric Paris
2010-11-24 02:50:17 +0800

18 Nov, 2010

1 commit

1f1aaf828 SELinux: return -ECONNREFUSED from ip_postroute to signal fatal error ... Browse Code »

The SELinux netfilter hooks just return NF_DROP if they drop a packet. We
want to signal that a drop in this hook is a permanant fatal error and is not
transient. If we do this the error will be passed back up the stack in some
places and applications will get a faster interaction that something went
wrong.

Signed-off-by: Eric Paris
Signed-off-by: David S. Miller

Eric Paris
2010-11-18 02:54:35 +0800

16 Nov, 2010

1 commit

12b3052c3 capabilities/syslog: open code cap_syslog logic to fix build failure ... Browse Code »

The addition of CONFIG_SECURITY_DMESG_RESTRICT resulted in a build
failure when CONFIG_PRINTK=n. This is because the capabilities code
which used the new option was built even though the variable in question
didn't exist.

The patch here fixes this by moving the capabilities checks out of the
LSM and into the caller. All (known) LSMs should have been calling the
capabilities hook already so it actually makes the code organization
better to eliminate the hook altogether.

Signed-off-by: Eric Paris
Acked-by: James Morris
Signed-off-by: Linus Torvalds

Eric Paris
2010-11-16 07:40:01 +0800

13 Nov, 2010

1 commit

fe7e96f66 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorri… ... Browse Code »

…s/security-testing-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
APPARMOR: Fix memory leak of apparmor_init()
APPARMOR: Fix memory leak of alloc_namespace()

Linus Torvalds
2010-11-13 00:00:25 +0800

12 Nov, 2010

1 commit

eaf06b241 Restrict unprivileged access to kernel syslog ... Browse Code »

The kernel syslog contains debugging information that is often useful
during exploitation of other vulnerabilities, such as kernel heap
addresses. Rather than futilely attempt to sanitize hundreds (or
thousands) of printk statements and simultaneously cripple useful
debugging functionality, it is far simpler to create an option that
prevents unprivileged users from reading the syslog.

This patch, loosely based on grsecurity's GRKERNSEC_DMESG, creates the
dmesg_restrict sysctl. When set to "0", the default, no restrictions are
enforced. When set to "1", only users with CAP_SYS_ADMIN can read the
kernel syslog via dmesg(8) or other mechanisms.

[akpm@linux-foundation.org: explain the config option in kernel.txt]
Signed-off-by: Dan Rosenberg
Acked-by: Ingo Molnar
Acked-by: Eugene Teo
Acked-by: Kees Cook
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dan Rosenberg
2010-11-12 23:55:32 +0800

11 Nov, 2010

2 commits

a26d279ea APPARMOR: Fix memory leak of apparmor_init() ... Browse Code »

set_init_cxt() allocted sizeof(struct aa_task_cxt) bytes for cxt,
if register_security() failed, it will cause memory leak.

Signed-off-by: Zhitong Wang
Signed-off-by: John Johansen
Signed-off-by: James Morris

wzt.wzt@gmail.com
2010-11-11 04:36:22 +0800
246c3fb16 APPARMOR: Fix memory leak of alloc_namespace() ... Browse Code »

policy->name is a substring of policy->hname, if prefix is not NULL, it will
allocted strlen(prefix) + strlen(name) + 3 bytes to policy->hname in policy_init().
use kzfree(ns->base.name) will casue memory leak if alloc_namespace() failed.

Signed-off-by: Zhitong Wang
Signed-off-by: John Johansen
Signed-off-by: James Morris

wzt.wzt@gmail.com
2010-11-11 04:36:18 +0800

29 Oct, 2010

2 commits

fc14f2fef convert get_sb_single() users ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2010-10-29 16:16:28 +0800
27d637989 Fix install_process_keyring error handling ... Browse Code »

Fix an incorrect error check that returns 1 for error instead of the
expected error code.

Signed-off-by: Andi Kleen
Signed-off-by: David Howells
Signed-off-by: Linus Torvalds

Andi Kleen
2010-10-29 00:02:15 +0800

27 Oct, 2010

13 commits

426e1f5ce Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (52 commits)
split invalidate_inodes()
fs: skip I_FREEING inodes in writeback_sb_inodes
fs: fold invalidate_list into invalidate_inodes
fs: do not drop inode_lock in dispose_list
fs: inode split IO and LRU lists
fs: switch bdev inode bdi's correctly
fs: fix buffer invalidation in invalidate_list
fsnotify: use dget_parent
smbfs: use dget_parent
exportfs: use dget_parent
fs: use RCU read side protection in d_validate
fs: clean up dentry lru modification
fs: split __shrink_dcache_sb
fs: improve DCACHE_REFERENCED usage
fs: use percpu counter for nr_dentry and nr_dentry_unused
fs: simplify __d_free
fs: take dcache_lock inside __d_path
fs: do not assign default i_ino in new_inode
fs: introduce a per-cpu last_ino allocator
new helper: ihold()
...

Linus Torvalds
2010-10-27 08:58:44 +0800
f9ba5375a Merge branch 'ima-memory-use-fixes' ... Browse Code »

* ima-memory-use-fixes:
IMA: fix the ToMToU logic
IMA: explicit IMA i_flag to remove global lock on inode_delete
IMA: drop refcnt from ima_iint_cache since it isn't needed
IMA: only allocate iint when needed
IMA: move read counter into struct inode
IMA: use i_writecount rather than a private counter
IMA: use inode->i_lock to protect read and write counters
IMA: convert internal flags from long to char
IMA: use unsigned int instead of long for counters
IMA: drop the inode opencount since it isn't needed for operation
IMA: use rbtree instead of radix tree for inode information cache

Linus Torvalds
2010-10-27 02:37:48 +0800
bade72d60 IMA: fix the ToMToU logic ... Browse Code »

Current logic looks like this:

rc = ima_must_measure(NULL, inode, MAY_READ, FILE_CHECK);
if (rc < 0)
goto out;

if (mode & FMODE_WRITE) {
if (inode->i_readcount)
send_tomtou = true;
goto out;
}

if (atomic_read(&inode->i_writecount) > 0)
send_writers = true;

Lets assume we have a policy which states that all files opened for read
by root must be measured.

Lets assume the file has permissions 777.

Lets assume that root has the given file open for read.

Lets assume that a non-root process opens the file write.

The non-root process will get to ima_counts_get() and will check the
ima_must_measure(). Since it is not supposed to measure it will goto
out.

We should check the i_readcount no matter what since we might be causing
a ToMToU voilation!

This is close to correct, but still not quite perfect. The situation
could have been that root, which was interested in the mesurement opened
and closed the file and another process which is not interested in the
measurement is the one holding the i_readcount ATM. This is just overly
strict on ToMToU violations, which is better than not strict enough...

Signed-off-by: Eric Paris
Acked-by: Mimi Zohar
Signed-off-by: Linus Torvalds

Eric Paris
2010-10-27 02:37:19 +0800
196f51812 IMA: explicit IMA i_flag to remove global lock on inode_delete ... Browse Code »

Currently for every removed inode IMA must take a global lock and search
the IMA rbtree looking for an associated integrity structure. Instead
we explicitly mark an inode when we add an integrity structure so we
only have to take the global lock and do the removal if it exists.

Signed-off-by: Eric Paris
Acked-by: Mimi Zohar
Signed-off-by: Linus Torvalds

Eric Paris
2010-10-27 02:37:19 +0800
64c62f06b IMA: drop refcnt from ima_iint_cache since it isn't needed ... Browse Code »

Since finding a struct ima_iint_cache requires a valid struct inode, and
the struct ima_iint_cache is supposed to have the same lifetime as a
struct inode (technically they die together but don't need to be created
at the same time) we don't have to worry about the ima_iint_cache
outliving or dieing before the inode. So the refcnt isn't useful. Just
get rid of it and free the structure when the inode is freed.

Signed-off-by: Eric Paris
Acked-by: Mimi Zohar
Signed-off-by: Linus Torvalds

Eric Paris
2010-10-27 02:37:19 +0800
bc7d2a3e6 IMA: only allocate iint when needed ... Browse Code »

IMA always allocates an integrity structure to hold information about
every inode, but only needed this structure to track the number of
readers and writers currently accessing a given inode. Since that
information was moved into struct inode instead of the integrity struct
this patch stops allocating the integrity stucture until it is needed.
Thus greatly reducing memory usage.

Signed-off-by: Eric Paris
Acked-by: Mimi Zohar
Signed-off-by: Linus Torvalds

Eric Paris
2010-10-27 02:37:18 +0800
a178d2027 IMA: move read counter into struct inode ... Browse Code »

IMA currently allocated an inode integrity structure for every inode in
core. This stucture is about 120 bytes long. Most files however
(especially on a system which doesn't make use of IMA) will never need
any of this space. The problem is that if IMA is enabled we need to
know information about the number of readers and the number of writers
for every inode on the box. At the moment we collect that information
in the per inode iint structure and waste the rest of the space. This
patch moves those counters into the struct inode so we can eventually
stop allocating an IMA integrity structure except when absolutely
needed.

This patch does the minimum needed to move the location of the data.
Further cleanups, especially the location of counter updates, may still
be possible.

Signed-off-by: Eric Paris
Acked-by: Mimi Zohar
Signed-off-by: Linus Torvalds

Eric Paris
2010-10-27 02:37:18 +0800
b9593d309 IMA: use i_writecount rather than a private counter ... Browse Code »

IMA tracks the number of struct files which are holding a given inode
readonly and the number which are holding the inode write or r/w. It
needs this information so when a new reader or writer comes in it can
tell if this new file will be able to invalidate results it already made
about existing files.

aka if a task is holding a struct file open RO, IMA measured the file
and recorded those measurements and then a task opens the file RW IMA
needs to note in the logs that the old measurement may not be correct.
It's called a "Time of Measure Time of Use" (ToMToU) issue. The same is
true is a RO file is opened to an inode which has an open writer. We
cannot, with any validity, measure the file in question since it could
be changing.

This patch attempts to use the i_writecount field to track writers. The
i_writecount field actually embeds more information in it's value than
IMA needs but it should work for our purposes and allow us to shrink the
struct inode even more.

Signed-off-by: Eric Paris
Acked-by: Mimi Zohar
Signed-off-by: Linus Torvalds

Eric Paris
2010-10-27 02:37:18 +0800
ad16ad00c IMA: use inode->i_lock to protect read and write counters ... Browse Code »

Currently IMA used the iint->mutex to protect the i_readcount and
i_writecount. This patch uses the inode->i_lock since we are going to
start using in inode objects and that is the most appropriate lock.

Signed-off-by: Eric Paris
Acked-by: Mimi Zohar
Signed-off-by: Linus Torvalds

Eric Paris
2010-10-27 02:37:18 +0800
15aac6767 IMA: convert internal flags from long to char ... Browse Code »

The IMA flags is an unsigned long but there is only 1 flag defined.
Lets save a little space and make it a char. This packs nicely next to
the array of u8's.

Signed-off-by: Eric Paris
Acked-by: Mimi Zohar
Signed-off-by: Linus Torvalds

Eric Paris
2010-10-27 02:37:18 +0800
497f32337 IMA: use unsigned int instead of long for counters ... Browse Code »

Currently IMA uses 2 longs in struct inode. To save space (and as it
seems impossible to overflow 32 bits) we switch these to unsigned int.
The switch to unsigned does require slightly different checks for
underflow, but it isn't complex.

Signed-off-by: Eric Paris
Acked-by: Mimi Zohar
Signed-off-by: Linus Torvalds

Eric Paris
2010-10-27 02:37:18 +0800
b575156da IMA: drop the inode opencount since it isn't needed for operation ... Browse Code »

The opencount was used to help debugging to make sure that everything
which created a struct file also correctly made the IMA calls. Since we
moved all of that into the VFS this isn't as necessary. We should be
able to get the same amount of debugging out of just the reader and
write count.

Signed-off-by: Eric Paris
Acked-by: Mimi Zohar
Signed-off-by: Linus Torvalds

Eric Paris
2010-10-27 02:37:17 +0800
854916414 IMA: use rbtree instead of radix tree for inode information cache ... Browse Code »

The IMA code needs to store the number of tasks which have an open fd
granting permission to write a file even when IMA is not in use. It
needs this information in order to be enabled at a later point in time
without losing it's integrity garantees.

At the moment that means we store a little bit of data about every inode
in a cache. We use a radix tree key'd on the inode's memory address.
Dave Chinner pointed out that a radix tree is a terrible data structure
for such a sparse key space. This patch switches to using an rbtree
which should be more efficient.

Bug report from Dave:

"I just noticed that slabtop was reporting an awfully high usage of
radix tree nodes:

OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
4200331 2778082 66% 0.55K 144839 29 2317424K radix_tree_node
2321500 2060290 88% 1.00K 72581 32 2322592K xfs_inode
2235648 2069791 92% 0.12K 69864 32 279456K iint_cache

That is, 2.7M radix tree nodes are allocated, and the cache itself is
consuming 2.3GB of RAM. I know that the XFS inodei caches are indexed
by radix tree node, but for 2 million cached inodes that would mean a
density of 1 inode per radix tree node, which for a system with 16M
inodes in the filsystems is an impossibly low density. The worst I've
seen in a production system like kernel.org is about 20-25% density,
which would mean about 150-200k radix tree nodes for that many inodes.
So it's not the inode cache.

So I looked up what the iint_cache was. It appears to used for
storing per-inode IMA information, and uses a radix tree for indexing.
It uses the *address* of the struct inode as the indexing key. That
means the key space is extremely sparse - for XFS the struct inode
addresses are approximately 1000 bytes apart, which means the closest
the radix tree index keys get is ~1000. Which means that there is a
single entry per radix tree leaf node, so the radix tree is using
roughly 550 bytes for every 120byte structure being cached. For the
above example, it's probably wasting close to 1GB of RAM...."

Reported-by: Dave Chinner
Signed-off-by: Eric Paris
Acked-by: Mimi Zohar
Signed-off-by: Linus Torvalds

Eric Paris
2010-10-27 02:37:17 +0800

26 Oct, 2010

2 commits

be148247c fs: take dcache_lock inside __d_path ... Browse Code »

All callers take dcache_lock just around the call to __d_path, so
take the lock into it in preparation of getting rid of dcache_lock.

Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2010-10-26 09:26:12 +0800
85fe4025c fs: do not assign default i_ino in new_inode ... Browse Code »
43

Instead of always assigning an increasing inode number in new_inode
move the call to assign it into those callers that actually need it.
For now callers that need it is estimated conservatively, that is
the call is added to all filesystems that do not assign an i_ino
by themselves. For a few more filesystems we can avoid assigning
any inode number given that they aren't user visible, and for others
it could be done lazily when an inode number is actually needed,
but that's left for later patches.

Signed-off-by: Christoph Hellwig
Signed-off-by: Dave Chinner
Signed-off-by: Al Viro

Christoph Hellwig
2010-10-26 09:26:11 +0800

23 Oct, 2010

1 commit

092e0e7e5 Merge branch 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl ... Browse Code »

* 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl:
vfs: make no_llseek the default
vfs: don't use BKL in default_llseek
llseek: automatically add .llseek fop
libfs: use generic_file_llseek for simple_attr
mac80211: disallow seeks in minstrel debug code
lirc: make chardev nonseekable
viotape: use noop_llseek
raw: use explicit llseek file operations
ibmasmfs: use generic_file_llseek
spufs: use llseek in all file operations
arm/omap: use generic_file_llseek in iommu_debug
lkdtm: use generic_file_llseek in debugfs
net/wireless: use generic_file_llseek in debugfs
drm: use noop_llseek

Linus Torvalds
2010-10-23 01:52:56 +0800

21 Oct, 2010

3 commits

f0d3d9894 selinux: include vmalloc.h for vmalloc_user ... Browse Code »

Include vmalloc.h for vmalloc_user (fixes ppc build warning).
Acked-by: Eric Paris

Signed-off-by: James Morris

Stephen Rothwell
2010-10-21 07:13:01 +0800
845ca30fe selinux: implement mmap on /selinux/policy ... Browse Code »

/selinux/policy allows a user to copy the policy back out of the kernel.
This patch allows userspace to actually mmap that file and use it directly.

Signed-off-by: Eric Paris
Signed-off-by: James Morris

Eric Paris
2010-10-21 07:12:59 +0800
cee74f47a SELinux: allow userspace to read policy back out of the kernel ... Browse Code »

There is interest in being able to see what the actual policy is that was
loaded into the kernel. The patch creates a new selinuxfs file
/selinux/policy which can be read by userspace. The actual policy that is
loaded into the kernel will be written back out to userspace.

Signed-off-by: Eric Paris
Signed-off-by: James Morris

Eric Paris
2010-10-21 07:12:58 +0800