14 Apr, 2009
1 commit
-
| ipc/mq_sysctl.c:26: warning: 'get_mq' defined but not used
Signed-off-by: Geert Uytterhoeven
Acked-by: Serge Hallyn
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
07 Apr, 2009
3 commits
-
Largely inspired from ipc/ipc_sysctl.c. This patch isolates the mqueue
sysctl stuff in its own file.[akpm@linux-foundation.org: build fix]
Signed-off-by: Cedric Le Goater
Signed-off-by: Nadia Derbey
Signed-off-by: Serge E. Hallyn
Cc: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Implement multiple mounts of the mqueue file system, and link it to usage
of CLONE_NEWIPC.Each ipc ns has a corresponding mqueuefs superblock. When a user does
clone(CLONE_NEWIPC) or unshare(CLONE_NEWIPC), the unshare will cause an
internal mount of a new mqueuefs sb linked to the new ipc ns.When a user does 'mount -t mqueue mqueue /dev/mqueue', he mounts the
mqueuefs superblock.Posix message queues can be worked with both through the mq_* system calls
(see mq_overview(7)), and through the VFS through the mqueue mount. Any
usage of mq_open() and friends will work with the acting task's ipc
namespace. Any actions through the VFS will work with the mqueuefs in
which the file was created. So if a user doesn't remount mqueuefs after
unshare(CLONE_NEWIPC), mq_open("/ab") will not be reflected in "ls
/dev/mqueue".If task a mounts mqueue for ipc_ns:1, then clones task b with a new ipcns,
ipcns:2, and then task a is the last task in ipc_ns:1 to exit, then (1)
ipc_ns:1 will be freed, (2) it's superblock will live on until task b
umounts the corresponding mqueuefs, and vfs actions will continue to
succeed, but (3) sb->s_fs_info will be NULL for the sb corresponding to
the deceased ipc_ns:1.To make this happen, we must protect the ipc reference count when
a) a task exits and drops its ipcns->count, since it might be dropping
it to 0 and freeing the ipcnsb) a task accesses the ipcns through its mqueuefs interface, since it
bumps the ipcns refcount and might race with the last task in the ipcns
exiting.So the kref is changed to an atomic_t so we can use
atomic_dec_and_lock(&ns->count,mq_lock), and every access to the ipcns
through ns = mqueuefs_sb->s_fs_info is protected by the same lock.Signed-off-by: Cedric Le Goater
Signed-off-by: Serge E. Hallyn
Cc: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Move mqueue vfsmount plus a few tunables into the ipc_namespace struct.
The CONFIG_IPC_NS boolean and the ipc_namespace struct will serve both the
posix message queue namespaces and the SYSV ipc namespaces.The sysctl code will be fixed separately in patch 3. After just this
patch, making a change to posix mqueue tunables always changes the values
in the initial ipc namespace.Signed-off-by: Cedric Le Goater
Signed-off-by: Serge E. Hallyn
Cc: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
03 Apr, 2009
3 commits
-
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
Remove two unneeded exports and make two symbols static in fs/mpage.c
Cleanup after commit 585d3bc06f4ca57f975a5a1f698f65a45ea66225
Trim includes of fdtable.h
Don't crap into descriptor table in binfmt_som
Trim includes in binfmt_elf
Don't mess with descriptor table in load_elf_binary()
Get rid of indirect include of fs_struct.h
New helper - current_umask()
check_unsafe_exec() doesn't care about signal handlers sharing
New locking/refcounting for fs_struct
Take fs_struct handling to new file (fs/fs_struct.c)
Get rid of bumping fs_struct refcount in pivot_root(2)
Kill unsharing fs_struct in __set_personality() -
As pointed out by Cedric Le Goater (in response to Alexey's original
comment wrt mqns), ipc_sysctl.c and utsname_sysctl.c are using
CONFIG_PROC_FS, not CONFIG_PROC_SYSCTL, to determine whether to define
the proc_handlers. Change that.Signed-off-by: Serge E. Hallyn
Cc: Cedric Le Goater
Acked-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
shm_get_stat() assumes idr_find(&shm_ids(ns).ipcs_idr) returns "struct
shmid_kernel *"; all other callers assume that it returns "struct
kern_ipc_perm *". This works because "struct kern_ipc_perm" is currently
the first member of "struct shmid_kernel", but it would be better to use
container_of() to prevent future breakage.Signed-off-by: Tony Battersby
Cc: Jiri Olsa
Cc: Jiri Kosina
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
01 Apr, 2009
1 commit
-
current->fs->umask is what most of fs_struct users are doing.
Put that into a helper function.Signed-off-by: Al Viro
27 Mar, 2009
1 commit
-
* 'bkl-removal' of git://git.lwn.net/linux-2.6:
Rationalize fasync return values
Move FASYNC bit handling to f_op->fasync()
Use f_lock to protect f_flags
Rename struct file->f_ep_lock
24 Mar, 2009
1 commit
16 Mar, 2009
1 commit
-
Traditionally, changes to struct file->f_flags have been done under BKL
protection, or with no protection at all. This patch causes all f_flags
changes after file open/creation time to be done under protection of
f_lock. This allows the removal of some BKL usage and fixes a number of
longstanding (if microscopic) races.Reviewed-by: Christoph Hellwig
Cc: Al Viro
Signed-off-by: Jonathan Corbet
11 Feb, 2009
1 commit
-
When overcommit is disabled, the core VM accounts for pages used by anonymous
shared, private mappings and special mappings. It keeps track of VMAs that
should be accounted for with VM_ACCOUNT and VMAs that never had a reserve
with VM_NORESERVE.Overcommit for hugetlbfs is much riskier than overcommit for base pages
due to contiguity requirements. It avoids overcommiting on both shared and
private mappings using reservation counters that are checked and updated
during mmap(). This ensures (within limits) that hugepages exist in the
future when faults occurs or it is too easy to applications to be SIGKILLed.As hugetlbfs makes its own reservations of a different unit to the base page
size, VM_ACCOUNT should never be set. Even if the units were correct, we would
double account for the usage in the core VM and hugetlbfs. VM_NORESERVE may
be set because an application can request no reserves be made for hugetlbfs
at the risk of getting killed later.With commit fc8744adc870a8d4366908221508bb113d8b72ee, VM_NORESERVE and
VM_ACCOUNT are getting unconditionally set for hugetlbfs-backed mappings. This
breaks the accounting for both the core VM and hugetlbfs, can trigger an
OOM storm when hugepage pools are too small lockups and corrupted counters
otherwise are used. This patch brings hugetlbfs more in line with how the
core VM treats VM_NORESERVE but prevents VM_ACCOUNT being set.Signed-off-by: Mel Gorman
Signed-off-by: Linus Torvalds
06 Feb, 2009
3 commits
-
Conflicts:
fs/namei.cManually merged per:
diff --cc fs/namei.c
index 734f2b5,bbc15c2..0000000
--- a/fs/namei.c
+++ b/fs/namei.c
@@@ -860,9 -848,8 +849,10 @@@ static int __link_path_walk(const char
nd->flags |= LOOKUP_CONTINUE;
err = exec_permission_lite(inode);
if (err == -EAGAIN)
- err = vfs_permission(nd, MAY_EXEC);
+ err = inode_permission(nd->path.dentry->d_inode,
+ MAY_EXEC);
+ if (!err)
+ err = ima_path_check(&nd->path, MAY_EXEC);
if (err)
break;@@@ -1525,14 -1506,9 +1509,14 @@@ int may_open(struct path *path, int acc
flag &= ~O_TRUNC;
}- error = vfs_permission(nd, acc_mode);
+ error = inode_permission(inode, acc_mode);
if (error)
return error;
+
- error = ima_path_check(&nd->path,
++ error = ima_path_check(path,
+ acc_mode & (MAY_READ | MAY_WRITE | MAY_EXEC));
+ if (error)
+ return error;
/*
* An append-only file must be opened in append mode for writing.
*/Signed-off-by: James Morris
-
The number of calls to ima_path_check()/ima_file_free()
should be balanced. An extra call to fput(), indicates
the file could have been accessed without first being
measured.Although f_count is incremented/decremented in places other
than fget/fput, like fget_light/fput_light and get_file, the
current task must already hold a file refcnt. The call to
__fput() is delayed until the refcnt becomes 0, resulting
in ima_file_free() flagging any changes.- add hook to increment opencount for IPC shared memory(SYSV),
shmat files, and /dev/zero
- moved NULL iint test in opencount_get()Signed-off-by: Mimi Zohar
Acked-by: Serge Hallyn
Signed-off-by: James Morris -
shm_get_stat() assumes that the inode is a "struct shmem_inode_info",
which is incorrect for !CONFIG_SHMEM (see fs/ramfs/inode.c:
ramfs_get_inode() vs. mm/shmem.c: shmem_get_inode()).This bad assumption can cause shmctl(SHM_INFO) to lockup when
shm_get_stat() tries to spin_lock(&info->lock). Users of !CONFIG_SHMEM
may encounter this lockup simply by invoking the 'ipcs' command.Reported by Jiri Olsa back in February 2008:
http://lkml.org/lkml/2008/2/29/74Signed-off-by: Tony Battersby
Cc: Jiri Kosina
Reported-by: Jiri Olsa
Cc: Hugh Dickins
Cc: [2.6.everything]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
01 Feb, 2009
1 commit
-
The mmap_region() code would temporarily set the VM_ACCOUNT flag for
anonymous shared mappings just to inform shmem_zero_setup() that it
should enable accounting for the resulting shm object. It would then
clear the flag after calling ->mmap (for the /dev/zero case) or doing
shmem_zero_setup() (for the MAP_ANON case).This just resulted in vma merge issues, but also made for just
unnecessary confusion. Use the already-existing VM_NORESERVE flag for
this instead, and let shmem_{zero|file}_setup() just figure it out from
that.This also happens to make it obvious that the new DRI2 GEM layer uses a
non-reserving backing store for its object allocation - which is quite
possibly not intentional. But since I didn't want to change semantics
in this patch, I left it alone, and just updated the caller to use the
new flag semantics.Signed-off-by: Linus Torvalds
14 Jan, 2009
5 commits
-
Signed-off-by: Heiko Carstens
-
Signed-off-by: Heiko Carstens
-
Signed-off-by: Heiko Carstens
-
System calls with an unsigned long long argument can't be converted with
the standard wrappers since that would include a cast to long, which in
turn means that we would lose the upper 32 bit on 32 bit architectures.
Also semctl can't use the standard wrapper since it has a 'union'
parameter.So we handle them as special case and add some extra wrappers instead.
Signed-off-by: Heiko Carstens
-
Convert all system calls to return a long. This should be a NOP since all
converted types should have the same size anyway.
With the exception of sys_exit_group which returned void. But that doesn't
matter since the system call doesn't return.Signed-off-by: Heiko Carstens
10 Jan, 2009
1 commit
-
* git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-nommu:
NOMMU: Support XIP on initramfs
NOMMU: Teach kobjsize() about VMA regions.
FLAT: Don't attempt to expand the userspace stack to fill the space allocated
FDPIC: Don't attempt to expand the userspace stack to fill the space allocated
NOMMU: Improve procfs output using per-MM VMAs
NOMMU: Make mmap allocation page trimming behaviour configurable.
NOMMU: Make VMAs per MM as for MMU-mode linux
NOMMU: Delete askedalloc and realalloc variables
NOMMU: Rename ARM's struct vm_region
NOMMU: Fix cleanup handling in ramfs_nommu_get_umapped_area()
09 Jan, 2009
1 commit
-
If a process registers for asynchronous notification on a POSIX message
queue, it gets a signal and a siginfo_t structure when a message arrives
on the message queue. The si_pid in the siginfo_t structure is set to the
PID of the process that sent the message to the message queue.The principle is the following:
. when mq_notify(SIGEV_SIGNAL) is called, the caller registers for
notification when a msg arrives. The associated pid structure is stroed into
inode_info->notify_owner. Let's call this process P1.
. when mq_send() is called by say P2, P2 sends a signal to P1 to notify
him about msg arrival.The way .si_pid is set today is not correct, since it doesn't take into account
the fact that the process that is sending the message might not be in the
same namespace as the notified one.This patch proposes to set si_pid to the sender's pid into the notify_owner
namespace.Signed-off-by: Nadia Derbey
Signed-off-by: Sukadev Bhattiprolu
Acked-by: Oleg Nesterov
Cc: Roland McGrath
Cc: Bastian Blank
Cc: Pavel Emelyanov
Cc: Eric W. Biederman
Acked-by: Serge Hallyn
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
08 Jan, 2009
1 commit
-
Make VMAs per mm_struct as for MMU-mode linux. This solves two problems:
(1) In SYSV SHM where nattch for a segment does not reflect the number of
shmat's (and forks) done.(2) In mmap() where the VMA's vm_mm is set to point to the parent mm by an
exec'ing process when VM_EXECUTABLE is specified, regardless of the fact
that a VMA might be shared and already have its vm_mm assigned to another
process or a dead process.A new struct (vm_region) is introduced to track a mapped region and to remember
the circumstances under which it may be shared and the vm_list_struct structure
is discarded as it's no longer required.This patch makes the following additional changes:
(1) Regions are now allocated with alloc_pages() rather than kmalloc() and
with no recourse to __GFP_COMP, so the pages are not composite. Instead,
each page has a reference on it held by the region. Anything else that is
interested in such a page will have to get a reference on it to retain it.
When the pages are released due to unmapping, each page is passed to
put_page() and will be freed when the page usage count reaches zero.(2) Excess pages are trimmed after an allocation as the allocation must be
made as a power-of-2 quantity of pages.(3) VMAs are added to the parent MM's R/B tree and mmap lists. As an MM may
end up with overlapping VMAs within the tree, the VMA struct address is
appended to the sort key.(4) Non-anonymous VMAs are now added to the backing inode's prio list.
(5) Holes may be punched in anonymous VMAs with munmap(), releasing parts of
the backing region. The VMA and region structs will be split if
necessary.(6) sys_shmdt() only releases one attachment to a SYSV IPC shared memory
segment instead of all the attachments at that addresss. Multiple
shmat()'s return the same address under NOMMU-mode instead of different
virtual addresses as under MMU-mode.(7) Core dumping for ELF-FDPIC requires fewer exceptions for NOMMU-mode.
(8) /proc/maps is now the global list of mapped regions, and may list bits
that aren't actually mapped anywhere.(9) /proc/meminfo gains a line (tagged "MmapCopy") that indicates the amount
of RAM currently allocated by mmap to hold mappable regions that can't be
mapped directly. These are copies of the backing device or file if not
anonymous.These changes make NOMMU mode more similar to MMU mode. The downside is that
NOMMU mode requires some extra memory to track things over NOMMU without this
patch (VMAs are no longer shared, and there are now region structs).Signed-off-by: David Howells
Tested-by: Mike Frysinger
Acked-by: Paul Mundt
07 Jan, 2009
3 commits
-
proc_ipcauto_dointvec_minmax() is the only user of ipc_auto_callback(),
since the former function is protected by CONFIG_PROC_FS, so should be the
latter one.Just move its definition down.
Signed-off-by: WANG Cong
Cc: Eric Biederman
Cc: Nadia Derbey
Cc: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Denis V. Lunev
Reviewed-by: WANG Cong
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Use the macro shm_ids().
Remove useless check for a userspace pointer, because copy_to_user()
will check it.Some style cleanups.
Signed-off-by: WANG Cong
Cc: Nadia Derbey
Cc: Pierre Peiffer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
06 Jan, 2009
3 commits
-
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
inotify: fix type errors in interfaces
fix breakage in reiserfs_new_inode()
fix the treatment of jfs special inodes
vfs: remove duplicate code in get_fs_type()
add a vfs_fsync helper
sys_execve and sys_uselib do not call into fsnotify
zero i_uid/i_gid on inode allocation
inode->i_op is never NULL
ntfs: don't NULL i_op
isofs check for NULL ->i_op in root directory is dead code
affs: do not zero ->i_op
kill suid bit only for regular files
vfs: lseek(fd, 0, SEEK_CUR) race condition -
Signed-off-by: Alan Cox
Signed-off-by: Linus Torvalds -
... and don't bother in callers. Don't bother with zeroing i_blocks,
while we are at it - it's already been zeroed.i_mode is not worth the effort; it has no common default value.
Signed-off-by: Al Viro
05 Jan, 2009
6 commits
-
* don't bother with allocations
* don't do double copy_from_user()
* don't duplicate parts of check for audit_dummy_context()Signed-off-by: Al Viro
-
* logging the original value of *msg_prio in mq_timedreceive(2)
is insane - the argument is write-only (i.e. syscall always
ignores the original value and only overwrites it).
* merge __audit_mq_timed{send,receive}
* don't do copy_from_user() twice
* don't mess with allocations in auditsc part
* ... and don't bother checking !audit_enabled and !context in there -
we'd already checked for audit_dummy_context().Signed-off-by: Al Viro
-
* don't copy_from_user() twice
* don't bother with allocations
* don't duplicate parts of audit_dummy_context()
* make it return voidSigned-off-by: Al Viro
-
* get rid of allocations
* make it return void
* don't duplicate parts of audit_dummy_context()Signed-off-by: Al Viro
-
* get rid of allocations
* make it return void
* simplify callersSigned-off-by: Al Viro
-
* get rid of allocations
* make it return void
* simplify callersSigned-off-by: Al Viro
04 Dec, 2008
1 commit
-
Conflicts:
fs/nfsd/nfs4recover.cManually fixed above to use new creds API functions, e.g.
nfs4_save_creds().Signed-off-by: James Morris
20 Nov, 2008
1 commit
-
A problem was found while reviewing the code after Bugzilla bug
http://bugzilla.kernel.org/show_bug.cgi?id=11796.In ipc_addid(), the newly allocated ipc structure is inserted into the
ipcs tree (i.e made visible to readers) without locking it. This is not
correct since its initialization continues after it has been inserted in
the tree.This patch moves the ipc structure lock initialization + locking before
the actual insertion.Signed-off-by: Nadia Derbey
Reported-by: Clement Calmels
Cc: Manfred Spraul
Cc: [2.6.27.x]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
14 Nov, 2008
2 commits
-
Pass credentials through dentry_open() so that the COW creds patch can have
SELinux's flush_unauthorized_files() pass the appropriate creds back to itself
when it opens its null chardev.The security_dentry_open() call also now takes a creds pointer, as does the
dentry_open hook in struct security_operations.Signed-off-by: David Howells
Acked-by: James Morris
Signed-off-by: James Morris -
Wrap current->cred and a few other accessors to hide their actual
implementation.Signed-off-by: David Howells
Acked-by: James Morris
Acked-by: Serge Hallyn
Signed-off-by: James Morris