Eric Lee / smarc-fsl-linux-kernel

17 Dec, 2014

1 commit

603ba7e41 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs pile #2 from Al Viro:
"Next pile (and there'll be one or two more).

The large piece in this one is getting rid of /proc/*/ns/* weirdness;
among other things, it allows to (finally) make nameidata completely
opaque outside of fs/namei.c, making for easier further cleanups in
there"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
coda_venus_readdir(): use file_inode()
fs/namei.c: fold link_path_walk() call into path_init()
path_init(): don't bother with LOOKUP_PARENT in argument
fs/namei.c: new helper (path_cleanup())
path_init(): store the "base" pointer to file in nameidata itself
make default ->i_fop have ->open() fail with ENXIO
make nameidata completely opaque outside of fs/namei.c
kill proc_ns completely
take the targets of /proc/*/ns/* symlinks to separate fs
bury struct proc_ns in fs/proc
copy address of proc_ns_ops into ns_common
new helpers: ns_alloc_inum/ns_free_inum
make proc_ns_operations work with struct ns_common * instead of void *
switch the rest of proc_ns_operations to working with &...->ns
netns: switch ->get()/->put()/->install()/->inum() to working with &net->ns
make mntns ->get()/->put()/->install()/->inum() work with &mnt_ns->ns
common object embedded into various struct ....ns

Linus Torvalds
2014-12-17 07:53:03 +0800

14 Dec, 2014

1 commit

0050ee059 ipc/msg: increase MSGMNI, remove scaling ... Browse Code »

SysV can be abused to allocate locked kernel memory. For most systems, a
small limit doesn't make sense, see the discussion with regards to SHMMAX.

Therefore: increase MSGMNI to the maximum supported.

And: If we ignore the risk of locking too much memory, then an automatic
scaling of MSGMNI doesn't make sense. Therefore the logic can be removed.

The code preserves auto_msgmni to avoid breaking any user space applications
that expect that the value exists.

Notes:
1) If an administrator must limit the memory allocations, then he can set
MSGMNI as necessary.

Or he can disable sysv entirely (as e.g. done by Android).

2) MSGMAX and MSGMNB are intentionally not increased, as these values are used
to control latency vs. throughput:
If MSGMNB is large, then msgsnd() just returns and more messages can be queued
before a task switch to a task that calls msgrcv() is forced.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Manfred Spraul
Cc: Davidlohr Bueso
Cc: Rafael Aquini
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Manfred Spraul
2014-12-14 04:42:52 +0800

05 Dec, 2014

5 commits

33c429405 copy address of proc_ns_ops into ns_common ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2014-12-05 03:34:47 +0800
6344c433a new helpers: ns_alloc_inum/ns_free_inum ... Browse Code »

take struct ns_common *, for now simply wrappers around proc_{alloc,free}_inum()

Signed-off-by: Al Viro

Al Viro
2014-12-05 03:34:36 +0800
64964528b make proc_ns_operations work with struct ns_common * instead of void * ... Browse Code »

We can do that now. And kill ->inum(), while we are at it - all instances
are identical.

Signed-off-by: Al Viro

Al Viro
2014-12-05 03:34:17 +0800
3c0411846 switch the rest of proc_ns_operations to working with &...->ns ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2014-12-05 03:34:11 +0800
435d5f4bb common object embedded into various struct ....ns ... Browse Code »

for now - just move corresponding ->proc_inum instances over there

Acked-by: "Eric W. Biederman"
Signed-off-by: Al Viro

Al Viro
2014-12-05 03:31:00 +0800

30 Jul, 2014

1 commit

728dba3a3 namespaces: Use task_lock and not rcu to protect nsproxy ... Browse Code »

The synchronous syncrhonize_rcu in switch_task_namespaces makes setns
a sufficiently expensive system call that people have complained.

Upon inspect nsproxy no longer needs rcu protection for remote reads.
remote reads are rare. So optimize for same process reads and write
by switching using rask_lock instead.

This yields a simpler to understand lock, and a faster setns system call.

In particular this fixes a performance regression observed
by Rafael David Tinoco .

This is effectively a revert of Pavel Emelyanov's commit
cf7b708c8d1d7a27736771bcf4c457b332b0f818 Make access to task's nsproxy lighter
from 2007. The race this originialy fixed no longer exists as
do_notify_parent uses task_active_pid_ns(parent) instead of
parent->nsproxy.

Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2014-07-30 09:08:50 +0800

12 Sep, 2013

2 commits

32a275001 ipc: drop ipc_lock_by_ptr ... Browse Code »

After previous cleanups and optimizations, this function is no longer
heavily used and we don't have a good reason to keep it. Update the few
remaining callers and get rid of it.

Signed-off-by: Davidlohr Bueso
Cc: Sedat Dilek
Cc: Rik van Riel
Cc: Manfred Spraul
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davidlohr Bueso
2013-09-12 06:59:44 +0800
d9a605e40 ipc: rename ids->rw_mutex ... Browse Code »

Since in some situations the lock can be shared for readers, we shouldn't
be calling it a mutex, rename it to rwsem.

Signed-off-by: Davidlohr Bueso
Tested-by: Sedat Dilek
Cc: Rik van Riel
Cc: Manfred Spraul
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davidlohr Bueso
2013-09-12 06:59:42 +0800

31 Aug, 2013

1 commit

c7b96acf1 userns: Kill nsown_capable it makes the wrong thing easy ... Browse Code »

nsown_capable is a special case of ns_capable essentially for just CAP_SETUID and
CAP_SETGID. For the existing users it doesn't noticably simplify things and
from the suggested patches I have seen it encourages people to do the wrong
thing. So remove nsown_capable.

Acked-by: Serge Hallyn
Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2013-08-31 14:44:11 +0800

02 May, 2013

1 commit

0bb80f240 proc: Split the namespace stuff out into linux/proc_ns.h ... Browse Code »

Split the proc namespace stuff out into linux/proc_ns.h.

Signed-off-by: David Howells
cc: netdev@vger.kernel.org
cc: Serge E. Hallyn
cc: Eric W. Biederman
Signed-off-by: Al Viro

David Howells
2013-05-02 05:29:39 +0800

15 Dec, 2012

1 commit

5e4a08476 userns: Require CAP_SYS_ADMIN for most uses of setns. ... Browse Code »

Andy Lutomirski found a nasty little bug in
the permissions of setns. With unprivileged user namespaces it
became possible to create new namespaces without privilege.

However the setns calls were relaxed to only require CAP_SYS_ADMIN in
the user nameapce of the targed namespace.

Which made the following nasty sequence possible.

pid = clone(CLONE_NEWUSER | CLONE_NEWNS);
if (pid == 0) { /* child */
system("mount --bind /home/me/passwd /etc/passwd");
}
else if (pid != 0) { /* parent */
char path[PATH_MAX];
snprintf(path, sizeof(path), "/proc/%u/ns/mnt");
fd = open(path, O_RDONLY);
setns(fd, 0);
system("su -");
}

Prevent this possibility by requiring CAP_SYS_ADMIN
in the current user namespace when joing all but the user namespace.

Acked-by: Serge Hallyn
Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2012-12-15 08:12:03 +0800

20 Nov, 2012

3 commits

98f842e67 proc: Usable inode numbers for the namespace file descriptors. ... Browse Code »

Assign a unique proc inode to each namespace, and use that
inode number to ensure we only allocate at most one proc
inode for every namespace in proc.

A single proc inode per namespace allows userspace to test
to see if two processes are in the same namespace.

This has been a long requested feature and only blocked because
a naive implementation would put the id in a global space and
would ultimately require having a namespace for the names of
namespaces, making migration and certain virtualization tricks
impossible.

We still don't have per superblock inode numbers for proc, which
appears necessary for application unaware checkpoint/restart and
migrations (if the application is using namespace file descriptors)
but that is now allowd by the design if it becomes important.

I have preallocated the ipc and uts initial proc inode numbers so
their structures can be statically initialized.

Signed-off-by: Eric W. Biederman

Eric W. Biederman
2012-11-20 20:19:49 +0800
bcf58e725 userns: Make create_new_namespaces take a user_ns parameter ... Browse Code »

Modify create_new_namespaces to explicitly take a user namespace
parameter, instead of implicitly through the task_struct.

This allows an implementation of unshare(CLONE_NEWUSER) where
the new user namespace is not stored onto the current task_struct
until after all of the namespaces are created.

Acked-by: Serge Hallyn
Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2012-11-20 20:17:43 +0800
142e1d1d5 userns: Allow unprivileged use of setns. ... Browse Code »

- Push the permission check from the core setns syscall into
the setns install methods where the user namespace of the
target namespace can be determined, and used in a ns_capable
call.

Acked-by: Serge Hallyn
Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2012-11-20 20:17:42 +0800

08 Apr, 2012

1 commit

c4a4d6037 userns: Use cred->user_ns instead of cred->user->user_ns ... Browse Code »

Optimize performance and prepare for the removal of the user_ns reference
from user_struct. Remove the slow long walk through cred->user->user_ns and
instead go straight to cred->user_ns.

Acked-by: Serge Hallyn
Signed-off-by: Eric W. Biederman

Eric W. Biederman
2012-04-08 07:55:51 +0800

11 May, 2011

1 commit

a00eaf11a ns proc: Add support for the ipc namespace ... Browse Code »

Acked-by: Daniel Lezcano
Signed-off-by: Eric W. Biederman

Eric W. Biederman
2011-05-11 05:35:47 +0800

26 Mar, 2011

1 commit

be4d250ab ipcns: fix use after free in free_ipc_ns() ... Browse Code »

commit b515498 ("userns: add a user namespace owner of ipc ns") added a
user namespace owner of ipc ns, but it also introduced a use after free in
free_ipc_ns().

Signed-off-by: Xiaotian Feng
Acked-by: "Serge E. Hallyn"
Acked-by: David Howells
Cc: "Eric W. Biederman"
Cc: Daniel Lezcano
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Xiaotian Feng
2011-03-26 08:45:16 +0800

24 Mar, 2011

2 commits

b0e77598f userns: user namespaces: convert several capable() calls ... Browse Code »

CAP_IPC_OWNER and CAP_IPC_LOCK can be checked against current_user_ns(),
because the resource comes from current's own ipc namespace.

setuid/setgid are to uids in own namespace, so again checks can be against
current_user_ns().

Changelog:
Jan 11: Use task_ns_capable() in place of sched_capable().
Jan 11: Use nsown_capable() as suggested by Bastian Blank.
Jan 11: Clarify (hopefully) some logic in futex and sched.c
Feb 15: use ns_capable for ipc, not nsown_capable
Feb 23: let copy_ipcs handle setting ipc_ns->user_ns
Feb 23: pass ns down rather than taking it from current

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Serge E. Hallyn
Acked-by: "Eric W. Biederman"
Acked-by: Daniel Lezcano
Acked-by: David Howells
Cc: James Morris
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Serge E. Hallyn
2011-03-24 10:47:08 +0800
b515498f5 userns: add a user namespace owner of ipc ns ... Browse Code »

Changelog:
Feb 15: Don't set new ipc->user_ns if we didn't create a new
ipc_ns.
Feb 23: Move extern declaration to ipc_namespace.h, and group
fwd declarations at top.

Signed-off-by: Serge E. Hallyn
Acked-by: "Eric W. Biederman"
Acked-by: Daniel Lezcano
Acked-by: David Howells
Cc: James Morris
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Serge E. Hallyn
2011-03-24 10:47:07 +0800

19 Jun, 2009

3 commits

b4188def4 ipcns: make free_ipc_ns() static ... Browse Code »

Signed-off-by: Alexey Dobriyan
Reviewed-by: WANG Cong
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2009-06-19 04:03:56 +0800
612ce478f ipcns: extract create_ipc_ns() ... Browse Code »

clone_ipc_ns() is misnamed, it doesn't clone anything and doesn't use
passed parameter. Rename it.

create_ipc_ns() will be used by C/R to create fresh ipcns.

Signed-off-by: Alexey Dobriyan
Acked-by: Serge Hallyn
Reviewed-by: WANG Cong
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2009-06-19 04:03:55 +0800
64424289d ipcns: remove useless get/put while CLONE_NEWIPC ... Browse Code »

copy_ipcs() doesn't actually copy anything. If new ipcns is created, it's
created from scratch, in this case get/put on old ipcns isn't needed.

Signed-off-by: Alexey Dobriyan
Acked-by: Serge Hallyn
Reviewed-by: WANG Cong
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2009-06-19 04:03:55 +0800

07 Apr, 2009

2 commits

7eafd7c74 namespaces: ipc namespaces: implement support for posix msqueues ... Browse Code »

Implement multiple mounts of the mqueue file system, and link it to usage
of CLONE_NEWIPC.

Each ipc ns has a corresponding mqueuefs superblock. When a user does
clone(CLONE_NEWIPC) or unshare(CLONE_NEWIPC), the unshare will cause an
internal mount of a new mqueuefs sb linked to the new ipc ns.

When a user does 'mount -t mqueue mqueue /dev/mqueue', he mounts the
mqueuefs superblock.

Posix message queues can be worked with both through the mq_* system calls
(see mq_overview(7)), and through the VFS through the mqueue mount. Any
usage of mq_open() and friends will work with the acting task's ipc
namespace. Any actions through the VFS will work with the mqueuefs in
which the file was created. So if a user doesn't remount mqueuefs after
unshare(CLONE_NEWIPC), mq_open("/ab") will not be reflected in "ls
/dev/mqueue".

If task a mounts mqueue for ipc_ns:1, then clones task b with a new ipcns,
ipcns:2, and then task a is the last task in ipc_ns:1 to exit, then (1)
ipc_ns:1 will be freed, (2) it's superblock will live on until task b
umounts the corresponding mqueuefs, and vfs actions will continue to
succeed, but (3) sb->s_fs_info will be NULL for the sb corresponding to
the deceased ipc_ns:1.

To make this happen, we must protect the ipc reference count when

a) a task exits and drops its ipcns->count, since it might be dropping
it to 0 and freeing the ipcns

b) a task accesses the ipcns through its mqueuefs interface, since it
bumps the ipcns refcount and might race with the last task in the ipcns
exiting.

So the kref is changed to an atomic_t so we can use
atomic_dec_and_lock(&ns->count,mq_lock), and every access to the ipcns
through ns = mqueuefs_sb->s_fs_info is protected by the same lock.

Signed-off-by: Cedric Le Goater
Signed-off-by: Serge E. Hallyn
Cc: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Serge E. Hallyn
2009-04-07 23:31:09 +0800
614b84cf4 namespaces: mqueue ns: move mqueue_mnt into struct ipc_namespace ... Browse Code »

Move mqueue vfsmount plus a few tunables into the ipc_namespace struct.
The CONFIG_IPC_NS boolean and the ipc_namespace struct will serve both the
posix message queue namespaces and the SYSV ipc namespaces.

The sysctl code will be fixed separately in patch 3. After just this
patch, making a change to posix mqueue tunables always changes the values
in the initial ipc namespace.

Signed-off-by: Cedric Le Goater
Signed-off-by: Serge E. Hallyn
Cc: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Serge E. Hallyn
2009-04-07 23:31:09 +0800

29 Apr, 2008

3 commits

e2c284d8a ipc: recompute msgmni on ipc namespace creation/removal ... Browse Code »

Introduce a notification mechanism that aims at recomputing msgmni each time
an ipc namespace is created or removed.

The ipc namespace notifier chain already defined for memory hotplug management
is used for that purpose too.

Each time a new ipc namespace is allocated or an existing ipc namespace is
removed, the ipcns notifier chain is notified. The callback routine for each
registered ipc namespace is then activated in order to recompute msgmni for
that namespace.

Signed-off-by: Nadia Derbey
Cc: Yasunori Goto
Cc: Matt Helsley
Cc: Mingming Cao
Cc: Pierre Peiffer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nadia Derbey
2008-04-29 23:06:12 +0800
b6b337ad1 ipc: recompute msgmni on memory add / remove ... Browse Code »

Introduce the registration of a callback routine that recomputes msg_ctlmni
upon memory add / remove.

A single notifier block is registered in the hotplug memory chain for all the
ipc namespaces.

Since the ipc namespaces are not linked together, they have their own
notification chain: one notifier_block is defined per ipc namespace.

Each time an ipc namespace is created (removed) it registers (unregisters) its
notifier block in (from) the ipcns chain. The callback routine registered in
the memory chain invokes the ipcns notifier chain with the IPCNS_LOWMEM event.
Each callback routine registered in the ipcns namespace, in turn, recomputes
msgmni for the owning namespace.

Signed-off-by: Nadia Derbey
Cc: Yasunori Goto
Cc: Matt Helsley
Cc: Mingming Cao
Cc: Pierre Peiffer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nadia Derbey
2008-04-29 23:06:12 +0800
4d89dc6ab ipc: scale msgmni to the number of ipc namespaces ... Browse Code »

Since all the namespaces see the same amount of memory (the total one) this
patch introduces a new variable that counts the ipc namespaces and divides
msg_ctlmni by this counter.

Signed-off-by: Nadia Derbey
Cc: Yasunori Goto
Cc: Matt Helsley
Cc: Mingming Cao
Cc: Pierre Peiffer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nadia Derbey
2008-04-29 23:06:12 +0800

09 Feb, 2008

3 commits

01b8b07a5 IPC: consolidate sem_exit_ns(), msg_exit_ns() and shm_exit_ns() ... Browse Code »

sem_exit_ns(), msg_exit_ns() and shm_exit_ns() are all called when an
ipc_namespace is released to free all ipcs of each type. But in fact, they
do the same thing: they loop around all ipcs to free them individually by
calling a specific routine.

This patch proposes to consolidate this by introducing a common function,
free_ipcs(), that do the job. The specific routine to call on each
individual ipcs is passed as parameter. For this, these ipc-specific
'free' routines are reworked to take a generic 'struct ipc_perm' as
parameter.

Signed-off-by: Pierre Peiffer
Cc: Cedric Le Goater
Cc: Pavel Emelyanov
Cc: Nadia Derbey
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pierre Peiffer
2008-02-09 01:22:26 +0800
ed2ddbf88 IPC: make struct ipc_ids static in ipc_namespace ... Browse Code »

Each ipc_namespace contains a table of 3 pointers to struct ipc_ids (3 for
msg, sem and shm, structure used to store all ipcs) These 'struct ipc_ids'
are dynamically allocated for each icp_namespace as the ipc_namespace
itself (for the init namespace, they are initialized with pointers to
static variables instead)

It is so for historical reason: in fact, before the use of idr to store the
ipcs, the ipcs were stored in tables of variable length, depending of the
maximum number of ipc allowed. Now, these 'struct ipc_ids' have a fixed
size. As they are allocated in any cases for each new ipc_namespace, there
is no gain of memory in having them allocated separately of the struct
ipc_namespace.

This patch proposes to make this table static in the struct ipc_namespace.
Thus, we can allocate all in once and get rid of all the code needed to
allocate and free these ipc_ids separately.

Signed-off-by: Pierre Peiffer
Acked-by: Cedric Le Goater
Cc: Pavel Emelyanov
Cc: Nadia Derbey
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pierre Peiffer
2008-02-09 01:22:26 +0800
ae5e1b22f namespaces: move the IPC namespace under IPC_NS option ... Browse Code »

Currently the IPC namespace management code is spread over the ipc/*.c files.
I moved this code into ipc/namespace.c file which is compiled out when needed.

The linux/ipc_namespace.h file is used to store the prototypes of the
functions in namespace.c and the stubs for NAMESPACES=n case. This is done
so, because the stub for copy_ipc_namespace requires the knowledge of the
CLONE_NEWIPC flag, which is in sched.h. But the linux/ipc.h file itself in
included into many many .c files via the sys.h->sem.h sequence so adding the
sched.h into it will make all these .c depend on sched.h which is not that
good. On the other hand the knowledge about the namespaces stuff is required
in 4 .c files only.

Besides, this patch compiles out some auxiliary functions from ipc/sem.c,
msg.c and shm.c files. It turned out that moving these functions into
namespaces.c is not that easy because they use many other calls and macros
from the original file. Moving them would make this patch complicated. On
the other hand all these functions can be consolidated, so I will send a
separate patch doing this a bit later.

Signed-off-by: Pavel Emelyanov
Acked-by: Serge Hallyn
Cc: Cedric Le Goater
Cc: "Eric W. Biederman"
Cc: Herbert Poetzl
Cc: Kirill Korotaev
Cc: Sukadev Bhattiprolu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Emelyanov
2008-02-09 01:22:23 +0800