27 May, 2018
1 commit
-
... and get rid of pointless struct inode *dir argument of those,
while we are at it.Signed-off-by: Al Viro
23 May, 2018
1 commit
-
First of all, calling pid_revalidate() in the end of /* lookups
is *not* about closing any kind of races; that used to be true once
upon a time, but these days those comments are actively misleading.
Especially since pid_revalidate() doesn't even do d_drop() on
failure anymore. It doesn't matter, anyway, since once
pid_revalidate() starts returning false, ->d_delete() of those
dentries starts saying "don't keep"; they won't get stuck in
dcache any longer than they are pinned.These calls cannot be just removed, though - the side effect of
pid_revalidate() (updating i_uid/i_gid/etc.) is what we are calling
it for here.Let's separate the "update ownership" into a new helper (pid_update_inode())
and use it, both in lookups and in pid_revalidate() itself.The comments in pid_revalidate() are also out of date - they refer to
the time when pid_revalidate() used to call d_drop() directly...Signed-off-by: Al Viro
02 Nov, 2017
1 commit
-
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.By default all files without license information are under the default
license of the kernel, which is GPL version 2.Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if
Reviewed-by: Philippe Ombredanne
Reviewed-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman
09 May, 2017
1 commit
-
pid_ns_for_children set by a task is known only to the task itself, and
it's impossible to identify it from outside.It's a big problem for checkpoint/restore software like CRIU, because it
can't correctly handle tasks, that do setns(CLONE_NEWPID) in proccess of
their work.This patch solves the problem, and it exposes pid_ns_for_children to ns
directory in standard way with the name "pid_for_children":~# ls /proc/5531/ns -l | grep pid
lrwxrwxrwx 1 root root 0 Jan 14 16:38 pid -> pid:[4026531836]
lrwxrwxrwx 1 root root 0 Jan 14 16:38 pid_for_children -> pid:[4026532286]Link: http://lkml.kernel.org/r/149201123914.6007.2187327078064239572.stgit@localhost.localdomain
Signed-off-by: Kirill Tkhai
Cc: Andrei Vagin
Cc: Andreas Gruenbacher
Cc: Kees Cook
Cc: Michael Kerrisk
Cc: Al Viro
Cc: Oleg Nesterov
Cc: Paul Moore
Cc: Eric Biederman
Cc: Andy Lutomirski
Cc: Ingo Molnar
Cc: Serge Hallyn
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
15 Nov, 2016
1 commit
-
Pass the file mode of the proc inode to be created to
proc_pid_make_inode. In proc_pid_make_inode, initialize inode->i_mode
before calling security_task_to_inode. This allows selinux to set
isec->sclass right away without introducing "half-initialized" inode
security structs.Signed-off-by: Andreas Gruenbacher
Signed-off-by: Paul Moore
03 May, 2016
1 commit
-
Signed-off-by: Al Viro
17 Feb, 2016
1 commit
-
Introduce the ability to create new cgroup namespace. The newly created
cgroup namespace remembers the cgroup of the process at the point
of creation of the cgroup namespace (referred as cgroupns-root).
The main purpose of cgroup namespace is to virtualize the contents
of /proc/self/cgroup file. Processes inside a cgroup namespace
are only able to see paths relative to their namespace root
(unless they are moved outside of their cgroupns-root, at which point
they will see a relative path from their cgroupns-root).
For a correctly setup container this enables container-tools
(like libcontainer, lxc, lmctfy, etc.) to create completely virtualized
containers without leaking system level cgroup hierarchy to the task.
This patch only implements the 'unshare' part of the cgroupns.Signed-off-by: Aditya Kali
Signed-off-by: Serge Hallyn
Signed-off-by: Tejun Heo
21 Jan, 2016
1 commit
-
By checking the effective credentials instead of the real UID / permitted
capabilities, ensure that the calling process actually intended to use its
credentials.To ensure that all ptrace checks use the correct caller credentials (e.g.
in case out-of-tree code or newly added code omits the PTRACE_MODE_*CREDS
flag), use two new flags and require one of them to be set.The problem was that when a privileged task had temporarily dropped its
privileges, e.g. by calling setreuid(0, user_uid), with the intent to
perform following syscalls with the credentials of a user, it still passed
ptrace access checks that the user would not be able to pass.While an attacker should not be able to convince the privileged task to
perform a ptrace() syscall, this is a problem because the ptrace access
check is reused for things in procfs.In particular, the following somewhat interesting procfs entries only rely
on ptrace access checks:/proc/$pid/stat - uses the check for determining whether pointers
should be visible, useful for bypassing ASLR
/proc/$pid/maps - also useful for bypassing ASLR
/proc/$pid/cwd - useful for gaining access to restricted
directories that contain files with lax permissions, e.g. in
this scenario:
lrwxrwxrwx root root /proc/13020/cwd -> /root/foobar
drwx------ root root /root
drwxr-xr-x root root /root/foobar
-rw-r--r-- root root /root/foobar/secretTherefore, on a system where a root-owned mode 6755 binary changes its
effective credentials as described and then dumps a user-specified file,
this could be used by an attacker to reveal the memory layout of root's
processes or reveal the contents of files he is not allowed to access
(through /proc/$pid/cwd).[akpm@linux-foundation.org: fix warning]
Signed-off-by: Jann Horn
Acked-by: Kees Cook
Cc: Casey Schaufler
Cc: Oleg Nesterov
Cc: Ingo Molnar
Cc: James Morris
Cc: "Serge E. Hallyn"
Cc: Andy Shevchenko
Cc: Andy Lutomirski
Cc: Al Viro
Cc: "Eric W. Biederman"
Cc: Willy Tarreau
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
31 Dec, 2015
1 commit
-
Signed-off-by: Al Viro
09 Dec, 2015
1 commit
-
new method: ->get_link(); replacement of ->follow_link(). The differences
are:
* inode and dentry are passed separately
* might be called both in RCU and non-RCU mode;
the former is indicated by passing it a NULL dentry.
* when called that way it isn't allowed to block
and should return ERR_PTR(-ECHILD) if it needs to be called
in non-RCU mode.It's a flagday change - the old method is gone, all in-tree instances
converted. Conversion isn't hard; said that, so far very few instances
do not immediately bail out when called in RCU mode. That'll change
in the next commits.Signed-off-by: Al Viro
11 May, 2015
2 commits
-
its only use is getting passed to nd_jump_link(), which can obtain
it from current->nameidataSigned-off-by: Al Viro
-
a) instead of storing the symlink body (via nd_set_link()) and returning
an opaque pointer later passed to ->put_link(), ->follow_link() _stores_
that opaque pointer (into void * passed by address by caller) and returns
the symlink body. Returning ERR_PTR() on error, NULL on jump (procfs magic
symlinks) and pointer to symlink body for normal symlinks. Stored pointer
is ignored in all cases except the last one.Storing NULL for opaque pointer (or not storing it at all) means no call
of ->put_link().b) the body used to be passed to ->put_link() implicitly (via nameidata).
Now only the opaque pointer is. In the cases when we used the symlink body
to free stuff, ->follow_link() now should store it as opaque pointer in addition
to returning it.Signed-off-by: Al Viro
16 Apr, 2015
1 commit
-
that's the bulk of filesystem drivers dealing with inodes of their own
Signed-off-by: David Howells
Signed-off-by: Al Viro
11 Dec, 2014
2 commits
-
procfs inodes need only the ns_ops part; nsfs inodes don't need it at all
Signed-off-by: Al Viro
-
New pseudo-filesystem: nsfs. Targets of /proc/*/ns/* live there now.
It's not mountable (not even registered, so it's not in /proc/filesystems,
etc.). Files on it *are* bindable - we explicitly permit that in do_loopback().This stuff lives in fs/nsfs.c now; proc_ns_fget() moved there as well.
get_proc_ns() is a macro now (it's simply returning ->i_private; would
have been an inline, if not for header ordering headache).
proc_ns_inode() is an ex-parrot. The interface used in procfs is
ns_get_path(path, task, ops) and ns_get_name(buf, size, task, ops).Dentries and inodes are never hashed; a non-counting reference to dentry
is stashed in ns_common (removed by ->d_prune()) and reused by ns_get_path()
if present. See ns_get_path()/ns_prune_dentry/nsfs_evict() for details
of that mechanism.As the result, proc_ns_follow_link() has stopped poking in nd->path.mnt;
it does nd_jump_link() on a consistent pair it gets
from ns_get_path().Signed-off-by: Al Viro
05 Dec, 2014
2 commits
-
a) make get_proc_ns() return a pointer to struct ns_common
b) mirror ns_ops in dentry->d_fsdata of ns dentries, so that
is_mnt_ns_file() could get away with fewer dereferences.That way struct proc_ns becomes invisible outside of fs/proc/*.c
Signed-off-by: Al Viro
-
We can do that now. And kill ->inum(), while we are at it - all instances
are identical.Signed-off-by: Al Viro
02 Apr, 2014
1 commit
-
Signed-off-by: Al Viro
16 Nov, 2013
1 commit
-
Rename simple_delete_dentry() to always_delete_dentry() and export it.
Export simple_dentry_operations, while we are at it, and get rid of
their duplicatesSigned-off-by: Al Viro
29 Jun, 2013
2 commits
-
all instances always return ERR_PTR(-E...) or NULL, anyway
Signed-off-by: Al Viro
-
Signed-off-by: Al Viro
02 May, 2013
1 commit
-
Split the proc namespace stuff out into linux/proc_ns.h.
Signed-off-by: David Howells
cc: netdev@vger.kernel.org
cc: Serge E. Hallyn
cc: Eric W. Biederman
Signed-off-by: Al Viro
09 Mar, 2013
1 commit
-
Update proc_ns_follow_link to use nd_jump_link instead of just
manually updating nd.path.dentry.This fixes the BUG_ON(nd->inode != parent->d_inode) reported by Dave
Jones and reproduced trivially with mkdir /proc/self/ns/uts/a.Sigh it looks like the VFS change to require use of nd_jump_link
happend while proc_ns_follow_link was baking and since the common case
of proc_ns_follow_link continued to work without problems the need for
making this change was overlooked.Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman"
20 Nov, 2012
3 commits
-
Assign a unique proc inode to each namespace, and use that
inode number to ensure we only allocate at most one proc
inode for every namespace in proc.A single proc inode per namespace allows userspace to test
to see if two processes are in the same namespace.This has been a long requested feature and only blocked because
a naive implementation would put the id in a global space and
would ultimately require having a namespace for the names of
namespaces, making migration and certain virtualization tricks
impossible.We still don't have per superblock inode numbers for proc, which
appears necessary for application unaware checkpoint/restart and
migrations (if the application is using namespace file descriptors)
but that is now allowd by the design if it becomes important.I have preallocated the ipc and uts initial proc inode numbers so
their structures can be statically initialized.Signed-off-by: Eric W. Biederman
-
Change the proc namespace files into symlinks so that
we won't cache the dentries for the namespace files
which can bypass the ptrace_may_access checks.To support the symlinks create an additional namespace
inode with it's own set of operations distinct from the
proc pid inode and dentry methods as those no longer
make sense.Signed-off-by: Eric W. Biederman
-
This allows entering a user namespace, and the ability
to store a reference to a user namespace with a bind
mount.Addition of missing userns_ns_put in userns_install
from Gao fengAcked-by: Serge Hallyn
Signed-off-by: "Eric W. Biederman"
19 Nov, 2012
2 commits
-
setns support for the mount namespace is a little tricky as an
arbitrary decision must be made about what to set fs->root and
fs->pwd to, as there is no expectation of a relationship between
the two mount namespaces. Therefore I arbitrarily find the root
mount point, and follow every mount on top of it to find the top
of the mount stack. Then I set fs->root and fs->pwd to that
location. The topmost root of the mount stack seems like a
reasonable place to be.Bind mount support for the mount namespace inodes has the
possibility of creating circular dependencies between mount
namespaces. Circular dependencies can result in loops that
prevent mount namespaces from every being freed. I avoid
creating those circular dependencies by adding a sequence number
to the mount namespace and require all bind mounts be of a
younger mount namespace into an older mount namespace.Add a helper function proc_ns_inode so it is possible to
detect when we are attempting to bind mound a namespace inode.Acked-by: Serge Hallyn
Signed-off-by: Eric W. Biederman -
- Pid namespaces are designed to be inescapable so verify that the
passed in pid namespace is a child of the currently active
pid namespace or the currently active pid namespace itself.Allowing the currently active pid namespace is important so
the effects of an earlier setns can be cancelled.Signed-off-by: Eric W. Biederman
14 Jul, 2012
2 commits
-
Just the flags; only NFS cares even about that, but there are
legitimate uses for such argument. And getting rid of that
completely would require splitting ->lookup() into a couple
of methods (at least), so let's leave that alone for now...Signed-off-by: Al Viro
-
Just the lookup flags. Die, bastard, die...
Signed-off-by: Al Viro
29 Mar, 2012
1 commit
-
If CONFIG_NET_NS, CONFIG_UTS_NS and CONFIG_IPC_NS are disabled,
ns_entries[] becomes empty and things like
ns_entries[ARRAY_SIZE(ns_entries) - 1] will explode.Reported-by: Richard Weinberger
Cc: "Eric W. Biederman"
Cc: Daniel Lezcano
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
24 Mar, 2012
1 commit
-
The namespace cleanup path leaks a dentry which holds a reference count
on a network namespace. Keeping that network namespace from being freed
when the last user goes away. Leaving things like vlan devices in the
leaked network namespace.If you use ip netns add for much real work this problem becomes apparent
pretty quickly. It light testing the problem hides because frequently
you simply don't notice the leak.Use d_set_d_op() so that DCACHE_OP_* flags are set correctly.
This issue exists back to 3.0.
Acked-by: "Eric W. Biederman"
Reported-by: Justin Pettit
Signed-off-by: Pravin B Shelar
Signed-off-by: Jesse Gross
Cc: David Miller
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
04 Jan, 2012
1 commit
-
[folded fix for missing magic.h from Tetsuo Handa]
Signed-off-by: Al Viro
16 Jun, 2011
1 commit
-
Don't call iput with the inode half setup to be a namespace filedescriptor.
Instead rearrange the code so that we don't initialize ei->ns_ops until
after I ns_ops->get succeeds, preventing us from invoking ns_ops->put
when ns_ops->get failed.Reported-by: Ingo Saitz
Signed-off-by: Eric W. Biederman
25 May, 2011
1 commit
-
Spotted-by: Nathan Lynch
Signed-off-by: Eric W. Biederman
11 May, 2011
4 commits
-
Acked-by: Daniel Lezcano
Signed-off-by: Eric W. Biederman -
Acked-by: Daniel Lezcano
Signed-off-by: Eric W. Biederman -
Implementing file descriptors for the network namespace
is simple and straight forward.Acked-by: David S. Miller
Acked-by: Daniel Lezcano
Signed-off-by: Eric W. Biederman -
Create files under /proc//ns/ to allow controlling the
namespaces of a process.This addresses three specific problems that can make namespaces hard to
work with.
- Namespaces require a dedicated process to pin them in memory.
- It is not possible to use a namespace unless you are the child
of the original creator.
- Namespaces don't have names that userspace can use to talk about
them.The namespace files under /proc//ns/ can be opened and the
file descriptor can be used to talk about a specific namespace, and
to keep the specified namespace alive.A namespace can be kept alive by either holding the file descriptor
open or bind mounting the file someplace else. aka:
mount --bind /proc/self/ns/net /some/filesystem/path
mount --bind /proc/self/fd/ /some/filesystem/pathThis allows namespaces to be named with userspace policy.
It requires additional support to make use of these filedescriptors
and that will be comming in the following patches.Acked-by: Daniel Lezcano
Signed-off-by: Eric W. Biederman