27 May, 2018

1 commit


23 May, 2018

1 commit

  • First of all, calling pid_revalidate() in the end of /* lookups
    is *not* about closing any kind of races; that used to be true once
    upon a time, but these days those comments are actively misleading.
    Especially since pid_revalidate() doesn't even do d_drop() on
    failure anymore. It doesn't matter, anyway, since once
    pid_revalidate() starts returning false, ->d_delete() of those
    dentries starts saying "don't keep"; they won't get stuck in
    dcache any longer than they are pinned.

    These calls cannot be just removed, though - the side effect of
    pid_revalidate() (updating i_uid/i_gid/etc.) is what we are calling
    it for here.

    Let's separate the "update ownership" into a new helper (pid_update_inode())
    and use it, both in lookups and in pid_revalidate() itself.

    The comments in pid_revalidate() are also out of date - they refer to
    the time when pid_revalidate() used to call d_drop() directly...

    Signed-off-by: Al Viro

    Al Viro
     

02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

09 May, 2017

1 commit

  • pid_ns_for_children set by a task is known only to the task itself, and
    it's impossible to identify it from outside.

    It's a big problem for checkpoint/restore software like CRIU, because it
    can't correctly handle tasks, that do setns(CLONE_NEWPID) in proccess of
    their work.

    This patch solves the problem, and it exposes pid_ns_for_children to ns
    directory in standard way with the name "pid_for_children":

    ~# ls /proc/5531/ns -l | grep pid
    lrwxrwxrwx 1 root root 0 Jan 14 16:38 pid -> pid:[4026531836]
    lrwxrwxrwx 1 root root 0 Jan 14 16:38 pid_for_children -> pid:[4026532286]

    Link: http://lkml.kernel.org/r/149201123914.6007.2187327078064239572.stgit@localhost.localdomain
    Signed-off-by: Kirill Tkhai
    Cc: Andrei Vagin
    Cc: Andreas Gruenbacher
    Cc: Kees Cook
    Cc: Michael Kerrisk
    Cc: Al Viro
    Cc: Oleg Nesterov
    Cc: Paul Moore
    Cc: Eric Biederman
    Cc: Andy Lutomirski
    Cc: Ingo Molnar
    Cc: Serge Hallyn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill Tkhai
     

15 Nov, 2016

1 commit

  • Pass the file mode of the proc inode to be created to
    proc_pid_make_inode. In proc_pid_make_inode, initialize inode->i_mode
    before calling security_task_to_inode. This allows selinux to set
    isec->sclass right away without introducing "half-initialized" inode
    security structs.

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Paul Moore

    Andreas Gruenbacher
     

03 May, 2016

1 commit


17 Feb, 2016

1 commit

  • Introduce the ability to create new cgroup namespace. The newly created
    cgroup namespace remembers the cgroup of the process at the point
    of creation of the cgroup namespace (referred as cgroupns-root).
    The main purpose of cgroup namespace is to virtualize the contents
    of /proc/self/cgroup file. Processes inside a cgroup namespace
    are only able to see paths relative to their namespace root
    (unless they are moved outside of their cgroupns-root, at which point
    they will see a relative path from their cgroupns-root).
    For a correctly setup container this enables container-tools
    (like libcontainer, lxc, lmctfy, etc.) to create completely virtualized
    containers without leaking system level cgroup hierarchy to the task.
    This patch only implements the 'unshare' part of the cgroupns.

    Signed-off-by: Aditya Kali
    Signed-off-by: Serge Hallyn
    Signed-off-by: Tejun Heo

    Aditya Kali
     

21 Jan, 2016

1 commit

  • By checking the effective credentials instead of the real UID / permitted
    capabilities, ensure that the calling process actually intended to use its
    credentials.

    To ensure that all ptrace checks use the correct caller credentials (e.g.
    in case out-of-tree code or newly added code omits the PTRACE_MODE_*CREDS
    flag), use two new flags and require one of them to be set.

    The problem was that when a privileged task had temporarily dropped its
    privileges, e.g. by calling setreuid(0, user_uid), with the intent to
    perform following syscalls with the credentials of a user, it still passed
    ptrace access checks that the user would not be able to pass.

    While an attacker should not be able to convince the privileged task to
    perform a ptrace() syscall, this is a problem because the ptrace access
    check is reused for things in procfs.

    In particular, the following somewhat interesting procfs entries only rely
    on ptrace access checks:

    /proc/$pid/stat - uses the check for determining whether pointers
    should be visible, useful for bypassing ASLR
    /proc/$pid/maps - also useful for bypassing ASLR
    /proc/$pid/cwd - useful for gaining access to restricted
    directories that contain files with lax permissions, e.g. in
    this scenario:
    lrwxrwxrwx root root /proc/13020/cwd -> /root/foobar
    drwx------ root root /root
    drwxr-xr-x root root /root/foobar
    -rw-r--r-- root root /root/foobar/secret

    Therefore, on a system where a root-owned mode 6755 binary changes its
    effective credentials as described and then dumps a user-specified file,
    this could be used by an attacker to reveal the memory layout of root's
    processes or reveal the contents of files he is not allowed to access
    (through /proc/$pid/cwd).

    [akpm@linux-foundation.org: fix warning]
    Signed-off-by: Jann Horn
    Acked-by: Kees Cook
    Cc: Casey Schaufler
    Cc: Oleg Nesterov
    Cc: Ingo Molnar
    Cc: James Morris
    Cc: "Serge E. Hallyn"
    Cc: Andy Shevchenko
    Cc: Andy Lutomirski
    Cc: Al Viro
    Cc: "Eric W. Biederman"
    Cc: Willy Tarreau
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jann Horn
     

31 Dec, 2015

1 commit


09 Dec, 2015

1 commit

  • new method: ->get_link(); replacement of ->follow_link(). The differences
    are:
    * inode and dentry are passed separately
    * might be called both in RCU and non-RCU mode;
    the former is indicated by passing it a NULL dentry.
    * when called that way it isn't allowed to block
    and should return ERR_PTR(-ECHILD) if it needs to be called
    in non-RCU mode.

    It's a flagday change - the old method is gone, all in-tree instances
    converted. Conversion isn't hard; said that, so far very few instances
    do not immediately bail out when called in RCU mode. That'll change
    in the next commits.

    Signed-off-by: Al Viro

    Al Viro
     

11 May, 2015

2 commits

  • its only use is getting passed to nd_jump_link(), which can obtain
    it from current->nameidata

    Signed-off-by: Al Viro

    Al Viro
     
  • a) instead of storing the symlink body (via nd_set_link()) and returning
    an opaque pointer later passed to ->put_link(), ->follow_link() _stores_
    that opaque pointer (into void * passed by address by caller) and returns
    the symlink body. Returning ERR_PTR() on error, NULL on jump (procfs magic
    symlinks) and pointer to symlink body for normal symlinks. Stored pointer
    is ignored in all cases except the last one.

    Storing NULL for opaque pointer (or not storing it at all) means no call
    of ->put_link().

    b) the body used to be passed to ->put_link() implicitly (via nameidata).
    Now only the opaque pointer is. In the cases when we used the symlink body
    to free stuff, ->follow_link() now should store it as opaque pointer in addition
    to returning it.

    Signed-off-by: Al Viro

    Al Viro
     

16 Apr, 2015

1 commit


11 Dec, 2014

2 commits

  • procfs inodes need only the ns_ops part; nsfs inodes don't need it at all

    Signed-off-by: Al Viro

    Al Viro
     
  • New pseudo-filesystem: nsfs. Targets of /proc/*/ns/* live there now.
    It's not mountable (not even registered, so it's not in /proc/filesystems,
    etc.). Files on it *are* bindable - we explicitly permit that in do_loopback().

    This stuff lives in fs/nsfs.c now; proc_ns_fget() moved there as well.
    get_proc_ns() is a macro now (it's simply returning ->i_private; would
    have been an inline, if not for header ordering headache).
    proc_ns_inode() is an ex-parrot. The interface used in procfs is
    ns_get_path(path, task, ops) and ns_get_name(buf, size, task, ops).

    Dentries and inodes are never hashed; a non-counting reference to dentry
    is stashed in ns_common (removed by ->d_prune()) and reused by ns_get_path()
    if present. See ns_get_path()/ns_prune_dentry/nsfs_evict() for details
    of that mechanism.

    As the result, proc_ns_follow_link() has stopped poking in nd->path.mnt;
    it does nd_jump_link() on a consistent pair it gets
    from ns_get_path().

    Signed-off-by: Al Viro

    Al Viro
     

05 Dec, 2014

2 commits


02 Apr, 2014

1 commit


16 Nov, 2013

1 commit


29 Jun, 2013

2 commits


02 May, 2013

1 commit


09 Mar, 2013

1 commit

  • Update proc_ns_follow_link to use nd_jump_link instead of just
    manually updating nd.path.dentry.

    This fixes the BUG_ON(nd->inode != parent->d_inode) reported by Dave
    Jones and reproduced trivially with mkdir /proc/self/ns/uts/a.

    Sigh it looks like the VFS change to require use of nd_jump_link
    happend while proc_ns_follow_link was baking and since the common case
    of proc_ns_follow_link continued to work without problems the need for
    making this change was overlooked.

    Cc: stable@vger.kernel.org
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

20 Nov, 2012

3 commits

  • Assign a unique proc inode to each namespace, and use that
    inode number to ensure we only allocate at most one proc
    inode for every namespace in proc.

    A single proc inode per namespace allows userspace to test
    to see if two processes are in the same namespace.

    This has been a long requested feature and only blocked because
    a naive implementation would put the id in a global space and
    would ultimately require having a namespace for the names of
    namespaces, making migration and certain virtualization tricks
    impossible.

    We still don't have per superblock inode numbers for proc, which
    appears necessary for application unaware checkpoint/restart and
    migrations (if the application is using namespace file descriptors)
    but that is now allowd by the design if it becomes important.

    I have preallocated the ipc and uts initial proc inode numbers so
    their structures can be statically initialized.

    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     
  • Change the proc namespace files into symlinks so that
    we won't cache the dentries for the namespace files
    which can bypass the ptrace_may_access checks.

    To support the symlinks create an additional namespace
    inode with it's own set of operations distinct from the
    proc pid inode and dentry methods as those no longer
    make sense.

    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     
  • This allows entering a user namespace, and the ability
    to store a reference to a user namespace with a bind
    mount.

    Addition of missing userns_ns_put in userns_install
    from Gao feng

    Acked-by: Serge Hallyn
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

19 Nov, 2012

2 commits

  • setns support for the mount namespace is a little tricky as an
    arbitrary decision must be made about what to set fs->root and
    fs->pwd to, as there is no expectation of a relationship between
    the two mount namespaces. Therefore I arbitrarily find the root
    mount point, and follow every mount on top of it to find the top
    of the mount stack. Then I set fs->root and fs->pwd to that
    location. The topmost root of the mount stack seems like a
    reasonable place to be.

    Bind mount support for the mount namespace inodes has the
    possibility of creating circular dependencies between mount
    namespaces. Circular dependencies can result in loops that
    prevent mount namespaces from every being freed. I avoid
    creating those circular dependencies by adding a sequence number
    to the mount namespace and require all bind mounts be of a
    younger mount namespace into an older mount namespace.

    Add a helper function proc_ns_inode so it is possible to
    detect when we are attempting to bind mound a namespace inode.

    Acked-by: Serge Hallyn
    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     
  • - Pid namespaces are designed to be inescapable so verify that the
    passed in pid namespace is a child of the currently active
    pid namespace or the currently active pid namespace itself.

    Allowing the currently active pid namespace is important so
    the effects of an earlier setns can be cancelled.

    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     

14 Jul, 2012

2 commits


29 Mar, 2012

1 commit


24 Mar, 2012

1 commit

  • The namespace cleanup path leaks a dentry which holds a reference count
    on a network namespace. Keeping that network namespace from being freed
    when the last user goes away. Leaving things like vlan devices in the
    leaked network namespace.

    If you use ip netns add for much real work this problem becomes apparent
    pretty quickly. It light testing the problem hides because frequently
    you simply don't notice the leak.

    Use d_set_d_op() so that DCACHE_OP_* flags are set correctly.

    This issue exists back to 3.0.

    Acked-by: "Eric W. Biederman"
    Reported-by: Justin Pettit
    Signed-off-by: Pravin B Shelar
    Signed-off-by: Jesse Gross
    Cc: David Miller
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pravin B Shelar
     

04 Jan, 2012

1 commit


16 Jun, 2011

1 commit


25 May, 2011

1 commit


11 May, 2011

4 commits

  • Acked-by: Daniel Lezcano
    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     
  • Acked-by: Daniel Lezcano
    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     
  • Implementing file descriptors for the network namespace
    is simple and straight forward.

    Acked-by: David S. Miller
    Acked-by: Daniel Lezcano
    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     
  • Create files under /proc//ns/ to allow controlling the
    namespaces of a process.

    This addresses three specific problems that can make namespaces hard to
    work with.
    - Namespaces require a dedicated process to pin them in memory.
    - It is not possible to use a namespace unless you are the child
    of the original creator.
    - Namespaces don't have names that userspace can use to talk about
    them.

    The namespace files under /proc//ns/ can be opened and the
    file descriptor can be used to talk about a specific namespace, and
    to keep the specified namespace alive.

    A namespace can be kept alive by either holding the file descriptor
    open or bind mounting the file someplace else. aka:
    mount --bind /proc/self/ns/net /some/filesystem/path
    mount --bind /proc/self/fd/ /some/filesystem/path

    This allows namespaces to be named with userspace policy.

    It requires additional support to make use of these filedescriptors
    and that will be comming in the following patches.

    Acked-by: Daniel Lezcano
    Signed-off-by: Eric W. Biederman

    Eric W. Biederman