02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

06 Jul, 2017

1 commit


09 May, 2017

1 commit

  • Patch series "Expose task pid_ns_for_children to userspace".

    pid_ns_for_children set by a task is known only to the task itself, and
    it's impossible to identify it from outside.

    It's a big problem for checkpoint/restore software like CRIU, because it
    can't correctly handle tasks, that do setns(CLONE_NEWPID) in proccess of
    their work. If they have a custom pid_ns_for_children before dump, they
    must have the same ns after restore. Otherwise, restored task bumped
    into enviroment it does not expect.

    This patchset solves the problem. It exposes pid_ns_for_children to ns
    directory in standard way with the name "pid_for_children":

    ~# ls /proc/5531/ns -l | grep pid
    lrwxrwxrwx 1 root root 0 Jan 14 16:38 pid -> pid:[4026531836]
    lrwxrwxrwx 1 root root 0 Jan 14 16:38 pid_for_children -> pid:[4026532286]

    This patch (of 2):

    Make possible to have link content prefix yyy different from the link
    name xxx:

    $ readlink /proc/[pid]/ns/xxx
    yyy:[4026531838]

    This will be used in next patch.

    Link: http://lkml.kernel.org/r/149201120318.6007.7362655181033883000.stgit@localhost.localdomain
    Signed-off-by: Kirill Tkhai
    Reviewed-by: Cyrill Gorcunov
    Acked-by: Andrei Vagin
    Cc: Andreas Gruenbacher
    Cc: Kees Cook
    Cc: Michael Kerrisk
    Cc: Al Viro
    Cc: Oleg Nesterov
    Cc: Paul Moore
    Cc: Eric Biederman
    Cc: Andy Lutomirski
    Cc: Ingo Molnar
    Cc: Serge Hallyn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill Tkhai
     

20 Apr, 2017

1 commit

  • Andrey reported a use-after-free in __ns_get_path():

    spin_lock include/linux/spinlock.h:299 [inline]
    lockref_get_not_dead+0x19/0x80 lib/lockref.c:179
    __ns_get_path+0x197/0x860 fs/nsfs.c:66
    open_related_ns+0xda/0x200 fs/nsfs.c:143
    sock_ioctl+0x39d/0x440 net/socket.c:1001
    vfs_ioctl fs/ioctl.c:45 [inline]
    do_vfs_ioctl+0x1bf/0x1780 fs/ioctl.c:685
    SYSC_ioctl fs/ioctl.c:700 [inline]
    SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691

    We are under rcu read lock protection at that point:

    rcu_read_lock();
    d = atomic_long_read(&ns->stashed);
    if (!d)
    goto slow;
    dentry = (struct dentry *)d;
    if (!lockref_get_not_dead(&dentry->d_lockref))
    goto slow;
    rcu_read_unlock();

    but don't use a proper RCU API on the free path, therefore a parallel
    __d_free() could free it at the same time. We need to mark the stashed
    dentry with DCACHE_RCUACCESS so that __d_free() will be called after all
    readers leave RCU.

    Fixes: e149ed2b805f ("take the targets of /proc/*/ns/* symlinks to separate fs")
    Cc: Alexander Viro
    Cc: Andrew Morton
    Reported-by: Andrey Konovalov
    Signed-off-by: Cong Wang
    Signed-off-by: Linus Torvalds

    Cong Wang
     

03 Feb, 2017

1 commit

  • I'd like to write code that discovers the user namespace hierarchy on a
    running system, and also shows who owns the various user namespaces.
    Currently, there is no way of getting the owner UID of a user namespace.
    Therefore, this patch adds a new NS_GET_CREATOR_UID ioctl() that fetches
    the UID (as seen in the user namespace of the caller) of the creator of
    the user namespace referred to by the specified file descriptor.

    If the supplied file descriptor does not refer to a user namespace,
    the operation fails with the error EINVAL. If the owner UID does
    not have a mapping in the caller's user namespace return the
    overflow UID as that appears easier to deal with in practice
    in user-space applications.

    -- EWB Changed the handling of unmapped UIDs from -EOVERFLOW
    back to the overflow uid. Per conversation with
    Michael Kerrisk after examining his test code.

    Acked-by: Andrey Vagin
    Signed-off-by: Michael Kerrisk
    Signed-off-by: Eric W. Biederman

    Michael Kerrisk (man-pages)
     

25 Jan, 2017

1 commit

  • Linux 4.9 added two ioctl() operations that can be used to discover:

    * the parental relationships for hierarchical namespaces (user and PID)
    [NS_GET_PARENT]
    * the user namespaces that owns a specified non-user-namespace
    [NS_GET_USERNS]

    For no good reason that I can glean, NS_GET_USERNS was made synonymous
    with NS_GET_PARENT for user namespaces. It might have been better if
    NS_GET_USERNS had returned an error if the supplied file descriptor
    referred to a user namespace, since it suggests that the caller may be
    confused. More particularly, if it had generated an error, then I wouldn't
    need the new ioctl() operation proposed here. (On the other hand, what
    I propose here may be more generally useful.)

    I would like to write code that discovers namespace relationships for
    the purpose of understanding the namespace setup on a running system.
    In particular, given a file descriptor (or pathname) for a namespace,
    N, I'd like to obtain the corresponding user namespace. Namespace N
    might be a user namespace (in which case my code would just use N) or
    a non-user namespace (in which case my code will use NS_GET_USERNS to
    get the user namespace associated with N). The problem is that there
    is no way to tell the difference by looking at the file descriptor
    (and if I try to use NS_GET_USERNS on an N that is a user namespace, I
    get the parent user namespace of N, which is not what I want).

    This patch therefore adds a new ioctl(), NS_GET_NSTYPE, which, given
    a file descriptor that refers to a user namespace, returns the
    namespace type (one of the CLONE_NEW* constants).

    Signed-off-by: Michael Kerrisk
    Signed-off-by: Eric W. Biederman

    Michael Kerrisk (man-pages)
     

31 Oct, 2016

1 commit

  • Each socket operates in a network namespace where it has been created,
    so if we want to dump and restore a socket, we have to know its network
    namespace.

    We have a socket_diag to get information about sockets, it doesn't
    report sockets which are not bound or connected.

    This patch introduces a new socket ioctl, which is called SIOCGSKNS
    and used to get a file descriptor for a socket network namespace.

    A task must have CAP_NET_ADMIN in a target network namespace to
    use this ioctl.

    Cc: "David S. Miller"
    Cc: Eric W. Biederman
    Signed-off-by: Andrei Vagin
    Signed-off-by: David S. Miller

    Andrey Vagin
     

11 Oct, 2016

1 commit

  • Pull more vfs updates from Al Viro:
    ">rename2() work from Miklos + current_time() from Deepa"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    fs: Replace current_fs_time() with current_time()
    fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps
    fs: Replace CURRENT_TIME with current_time() for inode timestamps
    fs: proc: Delete inode time initializations in proc_alloc_inode()
    vfs: Add current_time() api
    vfs: add note about i_op->rename changes to porting
    fs: rename "rename2" i_op to "rename"
    vfs: remove unused i_op->rename
    fs: make remaining filesystems use .rename2
    libfs: support RENAME_NOREPLACE in simple_rename()
    fs: support RENAME_NOREPLACE for local filesystems
    ncpfs: fix unused variable warning

    Linus Torvalds
     

28 Sep, 2016

1 commit

  • CURRENT_TIME macro is not appropriate for filesystems as it
    doesn't use the right granularity for filesystem timestamps.
    Use current_time() instead.

    CURRENT_TIME is also not y2038 safe.

    This is also in preparation for the patch that transitions
    vfs timestamps to use 64 bit time and hence make them
    y2038 safe. As part of the effort current_time() will be
    extended to do range checks. Hence, it is necessary for all
    file system timestamps to use current_time(). Also,
    current_time() will be transitioned along with vfs to be
    y2038 safe.

    Note that whenever a single call to current_time() is used
    to change timestamps in different inodes, it is because they
    share the same time granularity.

    Signed-off-by: Deepa Dinamani
    Reviewed-by: Arnd Bergmann
    Acked-by: Felipe Balbi
    Acked-by: Steven Whitehouse
    Acked-by: Ryusuke Konishi
    Acked-by: David Sterba
    Signed-off-by: Al Viro

    Deepa Dinamani
     

23 Sep, 2016

3 commits

  • Move mntget from the very beginning of __ns_get_path to
    the success path of __ns_get_path, and remove the mntget
    calls.

    This removes the possibility that there will be a mntget/mntput
    pair of __ns_get_path has to retry, and generally simplifies the code.

    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     
  • Pid and user namepaces are hierarchical. There is no way to discover
    parent-child relationships.

    In a future we will use this interface to dump and restore nested
    namespaces.

    Acked-by: Serge Hallyn
    Signed-off-by: Andrei Vagin
    Signed-off-by: Eric W. Biederman

    Andrey Vagin
     
  • Each namespace has an owning user namespace and now there is not way
    to discover these relationships.

    Understending namespaces relationships allows to answer the question:
    what capability does process X have to perform operations on a resource
    governed by namespace Y?

    After a long discussion, Eric W. Biederman proposed to use ioctl-s for
    this purpose.

    The NS_GET_USERNS ioctl returns a file descriptor to an owning user
    namespace.
    It returns EPERM if a target namespace is outside of a current user
    namespace.

    v2: rename parent to relative

    v3: Add a missing mntput when returning -EAGAIN --EWB

    Acked-by: Serge Hallyn
    Link: https://lkml.org/lkml/2016/7/6/158
    Signed-off-by: Andrei Vagin
    Signed-off-by: Eric W. Biederman

    Andrey Vagin
     

12 Sep, 2015

1 commit

  • The seq_ function return values were frequently misused.

    See: commit 1f33c41c03da ("seq_file: Rename seq_overflow() to
    seq_has_overflowed() and make public")

    All uses of these return values have been removed, so convert the
    return types to void.

    Miscellanea:

    o Move seq_put_decimal_ and seq_escape prototypes closer the
    other seq_vprintf prototypes
    o Reorder seq_putc and seq_puts to return early on overflow
    o Add argument names to seq_vprintf and seq_printf
    o Update the seq_escape kernel-doc
    o Convert a couple of leading spaces to tabs in seq_escape

    Signed-off-by: Joe Perches
    Cc: Al Viro
    Cc: Steven Rostedt
    Cc: Mark Brown
    Cc: Stephen Rothwell
    Cc: Joerg Roedel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     

12 Jul, 2015

1 commit


16 Apr, 2015

1 commit


11 Dec, 2014

1 commit

  • New pseudo-filesystem: nsfs. Targets of /proc/*/ns/* live there now.
    It's not mountable (not even registered, so it's not in /proc/filesystems,
    etc.). Files on it *are* bindable - we explicitly permit that in do_loopback().

    This stuff lives in fs/nsfs.c now; proc_ns_fget() moved there as well.
    get_proc_ns() is a macro now (it's simply returning ->i_private; would
    have been an inline, if not for header ordering headache).
    proc_ns_inode() is an ex-parrot. The interface used in procfs is
    ns_get_path(path, task, ops) and ns_get_name(buf, size, task, ops).

    Dentries and inodes are never hashed; a non-counting reference to dentry
    is stashed in ns_common (removed by ->d_prune()) and reused by ns_get_path()
    if present. See ns_get_path()/ns_prune_dentry/nsfs_evict() for details
    of that mechanism.

    As the result, proc_ns_follow_link() has stopped poking in nd->path.mnt;
    it does nd_jump_link() on a consistent pair it gets
    from ns_get_path().

    Signed-off-by: Al Viro

    Al Viro