18 Dec, 2014

1 commit

  • Pull user namespace related fixes from Eric Biederman:
    "As these are bug fixes almost all of thes changes are marked for
    backporting to stable.

    The first change (implicitly adding MNT_NODEV on remount) addresses a
    regression that was created when security issues with unprivileged
    remount were closed. I go on to update the remount test to make it
    easy to detect if this issue reoccurs.

    Then there are a handful of mount and umount related fixes.

    Then half of the changes deal with the a recently discovered design
    bug in the permission checks of gid_map. Unix since the beginning has
    allowed setting group permissions on files to less than the user and
    other permissions (aka ---rwx---rwx). As the unix permission checks
    stop as soon as a group matches, and setgroups allows setting groups
    that can not later be dropped, results in a situtation where it is
    possible to legitimately use a group to assign fewer privileges to a
    process. Which means dropping a group can increase a processes
    privileges.

    The fix I have adopted is that gid_map is now no longer writable
    without privilege unless the new file /proc/self/setgroups has been
    set to permanently disable setgroups.

    The bulk of user namespace using applications even the applications
    using applications using user namespaces without privilege remain
    unaffected by this change. Unfortunately this ix breaks a couple user
    space applications, that were relying on the problematic behavior (one
    of which was tools/selftests/mount/unprivileged-remount-test.c).

    To hopefully prevent needing a regression fix on top of my security
    fix I rounded folks who work with the container implementations mostly
    like to be affected and encouraged them to test the changes.

    > So far nothing broke on my libvirt-lxc test bed. :-)
    > Tested with openSUSE 13.2 and libvirt 1.2.9.
    > Tested-by: Richard Weinberger

    > Tested on Fedora20 with libvirt 1.2.11, works fine.
    > Tested-by: Chen Hanxiao

    > Ok, thanks - yes, unprivileged lxc is working fine with your kernels.
    > Just to be sure I was testing the right thing I also tested using
    > my unprivileged nsexec testcases, and they failed on setgroup/setgid
    > as now expected, and succeeded there without your patches.
    > Tested-by: Serge Hallyn

    > I tested this with Sandstorm. It breaks as is and it works if I add
    > the setgroups thing.
    > Tested-by: Andy Lutomirski # breaks things as designed :("

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
    userns: Unbreak the unprivileged remount tests
    userns; Correct the comment in map_write
    userns: Allow setting gid_maps without privilege when setgroups is disabled
    userns: Add a knob to disable setgroups on a per user namespace basis
    userns: Rename id_map_mutex to userns_state_mutex
    userns: Only allow the creator of the userns unprivileged mappings
    userns: Check euid no fsuid when establishing an unprivileged uid mapping
    userns: Don't allow unprivileged creation of gid mappings
    userns: Don't allow setgroups until a gid mapping has been setablished
    userns: Document what the invariant required for safe unprivileged mappings.
    groups: Consolidate the setgroups permission checks
    mnt: Clear mnt_expire during pivot_root
    mnt: Carefully set CL_UNPRIVILEGED in clone_mnt
    mnt: Move the clear of MNT_LOCKED from copy_tree to it's callers.
    umount: Do not allow unmounting rootfs.
    umount: Disallow unprivileged mount force
    mnt: Update unprivileged remount test
    mnt: Implicitly add MNT_NODEV on remount when it was implicitly added by mount

    Linus Torvalds
     

12 Dec, 2014

3 commits

  • It is important that all maps are less than PAGE_SIZE
    or else setting the last byte of the buffer to '0'
    could write off the end of the allocated storage.

    Correct the misleading comment.

    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     
  • Now that setgroups can be disabled and not reenabled, setting gid_map
    without privielge can now be enabled when setgroups is disabled.

    This restores most of the functionality that was lost when unprivileged
    setting of gid_map was removed. Applications that use this functionality
    will need to check to see if they use setgroups or init_groups, and if they
    don't they can be fixed by simply disabling setgroups before writing to
    gid_map.

    Cc: stable@vger.kernel.org
    Reviewed-by: Andy Lutomirski
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     
  • - Expose the knob to user space through a proc file /proc//setgroups

    A value of "deny" means the setgroups system call is disabled in the
    current processes user namespace and can not be enabled in the
    future in this user namespace.

    A value of "allow" means the segtoups system call is enabled.

    - Descendant user namespaces inherit the value of setgroups from
    their parents.

    - A proc file is used (instead of a sysctl) as sysctls currently do
    not allow checking the permissions at open time.

    - Writing to the proc file is restricted to before the gid_map
    for the user namespace is set.

    This ensures that disabling setgroups at a user namespace
    level will never remove the ability to call setgroups
    from a process that already has that ability.

    A process may opt in to the setgroups disable for itself by
    creating, entering and configuring a user namespace or by calling
    setns on an existing user namespace with setgroups disabled.
    Processes without privileges already can not call setgroups so this
    is a noop. Prodcess with privilege become processes without
    privilege when entering a user namespace and as with any other path
    to dropping privilege they would not have the ability to call
    setgroups. So this remains within the bounds of what is possible
    without a knob to disable setgroups permanently in a user namespace.

    Cc: stable@vger.kernel.org
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

10 Dec, 2014

5 commits

  • Generalize id_map_mutex so it can be used for more state of a user namespace.

    Cc: stable@vger.kernel.org
    Reviewed-by: Andy Lutomirski
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     
  • If you did not create the user namespace and are allowed
    to write to uid_map or gid_map you should already have the necessary
    privilege in the parent user namespace to establish any mapping
    you want so this will not affect userspace in practice.

    Limiting unprivileged uid mapping establishment to the creator of the
    user namespace makes it easier to verify all credentials obtained with
    the uid mapping can be obtained without the uid mapping without
    privilege.

    Limiting unprivileged gid mapping establishment (which is temporarily
    absent) to the creator of the user namespace also ensures that the
    combination of uid and gid can already be obtained without privilege.

    This is part of the fix for CVE-2014-8989.

    Cc: stable@vger.kernel.org
    Reviewed-by: Andy Lutomirski
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     
  • setresuid allows the euid to be set to any of uid, euid, suid, and
    fsuid. Therefor it is safe to allow an unprivileged user to map
    their euid and use CAP_SETUID privileged with exactly that uid,
    as no new credentials can be obtained.

    I can not find a combination of existing system calls that allows setting
    uid, euid, suid, and fsuid from the fsuid making the previous use
    of fsuid for allowing unprivileged mappings a bug.

    This is part of a fix for CVE-2014-8989.

    Cc: stable@vger.kernel.org
    Reviewed-by: Andy Lutomirski
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     
  • As any gid mapping will allow and must allow for backwards
    compatibility dropping groups don't allow any gid mappings to be
    established without CAP_SETGID in the parent user namespace.

    For a small class of applications this change breaks userspace
    and removes useful functionality. This small class of applications
    includes tools/testing/selftests/mount/unprivilged-remount-test.c

    Most of the removed functionality will be added back with the addition
    of a one way knob to disable setgroups. Once setgroups is disabled
    setting the gid_map becomes as safe as setting the uid_map.

    For more common applications that set the uid_map and the gid_map
    with privilege this change will have no affect.

    This is part of a fix for CVE-2014-8989.

    Cc: stable@vger.kernel.org
    Reviewed-by: Andy Lutomirski
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     
  • setgroups is unique in not needing a valid mapping before it can be called,
    in the case of setgroups(0, NULL) which drops all supplemental groups.

    The design of the user namespace assumes that CAP_SETGID can not actually
    be used until a gid mapping is established. Therefore add a helper function
    to see if the user namespace gid mapping has been established and call
    that function in the setgroups permission check.

    This is part of the fix for CVE-2014-8989, being able to drop groups
    without privilege using user namespaces.

    Cc: stable@vger.kernel.org
    Reviewed-by: Andy Lutomirski
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

06 Dec, 2014

1 commit

  • The rule is simple. Don't allow anything that wouldn't be allowed
    without unprivileged mappings.

    It was previously overlooked that establishing gid mappings would
    allow dropping groups and potentially gaining permission to files and
    directories that had lesser permissions for a specific group than for
    all other users.

    This is the rule needed to fix CVE-2014-8989 and prevent any other
    security issues with new_idmap_permitted.

    The reason for this rule is that the unix permission model is old and
    there are programs out there somewhere that take advantage of every
    little corner of it. So allowing a uid or gid mapping to be
    established without privielge that would allow anything that would not
    be allowed without that mapping will result in expectations from some
    code somewhere being violated. Violated expectations about the
    behavior of the OS is a long way to say a security issue.

    Cc: stable@vger.kernel.org
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

05 Dec, 2014

5 commits


09 Aug, 2014

1 commit

  • proc_uid_seq_operations, proc_gid_seq_operations and
    proc_projid_seq_operations are only called in proc_id_map_open with
    seq_open as const struct seq_operations so we can constify the 3
    structures and update proc_id_map_open prototype.

    text data bss dec hex filename
    6817 404 1984 9205 23f5 kernel/user_namespace.o-before
    6913 308 1984 9205 23f5 kernel/user_namespace.o-after

    Signed-off-by: Fabian Frederick
    Cc: Joe Perches
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fabian Frederick
     

07 Jun, 2014

1 commit


15 Apr, 2014

1 commit

  • smp_read_barrier_depends() can be used if there is data dependency between
    the readers - i.e. if the read operation after the barrier uses address
    that was obtained from the read operation before the barrier.

    In this file, there is only control dependency, no data dependecy, so the
    use of smp_read_barrier_depends() is incorrect. The code could fail in the
    following way:
    * the cpu predicts that idx < entries is true and starts executing the
    body of the for loop
    * the cpu fetches map->extent[0].first and map->extent[0].count
    * the cpu fetches map->nr_extents
    * the cpu verifies that idx < extents is true, so it commits the
    instructions in the body of the for loop

    The problem is that in this scenario, the cpu read map->extent[0].first
    and map->nr_extents in the wrong order. We need a full read memory barrier
    to prevent it.

    Signed-off-by: Mikulas Patocka
    Cc: stable@vger.kernel.org
    Signed-off-by: Linus Torvalds

    Mikulas Patocka
     

04 Apr, 2014

1 commit

  • Code that is obj-y (always built-in) or dependent on a bool Kconfig
    (built-in or absent) can never be modular. So using module_init as an
    alias for __initcall can be somewhat misleading.

    Fix these up now, so that we can relocate module_init from init.h into
    module.h in the future. If we don't do this, we'd have to add module.h
    to obviously non-modular code, and that would be a worse thing.

    The audit targets the following module_init users for change:
    kernel/user.c obj-y
    kernel/kexec.c bool KEXEC (one instance per arch)
    kernel/profile.c bool PROFILING
    kernel/hung_task.c bool DETECT_HUNG_TASK
    kernel/sched/stats.c bool SCHEDSTATS
    kernel/user_namespace.c bool USER_NS

    Note that direct use of __initcall is discouraged, vs. one of the
    priority categorized subgroups. As __initcall gets mapped onto
    device_initcall, our use of subsys_initcall (which makes sense for these
    files) will thus change this registration from level 6-device to level
    4-subsys (i.e. slightly earlier). However no observable impact of that
    difference has been observed during testing.

    Also, two instances of missing ";" at EOL are fixed in kexec.

    Signed-off-by: Paul Gortmaker
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Eric Biederman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Gortmaker
     

21 Feb, 2014

1 commit


24 Sep, 2013

1 commit

  • Add support for per-user_namespace registers of persistent per-UID kerberos
    caches held within the kernel.

    This allows the kerberos cache to be retained beyond the life of all a user's
    processes so that the user's cron jobs can work.

    The kerberos cache is envisioned as a keyring/key tree looking something like:

    struct user_namespace
    \___ .krb_cache keyring - The register
    \___ _krb.0 keyring - Root's Kerberos cache
    \___ _krb.5000 keyring - User 5000's Kerberos cache
    \___ _krb.5001 keyring - User 5001's Kerberos cache
    \___ tkt785 big_key - A ccache blob
    \___ tkt12345 big_key - Another ccache blob

    Or possibly:

    struct user_namespace
    \___ .krb_cache keyring - The register
    \___ _krb.0 keyring - Root's Kerberos cache
    \___ _krb.5000 keyring - User 5000's Kerberos cache
    \___ _krb.5001 keyring - User 5001's Kerberos cache
    \___ tkt785 keyring - A ccache
    \___ krbtgt/REDHAT.COM@REDHAT.COM big_key
    \___ http/REDHAT.COM@REDHAT.COM user
    \___ afs/REDHAT.COM@REDHAT.COM user
    \___ nfs/REDHAT.COM@REDHAT.COM user
    \___ krbtgt/KERNEL.ORG@KERNEL.ORG big_key
    \___ http/KERNEL.ORG@KERNEL.ORG big_key

    What goes into a particular Kerberos cache is entirely up to userspace. Kernel
    support is limited to giving you the Kerberos cache keyring that you want.

    The user asks for their Kerberos cache by:

    krb_cache = keyctl_get_krbcache(uid, dest_keyring);

    The uid is -1 or the user's own UID for the user's own cache or the uid of some
    other user's cache (requires CAP_SETUID). This permits rpc.gssd or whatever to
    mess with the cache.

    The cache returned is a keyring named "_krb." that the possessor can read,
    search, clear, invalidate, unlink from and add links to. Active LSMs get a
    chance to rule on whether the caller is permitted to make a link.

    Each uid's cache keyring is created when it first accessed and is given a
    timeout that is extended each time this function is called so that the keyring
    goes away after a while. The timeout is configurable by sysctl but defaults to
    three days.

    Each user_namespace struct gets a lazily-created keyring that serves as the
    register. The cache keyrings are added to it. This means that standard key
    search and garbage collection facilities are available.

    The user_namespace struct's register goes away when it does and anything left
    in it is then automatically gc'd.

    Signed-off-by: David Howells
    Tested-by: Simo Sorce
    cc: Serge E. Hallyn
    cc: Eric W. Biederman

    David Howells
     

08 Sep, 2013

1 commit

  • Pull namespace changes from Eric Biederman:
    "This is an assorted mishmash of small cleanups, enhancements and bug
    fixes.

    The major theme is user namespace mount restrictions. nsown_capable
    is killed as it encourages not thinking about details that need to be
    considered. A very hard to hit pid namespace exiting bug was finally
    tracked and fixed. A couple of cleanups to the basic namespace
    infrastructure.

    Finally there is an enhancement that makes per user namespace
    capabilities usable as capabilities, and an enhancement that allows
    the per userns root to nice other processes in the user namespace"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
    userns: Kill nsown_capable it makes the wrong thing easy
    capabilities: allow nice if we are privileged
    pidns: Don't have unshare(CLONE_NEWPID) imply CLONE_THREAD
    userns: Allow PR_CAPBSET_DROP in a user namespace.
    namespaces: Simplify copy_namespaces so it is clear what is going on.
    pidns: Fix hang in zap_pid_ns_processes by sending a potentially extra wakeup
    sysfs: Restrict mounting sysfs
    userns: Better restrictions on when proc and sysfs can be mounted
    vfs: Don't copy mount bind mounts of /proc//ns/mnt between namespaces
    kernel/nsproxy.c: Improving a snippet of code.
    proc: Restrict mounting the proc filesystem
    vfs: Lock in place mounts from more privileged users

    Linus Torvalds
     

27 Aug, 2013

1 commit

  • Rely on the fact that another flavor of the filesystem is already
    mounted and do not rely on state in the user namespace.

    Verify that the mounted filesystem is not covered in any significant
    way. I would love to verify that the previously mounted filesystem
    has no mounts on top but there are at least the directories
    /proc/sys/fs/binfmt_misc and /sys/fs/cgroup/ that exist explicitly
    for other filesystems to mount on top of.

    Refactor the test into a function named fs_fully_visible and call that
    function from the mount routines of proc and sysfs. This makes this
    test local to the filesystems involved and the results current of when
    the mounts take place, removing a weird threading of the user
    namespace, the mount namespace and the filesystems themselves.

    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

09 Aug, 2013

1 commit


07 Aug, 2013

1 commit

  • unshare_userns(new_cred) does *new_cred = prepare_creds() before
    create_user_ns() which can fail. However, the caller expects that
    it doesn't need to take care of new_cred if unshare_userns() fails.

    We could change the single caller, sys_unshare(), but I think it
    would be more clean to avoid the side effects on failure, so with
    this patch unshare_userns() does put_cred() itself and initializes
    *new_cred only if create_user_ns() succeeeds.

    Cc: stable@vger.kernel.org
    Signed-off-by: Oleg Nesterov
    Reviewed-by: Andy Lutomirski
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

02 May, 2013

2 commits

  • Pull VFS updates from Al Viro,

    Misc cleanups all over the place, mainly wrt /proc interfaces (switch
    create_proc_entry to proc_create(), get rid of the deprecated
    create_proc_read_entry() in favor of using proc_create_data() and
    seq_file etc).

    7kloc removed.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (204 commits)
    don't bother with deferred freeing of fdtables
    proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h
    proc: Make the PROC_I() and PDE() macros internal to procfs
    proc: Supply a function to remove a proc entry by PDE
    take cgroup_open() and cpuset_open() to fs/proc/base.c
    ppc: Clean up scanlog
    ppc: Clean up rtas_flash driver somewhat
    hostap: proc: Use remove_proc_subtree()
    drm: proc: Use remove_proc_subtree()
    drm: proc: Use minor->index to label things, not PDE->name
    drm: Constify drm_proc_list[]
    zoran: Don't print proc_dir_entry data in debug
    reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show()
    proc: Supply an accessor for getting the data from a PDE's parent
    airo: Use remove_proc_subtree()
    rtl8192u: Don't need to save device proc dir PDE
    rtl8187se: Use a dir under /proc/net/r8180/
    proc: Add proc_mkdir_data()
    proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h}
    proc: Move PDE_NET() to fs/proc/proc_net.c
    ...

    Linus Torvalds
     
  • Split the proc namespace stuff out into linux/proc_ns.h.

    Signed-off-by: David Howells
    cc: netdev@vger.kernel.org
    cc: Serge E. Hallyn
    cc: Eric W. Biederman
    Signed-off-by: Al Viro

    David Howells
     

15 Apr, 2013

3 commits


27 Mar, 2013

2 commits

  • Only allow unprivileged mounts of proc and sysfs if they are already
    mounted when the user namespace is created.

    proc and sysfs are interesting because they have content that is
    per namespace, and so fresh mounts are needed when new namespaces
    are created while at the same time proc and sysfs have content that
    is shared between every instance.

    Respect the policy of who may see the shared content of proc and sysfs
    by only allowing new mounts if there was an existing mount at the time
    the user namespace was created.

    In practice there are only two interesting cases: proc and sysfs are
    mounted at their usual places, proc and sysfs are not mounted at all
    (some form of mount namespace jail).

    Cc: stable@vger.kernel.org
    Acked-by: Serge Hallyn
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     
  • Guarantee that the policy of which files may be access that is
    established by setting the root directory will not be violated
    by user namespaces by verifying that the root directory points
    to the root of the mount namespace at the time of user namespace
    creation.

    Changing the root is a privileged operation, and as a matter of policy
    it serves to limit unprivileged processes to files below the current
    root directory.

    For reasons of simplicity and comprehensibility the privilege to
    change the root directory is gated solely on the CAP_SYS_CHROOT
    capability in the user namespace. Therefore when creating a user
    namespace we must ensure that the policy of which files may be access
    can not be violated by changing the root directory.

    Anyone who runs a processes in a chroot and would like to use user
    namespace can setup the same view of filesystems with a mount
    namespace instead. With this result that this is not a practical
    limitation for using user namespaces.

    Cc: stable@vger.kernel.org
    Acked-by: Serge Hallyn
    Reported-by: Andy Lutomirski
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

14 Mar, 2013

1 commit

  • Don't allowing sharing the root directory with processes in a
    different user namespace. There doesn't seem to be any point, and to
    allow it would require the overhead of putting a user namespace
    reference in fs_struct (for permission checks) and incrementing that
    reference count on practically every call to fork.

    So just perform the inexpensive test of forbidding sharing fs_struct
    acrosss processes in different user namespaces. We already disallow
    other forms of threading when unsharing a user namespace so this
    should be no real burden in practice.

    This updates setns, clone, and unshare to disallow multiple user
    namespaces sharing an fs_struct.

    Cc: stable@vger.kernel.org
    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     

27 Jan, 2013

2 commits

  • When I initially wrote the code for /proc//uid_map. I was lazy
    and avoided duplicate mappings by the simple expedient of ensuring the
    first number in a new extent was greater than any number in the
    previous extent.

    Unfortunately that precludes a number of valid mappings, and someone
    noticed and complained. So use a simple check to ensure that ranges
    in the mapping extents don't overlap.

    Acked-by: Serge Hallyn
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     
  • When freeing a deeply nested user namespace free_user_ns calls
    put_user_ns on it's parent which may in turn call free_user_ns again.
    When -fno-optimize-sibling-calls is passed to gcc one stack frame per
    user namespace is left on the stack, potentially overflowing the
    kernel stack. CONFIG_FRAME_POINTER forces -fno-optimize-sibling-calls
    so we can't count on gcc to optimize this code.

    Remove struct kref and use a plain atomic_t. Making the code more
    flexible and easier to comprehend. Make the loop in free_user_ns
    explict to guarantee that the stack does not overflow with
    CONFIG_FRAME_POINTER enabled.

    I have tested this fix with a simple program that uses unshare to
    create a deeply nested user namespace structure and then calls exit.
    With 1000 nesteuser namespaces before this change running my test
    program causes the kernel to die a horrible death. With 10,000,000
    nested user namespaces after this change my test program runs to
    completion and causes no harm.

    Acked-by: Serge Hallyn
    Pointed-out-by: Vasily Kulikov
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

15 Dec, 2012

1 commit


20 Nov, 2012

4 commits

  • Assign a unique proc inode to each namespace, and use that
    inode number to ensure we only allocate at most one proc
    inode for every namespace in proc.

    A single proc inode per namespace allows userspace to test
    to see if two processes are in the same namespace.

    This has been a long requested feature and only blocked because
    a naive implementation would put the id in a global space and
    would ultimately require having a namespace for the names of
    namespaces, making migration and certain virtualization tricks
    impossible.

    We still don't have per superblock inode numbers for proc, which
    appears necessary for application unaware checkpoint/restart and
    migrations (if the application is using namespace file descriptors)
    but that is now allowd by the design if it becomes important.

    I have preallocated the ipc and uts initial proc inode numbers so
    their structures can be statically initialized.

    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     
  • To keep things sane in the context of file descriptor passing derive the
    user namespace that uids are mapped into from the opener of the file
    instead of from current.

    When writing to the maps file the lower user namespace must always
    be the parent user namespace, or setting the mapping simply does
    not make sense. Enforce that the opener of the file was in
    the parent user namespace or the user namespace whose mapping
    is being set.

    Acked-by: Serge E. Hallyn
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     
  • - Add CLONE_THREAD to the unshare flags if CLONE_NEWUSER is selected
    As changing user namespaces is only valid if all there is only
    a single thread.
    - Restore the code to add CLONE_VM if CLONE_THREAD is selected and
    the code to addCLONE_SIGHAND if CLONE_VM is selected.
    Making the constraints in the code clear.

    Acked-by: Serge Hallyn
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     
  • This allows entering a user namespace, and the ability
    to store a reference to a user namespace with a bind
    mount.

    Addition of missing userns_ns_put in userns_install
    from Gao feng

    Acked-by: Serge Hallyn
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman