07 Oct, 2016

1 commit

  • Pull namespace updates from Eric Biederman:
    "This set of changes is a number of smaller things that have been
    overlooked in other development cycles focused on more fundamental
    change. The devpts changes are small things that were a distraction
    until we managed to kill off DEVPTS_MULTPLE_INSTANCES. There is an
    trivial regression fix to autofs for the unprivileged mount changes
    that went in last cycle. A pair of ioctls has been added by Andrey
    Vagin making it is possible to discover the relationships between
    namespaces when referring to them through file descriptors.

    The big user visible change is starting to add simple resource limits
    to catch programs that misbehave. With namespaces in general and user
    namespaces in particular allowing users to use more kinds of
    resources, it has become important to have something to limit errant
    programs. Because the purpose of these limits is to catch errant
    programs the code needs to be inexpensive to use as it always on, and
    the default limits need to be high enough that well behaved programs
    on well behaved systems don't encounter them.

    To this end, after some review I have implemented per user per user
    namespace limits, and use them to limit the number of namespaces. The
    limits being per user mean that one user can not exhause the limits of
    another user. The limits being per user namespace allow contexts where
    the limit is 0 and security conscious folks can remove from their
    threat anlysis the code used to manage namespaces (as they have
    historically done as it root only). At the same time the limits being
    per user namespace allow other parts of the system to use namespaces.

    Namespaces are increasingly being used in application sand boxing
    scenarios so an all or nothing disable for the entire system for the
    security conscious folks makes increasing use of these sandboxes
    impossible.

    There is also added a limit on the maximum number of mounts present in
    a single mount namespace. It is nontrivial to guess what a reasonable
    system wide limit on the number of mount structure in the kernel would
    be, especially as it various based on how a system is using
    containers. A limit on the number of mounts in a mount namespace
    however is much easier to understand and set. In most cases in
    practice only about 1000 mounts are used. Given that some autofs
    scenarious have the potential to be 30,000 to 50,000 mounts I have set
    the default limit for the number of mounts at 100,000 which is well
    above every known set of users but low enough that the mount hash
    tables don't degrade unreaonsably.

    These limits are a start. I expect this estabilishes a pattern that
    other limits for resources that namespaces use will follow. There has
    been interest in making inotify event limits per user per user
    namespace as well as interest expressed in making details about what
    is going on in the kernel more visible"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (28 commits)
    autofs: Fix automounts by using current_real_cred()->uid
    mnt: Add a per mount namespace limit on the number of mounts
    netns: move {inc,dec}_net_namespaces into #ifdef
    nsfs: Simplify __ns_get_path
    tools/testing: add a test to check nsfs ioctl-s
    nsfs: add ioctl to get a parent namespace
    nsfs: add ioctl to get an owning user namespace for ns file descriptor
    kernel: add a helper to get an owning user namespace for a namespace
    devpts: Change the owner of /dev/pts/ptmx to the mounter of /dev/pts
    devpts: Remove sync_filesystems
    devpts: Make devpts_kill_sb safe if fsi is NULL
    devpts: Simplify devpts_mount by using mount_nodev
    devpts: Move the creation of /dev/pts/ptmx into fill_super
    devpts: Move parse_mount_options into fill_super
    userns: When the per user per user namespace limit is reached return ENOSPC
    userns; Document per user per user namespace limits.
    mntns: Add a limit on the number of mount namespaces.
    netns: Add a limit on the number of net namespaces
    cgroupns: Add a limit on the number of cgroup namespaces
    ipcns: Add a limit on the number of ipc namespaces
    ...

    Linus Torvalds
     

01 Oct, 2016

1 commit

  • The capability check should not be audited since it is only being used
    to determine the inode permissions. A failed check does not indicate a
    violation of security policy but, when an LSM is enabled, a denial audit
    message was being generated.

    The denial audit message caused confusion for some application authors
    because root-running Go applications always triggered the denial. To
    prevent this confusion, the capability check in net_ctl_permissions() is
    switched to the noaudit variant.

    BugLink: https://launchpad.net/bugs/1465724

    Signed-off-by: Tyler Hicks
    Acked-by: Serge E. Hallyn
    Signed-off-by: James Morris
    [dtor: reapplied after e79c6a4fc923 ("net: make net namespace sysctls
    belong to container's owner") accidentally reverted the change.]
    Signed-off-by: Dmitry Torokhov
    Signed-off-by: David S. Miller

    Tyler Hicks
     

15 Aug, 2016

1 commit

  • If net namespace is attached to a user namespace let's make container's
    root owner of sysctls affecting said network namespace instead of global
    root.

    This also allows us to clean up net_ctl_permissions() because we do not
    need to fudge permissions anymore for the container's owner since it now
    owns the objects in question.

    Acked-by: "Eric W. Biederman"
    Signed-off-by: Dmitry Torokhov
    Signed-off-by: David S. Miller

    Dmitry Torokhov
     

08 Aug, 2016

1 commit


06 Jun, 2016

1 commit

  • The capability check should not be audited since it is only being used
    to determine the inode permissions. A failed check does not indicate a
    violation of security policy but, when an LSM is enabled, a denial audit
    message was being generated.

    The denial audit message caused confusion for some application authors
    because root-running Go applications always triggered the denial. To
    prevent this confusion, the capability check in net_ctl_permissions() is
    switched to the noaudit variant.

    BugLink: https://launchpad.net/bugs/1465724

    Signed-off-by: Tyler Hicks
    Acked-by: Serge E. Hallyn
    Signed-off-by: James Morris

    Tyler Hicks
     

23 Oct, 2015

1 commit

  • the returned buffer of register_sysctl() is stored into net_header
    variable, but net_header is not used after, and compiler maybe
    optimise the variable out, and lead kmemleak reported the below warning

    comm "swapper/0", pid 1, jiffies 4294937448 (age 267.270s)
    hex dump (first 32 bytes):
    90 38 8b 01 c0 ff ff ff 00 00 00 00 01 00 00 00 .8..............
    01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
    backtrace:
    [] create_object+0x10c/0x2a0
    [] kmemleak_alloc+0x54/0xa0
    [] __kmalloc+0x1f8/0x4f8
    [] __register_sysctl_table+0x64/0x5a0
    [] register_sysctl+0x30/0x40
    [] net_sysctl_init+0x20/0x58
    [] sock_init+0x10/0xb0
    [] do_one_initcall+0x90/0x1b8
    [] kernel_init_freeable+0x218/0x2f0
    [] kernel_init+0x1c/0xe8
    [] ret_from_fork+0xc/0x50
    [] 0xffffffffffffffff <>

    Before fix, the objdump result on ARM64:
    0000000000000000 :
    0: a9be7bfd stp x29, x30, [sp,#-32]!
    4: 90000001 adrp x1, 0
    8: 90000000 adrp x0, 0
    c: 910003fd mov x29, sp
    10: 91000021 add x1, x1, #0x0
    14: 91000000 add x0, x0, #0x0
    18: a90153f3 stp x19, x20, [sp,#16]
    1c: 12800174 mov w20, #0xfffffff4 // #-12
    20: 94000000 bl 0
    24: b4000120 cbz x0, 48
    28: 90000013 adrp x19, 0
    2c: 91000273 add x19, x19, #0x0
    30: 9101a260 add x0, x19, #0x68
    34: 94000000 bl 0
    38: 2a0003f4 mov w20, w0
    3c: 35000060 cbnz w0, 48
    40: aa1303e0 mov x0, x19
    44: 94000000 bl 0
    48: 2a1403e0 mov w0, w20
    4c: a94153f3 ldp x19, x20, [sp,#16]
    50: a8c27bfd ldp x29, x30, [sp],#32
    54: d65f03c0 ret
    After:
    0000000000000000 :
    0: a9bd7bfd stp x29, x30, [sp,#-48]!
    4: 90000000 adrp x0, 0
    8: 910003fd mov x29, sp
    c: a90153f3 stp x19, x20, [sp,#16]
    10: 90000013 adrp x19, 0
    14: 91000000 add x0, x0, #0x0
    18: 91000273 add x19, x19, #0x0
    1c: f90013f5 str x21, [sp,#32]
    20: aa1303e1 mov x1, x19
    24: 12800175 mov w21, #0xfffffff4 // #-12
    28: 94000000 bl 0
    2c: f9002260 str x0, [x19,#64]
    30: b40001a0 cbz x0, 64
    34: 90000014 adrp x20, 0
    38: 91000294 add x20, x20, #0x0
    3c: 9101a280 add x0, x20, #0x68
    40: 94000000 bl 0
    44: 2a0003f5 mov w21, w0
    48: 35000080 cbnz w0, 58
    4c: aa1403e0 mov x0, x20
    50: 94000000 bl 0
    54: 14000004 b 64
    58: f9402260 ldr x0, [x19,#64]
    5c: 94000000 bl 0
    60: f900227f str xzr, [x19,#64]
    64: 2a1503e0 mov w0, w21
    68: f94013f5 ldr x21, [sp,#32]
    6c: a94153f3 ldp x19, x20, [sp,#16]
    70: a8c37bfd ldp x29, x30, [sp],#48
    74: d65f03c0 ret

    Add the possible error handle to free the net_header to remove the
    kmemleak warning

    Signed-off-by: Li RongQing
    Signed-off-by: David S. Miller

    Li RongQing
     

07 Oct, 2013

1 commit


19 Nov, 2012

3 commits

  • Get rid of duplicate code in net_ctl_permissions and fix the comment.

    Signed-off-by: Zhao Hongjiang
    Signed-off-by: Eric W. Biederman
    Signed-off-by: David S. Miller

    Zhao Hongjiang
     
  • - Allow anyone with CAP_NET_ADMIN rights in the user namespace of the
    the netowrk namespace to change sysctls.
    - Allow anyone the uid of the user namespace root the same
    permissions over the network namespace sysctls as the global root.
    - Allow anyone with gid of the user namespace root group the same
    permissions over the network namespace sysctl as the global root group.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • - Current is implicitly avaiable so passing current->nsproxy isn't useful.
    - The ctl_table_header is needed to find how the sysctl table is connected
    to the rest of sysctl.
    - ctl_table_root is avaiable in the ctl_table_header so no need to it.

    With these changes it becomes possible to write a version of
    net_sysctl_permission that takes into account the network namespace of
    the sysctl table, an important feature in extending the user namespace.

    Acked-by: Serge Hallyn
    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

16 May, 2012

1 commit

  • We are going to delete the Token ring support. This removes any
    special processing in the core networking for token ring, (aside
    from net/tr.c itself), leaving the drivers and remaining tokenring
    support present but inert.

    The mass removal of the drivers and net/tr.c will be in a separate
    commit, so that the history of these files that we still care
    about won't have the giant deletion tied into their history.

    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     

21 Apr, 2012

5 commits

  • All of the users have been converted to use registera_net_sysctl so we
    no longer need register_net_sysctl.

    Signed-off-by: Eric W. Biederman
    Acked-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • register_sysctl_rotable never caught on as an interesting way to
    register sysctls. My take on the situation is that what we want are
    sysctls that we can only see in the initial network namespace. What we
    have implemented with register_sysctl_rotable are sysctls that we can
    see in all of the network namespaces and can only change in the initial
    network namespace.

    That is a very silly way to go. Just register the network sysctls
    in the initial network namespace and we don't have any weird special
    cases to deal with.

    The sysctls affected are:
    /proc/sys/net/ipv4/ipfrag_secret_interval
    /proc/sys/net/ipv4/ipfrag_max_dist
    /proc/sys/net/ipv6/ip6frag_secret_interval
    /proc/sys/net/ipv6/mld_max_msf

    I really don't expect anyone will miss them if they can't read them in a
    child user namespace.

    CC: Pavel Emelyanov
    Signed-off-by: Eric W. Biederman
    Acked-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • If the netfilter code is modified to use register_net_sysctl_table the
    kernel fails to boot because the per net sysctl infrasturce is not setup
    soon enough. So to avoid races call net_sysctl_init from sock_init().

    Signed-off-by: Eric W. Biederman
    Acked-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • Implementation limitations of the sysctl core won't let /proc/sys/net
    reside in a network namespace. /proc/sys/net at least must be registered
    as a normal sysctl. So register /proc/sys/net early as an empty directory
    to guarantee we don't violate this constraint and hit bugs in the sysctl
    implementation.

    Signed-off-by: Eric W. Biederman
    Acked-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • Right now all of the networking sysctl registrations are running in a
    compatibiity mode. The natvie sysctl registration api takes a cstring
    for a path and a simple ctl_table. Implement register_net_sysctl so
    that we can register network sysctls without needing to use
    compatiblity code in the sysctl core.

    Switching from a ctl_path to a cstring results in less boiler plate
    and denser code that is a little easier to read.

    I would simply have changed the arguments to register_net_sysctl_table
    instead of keeping two functions in parallel but gcc will allow a
    ctl_path pointer to be passed to a char * pointer with only issuing a
    warning resulting in completely incorrect code can be built. Since I
    have to change the function name I am taking advantage of the situation
    to let both register_net_sysctl and register_net_sysctl_table live for a
    short time in parallel which makes clean conversion patches a bit easier
    to read and write.

    Signed-off-by: Eric W. Biederman
    Acked-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

25 Jan, 2012

5 commits


01 Nov, 2011

1 commit


18 May, 2010

1 commit

  • This patch removes from net/ (but not any netfilter files)
    all the unnecessary return; statements that precede the
    last closing brace of void functions.

    It does not remove the returns that are immediately
    preceded by a label as gcc doesn't like that.

    Done via:
    $ grep -rP --include=*.[ch] -l "return;\n}" net/ | \
    xargs perl -i -e 'local $/ ; while (<>) { s/\n[ \t\n]+return;\n}/\n}/g; print; }'

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     

18 Jan, 2010

1 commit


16 Mar, 2009

1 commit


28 Jul, 2008

1 commit

  • Piss-poor sysctl registration API strikes again, film at 11...

    What we really need is _pathname_ required to be present in already
    registered table, so that kernel could warn about bad order. That's the
    next target for sysctl stuff (and generally saner and more explicit
    order of initialization of ipv[46] internals wouldn't hurt either).

    For the time being, here are full fixups required by ..._rotable()
    stuff; we make per-net sysctl sets descendents of "ro" one and make sure
    that sufficient skeleton is there before we start registering per-net
    sysctls.

    Signed-off-by: Al Viro
    Signed-off-by: Linus Torvalds

    Al Viro
     

27 Jul, 2008

1 commit

  • New object: set of sysctls [currently - root and per-net-ns].
    Contains: pointer to parent set, list of tables and "should I see this set?"
    method (->is_seen(set)).
    Current lists of tables are subsumed by that; net-ns contains such a beast.
    ->lookup() for ctl_table_root returns pointer to ctl_table_set instead of
    that to ->list of that ctl_table_set.

    [folded compile fixes by rdd for configs without sysctl]

    Signed-off-by: Al Viro

    Al Viro
     

26 Jul, 2008

1 commit

  • Extend the permission check for networking sysctl's to allow modification
    when current process has CAP_NET_ADMIN capability and is not root. This
    version uses the until now unused permissions hook to override the mode
    value for /proc/sys/net if accessed by a user with capabilities.

    Found while working with Quagga. It is impossible to turn forwarding
    on/off through the command interface because Quagga uses secure coding
    practice of dropping privledges during initialization and only raising via
    capabilities when necessary. Since the dameon has reset real/effective
    uid after initialization, all attempts to access /proc/sys/net variables
    will fail.

    Signed-off-by: Stephen Hemminger
    Acked-by: "Eric W. Biederman"
    Cc: Chris Wright
    Cc: Alexey Dobriyan
    Cc: Andrew Morgan
    Cc: Pavel Emelyanov
    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stephen Hemminger
     

12 Jun, 2008

1 commit


20 May, 2008

1 commit


01 May, 2008

1 commit

  • drivers/net/8390.c:37:2: warning: returning void-valued expression
    drivers/net/bnx2.c:1635:3: warning: returning void-valued expression
    drivers/net/xen-netfront.c:1806:2: warning: returning void-valued expression
    net/ipv4/tcp_hybla.c:105:3: warning: returning void-valued expression
    net/ipv4/tcp_vegas.c:171:3: warning: returning void-valued expression
    net/ipv4/tcp_veno.c:123:3: warning: returning void-valued expression
    net/sysctl_net.c:85:2: warning: returning void-valued expression

    Signed-off-by: Harvey Harrison
    Acked-by: Alan Cox
    Signed-off-by: David S. Miller

    Harvey Harrison
     

29 Jan, 2008

5 commits

  • I have removed all the entries from this table (core_table,
    ipv4_table and tr_table), so now we can safely drop it.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • The same thing for token-ring - use ctl paths and get
    rid of external references on the tr_table.

    Unfortunately, I couldn't split this patch into cleanup and
    use-the-paths parts.

    As a lame excuse I can say, that the cleanup is just moving
    the tr_table from one file to another - closet to a single
    variable, that this ctl table tunes. Since the source file
    becomes empty after the move, I remove it.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • This is the same as I did for the net/core/ table in the
    second patch in his series: use the paths and isolate the
    whole table in the .c file.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • Using ctl paths we can put all the stuff, related to net/core/
    sysctl table, into one file and remove all the references on it.

    As a good side effect this hides the "core_table" name from
    the global scope :)

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • The user interface is: register_net_sysctl_table and
    unregister_net_sysctl_table. Very much like the current
    interface except there is a network namespace parameter.

    With this any sysctl registered with register_net_sysctl_table
    will only show up to tasks in the same network namespace.

    All other sysctls continue to be globally visible.

    Signed-off-by: Eric W. Biederman
    Cc: Serge Hallyn
    Cc: Daniel Lezcano
    Cc: Cedric Le Goater
    Cc: Pavel Emelyanov
    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

01 Jul, 2006

1 commit


06 Jun, 2006

1 commit


04 Oct, 2005

1 commit

  • During the build for ARM machine type "fortunet", this error occurred:

    CC net/sysctl_net.o
    net/sysctl_net.c:36: error: 'core_table' undeclared here (not in a function)

    It appears that the following configuration settings cause this error
    due to a missing include:
    CONFIG_SYSCTL=y
    CONFIG_NET=y
    # CONFIG_INET is not set

    core_table appears to be declared in net/sock.h. if CONFIG_INET were
    defined, net/sock.h would have been included via:
    sysctl_net.c -> net/ip.h -> linux/ip.h -> net/sock.h

    so include it directly.

    Signed-off-by: Russell King
    Signed-off-by: David S. Miller

    Russell King
     

30 Aug, 2005

1 commit

  • Of this type, mostly:

    CHECK net/ipv6/netfilter.c
    net/ipv6/netfilter.c:96:12: warning: symbol 'ipv6_netfilter_init' was not declared. Should it be static?
    net/ipv6/netfilter.c:101:6: warning: symbol 'ipv6_netfilter_fini' was not declared. Should it be static?

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo