01 Feb, 2009

1 commit


30 Oct, 2008

1 commit

  • On Linux all filesystems are supposed to be operating under Posix'
    restricted chown. Restricted chown means it restricts chown to the owner
    unless you have CAP_FOWNER.

    NOTE: that 2 files outside of fs/xfs have been modified too for this
    change.

    Reviewed-by: Dave Chinner

    SGI-PV: 988919

    SGI-Modid: 2.6.x-xfs-melb:linux:32413b

    Signed-off-by: Tim Shimmin
    Signed-off-by: Christoph Hellwig
    Signed-off-by: David Chinner
    Signed-off-by: Lachlan McIlroy

    Tim Shimmin
     

26 Jul, 2008

1 commit


09 Feb, 2008

1 commit

  • Remains the question whether it is intended that many, perhaps even large,
    tables are compiled in without ever having a chance to get used, i.e.
    whether there shouldn't #ifdef CONFIG_xxx get added.

    [akpm@linux-foundation.org: fix cut-n-paste error]
    Signed-off-by: Jan Beulich
    Acked-by: "Eric W. Biederman"
    Cc: Dave Jones
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Beulich
     

06 Feb, 2008

1 commit

  • The capability bounding set is a set beyond which capabilities cannot grow.
    Currently cap_bset is per-system. It can be manipulated through sysctl,
    but only init can add capabilities. Root can remove capabilities. By
    default it includes all caps except CAP_SETPCAP.

    This patch makes the bounding set per-process when file capabilities are
    enabled. It is inherited at fork from parent. Noone can add elements,
    CAP_SETPCAP is required to remove them.

    One example use of this is to start a safer container. For instance, until
    device namespaces or per-container device whitelists are introduced, it is
    best to take CAP_MKNOD away from a container.

    The bounding set will not affect pP and pE immediately. It will only
    affect pP' and pE' after subsequent exec()s. It also does not affect pI,
    and exec() does not constrain pI'. So to really start a shell with no way
    of regain CAP_MKNOD, you would do

    prctl(PR_CAPBSET_DROP, CAP_MKNOD);
    cap_t cap = cap_get_proc();
    cap_value_t caparray[1];
    caparray[0] = CAP_MKNOD;
    cap_set_flag(cap, CAP_INHERITABLE, 1, caparray, CAP_DROP);
    cap_set_proc(cap);
    cap_free(cap);

    The following test program will get and set the bounding
    set (but not pI). For instance

    ./bset get
    (lists capabilities in bset)
    ./bset drop cap_net_raw
    (starts shell with new bset)
    (use capset, setuid binary, or binary with
    file capabilities to try to increase caps)

    ************************************************************
    cap_bound.c
    ************************************************************
    #include
    #include
    #include
    #include
    #include
    #include
    #include

    #ifndef PR_CAPBSET_READ
    #define PR_CAPBSET_READ 23
    #endif

    #ifndef PR_CAPBSET_DROP
    #define PR_CAPBSET_DROP 24
    #endif

    int usage(char *me)
    {
    printf("Usage: %s get\n", me);
    printf(" %s drop \n", me);
    return 1;
    }

    #define numcaps 32
    char *captable[numcaps] = {
    "cap_chown",
    "cap_dac_override",
    "cap_dac_read_search",
    "cap_fowner",
    "cap_fsetid",
    "cap_kill",
    "cap_setgid",
    "cap_setuid",
    "cap_setpcap",
    "cap_linux_immutable",
    "cap_net_bind_service",
    "cap_net_broadcast",
    "cap_net_admin",
    "cap_net_raw",
    "cap_ipc_lock",
    "cap_ipc_owner",
    "cap_sys_module",
    "cap_sys_rawio",
    "cap_sys_chroot",
    "cap_sys_ptrace",
    "cap_sys_pacct",
    "cap_sys_admin",
    "cap_sys_boot",
    "cap_sys_nice",
    "cap_sys_resource",
    "cap_sys_time",
    "cap_sys_tty_config",
    "cap_mknod",
    "cap_lease",
    "cap_audit_write",
    "cap_audit_control",
    "cap_setfcap"
    };

    int getbcap(void)
    {
    int comma=0;
    unsigned long i;
    int ret;

    printf("i know of %d capabilities\n", numcaps);
    printf("capability bounding set:");
    for (i=0; i< 0)
    perror("prctl");
    else if (ret==1)
    printf("%s%s", (comma++) ? ", " : " ", captable[i]);
    }
    printf("\n");
    return 0;
    }

    int capdrop(char *str)
    {
    unsigned long i;

    int found=0;
    for (i=0; i
    Signed-off-by: Andrew G. Morgan
    Cc: Stephen Smalley
    Cc: James Morris
    Cc: Chris Wright
    Cc: Casey Schaufler a
    Signed-off-by: "Serge E. Hallyn"
    Tested-by: Jiri Slaby
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Serge E. Hallyn
     

29 Jan, 2008

1 commit

  • This patch implements the basic infrastructure for per namespace sysctls.

    A list of lists of sysctl headers is added, allowing each namespace to have
    it's own list of sysctl headers.

    Each list of sysctl headers has a lookup function to find the first
    sysctl header in the list, allowing the lists to have a per namespace
    instance.

    register_sysct_root is added to tell sysctl.c about additional
    lists of sysctl_headers. As all of the users are expected to be in
    kernel no unregister function is provided.

    sysctl_head_next is updated to walk through the list of lists.

    __register_sysctl_paths is added to add a new sysctl table on
    a non-default sysctl list.

    The only intrusive part of this patch is propagating the information
    to decided which list of sysctls to use for sysctl_check_table.

    Signed-off-by: Eric W. Biederman
    Cc: Serge Hallyn
    Cc: Daniel Lezcano
    Cc: Cedric Le Goater
    Cc: Pavel Emelyanov
    Signed-off-by: Andrew Morton
    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

26 Jan, 2008

1 commit


18 Dec, 2007

1 commit

  • Fix:

    sysctl table check failed: /net/ax25/ax0/ax25_default_mode .3.9.1.2 Unknown
    sysctl binary path
    Pid: 2936, comm: kissattach Not tainted 2.6.24-rc5 #1
    [] set_fail+0x3b/0x43
    [] sysctl_check_table+0x408/0x456
    [] sysctl_check_table+0x41c/0x456
    [] sysctl_check_table+0x41c/0x456
    ...

    Signed-off-by: Eric W. Biederman
    Cc: Bernard Pidoux
    Cc: "David S. Miller"
    Cc: "Rafael J. Wysocki"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     

05 Dec, 2007

1 commit


27 Nov, 2007

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/net-2.6: (41 commits)
    [XFRM]: Fix leak of expired xfrm_states
    [ATM]: [he] initialize lock and tasklet earlier
    [IPV4]: Remove bogus ifdef mess in arp_process
    [SKBUFF]: Free old skb properly in skb_morph
    [IPV4]: Fix memory leak in inet_hashtables.h when NUMA is on
    [IPSEC]: Temporarily remove locks around copying of non-atomic fields
    [TCP] MTUprobe: Cleanup send queue check (no need to loop)
    [TCP]: MTUprobe: receiver window & data available checks fixed
    [MAINTAINERS]: tlan list is subscribers-only
    [SUNRPC]: Remove SPIN_LOCK_UNLOCKED
    [SUNRPC]: Make xprtsock.c:xs_setup_{udp,tcp}() static
    [PFKEY]: Sending an SADB_GET responds with an SADB_GET
    [IRDA]: Compilation for CONFIG_INET=n case
    [IPVS]: Fix compiler warning about unused register_ip_vs_protocol
    [ARP]: Fix arp reply when sender ip 0
    [IPV6] TCPMD5: Fix deleting key operation.
    [IPV6] TCPMD5: Check return value of tcp_alloc_md5sig_pool().
    [IPV4] TCPMD5: Use memmove() instead of memcpy() because we have overlaps.
    [IPV4] TCPMD5: Omit redundant NULL check for kfree() argument.
    ieee80211: Stop net_ratelimit/IEEE80211_DEBUG_DROP log pollution
    ...

    Linus Torvalds
     

20 Nov, 2007

5 commits

  • Remove binary sysctls that never worked due to missing strategy functions.

    Cc: "Eric W. Biederman"
    Cc: Christian Borntraeger
    Cc: Gerald Schaefer
    Signed-off-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Heiko Carstens
     
  • Remove binary sysctls that never worked due to missing strategy functions.

    Cc: Christian Borntraeger
    Signed-off-by: Heiko Carstens
    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Martin Schwidefsky

    Heiko Carstens
     
  • Switch the remaining IPVS sysctl entries over to to use CTL_UNNUMBERED,
    I stronly doubt that anyone is using the sys_sysctl interface to
    these variables.

    Signed-off-by: Simon Horman
    Signed-off-by: David S. Miller

    Simon Horman
     
  • sysctl table check failed: /net/ipv4/vs/lblc_expiration .3.5.21.19 Missing strategy
    [...]
    sysctl table check failed: /net/ipv4/vs/lblcr_expiration .3.5.21.20 Missing strategy

    Switch these entried over to use CTL_UNNUMBERED as clearly
    the sys_syscal portion wasn't working.

    This is along the same lines as Christian Borntraeger's patch that fixes
    up entries with no stratergy in net/ipv4/ipvs/ip_vs_ctl.c

    Signed-off-by: Simon Horman
    Signed-off-by: David S. Miller

    Simon Horman
     
  • Running the latest git code I get the following messages during boot:
    sysctl table check failed: /net/ipv4/vs/drop_entry .3.5.21.4 Missing strategy
    [...]
    sysctl table check failed: /net/ipv4/vs/drop_packet .3.5.21.5 Missing strategy
    [...]
    sysctl table check failed: /net/ipv4/vs/secure_tcp .3.5.21.6 Missing strategy
    [...]
    sysctl table check failed: /net/ipv4/vs/sync_threshold .3.5.21.24 Missing strategy

    I removed the binary sysctl handler for those messages and also removed
    the definitions in ip_vs.h. The alternative would be to implement a
    proper strategy handler, but syscall sysctl is deprecated.

    There are other sysctl definitions that are commented out or work with
    the default sysctl_data strategy. I did not touch these.

    Signed-off-by: Christian Borntraeger
    Acked-by: Simon Horman
    Signed-off-by: David S. Miller

    Christian Borntraeger
     

14 Nov, 2007

1 commit


06 Nov, 2007

1 commit


23 Oct, 2007

1 commit

  • Gabriel C reported that modprobing appletalk on current git gives a
    warning in dmesg :

    "sysctl table check failed: /net/appletalk .3.7 procname does not match binary path procname"

    Oops. My apologies it appears I made a mistake when creating my table
    to check up on sysctl values.

    Signed-off-by: "Eric W. Biederman"
    Tested-by: Gabriel C
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     

19 Oct, 2007

4 commits

  • The non-filesystem capability meaning of CAP_SETPCAP is that a process, p1,
    can change the capabilities of another process, p2. This is not the
    meaning that was intended for this capability at all, and this
    implementation came about purely because, without filesystem capabilities,
    there was no way to use capabilities without one process bestowing them on
    another.

    Since we now have a filesystem support for capabilities we can fix the
    implementation of CAP_SETPCAP.

    The most significant thing about this change is that, with it in effect, no
    process can set the capabilities of another process.

    The capabilities of a program are set via the capability convolution
    rules:

    pI(post-exec) = pI(pre-exec)
    pP(post-exec) = (X(aka cap_bset) & fP) | (pI(post-exec) & fI)
    pE(post-exec) = fE ? pP(post-exec) : 0

    at exec() time. As such, the only influence the pre-exec() program can
    have on the post-exec() program's capabilities are through the pI
    capability set.

    The correct implementation for CAP_SETPCAP (and that enabled by this patch)
    is that it can be used to add extra pI capabilities to the current process
    - to be picked up by subsequent exec()s when the above convolution rules
    are applied.

    Here is how it works:

    Let's say we have a process, p. It has capability sets, pE, pP and pI.
    Generally, p, can change the value of its own pI to pI' where

    (pI' & ~pI) & ~pP = 0.

    That is, the only new things in pI' that were not present in pI need to
    be present in pP.

    The role of CAP_SETPCAP is basically to permit changes to pI beyond
    the above:

    if (pE & CAP_SETPCAP) {
    pI' = anything; /* ie., even (pI' & ~pI) & ~pP != 0 */
    }

    This capability is useful for things like login, which (say, via
    pam_cap) might want to raise certain inheritable capabilities for use
    by the children of the logged-in user's shell, but those capabilities
    are not useful to or needed by the login program itself.

    One such use might be to limit who can run ping. You set the
    capabilities of the 'ping' program to be "= cap_net_raw+i", and then
    only shells that have (pI & CAP_NET_RAW) will be able to run
    it. Without CAP_SETPCAP implemented as described above, login(pam_cap)
    would have to also have (pP & CAP_NET_RAW) in order to raise this
    capability and pass it on through the inheritable set.

    Signed-off-by: Andrew Morgan
    Signed-off-by: Serge E. Hallyn
    Cc: Stephen Smalley
    Cc: James Morris
    Cc: Casey Schaufler
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morgan
     
  • It turns out that the net/irda code didn't register any of it's binary paths
    in the global sysctl.h header file so I missed them completely when making an
    authoritative list of binary sysctl paths in the kernel. So add them to the
    list of valid binary sysctl paths.

    Signed-off-by: Eric W. Biederman
    Acked-by: Samuel Ortiz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • Well it turns out after I dug into the problems a little more I was returning
    a few false positives so this patch updates my logic to remove them.

    - Don't complain about 0 ctl_names in sysctl_check_binary_path
    It is valid for someone to remove the sysctl binary interface
    and still keep the same sysctl proc interface.

    - Count ctl_names and procnames as matching if they both don't
    exist.

    - Only warn about missing min&max when the generic functions care.

    Signed-off-by: Eric W. Biederman
    Cc: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • After going through the kernels sysctl tables several times it has become
    clear that code review and testing is just not effective in prevent
    problematic sysctl tables from being used in the stable kernel. I certainly
    can't seem to fix the problems as fast as they are introduced.

    Therefore this patch adds sysctl_check_table which is called when a sysctl
    table is registered and checks to see if we have a problematic sysctl table.

    The biggest part of the code is the table of valid binary sysctl entries, but
    since we have frozen our set of binary sysctls this table should not need to
    change, and it makes it much easier to detect when someone unintentionally
    adds a new binary sysctl value.

    As best as I can determine all of the several hundred errors spewed on boot up
    now are legitimate.

    [bunk@kernel.org: kernel/sysctl_check.c must #include ]
    Signed-off-by: Eric W. Biederman
    Cc: Alexey Dobriyan
    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman