04 Jan, 2012

1 commit


03 Nov, 2011

1 commit

  • Adding support for poll() in sysctl fs allows userspace to receive
    notifications of changes in sysctl entries. This adds a infrastructure to
    allow files in sysctl fs to be pollable and implements it for hostname and
    domainname.

    [akpm@linux-foundation.org: s/declare/define/ for definitions]
    Signed-off-by: Lucas De Marchi
    Cc: Greg KH
    Cc: Kay Sievers
    Cc: Al Viro
    Cc: "Eric W. Biederman"
    Cc: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lucas De Marchi
     

04 Oct, 2011

1 commit


10 Mar, 2011

1 commit


08 Mar, 2011

1 commit

  • a) struct inode is not going to be freed under ->d_compare();
    however, the thing PROC_I(inode)->sysctl points to just might.
    Fortunately, it's enough to make freeing that sucker delayed,
    provided that we don't step on its ->unregistering, clear
    the pointer to it in PROC_I(inode) before dropping the reference
    and check if it's NULL in ->d_compare().

    b) I'm not sure that we *can* walk into NULL inode here (we recheck
    dentry->seq between verifying that it's still hashed / fetching
    dentry->d_inode and passing it to ->d_compare() and there's no
    negative hashed dentries in /proc/sys/*), but if we can walk into
    that, we really should not have ->d_compare() return 0 on it!
    Said that, I really suspect that this check can be simply killed.
    Nick?

    Signed-off-by: Al Viro

    Al Viro
     

16 May, 2010

1 commit

  • The new function can be used to read/write large bitmaps via /proc. A
    comma separated range format is used for compact output and input
    (e.g. 1,3-4,10-10).

    Writing into the file will first reset the bitmap then update it
    based on the given input.

    Signed-off-by: Octavian Purdila
    Signed-off-by: WANG Cong
    Cc: Eric W. Biederman
    Signed-off-by: David S. Miller

    Octavian Purdila
     

17 Feb, 2010

2 commits


11 Jan, 2010

1 commit


07 Jan, 2010

1 commit

  • This is to be used together with switch technologies, like RFC3069,
    that where the individual ports are not allowed to communicate with
    each other, but they are allowed to talk to the upstream router. As
    described in RFC 3069, it is possible to allow these hosts to
    communicate through the upstream router by proxy_arp'ing.

    This patch basically allow proxy arp replies back to the same
    interface (from which the ARP request/solicitation was received).

    Tunable per device via proc "proxy_arp_pvlan":
    /proc/sys/net/ipv4/conf/*/proxy_arp_pvlan

    This switch technology is known by different vendor names:
    - In RFC 3069 it is called VLAN Aggregation.
    - Cisco and Allied Telesyn call it Private VLAN.
    - Hewlett-Packard call it Source-Port filtering or port-isolation.
    - Ericsson call it MAC-Forced Forwarding (RFC Draft).

    Signed-off-by: Jesper Dangaard Brouer
    Signed-off-by: David S. Miller

    Jesper Dangaard Brouer
     

26 Dec, 2009

1 commit

  • when using policy routing and the skb mark:
    there are cases where a back path validation requires us
    to use a different routing table for src ip validation than
    the one used for mapping ingress dst ip.
    One such a case is transparent proxying where we pretend to be
    the destination system and therefore the local table
    is used for incoming packets but possibly a main table would
    be used on outbound.
    Make the default behavior to allow the above and if users
    need to turn on the symmetry via sysctl src_valid_mark

    Signed-off-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Jamal Hadi Salim
     

10 Dec, 2009

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (42 commits)
    tree-wide: fix misspelling of "definition" in comments
    reiserfs: fix misspelling of "journaled"
    doc: Fix a typo in slub.txt.
    inotify: remove superfluous return code check
    hdlc: spelling fix in find_pvc() comment
    doc: fix regulator docs cut-and-pasteism
    mtd: Fix comment in Kconfig
    doc: Fix IRQ chip docs
    tree-wide: fix assorted typos all over the place
    drivers/ata/libata-sff.c: comment spelling fixes
    fix typos/grammos in Documentation/edac.txt
    sysctl: add missing comments
    fs/debugfs/inode.c: fix comment typos
    sgivwfb: Make use of ARRAY_SIZE.
    sky2: fix sky2_link_down copy/paste comment error
    tree-wide: fix typos "couter" -> "counter"
    tree-wide: fix typos "offest" -> "offset"
    fix kerneldoc for set_irq_msi()
    spidev: fix double "of of" in comment
    comment typo fix: sybsystem -> subsystem
    ...

    Linus Torvalds
     

08 Dec, 2009

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1815 commits)
    mac80211: fix reorder buffer release
    iwmc3200wifi: Enable wimax core through module parameter
    iwmc3200wifi: Add wifi-wimax coexistence mode as a module parameter
    iwmc3200wifi: Coex table command does not expect a response
    iwmc3200wifi: Update wiwi priority table
    iwlwifi: driver version track kernel version
    iwlwifi: indicate uCode type when fail dump error/event log
    iwl3945: remove duplicated event logging code
    b43: fix two warnings
    ipw2100: fix rebooting hang with driver loaded
    cfg80211: indent regulatory messages with spaces
    iwmc3200wifi: fix NULL pointer dereference in pmkid update
    mac80211: Fix TX status reporting for injected data frames
    ath9k: enable 2GHz band only if the device supports it
    airo: Fix integer overflow warning
    rt2x00: Fix padding bug on L2PAD devices.
    WE: Fix set events not propagated
    b43legacy: avoid PPC fault during resume
    b43: avoid PPC fault during resume
    tcp: fix a timewait refcnt race
    ...

    Fix up conflicts due to sysctl cleanups (dead sysctl_check code and
    CTL_UNNUMBERED removed) in
    kernel/sysctl_check.c
    net/ipv4/sysctl_net_ipv4.c
    net/ipv6/addrconf.c
    net/sctp/sysctl.c

    Linus Torvalds
     

04 Dec, 2009

2 commits


19 Nov, 2009

1 commit


18 Nov, 2009

1 commit


12 Nov, 2009

1 commit


11 Nov, 2009

1 commit

  • The ctl_name and strategy fields are unused, now that sys_sysctl
    is a compatibility wrapper around /proc/sys. No longer looking
    at them in the generic code is effectively what we are doing
    now and provides the guarantee that during further cleanups
    we can just remove references to those fields and everything
    will work ok.

    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     

06 Nov, 2009

1 commit


24 Sep, 2009

1 commit

  • It's unused.

    It isn't needed -- read or write flag is already passed and sysctl
    shouldn't care about the rest.

    It _was_ used in two places at arch/frv for some reason.

    Signed-off-by: Alexey Dobriyan
    Cc: David Howells
    Cc: "Eric W. Biederman"
    Cc: Al Viro
    Cc: Ralf Baechle
    Cc: Martin Schwidefsky
    Cc: Ingo Molnar
    Cc: "David S. Miller"
    Cc: James Morris
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

01 Feb, 2009

1 commit


17 Oct, 2008

1 commit

  • name and nlen parameters passed to ->strategy hook are unused, remove
    them. In general ->strategy hook should know what it's doing, and don't
    do something tricky for which, say, pointer to original userspace array
    may be needed (name).

    Signed-off-by: Alexey Dobriyan
    Acked-by: David S. Miller [ networking bits ]
    Cc: Ralf Baechle
    Cc: David Howells
    Cc: Matt Mackall
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

27 Jul, 2008

4 commits

  • * keep references to ctl_table_head and ctl_table in /proc/sys inodes
    * grab the former during operations, use the latter for access to
    entry if that succeeds
    * have ->d_compare() check if table should be seen for one who does lookup;
    that allows us to avoid flipping inodes - if we have the same name resolve
    to different things, we'll just keep several dentries and ->d_compare()
    will reject the wrong ones.
    * have ->lookup() and ->readdir() scan the table of our inode first, then
    walk all ctl_table_header and scan ->attached_by for those that are
    attached to our directory.
    * implement ->getattr().
    * get rid of insane amounts of tree-walking
    * get rid of the need to know dentry in ->permission() and of the contortions
    induced by that.

    Signed-off-by: Al Viro

    Al Viro
     
  • In a sense, that's the heart of the series. It's based on the following
    property of the trees we are actually asked to add: they can be split into
    stem that is already covered by registered trees and crown that is entirely
    new. IOW, if a/b and a/c/d are introduced by our tree, then a/c is also
    introduced by it.

    That allows to associate tree and table entry with each node in the union;
    while directory nodes might be covered by many trees, only one will cover
    the node by its crown. And that will allow much saner logics for /proc/sys
    in the next patches. This patch introduces the data structures needed to
    keep track of that.

    When adding a sysctl table, we find a "parent" one. Which is to say,
    find the deepest node on its stem that already is present in one of the
    tables from our table set or its ancestor sets. That table will be our
    parent and that node in it - attachment point. Add our table to list
    anchored in parent, have it refer the parent and contents of attachment
    point. Also remember where its crown lives.

    Signed-off-by: Al Viro

    Al Viro
     
  • Refcount the sucker; instead of freeing it by the end of unregistration
    just drop the refcount and free only when it hits zero. Make sure that
    we _always_ make ->unregistering non-NULL in start_unregistering().

    That allows anybody to get a reference to such puppy, preventing its
    freeing and reuse. It does *not* block unregistration. Anybody who
    holds such a reference can
    * try to grab a "use" reference (ctl_head_grab()); that will
    succeeds if and only if it hadn't entered unregistration yet. If it
    succeeds, we can use it in all normal ways until we release the "use"
    reference (with ctl_head_finish()). Note that this relies on having
    ->unregistering become non-NULL in all cases when one starts to unregister
    the sucker.
    * keep pointers to ctl_table entries; they *can* be freed if
    the entire thing is unregistered. However, if ctl_head_grab() succeeds,
    we know that unregistration had not happened (and will not happen until
    ctl_head_finish()) and such pointers can be used safely.

    IOW, now we can have inodes under /proc/sys keep references to ctl_table
    entries, protecting them with references to ctl_table_header and
    grabbing the latter for the duration of operations that require access
    to ctl_table. That won't cause deadlocks, since unregistration will not
    be stopped by mere keeping a reference to ctl_table_header.

    Signed-off-by: Al Viro

    Al Viro
     
  • New object: set of sysctls [currently - root and per-net-ns].
    Contains: pointer to parent set, list of tables and "should I see this set?"
    method (->is_seen(set)).
    Current lists of tables are subsumed by that; net-ns contains such a beast.
    ->lookup() for ctl_table_root returns pointer to ctl_table_set instead of
    that to ->list of that ctl_table_set.

    [folded compile fixes by rdd for configs without sysctl]

    Signed-off-by: Al Viro

    Al Viro
     

29 Apr, 2008

3 commits

  • When reading from/writing to some table, a root, which this table came from,
    may affect this table's permissions, depending on who is working with the
    table.

    The core hunk is at the bottom of this patch. All the rest is just pushing
    the ctl_table_root argument up to the sysctl_perm() function.

    This will be mostly (only?) used in the net sysctls.

    Signed-off-by: Pavel Emelyanov
    Acked-by: David S. Miller
    Cc: "Eric W. Biederman"
    Cc: Alexey Dobriyan
    Cc: Denis V. Lunev
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • The do_sysctl_strategy isn't used outside kernel/sysctl.c, so this can be
    static and without a prototype in header.

    Besides, move this one and parse_table() above their callers and drop the
    forward declarations of the latter call.

    One more "besides" - fix two checkpatch warnings: space before a ( and an
    extra space at the end of a line.

    Signed-off-by: Pavel Emelyanov
    Acked-by: David S. Miller
    Cc: "Eric W. Biederman"
    Cc: Alexey Dobriyan
    Cc: Denis V. Lunev
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • Remove an empty #else.

    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     

06 Feb, 2008

1 commit

  • The capability bounding set is a set beyond which capabilities cannot grow.
    Currently cap_bset is per-system. It can be manipulated through sysctl,
    but only init can add capabilities. Root can remove capabilities. By
    default it includes all caps except CAP_SETPCAP.

    This patch makes the bounding set per-process when file capabilities are
    enabled. It is inherited at fork from parent. Noone can add elements,
    CAP_SETPCAP is required to remove them.

    One example use of this is to start a safer container. For instance, until
    device namespaces or per-container device whitelists are introduced, it is
    best to take CAP_MKNOD away from a container.

    The bounding set will not affect pP and pE immediately. It will only
    affect pP' and pE' after subsequent exec()s. It also does not affect pI,
    and exec() does not constrain pI'. So to really start a shell with no way
    of regain CAP_MKNOD, you would do

    prctl(PR_CAPBSET_DROP, CAP_MKNOD);
    cap_t cap = cap_get_proc();
    cap_value_t caparray[1];
    caparray[0] = CAP_MKNOD;
    cap_set_flag(cap, CAP_INHERITABLE, 1, caparray, CAP_DROP);
    cap_set_proc(cap);
    cap_free(cap);

    The following test program will get and set the bounding
    set (but not pI). For instance

    ./bset get
    (lists capabilities in bset)
    ./bset drop cap_net_raw
    (starts shell with new bset)
    (use capset, setuid binary, or binary with
    file capabilities to try to increase caps)

    ************************************************************
    cap_bound.c
    ************************************************************
    #include
    #include
    #include
    #include
    #include
    #include
    #include

    #ifndef PR_CAPBSET_READ
    #define PR_CAPBSET_READ 23
    #endif

    #ifndef PR_CAPBSET_DROP
    #define PR_CAPBSET_DROP 24
    #endif

    int usage(char *me)
    {
    printf("Usage: %s get\n", me);
    printf(" %s drop \n", me);
    return 1;
    }

    #define numcaps 32
    char *captable[numcaps] = {
    "cap_chown",
    "cap_dac_override",
    "cap_dac_read_search",
    "cap_fowner",
    "cap_fsetid",
    "cap_kill",
    "cap_setgid",
    "cap_setuid",
    "cap_setpcap",
    "cap_linux_immutable",
    "cap_net_bind_service",
    "cap_net_broadcast",
    "cap_net_admin",
    "cap_net_raw",
    "cap_ipc_lock",
    "cap_ipc_owner",
    "cap_sys_module",
    "cap_sys_rawio",
    "cap_sys_chroot",
    "cap_sys_ptrace",
    "cap_sys_pacct",
    "cap_sys_admin",
    "cap_sys_boot",
    "cap_sys_nice",
    "cap_sys_resource",
    "cap_sys_time",
    "cap_sys_tty_config",
    "cap_mknod",
    "cap_lease",
    "cap_audit_write",
    "cap_audit_control",
    "cap_setfcap"
    };

    int getbcap(void)
    {
    int comma=0;
    unsigned long i;
    int ret;

    printf("i know of %d capabilities\n", numcaps);
    printf("capability bounding set:");
    for (i=0; i< 0)
    perror("prctl");
    else if (ret==1)
    printf("%s%s", (comma++) ? ", " : " ", captable[i]);
    }
    printf("\n");
    return 0;
    }

    int capdrop(char *str)
    {
    unsigned long i;

    int found=0;
    for (i=0; i
    Signed-off-by: Andrew G. Morgan
    Cc: Stephen Smalley
    Cc: James Morris
    Cc: Chris Wright
    Cc: Casey Schaufler a
    Signed-off-by: "Serge E. Hallyn"
    Tested-by: Jiri Slaby
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Serge E. Hallyn
     

01 Feb, 2008

1 commit

  • Current ip route cache implementation is not suited to large caches.

    We can consume a lot of CPU when cache must be invalidated, since we
    currently need to evict all cache entries, and this eviction is
    sometimes asynchronous. min_delay & max_delay can somewhat control this
    asynchronism behavior, but whole thing is a kludge, regularly triggering
    infamous soft lockup messages. When entries are still in use, this also
    consumes a lot of ram, filling dst_garbage.list.

    A better scheme is to use a generation identifier on each entry,
    so that cache invalidation can be performed by changing the table
    identifier, without having to scan all entries.
    No more delayed flushing, no more stalling when secret_interval expires.

    Invalidated entries will then be freed at GC time (controled by
    ip_rt_gc_timeout or stress), or when an invalidated entry is found
    in a chain when an insert is done.
    Thus we keep a normal equilibrium.

    This patch :
    - renames rt_hash_rnd to rt_genid (and makes it an atomic_t)
    - Adds a new rt_genid field to 'struct rtable' (filling a hole on 64bit)
    - Checks entry->rt_genid at appropriate places :

    Eric Dumazet
     

29 Jan, 2008

3 commits

  • This patch implements the basic infrastructure for per namespace sysctls.

    A list of lists of sysctl headers is added, allowing each namespace to have
    it's own list of sysctl headers.

    Each list of sysctl headers has a lookup function to find the first
    sysctl header in the list, allowing the lists to have a per namespace
    instance.

    register_sysct_root is added to tell sysctl.c about additional
    lists of sysctl_headers. As all of the users are expected to be in
    kernel no unregister function is provided.

    sysctl_head_next is updated to walk through the list of lists.

    __register_sysctl_paths is added to add a new sysctl table on
    a non-default sysctl list.

    The only intrusive part of this patch is propagating the information
    to decided which list of sysctls to use for sysctl_check_table.

    Signed-off-by: Eric W. Biederman
    Cc: Serge Hallyn
    Cc: Daniel Lezcano
    Cc: Cedric Le Goater
    Cc: Pavel Emelyanov
    Signed-off-by: Andrew Morton
    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • By doing this we allow users of register_sysctl_paths that build
    and dynamically allocate their ctl_table to be simpler. This allows
    them to just remember the ctl_table_header returned from
    register_sysctl_paths from which they can now find the
    ctl_table array they need to free.

    Signed-off-by: Eric W. Biederman
    Cc: Serge Hallyn
    Cc: Daniel Lezcano
    Cc: Cedric Le Goater
    Cc: Pavel Emelyanov
    Signed-off-by: Andrew Morton
    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • There are a number of modules that register a sysctl table
    somewhere deeply nested in the sysctl hierarchy, such as
    fs/nfs, fs/xfs, dev/cdrom, etc.

    They all specify several dummy ctl_tables for the path name.
    This patch implements register_sysctl_path that takes
    an additional path name, and makes up dummy sysctl nodes
    for each component.

    This patch was originally written by Olaf Kirch and
    brought to my attention and reworked some by Olaf Hering.
    I have changed a few additional things so the bugs are mine.

    After converting all of the easy callers Olaf Hering observed
    allyesconfig ARCH=i386, the patch reduces the final binary size by 9369 bytes.

    .text +897
    .data -7008

    text data bss dec hex filename
    26959310 4045899 4718592 35723801 2211a19 ../vmlinux-vanilla
    26960207 4038891 4718592 35717690 221023a ../O-allyesconfig/vmlinux

    So this change is both a space savings and a code simplification.

    CC: Olaf Kirch
    CC: Olaf Hering
    Signed-off-by: Eric W. Biederman
    Cc: Serge Hallyn
    Cc: Daniel Lezcano
    Cc: Cedric Le Goater
    Cc: Pavel Emelyanov
    Signed-off-by: Andrew Morton
    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

20 Nov, 2007

2 commits


19 Oct, 2007

3 commits

  • After going through the kernels sysctl tables several times it has become
    clear that code review and testing is just not effective in prevent
    problematic sysctl tables from being used in the stable kernel. I certainly
    can't seem to fix the problems as fast as they are introduced.

    Therefore this patch adds sysctl_check_table which is called when a sysctl
    table is registered and checks to see if we have a problematic sysctl table.

    The biggest part of the code is the table of valid binary sysctl entries, but
    since we have frozen our set of binary sysctls this table should not need to
    change, and it makes it much easier to detect when someone unintentionally
    adds a new binary sysctl value.

    As best as I can determine all of the several hundred errors spewed on boot up
    now are legitimate.

    [bunk@kernel.org: kernel/sysctl_check.c must #include ]
    Signed-off-by: Eric W. Biederman
    Cc: Alexey Dobriyan
    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • Grumble. These numbers should have been in sysctl.h from the beginning if we
    ever expected anyone to use them. Oh well put them there now so we can find
    them and make maintenance easier.

    Signed-off-by: Eric W. Biederman
    Acked-by: Samuel Ortiz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • There as been no easy way to wrap the default sysctl strategy routine except
    for returning 0. Which is not always what we want. The few instances I have
    seen that want different behaviour have written their own version of
    sysctl_data. While not too hard it is unnecessary code and has the potential
    for extra bugs.

    So to make these situations easier and make that part of sysctl more symetric
    I have factord sysctl_data out of do_sysctl_strategy and exported as a
    function everyone can use.

    Further having sysctl_data be an explicit function makes checking for badly
    formed sysctl tables much easier.

    Signed-off-by: Eric W. Biederman
    Cc: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman