19 Nov, 2012

1 commit

  • - Current is implicitly avaiable so passing current->nsproxy isn't useful.
    - The ctl_table_header is needed to find how the sysctl table is connected
    to the rest of sysctl.
    - ctl_table_root is avaiable in the ctl_table_header so no need to it.

    With these changes it becomes possible to write a version of
    net_sysctl_permission that takes into account the network namespace of
    the sysctl table, an important feature in extending the user namespace.

    Acked-by: Serge Hallyn
    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

13 Oct, 2012

1 commit


25 Jan, 2012

16 commits

  • The plan is to convert all callers of register_sysctl_table
    and register_sysctl_paths to register_sysctl. The interface
    to register_sysctl is enough nicer this should make the callers
    a bit more readable. Additionally after the conversion the
    230 lines of backwards compatibility can be removed.

    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     
  • One of the most important jobs of sysctl is to export network stack
    tunables. Several of those tunables are per network device. In
    several instances people are running with 1000+ network devices in
    there network stacks, which makes the simple per directory linked list
    in sysctl a scaling bottleneck. Replace O(N^2) sysctl insertion and
    lookup times with O(NlogN) by using an rbtree to index the sysctl
    directories.

    Benchmark before:
    make-dummies 0 999 -> 0.32s
    rmmod dummy -> 0.12s
    make-dummies 0 9999 -> 1m17s
    rmmod dummy -> 17s

    Benchmark after:
    make-dummies 0 999 -> 0.074s
    rmmod dummy -> 0.070s
    make-dummies 0 9999 -> 3.4s
    rmmod dummy -> 0.44s

    Benchmark after (without dev_snmp6):
    make-dummies 0 9999 -> 0.75s
    rmmod dummy -> 0.44s
    make-dummies 0 99999 -> 11s
    rmmod dummy -> 4.3s

    At 10,000 dummy devices the bottleneck becomes the time to add and
    remove the files under /proc/sys/net/dev_snmp6. I have commented
    out the code that adds and removes files under /proc/sys/net/dev_snmp6
    and taken measurments of creating and destroying 100,000 dummies to
    verify the sysctl continues to scale.

    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     
  • Slightly enhance efficiency and clarity of the code by making the
    header list per directory instead of per set.

    Benchmark before:
    make-dummies 0 999 -> 0.63s
    rmmod dummy -> 0.12s
    make-dummies 0 9999 -> 2m35s
    rmmod dummy -> 18s

    Benchmark after:
    make-dummies 0 999 -> 0.32s
    rmmod dummy -> 0.12s
    make-dummies 0 9999 -> 1m17s
    rmmod dummy -> 17s

    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     
  • An nsproxy argument here has always been awkard and now the nsproxy argument
    is completely unnecessary so remove it, replacing it with the set we want
    the registered tables to show up in.

    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     
  • Piecing together directories by looking first in one directory
    tree, than in another directory tree and finally in a third
    directory tree makes it hard to verify that some directory
    entries are not multiply defined and makes it hard to create
    efficient implementations the sysctl filesystem.

    Replace the sysctl wide list of roots with autogenerated
    links from the core sysctl directory tree to the other
    sysctl directory trees.

    This simplifies sysctl directory reading and lookups as now
    only entries in a single sysctl directory tree need to be
    considered.

    Benchmark before:
    make-dummies 0 999 -> 0.44s
    rmmod dummy -> 0.065s
    make-dummies 0 9999 -> 1m36s
    rmmod dummy -> 0.4s

    Benchmark after:
    make-dummies 0 999 -> 0.63s
    rmmod dummy -> 0.12s
    make-dummies 0 9999 -> 2m35s
    rmmod dummy -> 18s

    The slowdown is caused by the lookups used in insert_headers
    and put_links to see if we need to add links or remove links.

    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     
  • Simplify the code and the sysctl semantics by autogenerating
    sysctl directories when a sysctl table is registered that needs
    the directories and autodeleting the directories when there are
    no more sysctl tables registered that need them.

    Autogenerating directories keeps sysctl tables from depending
    on each other, removing all of the arcane register/unregister
    ordering constraints and makes it impossible to get the order
    wrong when reigsering and unregistering sysctl tables.

    Autogenerating directories yields one unique entity that dentries
    can point to, retaining the current effective use of the dcache.

    Add struct ctl_dir as the type of these new autogenerated
    directories.

    The attached_by and attached_to fields in ctl_table_header are
    removed as they are no longer needed.

    The child field in ctl_table is no longer needed by the core of
    the sysctl code. ctl_table.child can be removed once all of the
    existing users have been updated.

    Benchmark before:
    make-dummies 0 999 -> 0.7s
    rmmod dummy -> 0.07s
    make-dummies 0 9999 -> 1m10s
    rmmod dummy -> 0.4s

    Benchmark after:
    make-dummies 0 999 -> 0.44s
    rmmod dummy -> 0.065s
    make-dummies 0 9999 -> 1m36s
    rmmod dummy -> 0.4s

    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     
  • Add a ctl_table_root pointer to ctl_table set so it is easy to
    go from a ctl_table_set to a ctl_table_root.

    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     
  • Add nreg to ctl_table_header. When nreg drops to 0 the ctl_table_header
    will be unregistered.

    Factor out drop_sysctl_table from unregister_sysctl_table, and add
    the logic for decrementing nreg.

    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     
  • While useful at one time for selinux and the sysctl sanity
    checks those users no longer use the parent field and we can
    safely remove it.

    Inspired-by: Lucian Adrian Grijincu
    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     
  • Split the registration of a complex ctl_table array which may have
    arbitrary numbers of directories (->child != NULL) and tables of files
    into a series of simpler registrations that only register tables of files.

    Graphically:

    register('dir', { + file-a
    + file-b
    + subdir1
    + file-c
    + subdir2
    + file-d
    + file-e })

    is transformed into:
    wrapper->subheaders[0] = register('dir', {file1-a, file1-b})
    wrapper->subheaders[1] = register('dir/subdir1', {file-c})
    wrapper->subheaders[2] = register('dir/subdir2', {file-d, file-e})
    return wrapper

    This guarantees that __register_sysctl_table will only see a simple
    ctl_table array with all entries having (->child == NULL).

    Care was taken to pass the original simple ctl_table arrays to
    __register_sysctl_table whenever possible.

    This change is derived from a similar patch written
    by Lucrian Grijincu.

    Inspired-by: Lucian Adrian Grijincu
    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     
  • Make __register_sysctl_table the core sysctl registration operation and
    make it take a char * string as path.

    Now that binary paths have been banished into the real of backwards
    compatibility in kernel/binary_sysctl.c where they can be safely
    ignored there is no longer a need to use struct ctl_path to represent
    path names when registering ctl_tables.

    Start the transition to using normal char * strings to represent
    pathnames when registering sysctl tables. Normal strings are easier
    to deal with both in the internal sysctl implementation and for
    programmers registering sysctl tables.

    __register_sysctl_paths is turned into a backwards compatibility wrapper
    that converts a ctl_path array into a normal char * string.

    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     
  • In sysctl_net register the two networking roots in the proper order.

    In register_sysctl walk the sysctl sets in the reverse order of the
    sysctl roots.

    Remove parent from ctl_table_set and setup_sysctl_set as it is no
    longer needed.

    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     
  • This adds a small helper retire_sysctl_set to remove the intimate knowledge about
    the how a sysctl_set is implemented from net/sysct_net.c

    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     
  • Move the core sysctl code from kernel/sysctl.c and kernel/sysctl_check.c
    into fs/proc/proc_sysctl.c.

    Currently sysctl maintenance is hampered by the sysctl implementation
    being split across 3 files with artificial layering between them.
    Consolidate the entire sysctl implementation into 1 file so that
    it is easier to see what is going on and hopefully allowing for
    simpler maintenance.

    For functions that are now only used in fs/proc/proc_sysctl.c remove
    their declarations from sysctl.h and make them static in fs/proc/proc_sysctl.c

    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     
  • Simplify the code by treating the base sysctl table like any other
    sysctl table and register it with register_sysctl_table.

    To ensure this table is registered early enough to avoid problems
    call sysctl_init from proc_sys_init.

    Rename sysctl_net.c:sysctl_init() to net_sysctl_init() to avoid
    name conflicts now that kernel/sysctl.c:sysctl_init() is no longer
    static.

    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     
  • - In sysctl.h move functions only available if CONFIG_SYSCL
    is defined inside of #ifdef CONFIG_SYSCTL

    - Move the stub function definitions for !CONFIG_SYSCTL
    into sysctl.h and make them static inlines.

    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     

04 Jan, 2012

1 commit


03 Nov, 2011

1 commit

  • Adding support for poll() in sysctl fs allows userspace to receive
    notifications of changes in sysctl entries. This adds a infrastructure to
    allow files in sysctl fs to be pollable and implements it for hostname and
    domainname.

    [akpm@linux-foundation.org: s/declare/define/ for definitions]
    Signed-off-by: Lucas De Marchi
    Cc: Greg KH
    Cc: Kay Sievers
    Cc: Al Viro
    Cc: "Eric W. Biederman"
    Cc: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lucas De Marchi
     

04 Oct, 2011

1 commit


10 Mar, 2011

1 commit


08 Mar, 2011

1 commit

  • a) struct inode is not going to be freed under ->d_compare();
    however, the thing PROC_I(inode)->sysctl points to just might.
    Fortunately, it's enough to make freeing that sucker delayed,
    provided that we don't step on its ->unregistering, clear
    the pointer to it in PROC_I(inode) before dropping the reference
    and check if it's NULL in ->d_compare().

    b) I'm not sure that we *can* walk into NULL inode here (we recheck
    dentry->seq between verifying that it's still hashed / fetching
    dentry->d_inode and passing it to ->d_compare() and there's no
    negative hashed dentries in /proc/sys/*), but if we can walk into
    that, we really should not have ->d_compare() return 0 on it!
    Said that, I really suspect that this check can be simply killed.
    Nick?

    Signed-off-by: Al Viro

    Al Viro
     

16 May, 2010

1 commit

  • The new function can be used to read/write large bitmaps via /proc. A
    comma separated range format is used for compact output and input
    (e.g. 1,3-4,10-10).

    Writing into the file will first reset the bitmap then update it
    based on the given input.

    Signed-off-by: Octavian Purdila
    Signed-off-by: WANG Cong
    Cc: Eric W. Biederman
    Signed-off-by: David S. Miller

    Octavian Purdila
     

17 Feb, 2010

2 commits


11 Jan, 2010

1 commit


07 Jan, 2010

1 commit

  • This is to be used together with switch technologies, like RFC3069,
    that where the individual ports are not allowed to communicate with
    each other, but they are allowed to talk to the upstream router. As
    described in RFC 3069, it is possible to allow these hosts to
    communicate through the upstream router by proxy_arp'ing.

    This patch basically allow proxy arp replies back to the same
    interface (from which the ARP request/solicitation was received).

    Tunable per device via proc "proxy_arp_pvlan":
    /proc/sys/net/ipv4/conf/*/proxy_arp_pvlan

    This switch technology is known by different vendor names:
    - In RFC 3069 it is called VLAN Aggregation.
    - Cisco and Allied Telesyn call it Private VLAN.
    - Hewlett-Packard call it Source-Port filtering or port-isolation.
    - Ericsson call it MAC-Forced Forwarding (RFC Draft).

    Signed-off-by: Jesper Dangaard Brouer
    Signed-off-by: David S. Miller

    Jesper Dangaard Brouer
     

26 Dec, 2009

1 commit

  • when using policy routing and the skb mark:
    there are cases where a back path validation requires us
    to use a different routing table for src ip validation than
    the one used for mapping ingress dst ip.
    One such a case is transparent proxying where we pretend to be
    the destination system and therefore the local table
    is used for incoming packets but possibly a main table would
    be used on outbound.
    Make the default behavior to allow the above and if users
    need to turn on the symmetry via sysctl src_valid_mark

    Signed-off-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Jamal Hadi Salim
     

10 Dec, 2009

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (42 commits)
    tree-wide: fix misspelling of "definition" in comments
    reiserfs: fix misspelling of "journaled"
    doc: Fix a typo in slub.txt.
    inotify: remove superfluous return code check
    hdlc: spelling fix in find_pvc() comment
    doc: fix regulator docs cut-and-pasteism
    mtd: Fix comment in Kconfig
    doc: Fix IRQ chip docs
    tree-wide: fix assorted typos all over the place
    drivers/ata/libata-sff.c: comment spelling fixes
    fix typos/grammos in Documentation/edac.txt
    sysctl: add missing comments
    fs/debugfs/inode.c: fix comment typos
    sgivwfb: Make use of ARRAY_SIZE.
    sky2: fix sky2_link_down copy/paste comment error
    tree-wide: fix typos "couter" -> "counter"
    tree-wide: fix typos "offest" -> "offset"
    fix kerneldoc for set_irq_msi()
    spidev: fix double "of of" in comment
    comment typo fix: sybsystem -> subsystem
    ...

    Linus Torvalds
     

08 Dec, 2009

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1815 commits)
    mac80211: fix reorder buffer release
    iwmc3200wifi: Enable wimax core through module parameter
    iwmc3200wifi: Add wifi-wimax coexistence mode as a module parameter
    iwmc3200wifi: Coex table command does not expect a response
    iwmc3200wifi: Update wiwi priority table
    iwlwifi: driver version track kernel version
    iwlwifi: indicate uCode type when fail dump error/event log
    iwl3945: remove duplicated event logging code
    b43: fix two warnings
    ipw2100: fix rebooting hang with driver loaded
    cfg80211: indent regulatory messages with spaces
    iwmc3200wifi: fix NULL pointer dereference in pmkid update
    mac80211: Fix TX status reporting for injected data frames
    ath9k: enable 2GHz band only if the device supports it
    airo: Fix integer overflow warning
    rt2x00: Fix padding bug on L2PAD devices.
    WE: Fix set events not propagated
    b43legacy: avoid PPC fault during resume
    b43: avoid PPC fault during resume
    tcp: fix a timewait refcnt race
    ...

    Fix up conflicts due to sysctl cleanups (dead sysctl_check code and
    CTL_UNNUMBERED removed) in
    kernel/sysctl_check.c
    net/ipv4/sysctl_net_ipv4.c
    net/ipv6/addrconf.c
    net/sctp/sysctl.c

    Linus Torvalds
     

04 Dec, 2009

2 commits


19 Nov, 2009

1 commit


18 Nov, 2009

1 commit


12 Nov, 2009

1 commit


11 Nov, 2009

1 commit

  • The ctl_name and strategy fields are unused, now that sys_sysctl
    is a compatibility wrapper around /proc/sys. No longer looking
    at them in the generic code is effectively what we are doing
    now and provides the guarantee that during further cleanups
    we can just remove references to those fields and everything
    will work ok.

    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     

06 Nov, 2009

1 commit


24 Sep, 2009

1 commit

  • It's unused.

    It isn't needed -- read or write flag is already passed and sysctl
    shouldn't care about the rest.

    It _was_ used in two places at arch/frv for some reason.

    Signed-off-by: Alexey Dobriyan
    Cc: David Howells
    Cc: "Eric W. Biederman"
    Cc: Al Viro
    Cc: Ralf Baechle
    Cc: Martin Schwidefsky
    Cc: Ingo Molnar
    Cc: "David S. Miller"
    Cc: James Morris
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

01 Feb, 2009

1 commit