19 Oct, 2007

6 commits

  • Grumble. These numbers should have been in sysctl.h from the beginning if we
    ever expected anyone to use them. Oh well put them there now so we can find
    them and make maintenance easier.

    Signed-off-by: Eric W. Biederman
    Acked-by: Samuel Ortiz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • No one has bothered to set strategy routine for the the netfilter sysctls that
    return jiffies to be sysctl_jiffies.

    So it appears the sys_sysctl path is unused and untested, so this patch
    removes the binary sysctl numbers.

    Which fixes the netfilter oops in 2.6.23-rc2-mm2 for me.

    Signed-off-by: Eric W. Biederman
    Cc: Patrick McHardy
    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • Currently tcp_available_congestion_control does not even attempt being read
    from sys_sysctl, and ipfrag_max_dist while it works allows setting of invalid
    values using sys_sysctl.

    So just kill the binary sys_sysctl support for these sysctls. If the support
    is not important enough to test and get right it probably isn't important
    enough to keep.

    Signed-off-by: Eric W. Biederman
    Cc: Alexey Dobriyan
    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • This is debug code so no need to support binary sysctl, and the binary sysctls
    as they were written were not consistent with what showed up in /proc so
    remove the binary sysctl support.

    Signed-off-by: Eric W. Biederman
    Cc: Alexey Dobriyan
    Cc: "David S. Miller"
    Cc: Trond Myklebust
    Cc: Neil Brown
    Cc: "J. Bruce Fields"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • We don't preoperly support the sysctl binary path for flushing the ipv6
    routes. So remove support for a binary path.

    Signed-off-by: Eric W. Biederman
    Cc: Alexey Dobriyan
    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • - In ipv6 ndisc_ifinfo_syctl_change so it doesn't depend on binary
    sysctl names for a function that works with proc.

    - In neighbour.c reorder the table to put the possibly unused entries
    at the end so we can remove them by terminating the table early.

    - In neighbour.c kill the entries with questionable binary sysctl
    handling behavior.

    - In neighbour.c if we don't have a strategy routine remove the
    binary path. So we don't the default sysctl strategy routine
    on data that is not ready for it.

    Signed-off-by: Eric W. Biederman
    Cc: Alexey Dobriyan
    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     

18 Oct, 2007

4 commits

  • A sysctl method was added to enable and disable debugging levels. After
    further review, it was decided that there are better approaches to doing this
    and the sysctl methodology isn't really desirable. This patch removes the
    sysctl code from 9p.

    Signed-off-by: Eric Van Hensbergen

    Eric Van Hensbergen
     
  • This patch moves transport dynamic registration and matching to the net
    module to prevent a bad Kconfig dependency between the net and fs 9p modules.

    Signed-off-by: Eric Van Hensbergen

    Eric Van Hensbergen
     
  • The 9P2000 protocol requires the authentication and permission checks to be
    done in the file server. For that reason every user that accesses the file
    server tree has to authenticate and attach to the server separately.
    Multiple users can share the same connection to the server.

    Currently v9fs does a single attach and executes all I/O operations as a
    single user. This makes using v9fs in multiuser environment unsafe as it
    depends on the client doing the permission checking.

    This patch improves the 9P2000 support by allowing every user to attach
    separately. The patch defines three modes of access (new mount option
    'access'):

    - attach-per-user (access=user) (default mode for 9P2000.u)
    If a user tries to access a file served by v9fs for the first time, v9fs
    sends an attach command to the server (Tattach) specifying the user. If
    the attach succeeds, the user can access the v9fs tree.
    As there is no uname->uid (string->integer) mapping yet, this mode works
    only with the 9P2000.u dialect.

    - allow only one user to access the tree (access=)
    Only the user with uid can access the v9fs tree. Other users that attempt
    to access it will get EPERM error.

    - do all operations as a single user (access=any) (default for 9P2000)
    V9fs does a single attach and all operations are done as a single user.
    If this mode is selected, the v9fs behavior is identical with the current
    one.

    Signed-off-by: Latchesar Ionkov
    Signed-off-by: Eric Van Hensbergen

    Latchesar Ionkov
     
  • This patch abstracts out the interfaces to underlying transports so that
    new transports can be added as modules. This should also allow kernel
    configuration of transports without ifdef-hell.

    Signed-off-by: Eric Van Hensbergen

    Eric Van Hensbergen
     

17 Oct, 2007

5 commits

  • Why do we need r/o bind mounts?

    This feature allows a read-only view into a read-write filesystem. In the
    process of doing that, it also provides infrastructure for keeping track of
    the number of writers to any given mount.

    This has a number of uses. It allows chroots to have parts of filesystems
    writable. It will be useful for containers in the future because users may
    have root inside a container, but should not be allowed to write to
    somefilesystems. This also replaces patches that vserver has had out of the
    tree for several years.

    It allows security enhancement by making sure that parts of your filesystem
    read-only (such as when you don't trust your FTP server), when you don't want
    to have entire new filesystems mounted, or when you want atime selectively
    updated. I've been using the following script to test that the feature is
    working as desired. It takes a directory and makes a regular bind and a r/o
    bind mount of it. It then performs some normal filesystem operations on the
    three directories, including ones that are expected to fail, like creating a
    file on the r/o mount.

    This patch:

    Some filesystems forego the vfs and may_open() and create their own 'struct
    file's.

    This patch creates a couple of helper functions which can be used by these
    filesystems, and will provide a unified place which the r/o bind mount code
    may patch.

    Also, rename an existing, static-scope init_file() to a less generic name.

    Signed-off-by: Dave Hansen
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen
     
  • Make request_key() and co fundamentally asynchronous to make it easier for
    NFS to make use of them. There are now accessor functions that do
    asynchronous constructions, a wait function to wait for construction to
    complete, and a completion function for the key type to indicate completion
    of construction.

    Note that the construction queue is now gone. Instead, keys under
    construction are linked in to the appropriate keyring in advance, and that
    anyone encountering one must wait for it to be complete before they can use
    it. This is done automatically for userspace.

    The following auxiliary changes are also made:

    (1) Key type implementation stuff is split from linux/key.h into
    linux/key-type.h.

    (2) AF_RXRPC provides a way to allocate null rxrpc-type keys so that AFS does
    not need to call key_instantiate_and_link() directly.

    (3) Adjust the debugging macros so that they're -Wformat checked even if
    they are disabled, and make it so they can be enabled simply by defining
    __KDEBUG to be consistent with other code of mine.

    (3) Documentation.

    [alan@lxorguk.ukuu.org.uk: keys: missing word in documentation]
    Signed-off-by: David Howells
    Signed-off-by: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     
  • Adrian Bunk points out that "unsafe" was used to mark modules touched by
    the deprecated MOD_INC_USE_COUNT interface, which has long gone. It's time
    to remove the member from the module structure, as well.

    If you want a module which can't unload, don't register an exit function.

    (Vlad Yasevich says SCTP is now safe to unload, so just remove the
    __unsafe there).

    Signed-off-by: Rusty Russell
    Acked-by: Shannon Nelson
    Acked-by: Dan Williams
    Acked-by: Vlad Yasevich
    Cc: Sridhar Samudrala
    Cc: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rusty Russell
     
  • Slab constructors currently have a flags parameter that is never used. And
    the order of the arguments is opposite to other slab functions. The object
    pointer is placed before the kmem_cache pointer.

    Convert

    ctor(void *object, struct kmem_cache *s, unsigned long flags)

    to

    ctor(struct kmem_cache *s, void *object)

    throughout the kernel

    [akpm@linux-foundation.org: coupla fixes]
    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • sparc64:

    net/sunrpc/xprtrdma/verbs.c:1264: warning: long long unsigned int format, u64 arg (arg 3)
    net/sunrpc/xprtrdma/verbs.c:1264: warning: long long unsigned int format, u64 arg (arg 4)

    Cc: Trond Myklebust
    Cc: "David S. Miller"
    Cc: "J. Bruce Fields"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

16 Oct, 2007

25 commits

  • The difference in both functions is in the "id" passed to
    the rt6_select, so just pass it as an extra argument from
    two outer helpers.

    This is minus 60 lines of code and 360 bytes of .text

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • The pnigh_lookup is used to lookup proxy entries and to
    create them in case lookup failed.

    However, the "creation" code does not perform the re-lookup
    after GFP_KERNEL allocation. This is done because the code
    is expected to be protected with the RTNL lock, so add the
    assertion (mainly to address future questions from new network
    developers like me :) ).

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • kmalloc + memset -> kzalloc in frag_alloc_queue

    Signed-off-by: Denis V. Lunev
    Signed-off-by: David S. Miller

    Denis V. Lunev
     
  • With all the users of the double pointers removed from the IPv6 input path,
    this patch converts all occurances of sk_buff ** to sk_buff * in IPv6 input
    handlers.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • These ones use the generic data types too, so move
    them in one place.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • After the evictor code is consolidated there is no need in
    passing the extra pointer to the xxx_put() functions.

    The only place when it made sense was the evictor code itself.

    Maybe this change must got with the previous (or with the
    next) patch, but I try to make them shorter as much as
    possible to simplify the review (but they are still large
    anyway), so this change goes in a separate patch.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • The evictors collect some statistics for ipv4 and ipv6,
    so make it return the number of evicted queues and account
    them all at once in the caller.

    The XXX_ADD_STATS_BH() macros are just for this case,
    but maybe there are places in code, that can make use of
    them as well.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • To make in possible we need to know the exact frag queue
    size for inet_frags->mem management and two callbacks:

    * to destoy the skb (optional, used in conntracks only)
    * to free the queue itself (mandatory, but later I plan to
    move the allocation and the destruction of frag_queues
    into the common place, so this callback will most likely
    be optional too).

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • This code works with the generic data types as well, so
    move this into inet_fragment.c

    This move makes it possible to hide the secret_timer
    management and the secret_rebuild routine completely in
    the inet_fragment.c

    Introduce the ->hashfn() callback in inet_frags() to get
    the hashfun for a given inet_frag_queue() object.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • Since now all the xxx_frag_kill functions now work
    with the generic inet_frag_queue data type, this can
    be moved into a common place.

    The xxx_unlink() code is moved as well.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • Some sysctl variables are used to tune the frag queues
    management and it will be useful to work with them in
    a common way in the future, so move them into one
    structure, moreover they are the same for all the frag
    management codes.

    I don't place them in the existing inet_frags object,
    introduced in the previous patch for two reasons:

    1. to keep them in the __read_mostly section;
    2. not to export the whole inet_frags objects outside.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • There are some objects that are common in all the places
    which are used to keep track of frag queues, they are:

    * hash table
    * LRU list
    * rw lock
    * rnd number for hash function
    * the number of queues
    * the amount of memory occupied by queues
    * secret timer

    Move all this stuff into one structure (struct inet_frags)
    to make it possible use them uniformly in the future. Like
    with the previous patch this mostly consists of hunks like

    - write_lock(&ipfrag_lock);
    + write_lock(&ip4_frags.lock);

    To address the issue with exporting the number of queues and
    the amount of memory occupied by queues outside the .c file
    they are declared in, I introduce a couple of helpers.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • Introduce the struct inet_frag_queue in include/net/inet_frag.h
    file and place there all the common fields from three structs:

    * struct ipq in ipv4/ip_fragment.c
    * struct nf_ct_frag6_queue in nf_conntrack_reasm.c
    * struct frag_queue in ipv6/reassembly.c

    After this, replace these fields on appropriate structures with
    this structure instance and fix the users to use correct names
    i.e. hunks like

    - atomic_dec(&fq->refcnt);
    + atomic_dec(&fq->q.refcnt);

    (these occupy most of the patch)

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • Signed-off-by: Ilpo Järvinen
    Signed-off-by: David S. Miller

    Ilpo Järvinen
     
  • Uninline netfilter okfns for those cases where gcc can generate tail-calls.

    Before:
    text data bss dec hex filename
    8994153 1016524 524652 10535329 a0c1a1 vmlinux

    After:
    text data bss dec hex filename
    8992761 1016524 524652 10533937 a0bc31 vmlinux
    -------------------------------------------------------
    -1392

    All cases have been verified to generate tail-calls with and without netfilter.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • Now that we don't pass double skb pointers to nf_hook_slow anymore, gcc
    can generate tail calls for some of the netfilter hook okfn invocations,
    so there is no need to inline the functions anymore. This caused huge
    code bloat since we ended up with one inlined version and one out-of-line
    version since we pass the address to nf_hook_slow.

    Before:
    text data bss dec hex filename
    8997385 1016524 524652 10538561 a0ce41 vmlinux

    After:
    text data bss dec hex filename
    8994009 1016524 524652 10535185 a0c111 vmlinux
    -------------------------------------------------------
    -3376

    All cases have been verified to generate tail-calls with and without
    netfilter. The okfns in ipmr and xfrm4_input still remain inline because
    gcc can't generate tail-calls for them.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • TCP packets all have writable heads, that is, even though it's cloned, it is
    writable up to the end of the TCP header. This patch makes skb_checksum_help
    aware of this fact by using skb_clone_writable and avoiding a copy for TCP.

    I've also modified the BUG_ON tests to be unsigned. The only case where this
    makes a difference is if csum_start points to a location before skb->data.
    Since skb->data should always include the header where the checksum field
    is (and all currently callers adhere to that), this change is safe and may
    uncover bugs later.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • I got confused by the dual nature of the off variable in the
    function pskb_expand_head. The csum_start offset should use
    nhead instead of off which can change depending on whether we
    are using offsets or pointers.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • The Coverity checker spotted that we'll leak the storage allocated
    to 'listeners' in netlink_kernel_create() when the
    if (!nl_table[unit].registered)
    check is false.

    This patch avoids the leak.

    Signed-off-by: Jesper Juhl
    Acked-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Jesper Juhl
     
  • The Coverity checker spotted that we have already oops'ed if "dst" was
    NULL.

    Since "dst" being NULL doesn't seem to be possible at this point this
    patch removes the NULL check.

    Signed-off-by: Adrian Bunk
    Acked-by: Masahide NAKAMURA
    Acked-by: Noriaki TAKAMIYA
    Signed-off-by: David S. Miller

    Adrian Bunk
     
  • This patch replaces unnecessary uses of skb_copy by pskb_expand_head
    on the IPv6 input path.

    This allows us to remove the double pointers later.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • This patch implements the same change taht was done to ip_defrag. It
    makes ipv6_frag_rcv return the last packet received of a train of fragments
    rather than the head of that sequence.

    This allows us to get rid of the sk_buff ** argument later.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • With all the users of the double pointers removed, this patch mops up by
    finally replacing all occurances of sk_buff ** in the netfilter API by
    sk_buff *.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • This patch replaces unnecessary uses of skb_copy, pskb_copy and
    skb_realloc_headroom by functions such as skb_make_writable and
    pskb_expand_head.

    This allows us to remove the double pointers later.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu