18 Oct, 2007

4 commits

  • A sysctl method was added to enable and disable debugging levels. After
    further review, it was decided that there are better approaches to doing this
    and the sysctl methodology isn't really desirable. This patch removes the
    sysctl code from 9p.

    Signed-off-by: Eric Van Hensbergen

    Eric Van Hensbergen
     
  • This patch moves transport dynamic registration and matching to the net
    module to prevent a bad Kconfig dependency between the net and fs 9p modules.

    Signed-off-by: Eric Van Hensbergen

    Eric Van Hensbergen
     
  • The 9P2000 protocol requires the authentication and permission checks to be
    done in the file server. For that reason every user that accesses the file
    server tree has to authenticate and attach to the server separately.
    Multiple users can share the same connection to the server.

    Currently v9fs does a single attach and executes all I/O operations as a
    single user. This makes using v9fs in multiuser environment unsafe as it
    depends on the client doing the permission checking.

    This patch improves the 9P2000 support by allowing every user to attach
    separately. The patch defines three modes of access (new mount option
    'access'):

    - attach-per-user (access=user) (default mode for 9P2000.u)
    If a user tries to access a file served by v9fs for the first time, v9fs
    sends an attach command to the server (Tattach) specifying the user. If
    the attach succeeds, the user can access the v9fs tree.
    As there is no uname->uid (string->integer) mapping yet, this mode works
    only with the 9P2000.u dialect.

    - allow only one user to access the tree (access=)
    Only the user with uid can access the v9fs tree. Other users that attempt
    to access it will get EPERM error.

    - do all operations as a single user (access=any) (default for 9P2000)
    V9fs does a single attach and all operations are done as a single user.
    If this mode is selected, the v9fs behavior is identical with the current
    one.

    Signed-off-by: Latchesar Ionkov
    Signed-off-by: Eric Van Hensbergen

    Latchesar Ionkov
     
  • This patch abstracts out the interfaces to underlying transports so that
    new transports can be added as modules. This should also allow kernel
    configuration of transports without ifdef-hell.

    Signed-off-by: Eric Van Hensbergen

    Eric Van Hensbergen
     

17 Oct, 2007

5 commits

  • Why do we need r/o bind mounts?

    This feature allows a read-only view into a read-write filesystem. In the
    process of doing that, it also provides infrastructure for keeping track of
    the number of writers to any given mount.

    This has a number of uses. It allows chroots to have parts of filesystems
    writable. It will be useful for containers in the future because users may
    have root inside a container, but should not be allowed to write to
    somefilesystems. This also replaces patches that vserver has had out of the
    tree for several years.

    It allows security enhancement by making sure that parts of your filesystem
    read-only (such as when you don't trust your FTP server), when you don't want
    to have entire new filesystems mounted, or when you want atime selectively
    updated. I've been using the following script to test that the feature is
    working as desired. It takes a directory and makes a regular bind and a r/o
    bind mount of it. It then performs some normal filesystem operations on the
    three directories, including ones that are expected to fail, like creating a
    file on the r/o mount.

    This patch:

    Some filesystems forego the vfs and may_open() and create their own 'struct
    file's.

    This patch creates a couple of helper functions which can be used by these
    filesystems, and will provide a unified place which the r/o bind mount code
    may patch.

    Also, rename an existing, static-scope init_file() to a less generic name.

    Signed-off-by: Dave Hansen
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen
     
  • Make request_key() and co fundamentally asynchronous to make it easier for
    NFS to make use of them. There are now accessor functions that do
    asynchronous constructions, a wait function to wait for construction to
    complete, and a completion function for the key type to indicate completion
    of construction.

    Note that the construction queue is now gone. Instead, keys under
    construction are linked in to the appropriate keyring in advance, and that
    anyone encountering one must wait for it to be complete before they can use
    it. This is done automatically for userspace.

    The following auxiliary changes are also made:

    (1) Key type implementation stuff is split from linux/key.h into
    linux/key-type.h.

    (2) AF_RXRPC provides a way to allocate null rxrpc-type keys so that AFS does
    not need to call key_instantiate_and_link() directly.

    (3) Adjust the debugging macros so that they're -Wformat checked even if
    they are disabled, and make it so they can be enabled simply by defining
    __KDEBUG to be consistent with other code of mine.

    (3) Documentation.

    [alan@lxorguk.ukuu.org.uk: keys: missing word in documentation]
    Signed-off-by: David Howells
    Signed-off-by: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     
  • Adrian Bunk points out that "unsafe" was used to mark modules touched by
    the deprecated MOD_INC_USE_COUNT interface, which has long gone. It's time
    to remove the member from the module structure, as well.

    If you want a module which can't unload, don't register an exit function.

    (Vlad Yasevich says SCTP is now safe to unload, so just remove the
    __unsafe there).

    Signed-off-by: Rusty Russell
    Acked-by: Shannon Nelson
    Acked-by: Dan Williams
    Acked-by: Vlad Yasevich
    Cc: Sridhar Samudrala
    Cc: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rusty Russell
     
  • Slab constructors currently have a flags parameter that is never used. And
    the order of the arguments is opposite to other slab functions. The object
    pointer is placed before the kmem_cache pointer.

    Convert

    ctor(void *object, struct kmem_cache *s, unsigned long flags)

    to

    ctor(struct kmem_cache *s, void *object)

    throughout the kernel

    [akpm@linux-foundation.org: coupla fixes]
    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • sparc64:

    net/sunrpc/xprtrdma/verbs.c:1264: warning: long long unsigned int format, u64 arg (arg 3)
    net/sunrpc/xprtrdma/verbs.c:1264: warning: long long unsigned int format, u64 arg (arg 4)

    Cc: Trond Myklebust
    Cc: "David S. Miller"
    Cc: "J. Bruce Fields"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

16 Oct, 2007

31 commits

  • The difference in both functions is in the "id" passed to
    the rt6_select, so just pass it as an extra argument from
    two outer helpers.

    This is minus 60 lines of code and 360 bytes of .text

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • The pnigh_lookup is used to lookup proxy entries and to
    create them in case lookup failed.

    However, the "creation" code does not perform the re-lookup
    after GFP_KERNEL allocation. This is done because the code
    is expected to be protected with the RTNL lock, so add the
    assertion (mainly to address future questions from new network
    developers like me :) ).

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • kmalloc + memset -> kzalloc in frag_alloc_queue

    Signed-off-by: Denis V. Lunev
    Signed-off-by: David S. Miller

    Denis V. Lunev
     
  • With all the users of the double pointers removed from the IPv6 input path,
    this patch converts all occurances of sk_buff ** to sk_buff * in IPv6 input
    handlers.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • These ones use the generic data types too, so move
    them in one place.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • After the evictor code is consolidated there is no need in
    passing the extra pointer to the xxx_put() functions.

    The only place when it made sense was the evictor code itself.

    Maybe this change must got with the previous (or with the
    next) patch, but I try to make them shorter as much as
    possible to simplify the review (but they are still large
    anyway), so this change goes in a separate patch.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • The evictors collect some statistics for ipv4 and ipv6,
    so make it return the number of evicted queues and account
    them all at once in the caller.

    The XXX_ADD_STATS_BH() macros are just for this case,
    but maybe there are places in code, that can make use of
    them as well.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • To make in possible we need to know the exact frag queue
    size for inet_frags->mem management and two callbacks:

    * to destoy the skb (optional, used in conntracks only)
    * to free the queue itself (mandatory, but later I plan to
    move the allocation and the destruction of frag_queues
    into the common place, so this callback will most likely
    be optional too).

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • This code works with the generic data types as well, so
    move this into inet_fragment.c

    This move makes it possible to hide the secret_timer
    management and the secret_rebuild routine completely in
    the inet_fragment.c

    Introduce the ->hashfn() callback in inet_frags() to get
    the hashfun for a given inet_frag_queue() object.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • Since now all the xxx_frag_kill functions now work
    with the generic inet_frag_queue data type, this can
    be moved into a common place.

    The xxx_unlink() code is moved as well.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • Some sysctl variables are used to tune the frag queues
    management and it will be useful to work with them in
    a common way in the future, so move them into one
    structure, moreover they are the same for all the frag
    management codes.

    I don't place them in the existing inet_frags object,
    introduced in the previous patch for two reasons:

    1. to keep them in the __read_mostly section;
    2. not to export the whole inet_frags objects outside.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • There are some objects that are common in all the places
    which are used to keep track of frag queues, they are:

    * hash table
    * LRU list
    * rw lock
    * rnd number for hash function
    * the number of queues
    * the amount of memory occupied by queues
    * secret timer

    Move all this stuff into one structure (struct inet_frags)
    to make it possible use them uniformly in the future. Like
    with the previous patch this mostly consists of hunks like

    - write_lock(&ipfrag_lock);
    + write_lock(&ip4_frags.lock);

    To address the issue with exporting the number of queues and
    the amount of memory occupied by queues outside the .c file
    they are declared in, I introduce a couple of helpers.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • Introduce the struct inet_frag_queue in include/net/inet_frag.h
    file and place there all the common fields from three structs:

    * struct ipq in ipv4/ip_fragment.c
    * struct nf_ct_frag6_queue in nf_conntrack_reasm.c
    * struct frag_queue in ipv6/reassembly.c

    After this, replace these fields on appropriate structures with
    this structure instance and fix the users to use correct names
    i.e. hunks like

    - atomic_dec(&fq->refcnt);
    + atomic_dec(&fq->q.refcnt);

    (these occupy most of the patch)

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • Signed-off-by: Ilpo Järvinen
    Signed-off-by: David S. Miller

    Ilpo Järvinen
     
  • Uninline netfilter okfns for those cases where gcc can generate tail-calls.

    Before:
    text data bss dec hex filename
    8994153 1016524 524652 10535329 a0c1a1 vmlinux

    After:
    text data bss dec hex filename
    8992761 1016524 524652 10533937 a0bc31 vmlinux
    -------------------------------------------------------
    -1392

    All cases have been verified to generate tail-calls with and without netfilter.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • Now that we don't pass double skb pointers to nf_hook_slow anymore, gcc
    can generate tail calls for some of the netfilter hook okfn invocations,
    so there is no need to inline the functions anymore. This caused huge
    code bloat since we ended up with one inlined version and one out-of-line
    version since we pass the address to nf_hook_slow.

    Before:
    text data bss dec hex filename
    8997385 1016524 524652 10538561 a0ce41 vmlinux

    After:
    text data bss dec hex filename
    8994009 1016524 524652 10535185 a0c111 vmlinux
    -------------------------------------------------------
    -3376

    All cases have been verified to generate tail-calls with and without
    netfilter. The okfns in ipmr and xfrm4_input still remain inline because
    gcc can't generate tail-calls for them.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • TCP packets all have writable heads, that is, even though it's cloned, it is
    writable up to the end of the TCP header. This patch makes skb_checksum_help
    aware of this fact by using skb_clone_writable and avoiding a copy for TCP.

    I've also modified the BUG_ON tests to be unsigned. The only case where this
    makes a difference is if csum_start points to a location before skb->data.
    Since skb->data should always include the header where the checksum field
    is (and all currently callers adhere to that), this change is safe and may
    uncover bugs later.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • I got confused by the dual nature of the off variable in the
    function pskb_expand_head. The csum_start offset should use
    nhead instead of off which can change depending on whether we
    are using offsets or pointers.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • The Coverity checker spotted that we'll leak the storage allocated
    to 'listeners' in netlink_kernel_create() when the
    if (!nl_table[unit].registered)
    check is false.

    This patch avoids the leak.

    Signed-off-by: Jesper Juhl
    Acked-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Jesper Juhl
     
  • The Coverity checker spotted that we have already oops'ed if "dst" was
    NULL.

    Since "dst" being NULL doesn't seem to be possible at this point this
    patch removes the NULL check.

    Signed-off-by: Adrian Bunk
    Acked-by: Masahide NAKAMURA
    Acked-by: Noriaki TAKAMIYA
    Signed-off-by: David S. Miller

    Adrian Bunk
     
  • This patch replaces unnecessary uses of skb_copy by pskb_expand_head
    on the IPv6 input path.

    This allows us to remove the double pointers later.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • This patch implements the same change taht was done to ip_defrag. It
    makes ipv6_frag_rcv return the last packet received of a train of fragments
    rather than the head of that sequence.

    This allows us to get rid of the sk_buff ** argument later.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • With all the users of the double pointers removed, this patch mops up by
    finally replacing all occurances of sk_buff ** in the netfilter API by
    sk_buff *.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • This patch replaces unnecessary uses of skb_copy, pskb_copy and
    skb_realloc_headroom by functions such as skb_make_writable and
    pskb_expand_head.

    This allows us to remove the double pointers later.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • This patch removes the IPVS-specific version of skb_make_writable and
    replaces it with the netfilter one.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Now that all callers of netfilter can guarantee that the skb is not shared,
    we no longer have to copy the skb in skb_make_writable.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Due to the special location of the bridging hook, it should never see a
    shared packet anyway (certainly not with any in-kernel code). So it
    makes sense to unshare the skb there if necessary as that will greatly
    simplify the code below it (in particular, netfilter).

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • As it is we always invoke pt_prev before ing_filter, even if there are no
    ingress filters attached. This can cause unnecessary cloning in pt_prev.

    This patch changes it so that we only invoke pt_prev if there are ingress
    filters attached.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Now that ip_frag always returns the packet given to it on input, we can
    change it to return an integer indicating error instead. This patch does
    that and updates all its callers accordingly.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • This patch is a bit of a hack. However it is worth it if you consider that
    this is the only reason why we have to carry around the struct sk_buff **
    pointers in netfilter.

    It makes ip_defrag always return the packet that was given to it on input.
    It does this by cloning the packet and replacing its original contents with
    the head fragment if necessary.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu