16 Oct, 2007

34 commits

  • The difference in both functions is in the "id" passed to
    the rt6_select, so just pass it as an extra argument from
    two outer helpers.

    This is minus 60 lines of code and 360 bytes of .text

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • The pnigh_lookup is used to lookup proxy entries and to
    create them in case lookup failed.

    However, the "creation" code does not perform the re-lookup
    after GFP_KERNEL allocation. This is done because the code
    is expected to be protected with the RTNL lock, so add the
    assertion (mainly to address future questions from new network
    developers like me :) ).

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • kmalloc + memset -> kzalloc in frag_alloc_queue

    Signed-off-by: Denis V. Lunev
    Signed-off-by: David S. Miller

    Denis V. Lunev
     
  • With all the users of the double pointers removed from the IPv6 input path,
    this patch converts all occurances of sk_buff ** to sk_buff * in IPv6 input
    handlers.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • These ones use the generic data types too, so move
    them in one place.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • After the evictor code is consolidated there is no need in
    passing the extra pointer to the xxx_put() functions.

    The only place when it made sense was the evictor code itself.

    Maybe this change must got with the previous (or with the
    next) patch, but I try to make them shorter as much as
    possible to simplify the review (but they are still large
    anyway), so this change goes in a separate patch.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • The evictors collect some statistics for ipv4 and ipv6,
    so make it return the number of evicted queues and account
    them all at once in the caller.

    The XXX_ADD_STATS_BH() macros are just for this case,
    but maybe there are places in code, that can make use of
    them as well.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • To make in possible we need to know the exact frag queue
    size for inet_frags->mem management and two callbacks:

    * to destoy the skb (optional, used in conntracks only)
    * to free the queue itself (mandatory, but later I plan to
    move the allocation and the destruction of frag_queues
    into the common place, so this callback will most likely
    be optional too).

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • This code works with the generic data types as well, so
    move this into inet_fragment.c

    This move makes it possible to hide the secret_timer
    management and the secret_rebuild routine completely in
    the inet_fragment.c

    Introduce the ->hashfn() callback in inet_frags() to get
    the hashfun for a given inet_frag_queue() object.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • Since now all the xxx_frag_kill functions now work
    with the generic inet_frag_queue data type, this can
    be moved into a common place.

    The xxx_unlink() code is moved as well.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • Some sysctl variables are used to tune the frag queues
    management and it will be useful to work with them in
    a common way in the future, so move them into one
    structure, moreover they are the same for all the frag
    management codes.

    I don't place them in the existing inet_frags object,
    introduced in the previous patch for two reasons:

    1. to keep them in the __read_mostly section;
    2. not to export the whole inet_frags objects outside.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • There are some objects that are common in all the places
    which are used to keep track of frag queues, they are:

    * hash table
    * LRU list
    * rw lock
    * rnd number for hash function
    * the number of queues
    * the amount of memory occupied by queues
    * secret timer

    Move all this stuff into one structure (struct inet_frags)
    to make it possible use them uniformly in the future. Like
    with the previous patch this mostly consists of hunks like

    - write_lock(&ipfrag_lock);
    + write_lock(&ip4_frags.lock);

    To address the issue with exporting the number of queues and
    the amount of memory occupied by queues outside the .c file
    they are declared in, I introduce a couple of helpers.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • Introduce the struct inet_frag_queue in include/net/inet_frag.h
    file and place there all the common fields from three structs:

    * struct ipq in ipv4/ip_fragment.c
    * struct nf_ct_frag6_queue in nf_conntrack_reasm.c
    * struct frag_queue in ipv6/reassembly.c

    After this, replace these fields on appropriate structures with
    this structure instance and fix the users to use correct names
    i.e. hunks like

    - atomic_dec(&fq->refcnt);
    + atomic_dec(&fq->q.refcnt);

    (these occupy most of the patch)

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • Signed-off-by: Ilpo Järvinen
    Signed-off-by: David S. Miller

    Ilpo Järvinen
     
  • Uninline netfilter okfns for those cases where gcc can generate tail-calls.

    Before:
    text data bss dec hex filename
    8994153 1016524 524652 10535329 a0c1a1 vmlinux

    After:
    text data bss dec hex filename
    8992761 1016524 524652 10533937 a0bc31 vmlinux
    -------------------------------------------------------
    -1392

    All cases have been verified to generate tail-calls with and without netfilter.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • Now that we don't pass double skb pointers to nf_hook_slow anymore, gcc
    can generate tail calls for some of the netfilter hook okfn invocations,
    so there is no need to inline the functions anymore. This caused huge
    code bloat since we ended up with one inlined version and one out-of-line
    version since we pass the address to nf_hook_slow.

    Before:
    text data bss dec hex filename
    8997385 1016524 524652 10538561 a0ce41 vmlinux

    After:
    text data bss dec hex filename
    8994009 1016524 524652 10535185 a0c111 vmlinux
    -------------------------------------------------------
    -3376

    All cases have been verified to generate tail-calls with and without
    netfilter. The okfns in ipmr and xfrm4_input still remain inline because
    gcc can't generate tail-calls for them.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • TCP packets all have writable heads, that is, even though it's cloned, it is
    writable up to the end of the TCP header. This patch makes skb_checksum_help
    aware of this fact by using skb_clone_writable and avoiding a copy for TCP.

    I've also modified the BUG_ON tests to be unsigned. The only case where this
    makes a difference is if csum_start points to a location before skb->data.
    Since skb->data should always include the header where the checksum field
    is (and all currently callers adhere to that), this change is safe and may
    uncover bugs later.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • I got confused by the dual nature of the off variable in the
    function pskb_expand_head. The csum_start offset should use
    nhead instead of off which can change depending on whether we
    are using offsets or pointers.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • The Coverity checker spotted that we'll leak the storage allocated
    to 'listeners' in netlink_kernel_create() when the
    if (!nl_table[unit].registered)
    check is false.

    This patch avoids the leak.

    Signed-off-by: Jesper Juhl
    Acked-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Jesper Juhl
     
  • The Coverity checker spotted that we have already oops'ed if "dst" was
    NULL.

    Since "dst" being NULL doesn't seem to be possible at this point this
    patch removes the NULL check.

    Signed-off-by: Adrian Bunk
    Acked-by: Masahide NAKAMURA
    Acked-by: Noriaki TAKAMIYA
    Signed-off-by: David S. Miller

    Adrian Bunk
     
  • This patch replaces unnecessary uses of skb_copy by pskb_expand_head
    on the IPv6 input path.

    This allows us to remove the double pointers later.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • This patch implements the same change taht was done to ip_defrag. It
    makes ipv6_frag_rcv return the last packet received of a train of fragments
    rather than the head of that sequence.

    This allows us to get rid of the sk_buff ** argument later.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • With all the users of the double pointers removed, this patch mops up by
    finally replacing all occurances of sk_buff ** in the netfilter API by
    sk_buff *.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • This patch replaces unnecessary uses of skb_copy, pskb_copy and
    skb_realloc_headroom by functions such as skb_make_writable and
    pskb_expand_head.

    This allows us to remove the double pointers later.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • This patch removes the IPVS-specific version of skb_make_writable and
    replaces it with the netfilter one.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Now that all callers of netfilter can guarantee that the skb is not shared,
    we no longer have to copy the skb in skb_make_writable.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Due to the special location of the bridging hook, it should never see a
    shared packet anyway (certainly not with any in-kernel code). So it
    makes sense to unshare the skb there if necessary as that will greatly
    simplify the code below it (in particular, netfilter).

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • As it is we always invoke pt_prev before ing_filter, even if there are no
    ingress filters attached. This can cause unnecessary cloning in pt_prev.

    This patch changes it so that we only invoke pt_prev if there are ingress
    filters attached.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Now that ip_frag always returns the packet given to it on input, we can
    change it to return an integer indicating error instead. This patch does
    that and updates all its callers accordingly.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • This patch is a bit of a hack. However it is worth it if you consider that
    this is the only reason why we have to carry around the struct sk_buff **
    pointers in netfilter.

    It makes ip_defrag always return the packet that was given to it on input.
    It does this by cloning the packet and replacing its original contents with
    the head fragment if necessary.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • This patch creates a new function skb_morph that's just like skb_clone
    except that it lets user provide the spare skb that will be overwritten
    by the one that's to be cloned.

    This will be used by IP fragment reassembly so that we get back the same
    skb that went in last (rather than the head skb that we get now which
    requires us to carry around double pointers all over the place).

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • This patch creates a new function __copy_skb_header to merge the common
    code between copy_skb_header and skb_clone. Having two functions which
    are largely the same is a source of wasted labour as well as confusion.

    In fact the tc_verd stuff is almost certainly a bug since it's treated
    differently in skb_clone compared to the callers of copy_skb_header
    (skb_copy/pskb_copy/skb_copy_expand).

    I've kept that difference in tact with a comment added asking for
    clarification.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • * git://git.linux-nfs.org/pub/linux/nfs-2.6: (131 commits)
    NFSv4: Fix a typo in nfs_inode_reclaim_delegation
    NFS: Add a boot parameter to disable 64 bit inode numbers
    NFS: nfs_refresh_inode should clear cache_validity flags on success
    NFS: Fix a connectathon regression in NFSv3 and NFSv4
    NFS: Use nfs_refresh_inode() in ops that aren't expected to change the inode
    SUNRPC: Don't call xprt_release in call refresh
    SUNRPC: Don't call xprt_release() if call_allocate fails
    SUNRPC: Fix buggy UDP transmission
    [23/37] Clean up duplicate includes in
    [2.6 patch] net/sunrpc/rpcb_clnt.c: make struct rpcb_program static
    SUNRPC: Use correct type in buffer length calculations
    SUNRPC: Fix default hostname created in rpc_create()
    nfs: add server port to rpc_pipe info file
    NFS: Get rid of some obsolete macros
    NFS: Simplify filehandle revalidation
    NFS: Ensure that nfs_link() returns a hashed dentry
    NFS: Be strict about dentry revalidation when doing exclusive create
    NFS: Don't zap the readdir caches upon error
    NFS: Remove the redundant nfs_reval_fsid()
    NFSv3: Always use directory post-op attributes in nfs3_proc_lookup
    ...

    Fix up trivial conflict due to sock_owned_by_user() cleanup manually in
    net/sunrpc/xprtsock.c

    Linus Torvalds
     

15 Oct, 2007

6 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched: (140 commits)
    sched: sync wakeups preempt too
    sched: affine sync wakeups
    sched: guest CPU accounting: maintain guest state in KVM
    sched: guest CPU accounting: maintain stats in account_system_time()
    sched: guest CPU accounting: add guest-CPU /proc//stat fields
    sched: guest CPU accounting: add guest-CPU /proc/stat field
    sched: domain sysctl fixes: add terminator comment
    sched: domain sysctl fixes: do not crash on allocation failure
    sched: domain sysctl fixes: unregister the sysctl table before domains
    sched: domain sysctl fixes: use for_each_online_cpu()
    sched: domain sysctl fixes: use kcalloc()
    Make scheduler debug file operations const
    sched: enable wake-idle on CONFIG_SCHED_MC=y
    sched: reintroduce topology.h tunings
    sched: allow the immediate migration of cache-cold tasks
    sched: debug, improve migration statistics
    sched: debug: increase width of debug line
    sched: activate task_hot() only on fair-scheduled tasks
    sched: reintroduce cache-hot affinity
    sched: speed up context-switches a bit
    ...

    Linus Torvalds
     
  • * 'nfs-server-stable' of git://linux-nfs.org/~bfields/linux:
    knfsd: query filesystem for NFSv4 getattr of FATTR4_MAXNAME
    knfsd: nfsv4 delegation recall should take reference on client
    knfsd: don't shutdown callbacks until nfsv4 client is freed
    knfsd: let nfsd manage timing out its own leases
    knfsd: Add source address to sunrpc svc errors
    knfsd: 64 bit ino support for NFS server
    svcgss: move init code into separate function
    knfsd: remove code duplication in nfsd4_setclientid()
    nfsd warning fix
    knfsd: fix callback rpc cred
    knfsd: move nfsv4 slab creation/destruction to module init/exit
    knfsd: spawn kernel thread to probe callback channel
    knfsd: nfs4 name->id mapping not correctly parsing negative downcall
    knfsd: demote some printk()s to dprintk()s
    knfsd: cleanup of nfsd4 cmp_* functions
    knfsd: delete code made redundant by map_new_errors
    nfsd: fix horrible indentation in nfsd_setattr
    nfsd: remove unused cache_for_each macro
    nfsd: tone down inaccurate dprintk

    Linus Torvalds
     
  • make sync wakeups affine for cache-cold tasks: if a cache-cold task
    is woken up by a sync wakeup then use the opportunity to migrate it
    straight away. (the two tasks are 'related' because they communicate)

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • all uses of and almost all assignments to lro_desc->tcp_ack assume that it's
    net-endian; one converts net-endian to host-endian and sticks it in
    lro_desc->tcp_ack.

    Signed-off-by: Al Viro
    Signed-off-by: Linus Torvalds

    Al Viro
     
  • Signed-off-by: Al Viro
    Signed-off-by: Linus Torvalds

    Al Viro
     
  • copy_to_user() into on-stack array

    Signed-off-by: Al Viro
    Signed-off-by: Linus Torvalds

    Al Viro