01 Nov, 2007

16 commits

  • Signed-off-by: David S. Miller

    David S. Miller
     
  • Documentation updates for network interfaces.

    1. Add doc for netif_napi_add
    2. Remove doc for unused returns from netif_rx
    3. Add doc for netif_receive_skb

    [ Incorporated minor mods from Randy Dunlap -DaveM ]

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • This cache is only required to create new namespaces,
    but we won't have them in CONFIG_NET_NS=n case.

    Hide it under the appropriate ifdef.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • The setup_net is called for the init net namespace
    only (int the CONFIG_NET_NS=n of course) from the __init
    function, so mark it as __net_init to disappear with the
    caller after the boot.

    Yet again, in the perfect world this has to be under
    #ifdef CONFIG_NET_NS, but it isn't guaranteed that every
    subsystem is registered *after* the init_net_ns is set
    up. After we are sure, that we don't start registering
    them before the init net setup, we'll be able to move
    this code under the ifdef.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • The namespace creation/destruction code is never called
    if the CONFIG_NET_NS is n, so it's OK to move it under
    appropriate ifdef.

    The copy_net_ns() in the "n" case checks for flags and
    returns -EINVAL when new net ns is requested. In a perfect
    world this stub must be in net_namespace.h, but this
    function need to know the CLONE_NEWNET value and thus
    requires sched.h. On the other hand this header is to be
    injected into almost every .c file in the networking code,
    and making all this code depend on the sched.h is a
    suicidal attempt.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • When the new pernet something (subsys, device or operations) is
    being registered, the init callback is to be called for each
    namespace, that currently exitst in the system. During the
    unregister, the same is to be done with the exit callback.

    However, not every pernet something has both calls, but the
    check for the appropriate pointer to be not NULL is performed
    inside the for_each_net() loop.

    This is (at least) strange, so tune this.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • Finally, the zero_it argument can be completely removed from
    the callers and from the function prototype.

    Besides, fix the checkpatch.pl warnings about using the
    assignments inside if-s.

    This patch is rather big, and it is a part of the previous one.
    I splitted it wishing to make the patches more readable. Hope
    this particular split helped.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • At this point nobody calls the sk_alloc(() with zero_it == 0,
    so remove unneeded checks from it.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • The sk_prot_alloc() already performs all the stuff needed by the
    sk_clone(). Besides, the sk_prot_alloc() requires almost twice
    less arguments than the sk_alloc() does, so call the sk_prot_alloc()
    saving the stack a bit.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • The security_sk_alloc() and the module_get is a part of the
    object allocations - move it in the proper place.

    Note, that since we do not reset the newly allocated sock
    in the sk_alloc() (memset() is removed with the previous
    patch) we can safely do this.

    Also fix the error path in sk_prot_alloc() - release the security
    context if needed.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • We have a __GFP_ZERO flag that allocates a zeroed chunk of memory.
    Use it in the sk_alloc() and avoid a hand-made memset().

    This is a temporary patch that will help us in the nearest future :)

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • The sock object is allocated either from the generic cache with
    the kmalloc, or from the proc->slab cache.

    Move this logic into an isolated set of helpers and make the
    sk_alloc/sk_free look a bit nicer.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • The sock_copy() is supposed to just clone the socket. In a perfect
    world it has to be just memcpy, but we have to handle the security
    mark correctly. All the extra setup must be performed in sk_clone()
    call, so move the get_net() into more proper place.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • The sock_copy() call is not used outside the sock.c file,
    so just move it into a sock.c

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • Similar to commit 3eec0047d9bdd, point of this is to avoid
    skipping R-bit skbs.

    Signed-off-by: Ilpo Järvinen
    Signed-off-by: David S. Miller

    Ilpo Järvinen
     
  • DSACK inside another SACK block were missed if start_seq of DSACK
    was larger than SACK block's because sorting prioritizes full
    processing of the SACK block before DSACK. After SACK block
    sorting situation is like this:

    SSSSSSSSS
    D
    SSSSSS
    SSSSSSS

    Because write_queue is walked in-order, when the first SACK block
    has been processed, TCP is already past the skb for which the
    DSACK arrived and we haven't taught it to backtrack (nor should
    we), so TCP just continues processing by going to the next SACK
    block after the DSACK (if any).

    Whenever such DSACK is present, do an embedded checking during
    the previous SACK block.

    If the DSACK is below snd_una, there won't be overlapping SACK
    block, and thus no problem in that case. Also if start_seq of
    the DSACK is equal to the actual block, it will be processed
    first.

    Tested this by using netem to duplicate 15% of packets, and
    by printing SACK block when found_dup_sack is true and the
    selected skb in the dup_sack = 1 branch (if taken):

    SACK block 0: 4344-5792 (relative to snd_una 2019137317)
    SACK block 1: 4344-5792 (relative to snd_una 2019137317)

    equal start seqnos => next_dup = 0, dup_sack = 1 won't occur...

    SACK block 0: 5792-7240 (relative to snd_una 2019214061)
    SACK block 1: 2896-7240 (relative to snd_una 2019214061)
    DSACK skb match 5792-7240 (relative to snd_una)

    ...and next_dup = 1 case (after the not shown start_seq sort),
    went to dup_sack = 1 branch.

    Signed-off-by: Ilpo Järvinen
    Signed-off-by: David S. Miller

    Ilpo Järvinen
     

31 Oct, 2007

7 commits

  • On PowerPC allmodconfig build we get this:

    net/key/af_key.c:400: warning: comparison is always false due to limited range of data type

    Signed-off-by: Stephen Rothwell
    Signed-off-by: David S. Miller

    Stephen Rothwell
     
  • This fixes scatterlist corruptions added by

    commit 68e3f5dd4db62619fdbe520d36c9ebf62e672256
    [CRYPTO] users: Fix up scatterlist conversion errors

    The issue is that the code calls sg_mark_end() which clobbers the
    sg_page() pointer of the final scatterlist entry.

    The first part fo the fix makes skb_to_sgvec() do __sg_mark_end().

    After considering all skb_to_sgvec() call sites the most correct
    solution is to call __sg_mark_end() in skb_to_sgvec() since that is
    what all of the callers would end up doing anyways.

    I suspect this might have fixed some problems in virtio_net which is
    the sole non-crypto user of skb_to_sgvec().

    Other similar sg_mark_end() cases were converted over to
    __sg_mark_end() as well.

    Arguably sg_mark_end() is a poorly named function because it doesn't
    just "mark", it clears out the page pointer as a side effect, which is
    what led to these bugs in the first place.

    The one remaining plain sg_mark_end() call is in scsi_alloc_sgtable()
    and arguably it could be converted to __sg_mark_end() if only so that
    we can delete this confusing interface from linux/scatterlist.h

    Signed-off-by: David S. Miller

    David S. Miller
     
  • It's under CONFIG_IP_VS_LBLCR_DEBUG option which never existed.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: David S. Miller

    Alexey Dobriyan
     
  • The file /proc/net/if_inet6 is removed twice.
    First time in:
    inet6_exit
    ->addrconf_cleanup
    And followed a few lines after by:
    inet6_exit
    -> if6_proc_exit

    Signed-off-by: Daniel Lezcano
    Signed-off-by: David S. Miller

    Daniel Lezcano
     
  • When a network namespace reference is held by a network subsystem,
    and when this reference is decremented in a rcu update callback, we
    must ensure that there is no more outstanding rcu update before
    trying to free the network namespace.

    In the normal case, the rcu_barrier is called when the network namespace
    is exiting in the cleanup_net function.

    But when a network namespace creation fails, and the subsystems are
    undone (like the cleanup), the rcu_barrier is missing.

    This patch adds the missing rcu_barrier.

    Signed-off-by: Daniel Lezcano
    Signed-off-by: David S. Miller

    Daniel Lezcano
     
  • Point 1:
    The unregistering of a network device schedule a netdev_run_todo.
    This function calls dev->destructor when it is set and the
    destructor calls free_netdev.

    Point 2:
    In the case of an initialization of a network device the usual code
    is:
    * alloc_netdev
    * register_netdev
    -> if this one fails, call free_netdev and exit with error.

    Point 3:
    In the register_netdevice function at the later state, when the device
    is at the registered state, a call to the netdevice_notifiers is made.
    If one of the notification falls into an error, a rollback to the
    registered state is done using unregister_netdevice.

    Conclusion:
    When a network device fails to register during initialization because
    one network subsystem returned an error during a notification call
    chain, the network device is freed twice because of fact 1 and fact 2.
    The second free_netdev will be done with an invalid pointer.

    Proposed solution:
    The following patch move all the code of unregister_netdevice *except*
    the call to net_set_todo, to a new function "rollback_registered".

    The following functions are changed in this way:
    * register_netdevice: calls rollback_registered when a notification fails
    * unregister_netdevice: calls rollback_register + net_set_todo, the call
    order to net_set_todo is changed because it is the
    latest now. Since it justs add an element to a list
    that should not break anything.

    Signed-off-by: Daniel Lezcano
    Signed-off-by: David S. Miller

    Daniel Lezcano
     
  • Fix links to files in Documentation/* in various Kconfig files

    Signed-off-by: Dirk Hohndel
    Signed-off-by: Linus Torvalds

    Dirk Hohndel
     

30 Oct, 2007

11 commits

  • Commit baa3a2a0d24ebcf1c451bec8e5bee3d3467f4cbb, by removing initialization
    of the ctl_name field, broke this conditional, preventing the display of
    rpc_tasks that you previously got when turning on rpc debugging.

    [akpm@linux-foundation.org: coding-style fixes]

    Signed-off-by: J. Bruce Fields
    Acked-by: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: David S. Miller

    J. Bruce Fields
     
  • On systems with a very large amount of memory, the heuristics in
    alloc_large_system_hash() result in a very large TCP established hash
    table: 16 millions of entries for a 128 GB ia64 system. This makes
    reading from /proc/net/tcp pretty slow (well over a second) and as a
    result netstat is slow on these machines. I know that /proc/net/tcp is
    deprecated in favor of tcp_diag, however at the moment netstat only
    knows of the former.

    I am skeptical that such a large TCP established hash is often needed.
    Just because a system has a lot of memory doesn't imply that it will
    have several millions of concurrent TCP connections. Thus I believe
    that we should put an arbitrary high limit to the size of the TCP
    established hash by default. Users who really need a bigger hash can
    always use the thash_entries boot parameter to get more.

    I propose 2 millions of entries as the arbitrary high limit. This
    makes /proc/net/tcp reasonably fast on the system in question (0.2 s)
    while being still large enough for me to be confident that network
    performance won't suffer.

    This is just one way to limit the hash size, there are others; I am not
    familiar enough with the TCP code to decide which is best. Thus, I
    would welcome the proposals of alternatives.

    [ 2 million is still too large, thus I've modified the limit in the
    change to be '512 * 1024'. -DaveM ]

    Signed-off-by: Jean Delvare
    Signed-off-by: David S. Miller

    Jean Delvare
     
  • as some architectures have unsigned long for u64.

    net/sunrpc/xprtrdma/rpc_rdma.c: In function 'rpcrdma_create_chunks':
    net/sunrpc/xprtrdma/rpc_rdma.c:222: warning: format '%llx' expects type 'long long unsigned int', but argument 4 has type 'u64'
    net/sunrpc/xprtrdma/rpc_rdma.c:234: warning: format '%llx' expects type 'long long unsigned int', but argument 5 has type 'u64'
    net/sunrpc/xprtrdma/rpc_rdma.c: In function 'rpcrdma_count_chunks':
    net/sunrpc/xprtrdma/rpc_rdma.c:577: warning: format '%llx' expects type 'long long unsigned int', but argument 4 has type 'u64

    Noticed on PowerPC pseries_defconfig build.

    Signed-off-by: Stephen Rothwell
    Signed-off-by: David S. Miller

    Stephen Rothwell
     
  • While displaying ICMP out-going statistics as Out counters in
    /proc/net/snmp, the memory location for ICMP in-coming statistics
    was referred by mistake.

    Signed-off-by: Mitsuru Chinen
    Acked-by: David L Stevens
    Signed-off-by: David S. Miller

    Mitsuru Chinen
     
  • If either of the two sock_alloc_fd() calls fail, we
    forget to update 'err' and thus we'll erroneously
    return zero in these cases.

    Based upon a report and patch from Rich Paul, and
    commentary from Chuck Ebbert.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • This allocation is expected to fail and we handle it by fallback to vmalloc().

    So don't scare people with nasty messages like
    http://bugzilla.kernel.org/show_bug.cgi?id=9190

    Signed-off-by: Andrew Morton
    Signed-off-by: David S. Miller

    Andrew Morton
     
  • netpoll_poll_lock() synchronizes the ->poll() invocation
    code paths, but once we have the lock we have to make
    sure that NAPI_STATE_SCHED is still set. Otherwise we
    get:

    cpu 0 cpu 1

    net_rx_action() poll_napi()
    netpoll_poll_lock() ... spin on ->poll_lock
    ->poll()
    netif_rx_complete
    netpoll_poll_unlock() acquire ->poll_lock()
    ->poll()
    netif_rx_complete()
    CRASH

    Based upon a bug report from Tina Yang.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • while reviewing the tcp_md5-related code further i came across with
    another two of these casts which you probably have missed. I don't
    actually think that they impose a problem by now, but as you said we
    should remove them.

    Signed-off-by: Matthias M. Dellweg
    Signed-off-by: David S. Miller

    Matthias M. Dellweg
     
  • TCP Vegas implementation has a bug in the process of disabling
    slow-start with gamma parameter. The bug may lead to extreme
    unfairness in the presence of early packet loss. See details in:
    http://www.cs.caltech.edu/~weixl/technical/ns2linux/known_linux/index.html#vegas

    Switch the order of "if (tp->snd_cwnd snd_ssthresh)" statement
    and "if (diff > gamma)" statement to eliminate the problem.

    Signed-off-by: Xiaoliang (David) Wei
    Signed-off-by: David S. Miller

    Xiaoliang (David) Wei
     
  • Instead of using the default timeout of 3 minutes, this uses the timeout
    specific to the protocol used for the connection. The 3 minute timeout
    seems somewhat arbitrary (though I know it is used other places in the
    ipvs code) and when failing over it would be much nicer to use one of
    the configured timeout values.

    Signed-off-by: Andy Gospodarek
    Acked-by: Simon Horman
    Signed-off-by: David S. Miller

    Andy Gospodarek
     
  • This bug was introduced by the commit
    d12af679bcf8995a237560bdf7a4d734f8df5dbb (sysctl: fix neighbour table
    sysctls).

    Signed-off-by: YOSHIFUJI Hideaki
    Signed-off-by: David S. Miller

    YOSHIFUJI Hideaki
     

29 Oct, 2007

2 commits


27 Oct, 2007

4 commits

  • This patch fixes the errors made in the users of the crypto layer during
    the sg_init_table conversion. It also adds a few conversions that were
    missing altogether.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • The pid namespace patches changed the semantics of
    find_task_by_pid without breaking the compile resulting
    in get_net_ns_by_pid doing the wrong thing.

    So switch to using the intended find_task_by_vpid.

    Combined with Denis' earlier patch to make netlink traffic
    fully synchronous the inadvertent race I introduced with
    accessing current is actually removed.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • It is not safe to to place struct pernet_operations in a special section.
    We need struct pernet_operations to last until we call unregister_pernet_subsys.
    Which doesn't happen until module unload.

    So marking struct pernet_operations is a disaster for modules in two ways.
    - We discard it before we call the exit method it points to.
    - Because I keep struct pernet_operations on a linked list discarding
    it for compiled in code removes elements in the middle of a linked
    list and does horrible things for linked insert.

    So this looks safe assuming __exit_refok is not discarded
    for modules.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • This patch fixes the following compile errors in some configurations:

    ...
    CC net/ipv4/esp4.o
    /home/bunk/linux/kernel-2.6/git/linux-2.6/net/ipv4/esp4.c: In function 'esp_output':
    /home/bunk/linux/kernel-2.6/git/linux-2.6/net/ipv4/esp4.c:113: error: implicit declaration of function 'sg_init_table'
    make[3]: *** [net/ipv4/esp4.o] Error 1
    ...
    /home/bunk/linux/kernel-2.6/git/linux-2.6/net/ipv6/esp6.c: In function 'esp6_output':
    /home/bunk/linux/kernel-2.6/git/linux-2.6/net/ipv6/esp6.c:112: error: implicit declaration of function 'sg_init_table'
    make[3]: *** [net/ipv6/esp6.o] Error 1

    Signed-off-by: Adrian Bunk
    Signed-off-by: David S. Miller

    Adrian Bunk