29 Jan, 2008

39 commits

  • Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     
  • The caller must hold the RTNL so let's check it in unregister_netdevice.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Signed-off-by: Denis V. Lunev
    Signed-off-by: David S. Miller

    Denis V. Lunev
     
  • This cleanup shrinks size of net/core/dst.o on i386 from 1299 to 1289 bytes.
    (This is because dev_hold()/dev_put() are doing atomic_inc()/atomic_dec() and
    force compiler to re-evaluate memory contents.)

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Just move the variable on the struct net and adjust
    its usage.

    Others sysctls from sys.net.core table are more
    difficult to virtualize (i.e. make them per-namespace),
    but I'll look at them as well a bit later.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • Some of ctl variables are going to be on the struct
    net. Here's the way to adjust the ->data pointer on the
    ctl_table-s to point on the right variable.

    Since some pointers still point on the global variables,
    I keep turning the write bits off on such tables.

    This looks to become a common procedure for net sysctls,
    so later parts of this code may migrate to some more
    generic place.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • Making them per-namespace is required for the following
    two reasons:

    First, some ctl values have a per-namespace meaning.
    Second, making them writable from the sub-namespace
    is an isolation hole.

    So I introduce the pernet operations to create these
    tables. For init_net I use the existing statically
    declared tables, for sub-namespace they are duplicated
    and the write bits are removed from the mode.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • single list_head variable initialized with LIST_HEAD_INIT could almost
    always can be replaced with LIST_HEAD declaration, this shrinks the code
    and looks better.

    Signed-off-by: Denis Cheng
    Signed-off-by: David S. Miller

    Denis Cheng
     
  • When the fib_rules initialization finished, no return code is provided
    so there is no way to know, for the caller, if the initialization has
    been successful or has failed. This patch fix that.

    Signed-off-by: Daniel Lezcano
    Acked-by: Benjamin Thery
    Signed-off-by: David S. Miller

    Daniel Lezcano
     
  • Move dst entries to a namespace loopback to catch refcounting leaks.

    Signed-off-by: Denis V. Lunev
    Signed-off-by: David S. Miller

    Denis V. Lunev
     
  • The previous move of the the UDP inDatagrams counter caused each
    peek of the same packet to be counted separately. This may be
    undesirable.

    This patch fixes this by adding a bit to sk_buff to record whether
    this packet has already been seen through skb_recv_datagram. We
    then only increment the counter when the packet is seen for the
    first time.

    The only dodgy part is the fact that skb_recv_datagram doesn't have
    a good way of returning this new bit of information. So I've added
    a new function __skb_recv_datagram that does return this and made
    skb_recv_datagram a wrapper around it.

    The plan is to eventually replace all uses of skb_recv_datagram with
    this new function at which time it can be renamed its proper name.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Currently it is possible for two processes to peek on the same socket
    and end up incrementing the error counter twice for the same packet.

    This patch fixes it by making skb_kill_datagram return whether it
    succeeded in unlinking the packet and only incrementing the counter
    if it did.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Using ctl paths we can put all the stuff, related to net/core/
    sysctl table, into one file and remove all the references on it.

    As a good side effect this hides the "core_table" name from
    the global scope :)

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • This file is already compiled out when the SYSCTL=n, so
    these ifdefs, that enclose the whole file, can be removed.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • The appropriate path is prepared right inside this function. It
    is prepared similar to how the ctl tables were.

    Since the path is modified, it is put on the stack, to avoid
    possible races with multiple calls to neigh_sysctl_register() : it
    is called by protocols and I didn't find any protection in this
    case. Did I overlooked the rtnl lock?.

    The stack growth of the neigh_sysctl_register() is 40 bytes. I
    believe this is OK, since this is not that much and this function
    is not called with the deep stack (device/protocols register).

    The device's name is stored on the template to free it later.

    This will help with the net namespaces, as each namespace should
    have its own set of these ctls.

    Besides, this saves ~350 bytes from the neigh template :)

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • This mainly removes the err variable, as this call always
    return the same error code (-ENOBUFS).

    Besides, I moved the call to kmalloc() from the *t declaration
    into the code (this is confusing when a variable is initialized
    with the result of some call) and removed unneeded comment near
    the error path.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • This allows to get rid of the CONFIG_NETFILTER dependency of NET_ACT_NAT.
    This patch redefines the old names to keep the noise low, the next patch
    converts all users.

    Signed-off-by: Patrick McHardy
    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • The

    if (statement)
    WARN_ON(1);

    looks much better as

    WARN_ON(statement);

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • Currently this size is 16, but as the comment says this
    is so only because all the chains (except one) has the
    length 1. I think, that some day this may change, so
    growing this hash will be much easier.

    Besides, symbolic names are read better than magic constants.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • The sock_wake_async() performs a bit different actions
    depending on "how" argument. Unfortunately this argument
    ony has numerical magic values.

    I propose to give names to their constants to help people
    reading this function callers understand what's going on
    without looking into this function all the time.

    I suppose this is 2.6.25 material, but if it's not (or the
    naming seems poor/bad/awful), I can rework it against the
    current net-2.6 tree.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • This function references sk->sk_prot->xxx for many times.
    It turned out, that there's so many code in it, that gcc
    cannot always optimize access to sk->sk_prot's fields.

    After saving the sk->sk_prot on the stack and comparing
    disassembled code, it turned out that the function became
    ~10 bytes shorter and made less dereferences (on i386 and
    x86_64). Stack consumption didn't grow.

    Besides, this patch drives most of this function into the
    80 columns limit.

    Signed-off-by: Pavel Emelyanov
    Acked-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • This patch adds a separate workqueue for cleaning up a network
    namespace. If we use the keventd workqueue to execute cleanup_net(),
    there is a problem to unregister devices in IPv6. Indeed the code
    that cleans up also schedule work in keventd: as long as cleanup_net()
    hasn't return, dst_gc_task() cannot run and as long as dst_gc_task() has
    not run, there are still some references pending on the net devices and
    cleanup_net() can not unregister and exit the keventd workqueue.

    Signed-off-by: Benjamin Thery
    Signed-off-by: Daniel Lezcano
    Acked-by: Denis V. Lunev
    Acked-By: Kirill Korotaev
    Signed-off-by: David S. Miller

    Benjamin Thery
     
  • This patch removes the following unused EXPORT_SYMBOL's:
    - reqsk_queue_alloc
    - __reqsk_queue_destroy
    - reqsk_queue_destroy

    Signed-off-by: Adrian Bunk
    Signed-off-by: David S. Miller

    Adrian Bunk
     
  • Signed-off-by: Denis V. Lunev
    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Denis V. Lunev
     
  • After the previous prep work this just consists of removing checks
    limiting the code to work in the initial network namespace, and
    updating rtmsg_ifinfo so we can generate events for devices in
    something other then the initial network namespace.

    Referring to network other network devices like the IFLA_LINK
    and IFLA_MASTER attributes do, gets interesting if those network
    devices happen to be in other network namespaces. Currently
    ifindex numbers are allocated globally so I have taken the path
    of least resistance and not still report the information even
    though the devices they are talking about are invisible.

    If applications start getting confused or when ifindex
    numbers become local to the network namespace we may need
    to do something different in the future.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: Denis V. Lunev
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • After this patch none of the netlink callback support anything
    except the initial network namespace but the rtnetlink infrastructure
    now handles multiple network namespaces.

    Changes from v2:
    - IPv6 addrlabel processing

    Changes from v1:
    - no need for special rtnl_unlock handling
    - fixed IPv6 ndisc

    Signed-off-by: Denis V. Lunev
    Signed-off-by: Eric W. Biederman
    Signed-off-by: David S. Miller

    Denis V. Lunev
     
  • Before I can enable rtnetlink to work in all network namespaces I need
    to be certain that something won't break. So this patch deliberately
    disables all of the rtnletlink methods in everything except the
    initial network namespace. After the methods have been audited this
    extra check can be disabled.

    Changes from v1:
    - added IPv6 addrlabel protection

    Signed-off-by: Denis V. Lunev
    Signed-off-by: Eric W. Biederman
    Signed-off-by: David S. Miller
    Signed-off-by: Herbert Xu

    Denis V. Lunev
     
  • The rx_flags variable is redundant. Turning rx on/off is done
    via setting the rx_np pointer.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • The local_mac is managed by the network device, no need to keep a
    spare copy and all the management problems that could cause.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • Restructure code slightly to improve readability:
    * dereference device once
    * change obvious while() loop
    * let poll_napi() handle null list itself

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • Use standard routine for flushing queue.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • This patch adds a protocol/address family number, ARP hardware type,
    ethernet packet type, and a line discipline number for the SocketCAN
    implementation.

    Signed-off-by: Oliver Hartkopp
    Signed-off-by: Urs Thuermann
    Signed-off-by: David S. Miller

    Oliver Hartkopp
     
  • The sock_valbool_flag() helper is used in setsockopt to
    set or reset some flag on the sock. This helper is required
    in the net/socket.c only, so move it there.

    Besides, patch two places in sys_setsockopt() that repeat
    this helper functionality manually.

    Since this is not a bugfix, but a trivial cleanup, I
    prepared this patch against net-2.6.25, but it also
    applies (with a single offset) to the latest net-2.6.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • We have a number of copies of dst_discard scattered around the place
    which all do the same thing, namely free a packet on the input or
    output paths.

    This patch deletes all of them except dst_discard and points all the
    users to it.

    The only non-trivial bit is decnet where it returns an error.
    However, conceptually this is identical to the blackhole functions
    used in IPv4 and IPv6 which do not return errors. So they should
    either all return errors or all return zero. For now I've stuck with
    the majority and picked zero as the return value.

    It doesn't really matter in practice since few if any driver would
    react differently depending on a zero return value or NET_RX_DROP.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Many-many code in the kernel initialized the timer->function
    and timer->data together with calling init_timer(timer). There
    is already a helper for this. Use it for networking code.

    The patch is HUGE, but makes the code 130 lines shorter
    (98 insertions(+), 228 deletions(-)).

    Signed-off-by: Pavel Emelyanov
    Acked-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • Add raw drops counter for IPv4 in /proc/net/raw .

    Signed-off-by: Wang Chen
    Signed-off-by: David S. Miller

    Wang Chen
     
  • Support for network splice receive.

    Signed-off-by: Jens Axboe
    Signed-off-by: David S. Miller

    Jens Axboe
     

26 Jan, 2008

1 commit

  • Replace all lock_cpu_hotplug/unlock_cpu_hotplug from the kernel and use
    get_online_cpus and put_online_cpus instead as it highlights the
    refcount semantics in these operations.

    The new API guarantees protection against the cpu-hotplug operation, but
    it doesn't guarantee serialized access to any of the local data
    structures. Hence the changes needs to be reviewed.

    In case of pseries_add_processor/pseries_remove_processor, use
    cpu_maps_update_begin()/cpu_maps_update_done() as we're modifying the
    cpu_present_map there.

    Signed-off-by: Gautham R Shenoy
    Signed-off-by: Ingo Molnar

    Gautham R Shenoy