12 Jul, 2007

3 commits

  • Drivers need to validate the initial addresses in their netlink attribute
    validation function or manually reject them if they can't support this.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • All drivers need to unregister their devices in the module unload function.
    While doing so they must hold the rtnl and atomically unregister the
    rtnl_link ops as well. This makes the rtnl_link_unregister function that
    takes the rtnl itself completely useless.

    Provide default newlink/dellink functions, make __rtnl_link_unregister and
    rtnl_link_unregister unregister all devices with matching rtnl_link_ops and
    change the existing users to take advantage of that.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • Keep netpoll/poll_napi from messing with the poll_list.
    Only net_rx_action is allowed to manipulate the list.

    Signed-off-by: Olaf Kirch
    Signed-off-by: David S. Miller

    Olaf Kirch
     

11 Jul, 2007

21 commits

  • As noticed by Jarek Poplawski , the timer removal in
    gen_kill_estimator races with the timer function rearming the timer.

    Check whether the timer list is empty before rearming the timer
    in the timer function to fix this.

    Signed-off-by: Patrick McHardy
    Acked-by: Jarek Poplawski
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • 93ec2c723e3f8a216dde2899aeb85c648672bc6b applied excessive duct tape to
    the netpoll beast's netpoll_cleanup(), thus substituting one leak with
    another, and opening up a little buglet :-)

    net_device->npinfo (netpoll_info) is a shared and refcounted object and
    cannot simply be set NULL the first time netpoll_cleanup() is called.
    Otherwise, further netpoll_cleanup()'s see np->dev->npinfo == NULL and
    become no-ops, thus leaking. And it's a bug too: the first call to
    netpoll_cleanup() would thus (annoyingly) "disable" other (still alive)
    netpolls too. Maybe nobody noticed this because netconsole (only user
    of netpoll) never supported multiple netpoll objects earlier.

    This is a trivial and obvious one-line fixlet.

    Signed-off-by: Satyam Sharma
    Signed-off-by: David S. Miller

    Satyam Sharma
     
  • - save 4 bytes

    - it's read-mostly.

    Signed-off-by: Andrew Morton
    Acked-by: Vasily Averin
    Signed-off-by: David S. Miller

    Andrew Morton
     
  • This includes /proc/net/protocols, /proc/net/rxrpc_calls and
    /proc/net/rxrpc_connections files.

    All three need seq_list_start_head to show some header.

    Signed-off-by: Pavel Emelianov
    Signed-off-by: Andrew Morton
    Signed-off-by: David S. Miller

    Pavel Emelianov
     
  • The TRACE target can be used to follow IP and IPv6 packets through
    the ruleset.

    Signed-off-by: Jozsef Kadlecsik
    Signed-off-by: Patrick NcHardy
    Signed-off-by: David S. Miller

    Jozsef Kadlecsik
     
  • Added transport mode ESP support for starters. I will send more of
    these modes and types once i have resolved the tunnel mode isses.

    Signed-off-by: Jamal Hadi Salim
    Signed-off-by: Robert Olsson
    Signed-off-by: David S. Miller

    Jamal Hadi Salim
     
  • By default all flows in pktgen are randomly selected.
    This patch introduces ability to have all defined flows to
    be sent sequentially. Robert defined randomness to be the
    default behavior.

    Signed-off-by: Jamal Hadi Salim
    Signed-off-by: Robert Olsson
    Signed-off-by: David S. Miller

    Jamal Hadi Salim
     
  • Track the extra packet overhead for VLAN tags, MPLS, IPSEC etc

    Signed-off-by: Jamal Hadi Salim
    Signed-off-by: Robert Olsson
    Signed-off-by: David S. Miller

    Jamal Hadi Salim
     
  • When a reference to an existing address is increased or decreased without
    hitting zero, the address count is incorrectly adjusted.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • Add the multiqueue hardware device support API to the core network
    stack. Allow drivers to allocate multiple queues and manage them at
    the netdev level if they choose to do so.

    Added a new field to sk_buff, namely queue_mapping, for drivers to
    know which tx_ring to select based on OS classification of the flow.

    Signed-off-by: Peter P Waskiewicz Jr
    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Peter P Waskiewicz Jr
     
  • This patch fixes a boolean error in the new TX checksum check
    that causes bogus TSO packets to be generated.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Add support for configuring secondary unicast addresses on network
    devices. To support this devices capable of filtering multiple
    unicast addresses need to change their set_multicast_list function
    to configure unicast filters as well and assign it to dev->set_rx_mode
    instead of dev->set_multicast_list. Other devices are put into promiscous
    mode when secondary unicast addresses are present.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • Use generic net_device address lists for multicast list handling.
    Some defines are used to keep drivers working.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • Introduce struct dev_addr_list and list maintenance functions
    based on dev_mc_list and the related functions. This will be
    used by follow-up patches for both multicast and secondary
    unicast addresses.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • dev_mc_add/dev_mc_delete take care of uploading the list when
    necessary and thats the only interface other code should use.
    Also remove two incorrect calls in DECnet.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • The existing model for checksum offload does not correctly handle
    devices that can offload IPV4 and IPV6 only. The NETIF_F_HW_CSUM flag
    implies device can do any arbitrary protocol.

    This patch:
    * adds NETIF_F_IPV6_CSUM for those devices
    * fixes bnx2 and tg3 devices that need it
    * add NETIF_F_IPV6_CSUM to ipv6 output (incl GSO)
    * fixes assumptions about NETIF_F_ALL_CSUM in nat
    * adjusts bridge union of checksumming computation

    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • Sent the wrong patch previously.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • Add a nested compat attribute type that can be used to convert
    attributes that contain a structure to nested attributes in a
    backwards compatible way.

    The attribute looks like this:

    struct {
    [ compat contents ]
    struct rtattr {
    .rta_len = total size,
    .rta_type = type,
    } rta;
    struct old_structure struct;

    [ nested top-level attribute ]
    struct rtattr {
    .rta_len = nest size,
    .rta_type = type,
    } nest_attr;

    [ optional 0 .. n nested attributes ]
    struct rtattr {
    .rta_len = private attribute len,
    .rta_type = private attribute typ,
    } nested_attr;
    struct nested_data data;
    };

    Since both userspace and kernel deal correctly with attributes that are
    larger than expected old versions will just parse the compat part and
    ignore the rest.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • Currently NAT (and others) that want to modify cloned skbs copy them,
    even if in the vast majority of cases its not necessary because the
    skb is a clone made by TCP and the portion NAT wants to modify is
    actually writable because TCP release the header reference before
    cloning.

    The problem is that there is no clean way for NAT to find out how
    long the writable header area is, so this patch introduces skb->hdr_len
    to hold this length. When a headerless skb is cloned skb->hdr_len
    is set to the current headroom, for regular clones it is copied from
    the original. A new function skb_clone_writable(skb, len) returns
    whether the skb is writable up to len bytes from skb->data. To avoid
    enlarging the skb the mac_len field is reduced to 16 bit and the
    new hdr_len field is put in the remaining 16 bit.

    I've done a few rough benchmarks of NAT (not with this exact patch,
    but a very similar one). As expected it saves huge amounts of system
    time in case of sendfile, bringing it down to basically the same
    amount as without NAT, with sendmsg it only helps on loopback,
    probably because of the large MTU.

    Transmit a 1GB file using sendfile/sendmsg over eth0/lo with and
    without NAT:

    - sendfile eth0, no NAT: sys 0m0.388s
    - sendfile eth0, NAT: sys 0m1.835s
    - sendfile eth0: NAT + path: sys 0m0.370s (~ -80%)

    - sendfile lo, no NAT: sys 0m0.258s
    - sendfile lo, NAT: sys 0m2.609s
    - sendfile lo, NAT + patch: sys 0m0.260s (~ -90%)

    - sendmsg eth0, no NAT: sys 0m2.508s
    - sendmsg eth0, NAT: sys 0m2.539s
    - sendmsg eth0, NAT + patch: sys 0m2.445s (no change)

    - sendmsg lo, no NAT: sys 0m2.151s
    - sendmsg lo, NAT: sys 0m3.557s
    - sendmsg lo, NAT + patch: sys 0m2.159s (~ -40%)

    I expect other users can see a similar performance improvement,
    packet mangling iptables targets, ipip and ip_gre come to mind ..

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • Add rtnetlink API for creating, changing and deleting software devices.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • Split up rtnl_setlink into a function performing validation and a function
    performing the actual changes. This allows to share the modifcation logic
    with rtnl_newlink, which is introduced by the next patch.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     

06 Jul, 2007

3 commits


29 Jun, 2007

1 commit

  • #1
    Until kernel ver. 2.6.21 (including) cancel_rearming_delayed_work()
    required a work function should always (unconditionally) rearm with
    delay > 0 - otherwise it would endlessly loop. This patch replaces
    this function with cancel_delayed_work(). Later kernel versions don't
    require this, so here it's only for uniformity.

    #2
    After deleting a timer in cancel_[rearming_]delayed_work() there could
    stay a last skb queued in npinfo->txq causing a memory leak after
    kfree(npinfo).

    Initial patch & testing by: Jason Wessel

    Signed-off-by: Jarek Poplawski
    Signed-off-by: Andrew Morton
    Signed-off-by: David S. Miller

    Jarek Poplawski
     

27 Jun, 2007

1 commit

  • If sky2 device poll routine is called from netpoll_send_skb, it would
    deadlock. The netpoll_send_skb held the netif_tx_lock, and the poll
    routine could acquire it to clean up skb's. Other drivers might use
    same locking model.

    The driver is correct, netpoll should not introduce more locking
    problems than it causes already. So change the code to drop lock
    before calling poll handler.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     

24 Jun, 2007

3 commits


08 Jun, 2007

4 commits


04 Jun, 2007

1 commit

  • This isn't a bug just yet as only TCP uses sk_setup_caps for GSO.
    However, if and when UDP or something else starts using it this is
    likely to cause a problem if we forget to add software emulation
    for it at the same time.

    The problem is that right now we translate GSO emulation to the
    bitmask NETIF_F_GSO_MASK, which includes every protocol, even
    ones that we cannot emulate.

    This patch makes it provide only the ones that we can emulate.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

31 May, 2007

2 commits

  • in4_pton converts a textual representation of an ip4 address
    into an integer representation. However, when the textual representation
    is of in the form ip:port, e.g. 192.168.1.1:5060, and 'delim' is set to
    -1, the function bails out with an error when reading the colon.

    It makes sense to allow the colon as a delimiting character without
    explicitly having to set it through the 'delim' variable as there can be
    no ambiguity in the point where the ip address is completely parsed. This
    function is indeed called from nf_conntrack_sip.c in this way to parse
    textual ip:port combinations which fails due to the reason stated above.

    Signed-off-by: Jerome Borsboom
    Signed-off-by: David S. Miller

    Jerome Borsboom
     
  • Signed-off-by: David S. Miller

    David S. Miller
     

25 May, 2007

1 commit

  • The current IPSEC rule resolution behavior we have does not work for a
    lot of people, even though technically it's an improvement from the
    -EAGAIN buisness we had before.

    Right now we'll block until the key manager resolves the route. That
    works for simple cases, but many folks would rather packets get
    silently dropped until the key manager resolves the IPSEC rules.

    We can't tell these folks to "set the socket non-blocking" because
    they don't have control over the non-block setting of things like the
    sockets used to resolve DNS deep inside of the resolver libraries in
    libc.

    With that in mind I coded up the patch below with some help from
    Herbert Xu which provides packet-drop behavior during larval state
    resolution, controllable via sysctl and off by default.

    This lays the framework to either:

    1) Make this default at some point or...

    2) Move this logic into xfrm{4,6}_policy.c and implement the
    ARP-like resolution queue we've all been dreaming of.
    The idea would be to queue packets to the policy, then
    once the larval state is resolved by the key manager we
    re-resolve the route and push the packets out. The
    packets would timeout if the rule didn't get resolved
    in a certain amount of time.

    Signed-off-by: David S. Miller

    David S. Miller