11 Apr, 2011

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (34 commits)
    net: Add support for SMSC LAN9530, LAN9730 and LAN89530
    mlx4_en: Restoring RX buffer pointer in case of failure
    mlx4: Sensing link type at device initialization
    ipv4: Fix "Set rt->rt_iif more sanely on output routes."
    MAINTAINERS: add entry for Xen network backend
    be2net: Fix suspend/resume operation
    be2net: Rename some struct members for clarity
    pppoe: drop PPPOX_ZOMBIEs in pppoe_flush_dev
    dsa/mv88e6131: add support for mv88e6085 switch
    ipv6: Enable RFS sk_rxhash tracking for ipv6 sockets (v2)
    be2net: Fix a potential crash during shutdown.
    bna: Fix for handling firmware heartbeat failure
    can: mcp251x: Allow pass IRQ flags through platform data.
    smsc911x: fix mac_lock acquision before calling smsc911x_mac_read
    iwlwifi: accept EEPROM version 0x423 for iwl6000
    rt2x00: fix cancelling uninitialized work
    rtlwifi: Fix some warnings/bugs
    p54usb: IDs for two new devices
    wl12xx: fix potential buffer overflow in testmode nvs push
    zd1211rw: reset rx idle timer from tasklet
    ...

    Linus Torvalds
     

08 Apr, 2011

2 commits


06 Apr, 2011

1 commit


04 Apr, 2011

2 commits

  • ipv6 fib lookup can set RT6_LOOKUP_F_IFACE flag to restrict search
    to an interface, but this flag cannot be set via struct flowi.

    Also, it cannot be set via ip6_route_output: this function uses the
    passed sock struct to determine if this flag is required
    (by testing for nonzero sk_bound_dev_if).

    Work around this by passing in an artificial struct sk in case
    'strict' argument is true.

    This is required to replace the rt6_lookup call in xt_addrtype.c with
    nf_afinfo->route().

    Signed-off-by: Florian Westphal
    Acked-by: David S. Miller
    Signed-off-by: Patrick McHardy

    Florian Westphal
     
  • This is required to eventually replace the rt6_lookup call in
    xt_addrtype.c with nf_afinfo->route().

    Signed-off-by: Florian Westphal
    Acked-by: David S. Miller
    Signed-off-by: Patrick McHardy

    Florian Westphal
     

02 Apr, 2011

1 commit

  • All callers are prepared for alloc failures anyway, so this error
    can safely be boomeranged to the callers domain without super
    bad consequences. ...At worst the connection might go into a state
    where each RTO tries to (unsuccessfully) re-fragment with such
    a mis-sized value and eventually dies.

    Signed-off-by: Ilpo Järvinen
    Signed-off-by: David S. Miller

    Ilpo Järvinen
     

31 Mar, 2011

2 commits


30 Mar, 2011

1 commit

  • My commit 6d55cb91a0020ac0 (gre: fix hard header destination
    address checking) broke multicast.

    The reason is that ip_gre used to get ipgre_header() calls with
    zero destination if we have NOARP or multicast destination. Instead
    the actual target was decided at ipgre_tunnel_xmit() time based on
    per-protocol dissection.

    Instead of allowing the "abuse" of ->header() calls with invalid
    destination, this creates multicast mappings for ip_gre. This also
    fixes "ip neigh show nud noarp" to display the proper multicast
    mappings used by the gre device.

    Reported-by: Doug Kehn
    Signed-off-by: Timo Teräs
    Acked-by: Doug Kehn
    Signed-off-by: David S. Miller

    Timo Teräs
     

29 Mar, 2011

1 commit


28 Mar, 2011

1 commit

  • The current handling of echoed IP timestamp options with prespecified
    addresses is rather broken since the 2.2.x kernels. As far as i understand
    it, it should behave like when originating packets.

    Currently it will only timestamp the next free slot if:
    - there is space for *two* timestamps
    - some random data from the echoed packet taken as an IP is *not* a local IP

    This first is caused by an off-by-one error. 'soffset' points to the next
    free slot and so we only need to have 'soffset + 7
    Signed-off-by: David S. Miller

    Jan Luebbe
     

26 Mar, 2011

1 commit


25 Mar, 2011

3 commits

  • Move the scope value out of the fib alias entries and into fib_info,
    so that we always use the correct scope when recomputing the nexthop
    cached source address.

    Reported-by: Julian Anastasov
    Signed-off-by: David S. Miller

    David S. Miller
     
  • Any operation that:

    1) Brings up an interface
    2) Adds an IP address to an interface
    3) Deletes an IP address from an interface

    can potentially invalidate the nh_saddr value, requiring
    it to be recomputed.

    Perform the recomputation lazily using a generation ID.

    Reported-by: Julian Anastasov
    Signed-off-by: David S. Miller

    David S. Miller
     
  • Alessandro Suardi reported that we could not change route metrics :

    ip ro change default .... advmss 1400

    This regression came with commit 9c150e82ac50 (Allocate fib metrics
    dynamically). fib_metrics is no longer an array, but a pointer to an
    array.

    Reported-by: Alessandro Suardi
    Signed-off-by: Eric Dumazet
    Tested-by: Alessandro Suardi
    Signed-off-by: David S. Miller

    Eric Dumazet
     

24 Mar, 2011

2 commits

  • commit 2c8cec5c10bc (Cache learned PMTU information in inetpeer) added
    an extra inet_putpeer() call in ip_rt_update_pmtu().

    This results in various problems, since we can free one inetpeer, while
    it is still in use.

    Ref: http://www.spinics.net/lists/netdev/msg159121.html

    Reported-by: Alexander Beregalov
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • In commit 9435eb1cf0b76b323019cebf8d16762a50a12a19
    ("ipv4: Implement __ip_dev_find using new interface address hash.")
    we reimplemented __ip_dev_find() so that it doesn't have to
    do a full FIB table lookup.

    Instead, it consults a hash table of addresses configured to
    interfaces.

    This works identically to the old code in all except one case,
    and that is for loopback subnets.

    The old code would match the loopback device for any IP address
    that falls within a subnet configured to the loopback device.

    Handle this corner case by doing the FIB lookup.

    We could implement this via inet_addr_onlink() but:

    1) Someone could configure many addresses to loopback and
    inet_addr_onlink() is a simple list traversal.

    2) We know the old code works.

    Reported-by: Julian Anastasov
    Acked-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    David S. Miller
     

23 Mar, 2011

2 commits

  • Signed-off-by: David S. Miller

    David S. Miller
     
  • In the current undo logic, cwnd is moderated after it was restored
    to the value prior entering fast-recovery. It was moderated first
    in tcp_try_undo_recovery then again in tcp_complete_cwr.

    Since the undo indicates recovery was false, these moderations
    are not necessary. If the undo is triggered when most of the
    outstanding data have been acknowledged, the (restored) cwnd is
    falsely pulled down to a small value.

    This patch removes these cwnd moderations if cwnd is undone
    a) during fast-recovery
    b) by receiving DSACKs past fast-recovery

    Signed-off-by: Yuchung Cheng
    Signed-off-by: David S. Miller

    Yuchung Cheng
     

22 Mar, 2011

4 commits

  • Optimize the calling of fib_add_ifaddr for all
    secondary addresses after the promoted one to start from
    their place, not from the new place of the promoted
    secondary. It will save some CPU cycles because we
    are sure the promoted secondary was first for the subnet
    and all next secondaries do not change their place.

    Signed-off-by: Julian Anastasov
    Signed-off-by: David S. Miller

    Julian Anastasov
     
  • The secondary address promotion relies on fib_sync_down_addr
    to remove all routes created for the secondary addresses when
    the old primary address is deleted. It does not happen for cases
    when the primary address is also in another subnet. Fix that
    by deleting local and broadcast routes for all secondaries while
    they are on device list and by faking that all addresses from
    this subnet are to be deleted. It relies on fib_del_ifaddr being
    able to ignore the IPs from the concerned subnet while checking
    for duplication.

    Signed-off-by: Julian Anastasov
    Signed-off-by: David S. Miller

    Julian Anastasov
     
  • Alex Sidorenko reported for problems with local
    routes left after IP addresses are deleted. It happens
    when same IPs are used in more than one subnet for the
    device.

    Fix fib_del_ifaddr to restrict the checks for duplicate
    local and broadcast addresses only to the IFAs that use
    our primary IFA or another primary IFA with same address.
    And we expect the prefsrc to be matched when the routes
    are deleted because it is possible they to differ only by
    prefsrc. This patch prevents local and broadcast routes
    to be leaked until their primary IP is deleted finally
    from the box.

    As the secondary address promotion needs to delete
    the routes for all secondaries that used the old primary IFA,
    add option to ignore these secondaries from the checks and
    to assume they are already deleted, so that we can safely
    delete the route while these IFAs are still on the device list.

    Reported-by: Alex Sidorenko
    Signed-off-by: Julian Anastasov
    Signed-off-by: David S. Miller

    Julian Anastasov
     
  • fib_table_delete forgets to match the routes by prefsrc.
    Callers can specify known IP in fc_prefsrc and we should remove
    the exact route. This is needed for cases when same local or
    broadcast addresses are used in different subnets and the
    routes differ only in prefsrc. All callers that do not provide
    fc_prefsrc will ignore the route prefsrc as before and will
    delete the first occurence. That is how the ip route del default
    magic works.

    Current callers are:

    - ip_rt_ioctl where rtentry_to_fib_config provides fc_prefsrc only
    when the provided device name matches IP label with colon.

    - inet_rtm_delroute where RTA_PREFSRC is optional too

    - fib_magic which deals with routes when deleting addresses
    and where the fc_prefsrc is always set with the primary IP
    for the concerned IFA.

    Signed-off-by: Julian Anastasov
    Signed-off-by: David S. Miller

    Julian Anastasov
     

20 Mar, 2011

2 commits

  • 'buffer' string is copied from userspace. It is not checked whether it is
    zero terminated. This may lead to overflow inside of simple_strtoul().
    Changli Gao suggested to copy not more than user supplied 'size' bytes.

    It was introduced before the git epoch. Files "ipt_CLUSTERIP/*" are
    root writable only by default, however, on some setups permissions might be
    relaxed to e.g. network admin user.

    Signed-off-by: Vasiliy Kulikov
    Acked-by: Changli Gao
    Signed-off-by: Patrick McHardy

    Vasiliy Kulikov
     
  • commit f3c5c1bfd4308 (make ip_tables reentrant) introduced a race in
    handling the stackptr restore, at the end of ipt_do_table()

    We should do it before the call to xt_info_rdunlock_bh(), or we allow
    cpu preemption and another cpu overwrites stackptr of original one.

    A second fix is to change the underflow test to check the origptr value
    instead of 0 to detect underflow, or else we allow a jump from different
    hooks.

    Signed-off-by: Eric Dumazet
    Cc: Jan Engelhardt
    Signed-off-by: Patrick McHardy

    Eric Dumazet
     

16 Mar, 2011

4 commits


15 Mar, 2011

9 commits

  • Structures ipt_replace, compat_ipt_replace, and xt_get_revision are
    copied from userspace. Fields of these structs that are
    zero-terminated strings are not checked. When they are used as argument
    to a format string containing "%s" in request_module(), some sensitive
    information is leaked to userspace via argument of spawned modprobe
    process.

    The first and the third bugs were introduced before the git epoch; the
    second was introduced in 2722971c (v2.6.17-rc1). To trigger the bug
    one should have CAP_NET_ADMIN.

    Signed-off-by: Vasiliy Kulikov
    Signed-off-by: Patrick McHardy

    Vasiliy Kulikov
     
  • Structures ipt_replace, compat_ipt_replace, and xt_get_revision are
    copied from userspace. Fields of these structs that are
    zero-terminated strings are not checked. When they are used as argument
    to a format string containing "%s" in request_module(), some sensitive
    information is leaked to userspace via argument of spawned modprobe
    process.

    The first bug was introduced before the git epoch; the second is
    introduced by 6b7d31fc (v2.6.15-rc1); the third is introduced by
    6b7d31fc (v2.6.15-rc1). To trigger the bug one should have
    CAP_NET_ADMIN.

    Signed-off-by: Vasiliy Kulikov
    Signed-off-by: Patrick McHardy

    Vasiliy Kulikov
     
  • HyStart sets the initial exit point of slow start.
    Suppose that HyStart exits at 0.5BDP in a BDP network and no history exists.
    If the BDP of a network is large, CUBIC's initial cwnd growth may be
    too conservative to utilize the link.
    CUBIC increases the cwnd 20% per RTT in this case.

    Signed-off-by: Sangtae Ha
    Acked-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Sangtae Ha
     
  • Make HyStart less sensitive to abrupt delay variations due to buffer bloat.

    Signed-off-by: Sangtae Ha
    Acked-by: Stephen Hemminger
    Reported-by: Lucas Nussbaum
    Signed-off-by: David S. Miller

    Sangtae Ha
     
  • This is a refined version of an earlier patch by Lucas Nussbaum.
    Cubic needs RTT values in milliseconds. If HZ < 1000 then
    the values will be too coarse.

    Signed-off-by: Stephen Hemminger
    Reported-by: Lucas Nussbaum
    Signed-off-by: David S. Miller

    stephen hemminger
     
  • The hystart code was written with assumption that HZ=1000.
    Replace the use of jiffies with bictcp_clock as a millisecond
    real time clock.

    Signed-off-by: Stephen Hemminger
    Reported-by: Lucas Nussbaum
    Signed-off-by: David S. Miller

    stephen hemminger
     
  • Make the spacing between ACK's that indicates a train a tuneable
    value like other hystart values.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    stephen hemminger
     
  • Jiffies wraps around therefore the correct way to compare is
    to use cast to signed value.

    Note: cubic is not using full jiffies value on 64 bit arch
    because using full unsigned long makes struct bictcp grow too
    large for the available ca_priv area.

    Includes correction from Sangtae Ha to improve ack train detection.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    stephen hemminger
     
  • In the congestion control interface, the callback for each ACK
    includes an estimated round trip time in microseconds.
    Some algorithms need high resolution (Vegas style) but most only
    need jiffie resolution. If RTT is not accurate (like a retransmission)
    -1 is used as a flag value.

    When doing coarse resolution if RTT is less than a a jiffie
    then 0 should be returned rather than no estimate. Otherwise algorithms
    that expect good ack's to trigger slow start (like CUBIC Hystart)
    will be confused.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    stephen hemminger
     

14 Mar, 2011

1 commit