16 Jun, 2008

1 commit


13 Jun, 2008

3 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
    tcp: Revert 'process defer accept as established' changes.
    ipv6: Fix duplicate initialization of rawv6_prot.destroy
    bnx2x: Updating the Maintainer
    net: Eliminate flush_scheduled_work() calls while RTNL is held.
    drivers/net/r6040.c: correct bad use of round_jiffies()
    fec_mpc52xx: MPC52xx_MESSAGES_DEFAULT: 2nd NETIF_MSG_IFDOWN => IFUP
    ipg: fix receivemode IPG_RM_RECEIVEMULTICAST{,HASH} in ipg_nic_set_multicast_list()
    netfilter: nf_conntrack: fix ctnetlink related crash in nf_nat_setup_info()
    netfilter: Make nflog quiet when no one listen in userspace.
    ipv6: Fail with appropriate error code when setting not-applicable sockopt.
    ipv6: Check IPV6_MULTICAST_LOOP option value.
    ipv6: Check the hop limit setting in ancillary data.
    ipv6 route: Fix route lifetime in netlink message.
    ipv6 mcast: Check address family of gf_group in getsockopt(MS_FILTER).
    dccp: Bug in initial acknowledgment number assignment
    dccp ccid-3: X truncated due to type conversion
    dccp ccid-3: TFRC reverse-lookup Bug-Fix
    dccp ccid-2: Bug-Fix - Ack Vectors need to be ignored on request sockets
    dccp: Fix sparse warnings
    dccp ccid-3: Bug-Fix - Zero RTT is possible

    Linus Torvalds
     
  • This reverts two changesets, ec3c0982a2dd1e671bad8e9d26c28dcba0039d87
    ("[TCP]: TCP_DEFER_ACCEPT updates - process as established") and
    the follow-on bug fix 9ae27e0adbf471c7a6b80102e38e1d5a346b3b38
    ("tcp: Fix slab corruption with ipv6 and tcp6fuzz").

    This change causes several problems, first reported by Ingo Molnar
    as a distcc-over-loopback regression where connections were getting
    stuck.

    Ilpo Järvinen first spotted the locking problems. The new function
    added by this code, tcp_defer_accept_check(), only has the
    child socket locked, yet it is modifying state of the parent
    listening socket.

    Fixing that is non-trivial at best, because we can't simply just grab
    the parent listening socket lock at this point, because it would
    create an ABBA deadlock. The normal ordering is parent listening
    socket --> child socket, but this code path would require the
    reverse lock ordering.

    Next is a problem noticed by Vitaliy Gusev, he noted:

    ----------------------------------------
    >--- a/net/ipv4/tcp_timer.c
    >+++ b/net/ipv4/tcp_timer.c
    >@@ -481,6 +481,11 @@ static void tcp_keepalive_timer (unsigned long data)
    > goto death;
    > }
    >
    >+ if (tp->defer_tcp_accept.request && sk->sk_state == TCP_ESTABLISHED) {
    >+ tcp_send_active_reset(sk, GFP_ATOMIC);
    >+ goto death;

    Here socket sk is not attached to listening socket's request queue. tcp_done()
    will not call inet_csk_destroy_sock() (and tcp_v4_destroy_sock() which should
    release this sk) as socket is not DEAD. Therefore socket sk will be lost for
    freeing.
    ----------------------------------------

    Finally, Alexey Kuznetsov argues that there might not even be any
    real value or advantage to these new semantics even if we fix all
    of the bugs:

    ----------------------------------------
    Hiding from accept() sockets with only out-of-order data only
    is the only thing which is impossible with old approach. Is this really
    so valuable? My opinion: no, this is nothing but a new loophole
    to consume memory without control.
    ----------------------------------------

    So revert this thing for now.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • In changeset 22dd485022f3d0b162ceb5e67d85de7c3806aa20
    ("raw: Raw socket leak.") code was added so that we
    flush pending frames on raw sockets to avoid leaks.

    The ipv4 part was fine, but the ipv6 part was not
    done correctly. Unlike the ipv4 side, the ipv6 code
    already has a .destroy method for rawv6_prot.

    So now there were two assignments to this member, and
    what the compiler does is use the last one, effectively
    making the ipv6 parts of that changeset a NOP.

    Fix this by removing the:

    .destroy = inet6_destroy_sock,

    line, and adding an inet6_destroy_sock() call to the
    end of raw6_destroy().

    Noticed by Al Viro.

    Signed-off-by: David S. Miller
    Acked-by: YOSHIFUJI Hideaki

    David S. Miller
     

12 Jun, 2008

9 commits


11 Jun, 2008

10 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (42 commits)
    net: Fix routing tables with id > 255 for legacy software
    sky2: Hold RTNL while calling dev_close()
    s2io iomem annotations
    atl1: fix suspend regression
    qeth: start dev queue after tx drop error
    qeth: Prepare-function to call s390dbf was wrong
    qeth: reduce number of kernel messages
    qeth: Use ccw_device_get_id().
    qeth: layer 3 Oops in ip event handler
    virtio: use callback on empty in virtio_net
    virtio: virtio_net free transmit skbs in a timer
    virtio: Fix typo in virtio_net_hdr comments
    virtio_net: Fix skb->csum_start computation
    ehea: set mac address fix
    sfc: Recover from RX queue flush failure
    add missing lance_* exports
    ixgbe: fix typo
    forcedeth: msi interrupts
    ipsec: pfkey should ignore events when no listeners
    pppoe: Unshare skb before anything else
    ...

    Linus Torvalds
     
  • Step 8.5 in RFC 4340 says for the newly cloned socket

    Initialize S.GAR := S.ISS,

    but what in fact the code (minisocks.c) does is

    Initialize S.GAR := S.ISR,

    which is wrong (typo?) -- fixed by the patch.

    Signed-off-by: Gerrit Renker

    Gerrit Renker
     
  • This fixes a bug in computing the inter-packet-interval t_ipi = s/X:

    scaled_div32(a, b) uses u32 for b, but in "scaled_div32(s, X)" the type of the
    sending rate `X' is u64. Since X is scaled by 2^6, this truncates rates greater
    than 2^26 Bps (~537 Mbps).

    Using full 64-bit division now.

    Signed-off-by: Gerrit Renker

    Gerrit Renker
     
  • This fixes a bug in the reverse lookup of p: given a value f(p), instead of p,
    the function returned the smallest tabulated value f(p).

    The smallest tabulated value of

    10^6 * f(p) = sqrt(2*p/3) + 12 * sqrt(3*p/8) * (32 * p^3 + p)

    for p=0.0001 is 8172.

    Since this value is scaled by 10^6, the outcome of this bug is that a loss
    of 8172/10^6 = 0.8172% was reported whenever the input was below the table
    resolution of 0.01%.

    This means that the value was over 80 times too high, resulting in large spikes
    of the initial loss interval, thus unnecessarily reducing the throughput.

    Also corrected the printk format (%u for u32).

    Signed-off-by: Gerrit Renker

    Gerrit Renker
     
  • This fixes an oversight from an earlier patch, ensuring that Ack Vectors
    are not processed on request sockets.

    The issue is that Ack Vectors must not be parsed on request sockets, since
    the Ack Vector feature depends on the selection of the (TX) CCID. During the
    initial handshake the CCIDs are undefined, and so RFC 4340, 10.3 applies:

    "Using CCID-specific options and feature options during a negotiation
    for the corresponding CCID feature is NOT RECOMMENDED [...]"

    And it is not even possible: when the server receives the Request from the
    client, the CCID and Ack vector features are undefined; when the Ack finalising
    the 3-way hanshake arrives, the request socket has not been cloned yet into a
    full socket. (This order is necessary, since otherwise the newly created socket
    would have to be destroyed whenever an option error occurred - a malicious
    hacker could simply send garbage options and exploit this.)

    Signed-off-by: Gerrit Renker

    Gerrit Renker
     
  • This patch fixes the following sparse warnings:
    * nested min(max()) expression:
    net/dccp/ccids/ccid3.c:91:21: warning: symbol '__x' shadows an earlier one
    net/dccp/ccids/ccid3.c:91:21: warning: symbol '__y' shadows an earlier one

    * Declaration of function prototypes in .c instead of .h file, resulting in
    "should it be static?" warnings.

    * Declared "struct dccpw" static (local to dccp_probe).

    * Disabled dccp_delayed_ack() - not fully removed due to RFC 4340, 11.3
    ("Receivers SHOULD implement delayed acknowledgement timers ...").

    * Used a different local variable name to avoid
    net/dccp/ackvec.c:293:13: warning: symbol 'state' shadows an earlier one
    net/dccp/ackvec.c:238:33: originally declared here

    * Removed unused functions `dccp_ackvector_print' and `dccp_ackvec_print'.

    Signed-off-by: Gerrit Renker

    Gerrit Renker
     
  • In commit $(825de27d9e40b3117b29a79d412b7a4b78c5d815) (from 27th May, commit
    message `dccp ccid-3: Fix "t_ipi explosion" bug'), the CCID-3 window counter
    computation was fixed to cope with RTTs < 4 microseconds.

    Such RTTs can be found e.g. when running CCID-3 over loopback. The fix removed
    a check against RTT < 4, but introduced a divide-by-zero bug.

    All steady-state RTTs in DCCP are filtered using dccp_sample_rtt(), which
    ensures non-zero samples. However, a zero RTT is possible on initialisation,
    when there is no RTT sample from the Request/Response exchange.

    The fix is to use the fallback-RTT from RFC 4340, 3.4.

    This is also better than just fixing update_win_count() since it allows other
    parts of the code to always assume that the RTT is non-zero during the time
    that the CCID is used.

    Signed-off-by: Gerrit Renker

    Gerrit Renker
     
  • Most legacy software do not like tables > 255 as rtm_table is u8
    so tb_id is sent &0xff and it is possible to mismatch for example
    table 510 with table 254 (main).

    This patch introduces RT_TABLE_COMPAT=252 so the code uses it if
    tb_id > 255. It makes such old applications happy, new
    ones are still able to use RTA_TABLE to get a proper table id.

    Signed-off-by: Krzysztof Piotr Oledzki
    Acked-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Krzysztof Piotr Oledzki
     
  • When pfkey has no km listeners, it still does a lot of work
    before finding out there aint nobody out there.
    If a tree falls in a forest and no one is around to hear it, does it make
    a sound? In this case it makes a lot of noise:
    With this short-circuit adding 10s of thousands of SAs using
    netlink improves performance by ~10%.

    Signed-off-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Jamal Hadi Salim
     
  • Wei Yongjun noticed that we may call reqsk_free on request sock objects where
    the opt fields may not be initialized, fix it by introducing inet_reqsk_alloc
    where we initialize ->opt to NULL and set ->pktopts to NULL in
    inet6_reqsk_alloc.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     

10 Jun, 2008

5 commits

  • The bindv6only is tuned via sysctl. It is already on a struct net
    and per-net sysctls allow for its modification (ipv6_sysctl_net_init).

    Despite this the value configured in the init net is used for the
    rest of them.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • This patch adds a check to the set_channel flow. When attempting to change
    the channel while in IBSS mode, and the new channel does not support IBSS
    mode, the flow return with an error value with no consequences on the
    mac80211 and driver state.

    Signed-off-by: Assaf Krauss
    Signed-off-by: Emmanuel Grumbach
    Signed-off-by: Tomas Winkler
    Signed-off-by: John W. Linville

    Assaf Krauss
     
  • Sufficient scans (at least 2 or 3) should have been done within 7
    seconds to find an existing IBSS to join. This should improve IBSS
    creation latency; and since IBSS merging is still in effect, shouldn't
    have detrimental effects on eventual IBSS convergence.

    Signed-off-by: Dan Williams
    Signed-off-by: John W. Linville

    Dan Williams
     
  • This patch fixes the issue of slow reconnection to an IBSS cell after
    disconnection from it. Now the interface's bssid is reset upon ifdown.

    ieee80211_sta_find_ibss:
    if (found && memcmp(ifsta->bssid, bssid, ETH_ALEN) != 0 &&
    (bss = ieee80211_rx_bss_get(dev, bssid,
    local->hw.conf.channel->center_freq,
    ifsta->ssid, ifsta->ssid_len)))

    Note:
    In general disconnection is still not handled properly in mac80211

    Signed-off-by: Assaf Krauss
    Signed-off-by: Tomas Winkler
    Signed-off-by: John W. Linville

    Assaf Krauss
     
  • Otherwise userspace has no idea the IBSS creation succeeded.

    Signed-off-by: Dan Williams
    Signed-off-by: John W. Linville

    Dan Williams
     

06 Jun, 2008

1 commit

  • - Don't trust a length which is greater than the working buffer.
    An invalid length could cause overflow when calculating buffer size
    for decoding oid.

    - An oid length of zero is invalid and allows for an off-by-one error when
    decoding oid because the first subid actually encodes first 2 subids.

    - A primitive encoding may not have an indefinite length.

    Thanks to Wei Wang from McAfee for report.

    Cc: Steven French
    Cc: stable@kernel.org
    Acked-by: Patrick McHardy
    Signed-off-by: Chris Wright
    Signed-off-by: Linus Torvalds

    Chris Wright
     

05 Jun, 2008

11 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (56 commits)
    l2tp: Fix possible oops if transmitting or receiving when tunnel goes down
    tcp: Fix for race due to temporary drop of the socket lock in skb_splice_bits.
    tcp: Increment OUTRSTS in tcp_send_active_reset()
    raw: Raw socket leak.
    lt2p: Fix possible WARN_ON from socket code when UDP socket is closed
    USB ID for Philips CPWUA054/00 Wireless USB Adapter 11g
    ssb: Fix context assertion in ssb_pcicore_dev_irqvecs_enable
    libertas: fix command size for CMD_802_11_SUBSCRIBE_EVENT
    ipw2200: expire and use oldest BSS on adhoc create
    airo warning fix
    b43legacy: Fix controller restart crash
    sctp: Fix ECN markings for IPv6
    sctp: Flush the queue only once during fast retransmit.
    sctp: Start T3-RTX timer when fast retransmitting lowest TSN
    sctp: Correctly implement Fast Recovery cwnd manipulations.
    sctp: Move sctp_v4_dst_saddr out of loop
    sctp: retran_path update bug fix
    tcp: fix skb vs fack_count out-of-sync condition
    sunhme: Cleanup use of deprecated calls to save_and_cli and restore_flags.
    xfrm: xfrm_algo: correct usage of RIPEMD-160
    ...

    Linus Torvalds
     
  • skb_splice_bits temporary drops the socket lock while iterating over
    the socket queue in order to break a reverse locking condition which
    happens with sendfile. This, however, opens a window of opportunity
    for tcp_collapse() to aggregate skbs and thus potentially free the
    current skb used in skb_splice_bits and tcp_read_sock.

    This patch fixes the problem by (re-)getting the same "logical skb"
    after the lock has been temporary dropped.

    Based on idea and initial patch from Evgeniy Polyakov.

    Signed-off-by: Octavian Purdila
    Acked-by: Evgeniy Polyakov
    Signed-off-by: David S. Miller

    Octavian Purdila
     
  • TCP "resets sent" counter is not incremented when a TCP Reset is
    sent via tcp_send_active_reset().

    Signed-off-by: Sridhar Samudrala
    Signed-off-by: David S. Miller

    Sridhar Samudrala
     
  • The program below just leaks the raw kernel socket

    int main() {
    int fd = socket(PF_INET, SOCK_RAW, IPPROTO_UDP);
    struct sockaddr_in addr;

    memset(&addr, 0, sizeof(addr));
    inet_aton("127.0.0.1", &addr.sin_addr);
    addr.sin_family = AF_INET;
    addr.sin_port = htons(2048);
    sendto(fd, "a", 1, MSG_MORE, &addr, sizeof(addr));
    return 0;
    }

    Corked packet is allocated via sock_wmalloc which holds the owner socket,
    so one should uncork it and flush all pending data on close. Do this in the
    same way as in UDP.

    Signed-off-by: Denis V. Lunev
    Acked-by: Alexey Kuznetsov
    Signed-off-by: David S. Miller

    Denis V. Lunev
     
  • Commit e9df2e8fd8fbc95c57dbd1d33dada66c4627b44c ("[IPV6]: Use
    appropriate sock tclass setting for routing lookup.") also changed the
    way that ECN capable transports mark this capability in IPv6. As a
    result, SCTP was not marking ECN capablity because the traffic class
    was never set. This patch brings back the markings for IPv6 traffic.

    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich
     
  • When fast retransmit is triggered by a sack, we should flush the queue
    only once so that only 1 retransmit happens. Also, since we could
    potentially have non-fast-rtx chunks on the retransmit queue, we need
    make sure any chunks eligable for fast retransmit are sent first
    during fast retransmission.

    Signed-off-by: Vlad Yasevich
    Tested-by: Wei Yongjun
    Signed-off-by: David S. Miller

    Vlad Yasevich
     
  • When we are trying to fast retransmit the lowest outstanding TSN, we
    need to restart the T3-RTX timer, so that subsequent timeouts will
    correctly tag all the packets necessary for retransmissions.

    Signed-off-by: Vlad Yasevich
    Tested-by: Wei Yongjun
    Signed-off-by: David S. Miller

    Vlad Yasevich
     
  • Correctly keep track of Fast Recovery state and do not reduce
    congestion window multiple times during sucht state.

    Signed-off-by: Vlad Yasevich
    Tested-by: Wei Yongjun
    Signed-off-by: David S. Miller

    Vlad Yasevich
     
  • There's no need to execute sctp_v4_dst_saddr() for each
    iteration, just move it out of loop.

    Signed-off-by: Gui Jianfeng
    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Gui Jianfeng
     
  • If the current retran_path is the only active one, it should
    update it to the the next inactive one.

    Signed-off-by: Gui Jianfeng
    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Gui Jianfeng
     
  • David S. Miller