19 Oct, 2011

1 commit

  • The transparent socket option setting was not copied to the time wait
    socket when an inet socket was being replaced by a time wait socket. This
    broke the --transparent option of the socket match and may have caused
    that FIN packets belonging to sockets in FIN_WAIT2 or TIME_WAIT state
    were being dropped by the packet filter.

    Signed-off-by: KOVACS Krisztian
    Signed-off-by: David S. Miller

    KOVACS Krisztian
     

05 Oct, 2011

2 commits

  • lost_skb_hint is used by tcp_mark_head_lost() to mark the first unhandled skb.
    lost_cnt_hint is the number of packets or sacked packets before the lost_skb_hint;
    When shifting a skb that is before the lost_skb_hint, if tcp_is_fack() is ture,
    the skb has already been counted in the lost_cnt_hint; if tcp_is_fack() is false,
    tcp_sacktag_one() will increase the lost_cnt_hint. So tcp_shifted_skb() does not
    need to adjust the lost_cnt_hint by itself. When shifting a skb that is equal to
    lost_skb_hint, the shifted packets will not be counted by tcp_mark_head_lost().
    So tcp_shifted_skb() should adjust the lost_cnt_hint even tcp_is_fack(tp) is true.

    Signed-off-by: Zheng Yan
    Signed-off-by: David S. Miller

    Yan, Zheng
     
  • tcp_v4_clear_md5_list() assumes that multiple tcp md5sig peers
    only hold one reference to md5sig_pool. but tcp_v4_md5_do_add()
    increases use count of md5sig_pool for each peer. This patch
    makes tcp_v4_md5_do_add() only increases use count for the first
    tcp md5sig peer.

    Signed-off-by: Zheng Yan
    Signed-off-by: David S. Miller

    Yan, Zheng
     

19 Sep, 2011

1 commit

  • D-SACK is allowed to reside below snd_una. But the corresponding check
    in tcp_is_sackblock_valid() is the exact opposite. It looks like a typo.

    Signed-off-by: Zheng Yan
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Zheng Yan
     

17 Sep, 2011

1 commit


16 Sep, 2011

2 commits

  • David S. Miller
     
  • "Possible SYN flooding on port xxxx " messages can fill logs on servers.

    Change logic to log the message only once per listener, and add two new
    SNMP counters to track :

    TCPReqQFullDoCookies : number of times a SYNCOOKIE was replied to client

    TCPReqQFullDrop : number of times a SYN request was dropped because
    syncookies were not enabled.

    Based on a prior patch from Tom Herbert, and suggestions from David.

    Signed-off-by: Eric Dumazet
    CC: Tom Herbert
    Signed-off-by: David S. Miller

    Eric Dumazet
     

31 Aug, 2011

2 commits


30 Aug, 2011

1 commit

  • A userspace listener may send (bogus) NF_STOLEN verdict, which causes skb leak.

    This problem was previously fixed via
    64507fdbc29c3a622180378210ecea8659b14e40 (netfilter:
    nf_queue: fix NF_STOLEN skb leak) but this had to be reverted because
    NF_STOLEN can also be returned by a netfilter hook when iterating the
    rules in nf_reinject.

    Reject userspace NF_STOLEN verdict, as suggested by Michal Miroslaw.

    This is complementary to commit fad54440438a7c231a6ae347738423cbabc936d9
    (netfilter: avoid double free in nf_reinject).

    Cc: Julian Anastasov
    Cc: Eric Dumazet
    Signed-off-by: Florian Westphal
    Signed-off-by: Patrick McHardy

    Florian Westphal
     

25 Aug, 2011

1 commit


11 Aug, 2011

2 commits

  • As rt_iif represents input device even for packets
    coming from loopback with output route, it is not an unique
    key specific to input routes. Now rt_route_iif has such role,
    it was fl.iif in 2.6.38, so better to change the checks at
    some places to save CPU cycles and to restore 2.6.38 semantics.

    compare_keys:
    - input routes: only rt_route_iif matters, rt_iif is same
    - output routes: only rt_oif matters, rt_iif is not
    used for matching in __ip_route_output_key
    - now we are back to 2.6.38 state

    ip_route_input_common:
    - matching rt_route_iif implies input route
    - compared to 2.6.38 we eliminated one rth->fl.oif check
    because it was not needed even for 2.6.38

    compare_hash_inputs:
    Only the change here is not an optimization, it has
    effect only for output routes. I assume I'm restoring
    the original intention to ignore oif, it was using fl.iif
    - now we are back to 2.6.38 state

    Signed-off-by: Julian Anastasov
    Signed-off-by: David S. Miller

    Julian Anastasov
     
  • Using a gcc 4.4.3, warnings are emitted for a possibly uninitialized use
    of ecn_ok.

    This can happen if cookie_check_timestamp() returns due to not having
    seen a timestamp. Defaulting to ecn off seems like a reasonable thing
    to do in this case, so initialized ecn_ok to false.

    Signed-off-by: Mike Waychison
    Signed-off-by: David S. Miller

    Mike Waychison
     

08 Aug, 2011

5 commits

  • Make sure skb dst has reference when moving to
    another context. Currently, I don't see protocols that can
    hit it when sending broadcasts/multicasts to loopback using
    noref dsts, so it is just a precaution.

    Signed-off-by: Julian Anastasov
    Signed-off-by: David S. Miller

    Julian Anastasov
     
  • The raw sockets can provide source address for
    routing but their privileges are not considered. We
    can provide non-local source address, make sure the
    FLOWI_FLAG_ANYSRC flag is set if socket has privileges
    for this, i.e. based on hdrincl (IP_HDRINCL) and
    transparent flags.

    Signed-off-by: Julian Anastasov
    Signed-off-by: David S. Miller

    Julian Anastasov
     
  • TCP in some cases uses different global (raw) socket
    to send RST and ACK. The transparent flag is not set there.
    Currently, it is a problem for rerouting after the previous
    change.

    Fix it by simplifying the checks in ip_route_me_harder
    and use FLOWI_FLAG_ANYSRC even for sockets. It looks safe
    because the initial routing allowed this source address to
    be used and now we just have to make sure the packet is rerouted.

    As a side effect this also allows rerouting for normal
    raw sockets that use spoofed source addresses which was not possible
    even before we eliminated the ip_route_input call.

    Signed-off-by: Julian Anastasov
    Signed-off-by: David S. Miller

    Julian Anastasov
     
  • IP_PKTOPTIONS is broken for 32-bit applications running
    in COMPAT mode on 64-bit kernels.

    This happens because msghdr's msg_flags field is always
    set to zero. When running in COMPAT mode this should be
    set to MSG_CMSG_COMPAT instead.

    Signed-off-by: Tiberiu Szocs-Mihai
    Signed-off-by: Daniel Baluta
    Signed-off-by: David S. Miller

    Daniel Baluta
     
  • compare_keys and ip_route_input_common rely on
    rt_oif for distinguishing of input and output routes
    with same keys values. But sometimes the input route has
    also same hash chain (keyed by iif != 0) with the output
    routes (keyed by orig_oif=0). Problem visible if running
    with small number of rhash_entries.

    Fix them to use rt_route_iif instead. By this way
    input route can not be returned to users that request
    output route.

    The patch fixes the ip_rt_bug errors that were
    reported in ip_local_out context, mostly for 255.255.255.255
    destinations.

    Signed-off-by: Julian Anastasov
    Signed-off-by: David S. Miller

    Julian Anastasov
     

07 Aug, 2011

1 commit

  • Computers have become a lot faster since we compromised on the
    partial MD4 hash which we use currently for performance reasons.

    MD5 is a much safer choice, and is inline with both RFC1948 and
    other ISS generators (OpenBSD, Solaris, etc.)

    Furthermore, only having 24-bits of the sequence number be truly
    unpredictable is a very serious limitation. So the periodic
    regeneration and 8-bit counter have been removed. We compute and
    use a full 32-bit sequence number.

    For ipv6, DCCP was found to use a 32-bit truncated initial sequence
    number (it needs 43-bits) and that is fixed here as well.

    Reported-by: Dan Kaminsky
    Tested-by: Willy Tarreau
    Signed-off-by: David S. Miller

    David S. Miller
     

03 Aug, 2011

1 commit

  • Gergely Kalman reported crashes in check_peer_redir().

    It appears commit f39925dbde778 (ipv4: Cache learned redirect
    information in inetpeer.) added a race, leading to possible NULL ptr
    dereference.

    Since we can now change dst neighbour, we should make sure a reader can
    safely use a neighbour.

    Add RCU protection to dst neighbour, and make sure check_peer_redir()
    can be called safely by different cpus in parallel.

    As neighbours are already freed after one RCU grace period, this patch
    should not add typical RCU penalty (cache cold effects)

    Many thanks to Gergely for providing a pretty report pointing to the
    bug.

    Reported-by: Gergely Kalman
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

01 Aug, 2011

1 commit

  • Convert array index from the loop bound to the loop index.

    A simplified version of the semantic patch that fixes this problem is as
    follows: (http://coccinelle.lip6.fr/)

    //
    @@
    expression e1,e2,ar;
    @@

    for(e1 = 0; e1 < e2; e1++) { }
    //

    Signed-off-by: Julia Lawall
    Signed-off-by: David S. Miller

    Julia Lawall
     

29 Jul, 2011

1 commit

  • ipq_build_packet_message() in net/ipv4/netfilter/ip_queue.c and
    net/ipv6/netfilter/ip6_queue.c contain a small potential mem leak as
    far as I can tell.

    We allocate memory for 'skb' with alloc_skb() annd then call
    nlh = NLMSG_PUT(skb, 0, 0, IPQM_PACKET, size - sizeof(*nlh));

    NLMSG_PUT is a macro
    NLMSG_PUT(skb, pid, seq, type, len) \
    NLMSG_NEW(skb, pid, seq, type, len, 0)

    that expands to NLMSG_NEW, which is also a macro which expands to:
    NLMSG_NEW(skb, pid, seq, type, len, flags) \
    ({ if (unlikely(skb_tailroom(skb) < (int)NLMSG_SPACE(len))) \
    goto nlmsg_failure; \
    __nlmsg_put(skb, pid, seq, type, len, flags); })

    If we take the true branch of the 'if' statement and 'goto
    nlmsg_failure', then we'll, at that point, return from
    ipq_build_packet_message() without having assigned 'skb' to anything
    and we'll leak the memory we allocated for it when it goes out of
    scope.

    Fix this by placing a 'kfree(skb)' at 'nlmsg_failure'.

    I admit that I do not know how likely this to actually happen or even
    if there's something that guarantees that it will never happen - I'm
    not that familiar with this code, but if that is so, I've not been
    able to spot it.

    Signed-off-by: Jesper Juhl
    Signed-off-by: Patrick McHardy

    Jesper Juhl
     

28 Jul, 2011

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (32 commits)
    tg3: Remove 5719 jumbo frames and TSO blocks
    tg3: Break larger frags into 4k chunks for 5719
    tg3: Add tx BD budgeting code
    tg3: Consolidate code that calls tg3_tx_set_bd()
    tg3: Add partial fragment unmapping code
    tg3: Generalize tg3_skb_error_unmap()
    tg3: Remove short DMA check for 1st fragment
    tg3: Simplify tx bd assignments
    tg3: Reintroduce tg3_tx_ring_info
    ASIX: Use only 11 bits of header for data size
    ASIX: Simplify condition in rx_fixup()
    Fix cdc-phonet build
    bonding: reduce noise during init
    bonding: fix string comparison errors
    net: Audit drivers to identify those needing IFF_TX_SKB_SHARING cleared
    net: add IFF_SKB_TX_SHARED flag to priv_flags
    net: sock_sendmsg_nosec() is static
    forcedeth: fix vlans
    gianfar: fix bug caused by 87c288c6e9aa31720b72e2bc2d665e24e1653c3e
    gro: Only reset frag0 when skb can be pulled
    ...

    Linus Torvalds
     

27 Jul, 2011

1 commit

  • This allows us to move duplicated code in
    (atomic_inc_not_zero() for now) to

    Signed-off-by: Arun Sharma
    Reviewed-by: Eric Dumazet
    Cc: Ingo Molnar
    Cc: David Miller
    Cc: Eric Dumazet
    Acked-by: Mike Frysinger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arun Sharma
     

26 Jul, 2011

1 commit


24 Jul, 2011

2 commits


22 Jul, 2011

6 commits


19 Jul, 2011

1 commit

  • Compiler is not smart enough to avoid double BSWAP instructions in
    ntohl(inet_make_mask(plen)).

    Lets cache this value in struct leaf_info, (fill a hole on 64bit arches)

    With route cache disabled, this saves ~2% of cpu in udpflood bench on
    x86_64 machine.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

18 Jul, 2011

3 commits


17 Jul, 2011

3 commits