18 Jun, 2006

10 commits


13 Jun, 2006

1 commit


12 Jun, 2006

1 commit

  • From: Aki M Nyrhinen

    IMHO the current fix to the problem (in_flight underflow in reno)
    is incorrect. it treats the symptons but ignores the problem. the
    problem is timing out packets other than the head packet when we
    don't have sack. i try to explain (sorry if explaining the obvious).

    with sack, scanning the retransmit queue for timed out packets is
    fine because we know which packets in our retransmit queue have been
    acked by the receiver.

    without sack, we know only how many packets in our retransmit queue the
    receiver has acknowledged, but no idea which packets.

    think of a "typical" slow-start overshoot case, where for example
    every third packet in a window get lost because a router buffer gets
    full.

    with sack, we check for timeouts on those every third packet (as the
    rest have been sacked). the packet counting works out and if there
    is no reordering, we'll retransmit exactly the packets that were
    lost.

    without sack, however, we check for timeout on every packet and end up
    retransmitting consecutive packets in the retransmit queue. in our
    slow-start example, 2/3 of those retransmissions are unnecessary. these
    unnecessary retransmissions eat the congestion window and evetually
    prevent fast recovery from continuing, if enough packets were lost.

    Signed-off-by: David S. Miller

    Aki M Nyrhinen
     

06 Jun, 2006

1 commit

  • Trimming the head of an skb by calling skb_pull can cause the packet
    to become unaligned if the length pulled is odd. Since the length is
    entirely arbitrary for a FIN packet carrying data, this is actually
    quite common.

    Unaligned data is not the end of the world, but we should avoid it if
    it's easily done. In this case it is trivial. Since we're discarding
    all of the head data it doesn't matter whether we move skb->data forward
    or back.

    However, it is still possible to have unaligned skb->data in general.
    So network drivers should be prepared to handle it instead of crashing.

    This patch also adds an unlikely marking on len < headlen since partial
    ACKs on head data are extremely rare in the wild. As the return value
    of __pskb_trim_head is no longer ever NULL that has been removed.

    Signed-off-by: Herbert Xu ~{PmV>HI~}
    Signed-off-by: David S. Miller

    Herbert Xu ~{PmVHI~}
     

03 Jun, 2006

1 commit


29 May, 2006

3 commits


24 May, 2006

3 commits

  • If kmalloc fails, error path leaks data allocated from asn1_oid_decode().

    Signed-off-by: Chris Wright
    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Chris Wright
     
  • When parsing unknown sequence extensions the "son"-pointer points behind
    the last known extension for this type, don't try to interpret it.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • The condition "> H323_ERROR_STOP" can never be true since H323_ERROR_STOP
    is positive and is the highest possible return code, while real errors are
    negative, fix the checks. Also only abort on real errors in some spots
    that were just interpreting any return value != 0 as error.

    Fixes crashes caused by use of stale data after a parsing error occured:

    BUG: unable to handle kernel paging request at virtual address bfffffff
    printing eip:
    c01aa0f8
    *pde = 1a801067
    *pte = 00000000
    Oops: 0000 [#1]
    PREEMPT
    Modules linked in: ip_nat_h323 ip_conntrack_h323 nfsd exportfs sch_sfq sch_red cls_fw sch_hfsc xt_length ipt_owner xt_MARK iptable_mangle nfs lockd sunrpc pppoe pppoxx
    CPU: 0
    EIP: 0060:[] Not tainted VLI
    EFLAGS: 00210646 (2.6.17-rc4 #8)
    EIP is at memmove+0x19/0x22
    eax: d77264e9 ebx: d77264e9 ecx: e88d9b17 edx: d77264e9
    esi: bfffffff edi: bfffffff ebp: de6a7680 esp: c0349db8
    ds: 007b es: 007b ss: 0068
    Process asterisk (pid: 3765, threadinfo=c0349000 task=da068540)
    Stack: 00000006 c0349e5e d77264e3 e09a2b4e e09a38a0 d7726052 d7726124 00000491
    00000006 00000006 00000006 00000491 de6a7680 d772601e d7726032 c0349f74
    e09a2dc2 00000006 c0349e5e 00000006 00000000 d76dda28 00000491 c0349f74
    Call Trace:
    [] mangle_contents+0x62/0xfe [ip_nat]
    [] ip_nat_mangle_tcp_packet+0xa1/0x191 [ip_nat]
    [] set_addr+0x74/0x14c [ip_nat_h323]
    [] process_setup+0x11b/0x29e [ip_conntrack_h323]
    [] process_setup+0x14c/0x29e [ip_conntrack_h323]
    [] process_q931+0x3c/0x142 [ip_conntrack_h323]
    [] q931_help+0xe0/0x144 [ip_conntrack_h323]
    ...

    Found by the PROTOS c07-h2250v4 testsuite.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     

23 May, 2006

2 commits

  • Fix memory corruption caused by snmp_trap_decode:

    - When snmp_trap_decode fails before the id and address are allocated,
    the pointers contain random memory, but are freed by the caller
    (snmp_parse_mangle).

    - When snmp_trap_decode fails after allocating just the ID, it tries
    to free both address and ID, but the address pointer still contains
    random memory. The caller frees both ID and random memory again.

    - When snmp_trap_decode fails after allocating both, it frees both,
    and the callers frees both again.

    The corruption can be triggered remotely when the ip_nat_snmp_basic
    module is loaded and traffic on port 161 or 162 is NATed.

    Found by multiple testcases of the trap-app and trap-enc groups of the
    PROTOS c06-snmpv1 testsuite.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • Signed-off-by: Alexey Dobriyan
    Signed-off-by: David S. Miller

    Alexey Dobriyan
     

19 May, 2006

4 commits

  • Solar Designer found a race condition in do_add_counters(). The beginning
    of paddc is supposed to be the same as tmp which was sanity-checked
    above, but it might not be the same in reality. In case the integer
    overflow and/or the race condition are triggered, paddc->num_counters
    might not match the allocation size for paddc. If the check below
    (t->private->number != paddc->num_counters) nevertheless passes (perhaps
    this requires the race condition to be triggered), IPT_ENTRY_ITERATE()
    would read kernel memory beyond the allocation size, potentially causing
    an oops or leaking sensitive data (e.g., passwords from host system or
    from another VPS) via counter increments. This requires CAP_NET_ADMIN.

    Signed-off-by: Solar Designer
    Signed-off-by: Kirill Korotaev
    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Solar Designer
     
  • GRE keys are 16 bit.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Alexey Dobriyan
     
  • The prefix argument for nf_log_packet is a format specifier,
    so don't pass the user defined string directly to it.

    Signed-off-by: Philip Craig
    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Philip Craig
     
  • The Coverity checker spotted that we may leak 'hold' in
    net/ipv4/netfilter/ipt_recent.c::checkentry() when the following
    is true:
    if (!curr_table->status_proc) {
    ...
    if(!curr_table) {
    ...
    return 0;
    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Jesper Juhl
     

17 May, 2006

1 commit

  • From: "Angelo P. Castellani"

    Using NewReno, if a sk_buff is timed out and is accounted as lost_out,
    it should also be removed from the sacked_out.

    This is necessary because recovery using NewReno fast retransmit could
    take up to a lot RTTs and the sk_buff RTO can expire without actually
    being really lost.

    left_out = sacked_out + lost_out
    in_flight = packets_out - left_out + retrans_out

    Using NewReno without this patch, on very large network losses,
    left_out becames bigger than packets_out + retrans_out (!!).

    For this reason unsigned integer in_flight overflows to 2^32 - something.

    Signed-off-by: David S. Miller

    Angelo P. Castellani
     

10 May, 2006

1 commit


07 May, 2006

1 commit

  • This is another result from my likely profiling tool
    (dwalker@mvista.com just sent the patch of the profiling tool to
    linux-kernel mailing list, which is similar to what I use).

    On my system (not very busy, normal development machine within a
    VMWare workstation), I see a 6/5 miss/hit ratio for this "likely".

    Signed-off-by: Hua Zhong
    Signed-off-by: David S. Miller

    Hua Zhong
     

06 May, 2006

1 commit

  • Xiaoliang (David) Wei wrote:
    > Hi gurus,
    >
    > I am reading the code of tcp_highspeed.c in the kernel and have a
    > question on the hstcp_cong_avoid function, specifically the following
    > AI part (line 136~143 in net/ipv4/tcp_highspeed.c ):
    >
    > /* Do additive increase */
    > if (tp->snd_cwnd < tp->snd_cwnd_clamp) {
    > tp->snd_cwnd_cnt += ca->ai;
    > if (tp->snd_cwnd_cnt >= tp->snd_cwnd) {
    > tp->snd_cwnd++;
    > tp->snd_cwnd_cnt -= tp->snd_cwnd;
    > }
    > }
    >
    > In this part, when (tp->snd_cwnd_cnt == tp->snd_cwnd),
    > snd_cwnd_cnt will be -1... snd_cwnd_cnt is defined as u16, will this
    > small chance of getting -1 becomes a problem?
    > Shall we change it by reversing the order of the cwnd++ and cwnd_cnt -=
    > cwnd?

    Absolutely correct. Thanks.

    Signed-off-by: John Heffner
    Signed-off-by: David S. Miller

    John Heffner
     

04 May, 2006

6 commits

  • Calling sock_orphan inside bh_lock_sock in tcp_close can lead to dead
    locks. For example, the inet_diag code holds sk_callback_lock without
    disabling BH. If an inbound packet arrives during that admittedly tiny
    window, it will cause a dead lock on bh_lock_sock. Another possible
    path would be through sock_wfree if the network device driver frees the
    tx skb in process context with BH enabled.

    We can fix this by moving sock_orphan out of bh_lock_sock.

    The tricky bit is to work out when we need to destroy the socket
    ourselves and when it has already been destroyed by someone else.

    By moving sock_orphan before the release_sock we can solve this
    problem. This is because as long as we own the socket lock its
    state cannot change.

    So we simply record the socket state before the release_sock
    and then check the state again after we regain the socket lock.
    If the socket state has transitioned to TCP_CLOSE in the time being,
    we know that the socket has been destroyed. Otherwise the socket is
    still ours to keep.

    Note that I've also moved the increment on the orphan count forward.
    This may look like a problem as we're increasing it even if the socket
    is just about to be destroyed where it'll be decreased again. However,
    this simply enlarges a window that already exists. This also changes
    the orphan count test by one.

    Considering what the orphan count is meant to do this is no big deal.

    This problem was discoverd by Ingo Molnar using his lock validator.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Noticed by Linus Torvalds

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • Signed-off-by: Jing Min Zhao
    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Jing Min Zhao
     
  • net/ipv4/netfilter/ip_nat_standalone.c: In function 'ip_nat_out':
    net/ipv4/netfilter/ip_nat_standalone.c:223: warning: unused variable 'ctinfo'
    net/ipv4/netfilter/ip_nat_standalone.c:222: warning: unused variable 'ct'

    Surprisingly no complaints so far ..

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • When a Choice element contains an unsupported choice no error is returned
    and parsing continues normally, but the choice value is not set and
    contains data from the last parsed message. This may in turn lead to
    parsing of more stale data and following crashes.

    Fixes a crash triggered by testcase 0003243 from the PROTOS c07-h2250v4
    testsuite following random other testcases:

    CPU: 0
    EIP: 0060:[] Not tainted VLI
    EFLAGS: 00210646 (2.6.17-rc2 #3)
    EIP is at memmove+0x19/0x22
    eax: d7be0307 ebx: d7be0307 ecx: e841fcf9 edx: d7be0307
    esi: bfffffff edi: bfffffff ebp: da5eb980 esp: c0347e2c
    ds: 007b es: 007b ss: 0068
    Process events/0 (pid: 4, threadinfo=c0347000 task=dff86a90)
    Stack: 00000006 c0347ea6 d7be0301 e09a6b2c 00000006 da5eb980 d7be003e d7be0052
    c0347f6c e09a6d9c 00000006 c0347ea6 00000006 00000000 d7b9a548 00000000
    c0347f6c d7b9a548 00000004 e0a1a119 0000028f 00000006 c0347ea6 00000006
    Call Trace:
    [] mangle_contents+0x40/0xd8 [ip_nat]
    [] ip_nat_mangle_tcp_packet+0xa1/0x191 [ip_nat]
    [] set_addr+0x60/0x14d [ip_nat_h323]
    [] q931_help+0x2da/0x71a [ip_conntrack_h323]
    [] q931_help+0x30c/0x71a [ip_conntrack_h323]
    [] ip_conntrack_help+0x22/0x2f [ip_conntrack]
    [] nf_iterate+0x2e/0x5f
    [] xfrm4_output_finish+0x0/0x39f
    [] nf_hook_slow+0x42/0xb0
    [] xfrm4_output_finish+0x0/0x39f
    [] xfrm4_output+0x3c/0x4e
    [] xfrm4_output_finish+0x0/0x39f
    [] ip_forward+0x1c2/0x1fa
    [] ip_rcv+0x388/0x3b5
    [] netif_receive_skb+0x2bc/0x2ec
    [] process_backlog+0x6b/0xd0
    [] net_rx_action+0x4b/0xb7
    [] __do_softirq+0x35/0x7d
    [] do_softirq+0x38/0x3f

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • When the TPKT len included in the packet is below the lowest valid value
    of 4 an underflow occurs which results in an endless loop.

    Found by testcase 0000058 from the PROTOS c07-h2250v4 testsuite.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     

03 May, 2006

1 commit

  • fix infinite loop in the SCTP-netfilter code: check SCTP chunk size to
    guarantee progress of for_each_sctp_chunk(). (all other uses of
    for_each_sctp_chunk() are preceded by do_basic_checks(), so this fix
    should be complete.)

    Based on patch from Ingo Molnar

    CVE-2006-1527

    Signed-off-by: Patrick McHardy
    Signed-off-by: Linus Torvalds

    Patrick McHardy
     

02 May, 2006

1 commit

  • When iptables userspace adds an ipt_standard_target, it calculates the size
    of the entire entry as:

    sizeof(struct ipt_entry) + XT_ALIGN(sizeof(struct ipt_standard_target))

    ipt_standard_target looks like this:

    struct xt_standard_target
    {
    struct xt_entry_target target;
    int verdict;
    };

    xt_entry_target contains a pointer, so when compiled for 64 bit the
    structure gets an extra 4 byte of padding at the end. On 32 bit
    architectures where iptables aligns to 8 byte it will also have 4
    byte padding at the end because it is only 36 bytes large.

    The compat_ipt_standard_fn in the kernel adjusts the offsets by

    sizeof(struct ipt_standard_target) - sizeof(struct compat_ipt_standard_target),

    which will always result in 4, even if the structure from userspace
    was already padded to a multiple of 8. On x86 this works out by
    accident because userspace only aligns to 4, on all other
    architectures this is broken and causes incorrect adjustments to
    the size and following offsets.

    Thanks to Linus for lots of debugging help and testing.

    Signed-off-by: Patrick McHardy
    Signed-off-by: Linus Torvalds

    Patrick McHardy
     

30 Apr, 2006

2 commits

  • The following unlikely should be replaced by likely because the
    condition happens every time unless there is a hard error to transmit
    a packet.

    Signed-off-by: Hua Zhong
    Signed-off-by: David S. Miller

    Hua Zhong
     
  • I was looking through the xfrm input/output code in order to abstract
    out the address family specific encapsulation/decapsulation code. During
    that process I found this bug in the IP ID selection code in xfrm4_output.c.

    At that point dst is still the xfrm_dst for the current SA which
    represents an internal flow as far as the IPsec tunnel is concerned.
    Since the IP ID is going to sit on the outside of the encapsulated
    packet, we obviously want the external flow which is just dst->child.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu