02 Sep, 2010

1 commit


28 Aug, 2010

1 commit

  • The string clone is only used as a temporary copy of the argument val
    within the while loop, and so it should be freed before leaving the
    function. The call to strsep, however, modifies clone, so a pointer to the
    front of the string is kept in saved_clone, to make it possible to free it.

    The sematic match that finds this problem is as follows:
    (http://coccinelle.lip6.fr/)

    //
    @r exists@
    local idexpression x;
    expression E;
    identifier l;
    statement S;
    @@

    *x= \(kasprintf\|kstrdup\)(...);
    ...
    if (x == NULL) S
    ... when != kfree(x)
    when != E = x
    if (...) {

    * return ...;
    }
    //

    Signed-off-by: Julia Lawall
    Signed-off-by: David S. Miller

    Julia Lawall
     

26 Aug, 2010

2 commits

  • This issue come from ruby language community. Below test program
    hang up when only run on Linux.

    % uname -mrsv
    Linux 2.6.26-2-486 #1 Sat Dec 26 08:37:39 UTC 2009 i686
    % ruby -rsocket -ve '
    BasicSocket.do_not_reverse_lookup = true
    serv = TCPServer.open("127.0.0.1", 0)
    s1 = TCPSocket.open("127.0.0.1", serv.addr[1])
    s2 = serv.accept
    s2.close
    s1.write("a") rescue p $!
    s1.write("a") rescue p $!
    Thread.new {
    s1.write("a")
    }.join'
    ruby 1.9.3dev (2010-07-06 trunk 28554) [i686-linux]
    #
    [Hang Here]

    FreeBSD, Solaris, Mac doesn't. because Ruby's write() method call
    select() internally. and tcp_poll has a bug.

    SUS defined 'ready for writing' of select() as following.

    | A descriptor shall be considered ready for writing when a call to an output
    | function with O_NONBLOCK clear would not block, whether or not the function
    | would transfer data successfully.

    That said, EPIPE situation is clearly one of 'ready for writing'.

    We don't have read-side issue because tcp_poll() already has read side
    shutdown care.

    | if (sk->sk_shutdown & RCV_SHUTDOWN)
    | mask |= POLLIN | POLLRDNORM | POLLRDHUP;

    So, Let's insert same logic in write side.

    - reference url
    http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/31065
    http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/31068

    Signed-off-by: KOSAKI Motohiro
    Signed-off-by: David S. Miller

    KOSAKI Motohiro
     
  • As discovered by Anton Blanchard, current code to autotune
    tcp_death_row.sysctl_max_tw_buckets, sysctl_tcp_max_orphans and
    sysctl_max_syn_backlog makes little sense.

    The bigger a page is, the less tcp_max_orphans is : 4096 on a 512GB
    machine in Anton's case.

    (tcp_hashinfo.bhash_size * sizeof(struct inet_bind_hashbucket))
    is much bigger if spinlock debugging is on. Its wrong to select bigger
    limits in this case (where kernel structures are also bigger)

    bhash_size max is 65536, and we get this value even for small machines.

    A better ground is to use size of ehash table, this also makes code
    shorter and more obvious.

    Based on a patch from Anton, and another from David.

    Reported-and-tested-by: Anton Blanchard
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

25 Aug, 2010

1 commit

  • As reported by Anton Blanchard when we use
    percpu_counter_read_positive() to make our orphan socket limit checks,
    the check can be off by up to num_cpus_online() * batch (which is 32
    by default) which on a 128 cpu machine can be as large as the default
    orphan limit itself.

    Fix this by doing the full expensive sum check if the optimized check
    triggers.

    Reported-by: Anton Blanchard
    Signed-off-by: David S. Miller
    Acked-by: Eric Dumazet

    David S. Miller
     

24 Aug, 2010

1 commit

  • commit f3c5c1bfd430858d3a05436f82c51e53104feb6b
    (netfilter: xtables: make ip_tables reentrant) forgot to
    also compute the jumpstack size in the compat handlers.

    Result is that "iptables -I INPUT -j userchain" turns into -j DROP.

    Reported by Sebastian Roesner on #netfilter, closes
    http://bugzilla.netfilter.org/show_bug.cgi?id=669.

    Note: arptables change is compile-tested only.

    Signed-off-by: Florian Westphal
    Acked-by: Eric Dumazet
    Tested-by: Mikael Pettersson
    Signed-off-by: David S. Miller

    Florian Westphal
     

18 Aug, 2010

1 commit

  • After commit 24b36f019 (netfilter: {ip,ip6,arp}_tables: dont block
    bottom half more than necessary), lockdep can raise a warning
    because we attempt to lock a spinlock with BH enabled, while
    the same lock is usually locked by another cpu in a softirq context.

    Disable again BH to avoid these lockdep warnings.

    Reported-by: Linus Torvalds
    Diagnosed-by: David S. Miller
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

08 Aug, 2010

1 commit


03 Aug, 2010

4 commits


02 Aug, 2010

4 commits


31 Jul, 2010

1 commit

  • There is a bug in do_tcp_setsockopt(net/ipv4/tcp.c),
    TCP_COOKIE_TRANSACTIONS case.
    In some cases (when tp->cookie_values == NULL) new tcp_cookie_values
    structure can be allocated (at cvp), but not bound to
    tp->cookie_values. So a memory leak occurs.

    Signed-off-by: Dmitry Popov
    Signed-off-by: David S. Miller

    Dmitry Popov
     

23 Jul, 2010

4 commits


22 Jul, 2010

1 commit


21 Jul, 2010

1 commit


20 Jul, 2010

1 commit

  • It can happen that there are no packets in queue while calling
    tcp_xmit_retransmit_queue(). tcp_write_queue_head() then returns
    NULL and that gets deref'ed to get sacked into a local var.

    There is no work to do if no packets are outstanding so we just
    exit early.

    This oops was introduced by 08ebd1721ab8fd (tcp: remove tp->lost_out
    guard to make joining diff nicer).

    Signed-off-by: Ilpo Järvinen
    Reported-by: Lennart Schulte
    Tested-by: Lennart Schulte
    Signed-off-by: David S. Miller

    Ilpo Järvinen
     

16 Jul, 2010

1 commit

  • This was detected using two mcast router tables. The
    pimreg for the second interface did not have a specific
    mrule, so packets received by it were handled by the
    default table, which had nothing configured.

    This caused the ipmr_fib_lookup to fail, causing
    the memory leak.

    Signed-off-by: Ben Greear
    Signed-off-by: David S. Miller

    Ben Greear
     

15 Jul, 2010

1 commit


13 Jul, 2010

2 commits

  • a new boolean flag no_autobind is added to structure proto to avoid the autobind
    calls when the protocol is TCP. Then sock_rps_record_flow() is called int the
    TCP's sendmsg() and sendpage() pathes.

    Signed-off-by: Changli Gao
    ----
    include/net/inet_common.h | 4 ++++
    include/net/sock.h | 1 +
    include/net/tcp.h | 8 ++++----
    net/ipv4/af_inet.c | 15 +++++++++------
    net/ipv4/tcp.c | 11 +++++------
    net/ipv4/tcp_ipv4.c | 3 +++
    net/ipv6/af_inet6.c | 8 ++++----
    net/ipv6/tcp_ipv6.c | 3 +++
    8 files changed, 33 insertions(+), 20 deletions(-)
    Signed-off-by: David S. Miller

    Changli Gao
     
  • CodingStyle cleanups

    EXPORT_SYMBOL should immediately follow the symbol declaration.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

09 Jul, 2010

1 commit

  • This patch makes IPV6 over IPv4 GRE tunnel propagate the transport
    class field from the underlying IPV6 header to the IPV4 Type Of Service
    field. Without the patch, all IPV6 packets in tunnel look the same to QoS.

    This assumes that IPV6 transport class is exactly the same
    as IPv4 TOS. Not sure if that is always the case? Maybe need
    to mask off some bits.

    The mask and shift to get tclass is copied from ipv6/datagram.c

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     

08 Jul, 2010

2 commits


06 Jul, 2010

1 commit


05 Jul, 2010

3 commits

  • We can avoid a pair of atomic ops in ipt_REJECT send_reset()

    Signed-off-by: Eric Dumazet
    Signed-off-by: Patrick McHardy

    Eric Dumazet
     
  • postpone the checksum calculation, then if the output NIC supports checksum
    offloading, we can utlize it. And though the output NIC doesn't support
    checksum offloading, but we'll mangle this packet, this can free us from
    updating the checksum, as the checksum calculation occurs later.

    Signed-off-by: Changli Gao
    Signed-off-by: Patrick McHardy

    Changli Gao
     
  • While using xfrm by MARK feature in
    2.6.34 - 2.6.35 kernels, the mark
    is always cleared in flowi structure via memset in
    _decode_session4 (net/ipv4/xfrm4_policy.c), so
    the policy lookup fails.
    IPv6 code is affected by this bug too.

    Signed-off-by: Peter Kosyh
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Peter Kosyh
     

03 Jul, 2010

1 commit


01 Jul, 2010

2 commits

  • add fast path for in-order fragments

    As the fragments are sent in order in most of OSes, such as Windows, Darwin and
    FreeBSD, it is likely the new fragments are at the end of the inet_frag_queue.
    In the fast path, we check if the skb at the end of the inet_frag_queue is the
    prev we expect.

    Signed-off-by: Changli Gao
    ----
    include/net/inet_frag.h | 1 +
    net/ipv4/ip_fragment.c | 12 ++++++++++++
    net/ipv6/reassembly.c | 11 +++++++++++
    3 files changed, 24 insertions(+)
    Signed-off-by: David S. Miller

    Changli Gao
     
  • /proc/net/snmp and /proc/net/netstat expose SNMP counters.

    Width of these counters is either 32 or 64 bits, depending on the size
    of "unsigned long" in kernel.

    This means user program parsing these files must already be prepared to
    deal with 64bit values, regardless of user program being 32 or 64 bit.

    This patch introduces 64bit snmp values for IPSTAT mib, where some
    counters can wrap pretty fast if they are 32bit wide.

    # netstat -s|egrep "InOctets|OutOctets"
    InOctets: 244068329096
    OutOctets: 244069348848

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

29 Jun, 2010

2 commits