Eric Lee / smarc-fsl-linux-kernel

02 Sep, 2010

1 commit

750e9fad8 ipv4: minor fix about RPF in help of Kconfig ... Browse Code »

Signed-off-by: Nicolas Dichtel
Signed-off-by: David S. Miller

Nicolas Dichtel
2010-09-02 05:29:36 +0800

28 Aug, 2010

1 commit

c34186ed0 net/ipv4: Eliminate kstrdup memory leak ... Browse Code »

The string clone is only used as a temporary copy of the argument val
within the while loop, and so it should be freed before leaving the
function. The call to strsep, however, modifies clone, so a pointer to the
front of the string is kept in saved_clone, to make it possible to free it.

The sematic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)

//
@r exists@
local idexpression x;
expression E;
identifier l;
statement S;
@@

*x= $kasprintf\|kstrdup$(...);
...
if (x == NULL) S
... when != kfree(x)
when != E = x
if (...) {

* return ...;
}
//

Signed-off-by: Julia Lawall
Signed-off-by: David S. Miller

Julia Lawall
2010-08-28 10:31:56 +0800

26 Aug, 2010

2 commits

d84ba638e tcp: select(writefds) don't hang up when a peer close connection ... Browse Code »

This issue come from ruby language community. Below test program
hang up when only run on Linux.

% uname -mrsv
Linux 2.6.26-2-486 #1 Sat Dec 26 08:37:39 UTC 2009 i686
% ruby -rsocket -ve '
BasicSocket.do_not_reverse_lookup = true
serv = TCPServer.open("127.0.0.1", 0)
s1 = TCPSocket.open("127.0.0.1", serv.addr[1])
s2 = serv.accept
s2.close
s1.write("a") rescue p $!
s1.write("a") rescue p $!
Thread.new {
s1.write("a")
}.join'
ruby 1.9.3dev (2010-07-06 trunk 28554) [i686-linux]
#
[Hang Here]

FreeBSD, Solaris, Mac doesn't. because Ruby's write() method call
select() internally. and tcp_poll has a bug.

SUS defined 'ready for writing' of select() as following.

| A descriptor shall be considered ready for writing when a call to an output
| function with O_NONBLOCK clear would not block, whether or not the function
| would transfer data successfully.

That said, EPIPE situation is clearly one of 'ready for writing'.

We don't have read-side issue because tcp_poll() already has read side
shutdown care.

| if (sk->sk_shutdown & RCV_SHUTDOWN)
| mask |= POLLIN | POLLRDNORM | POLLRDHUP;

So, Let's insert same logic in write side.

- reference url
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/31065
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/31068

Signed-off-by: KOSAKI Motohiro
Signed-off-by: David S. Miller

KOSAKI Motohiro
2010-08-26 14:02:48 +0800
c5ed63d66 tcp: fix three tcp sysctls tuning ... Browse Code »

As discovered by Anton Blanchard, current code to autotune
tcp_death_row.sysctl_max_tw_buckets, sysctl_tcp_max_orphans and
sysctl_max_syn_backlog makes little sense.

The bigger a page is, the less tcp_max_orphans is : 4096 on a 512GB
machine in Anton's case.

(tcp_hashinfo.bhash_size * sizeof(struct inet_bind_hashbucket))
is much bigger if spinlock debugging is on. Its wrong to select bigger
limits in this case (where kernel structures are also bigger)

bhash_size max is 65536, and we get this value even for small machines.

A better ground is to use size of ehash table, this also makes code
shorter and more obvious.

Based on a patch from Anton, and another from David.

Reported-and-tested-by: Anton Blanchard
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2010-08-26 14:02:17 +0800

25 Aug, 2010

1 commit

ad1af0fed tcp: Combat per-cpu skew in orphan tests. ... Browse Code »

As reported by Anton Blanchard when we use
percpu_counter_read_positive() to make our orphan socket limit checks,
the check can be off by up to num_cpus_online() * batch (which is 32
by default) which on a 128 cpu machine can be as large as the default
orphan limit itself.

Fix this by doing the full expensive sum check if the optimized check
triggers.

Reported-by: Anton Blanchard
Signed-off-by: David S. Miller
Acked-by: Eric Dumazet

David S. Miller
2010-08-25 17:27:49 +0800

24 Aug, 2010

1 commit

cca77b7c8 netfilter: fix CONFIG_COMPAT support ... Browse Code »

commit f3c5c1bfd430858d3a05436f82c51e53104feb6b
(netfilter: xtables: make ip_tables reentrant) forgot to
also compute the jumpstack size in the compat handlers.

Result is that "iptables -I INPUT -j userchain" turns into -j DROP.

Reported by Sebastian Roesner on #netfilter, closes
http://bugzilla.netfilter.org/show_bug.cgi?id=669.

Note: arptables change is compile-tested only.

Signed-off-by: Florian Westphal
Acked-by: Eric Dumazet
Tested-by: Mikael Pettersson
Signed-off-by: David S. Miller

Florian Westphal
2010-08-24 05:41:22 +0800

18 Aug, 2010

1 commit

001389b95 netfilter: {ip,ip6,arp}_tables: avoid lockdep false positive ... Browse Code »

After commit 24b36f019 (netfilter: {ip,ip6,arp}_tables: dont block
bottom half more than necessary), lockdep can raise a warning
because we attempt to lock a spinlock with BH enabled, while
the same lock is usually locked by another cpu in a softirq context.

Disable again BH to avoid these lockdep warnings.

Reported-by: Linus Torvalds
Diagnosed-by: David S. Miller
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2010-08-18 06:12:14 +0800

08 Aug, 2010

1 commit

ba78e2ddc tcp: no md5sig option size check bug ... Browse Code »

tcp_parse_md5sig_option doesn't check md5sig option (TCPOPT_MD5SIG)
length, but tcp_v[46]_inbound_md5_hash assume that it's at least 16
bytes long.

Signed-off-by: Dmitry Popov
Signed-off-by: David S. Miller

Dmitry Popov
2010-08-08 11:24:28 +0800

03 Aug, 2010

4 commits

00dad5e47 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 ... Browse Code »

Conflicts:
drivers/net/e1000e/hw.h
net/bridge/br_device.c
net/bridge/br_input.c

David S. Miller
2010-08-03 13:22:46 +0800
c893b8066 ip_fragment: fix subtracting PPPOE_SES_HLEN from mtu twice ... Browse Code »

6c79bf0f2440fd250c8fce8d9b82fcf03d4e8350 subtracts PPPOE_SES_HLEN from mtu at
the front of ip_fragment(). So the later subtraction should be removed. The
MTU of 802.1q is also 1500, so MTU should not be changed.

Signed-off-by: Changli Gao
Signed-off-by: Bart De Schuymer
----
net/ipv4/ip_output.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
Signed-off-by: Bart De Schuymer
Signed-off-by: David S. Miller

Changli Gao
2010-08-03 08:25:07 +0800
3c0fef0b7 net: Add getsockopt support for TCP thin-streams ... Browse Code »

Initial TCP thin-stream commit did not add getsockopt support for the new
socket options: TCP_THIN_LINEAR_TIMEOUTS and TCP_THIN_DUPACK. This adds support
for them.

Signed-off-by: Josh Hunt
Tested-by: Andreas Petlund
Acked-by: Andreas Petlund
Signed-off-by: David S. Miller

Josh Hunt
2010-08-03 08:25:06 +0800
83bf2e408 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6 Browse Code »

David S. Miller
2010-08-03 06:07:58 +0800

02 Aug, 2010

4 commits

2452a99dc netfilter: nf_nat: don't check if the tuple is unique when there isn't any other choice ... Browse Code »

The tuple got from unique_tuple() doesn't need to be really unique, so the
check for the unique tuple isn't necessary, when there isn't any other
choice. Eliminating the unnecessary nf_nat_used_tuple() can save some CPU
cycles too.

Signed-off-by: Changli Gao
Signed-off-by: Patrick McHardy

Changli Gao
2010-08-02 23:35:49 +0800
f43dc98b3 netfilter: nf_nat: make unique_tuple return void ... Browse Code »

The only user of unique_tuple() get_unique_tuple() doesn't care about the
return value of unique_tuple(), so make unique_tuple() return void (nothing).

Signed-off-by: Changli Gao
Signed-off-by: Patrick McHardy

Changli Gao
2010-08-02 23:20:54 +0800
794dbc1d7 netfilter: nf_nat: use local variable hdrlen ... Browse Code »

Use local variable hdrlen instead of ip_hdrlen(skb).

Signed-off-by: Changli Gao
Signed-off-by: Patrick McHardy

Changli Gao
2010-08-02 23:15:30 +0800
24b36f019 netfilter: {ip,ip6,arp}_tables: dont block bottom half more than necessary ... Browse Code »

We currently disable BH for the whole duration of get_counters()

On machines with a lot of cpus and large tables, this might be too long.

We can disable preemption during the whole function, and disable BH only
while fetching counters for the current cpu.

Signed-off-by: Eric Dumazet
Signed-off-by: Patrick McHardy

Eric Dumazet
2010-08-02 22:49:01 +0800

31 Jul, 2010

1 commit

a3bdb549e tcp: cookie transactions setsockopt memory leak ... Browse Code »

There is a bug in do_tcp_setsockopt(net/ipv4/tcp.c),
TCP_COOKIE_TRANSACTIONS case.
In some cases (when tp->cookie_values == NULL) new tcp_cookie_values
structure can be allocated (at cvp), but not bound to
tp->cookie_values. So a memory leak occurs.

Signed-off-by: Dmitry Popov
Signed-off-by: David S. Miller

Dmitry Popov
2010-07-31 14:04:07 +0800

23 Jul, 2010

4 commits

7df0884ce netfilter: iptables: use skb->len for accounting ... Browse Code »

Use skb->len for accounting as xt_quota does.

Signed-off-by: Changli Gao
Signed-off-by: Patrick McHardy

Changli Gao
2010-07-23 22:25:11 +0800
f667009ec netfilter: arptables: use arp_hdr_len() ... Browse Code »

use arp_hdr_len().

Signed-off-by: Changli Gao
Signed-off-by: Patrick McHardy

Changli Gao
2010-07-23 19:40:53 +0800
c36952e52 netfilter: nf_nat_core: merge the same lines ... Browse Code »

proto->unique_tuple() will be called finally, if the previous calls fail. This
patch checks the false condition of (range->flags &IP_NAT_RANGE_PROTO_RANDOM)
instead to avoid duplicate line of code: proto->unique_tuple().

Signed-off-by: Changli Gao
Signed-off-by: Patrick McHardy

Changli Gao
2010-07-23 19:27:08 +0800
963bfeeee net: RTA_MARK addition ... Browse Code »

Add a new rt attribute, RTA_MARK, and use it in
rt_fill_info()/inet_rtm_getroute() to support following commands :

ip route get 192.168.20.110 mark NUMBER
ip route get 192.168.20.108 from 192.168.20.110 iif eth1 mark NUMBER
ip route list cache [192.168.20.110] mark NUMBER

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2010-07-23 04:46:21 +0800

22 Jul, 2010

1 commit

3f30fc157 net: remove last uses of __attribute__((packed)) ... Browse Code »

Network code uses the __packed macro instead of __attribute__((packed)).

Signed-off-by: Gustavo F. Padovan
Signed-off-by: David S. Miller

Gustavo F. Padovan
2010-07-22 05:44:29 +0800

21 Jul, 2010

1 commit

11fe88393 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 ... Browse Code »

Conflicts:
drivers/vhost/net.c
net/bridge/br_device.c

Fix merge conflict in drivers/vhost/net.c with guidance from
Stephen Rothwell.

Revert the effects of net-2.6 commit 573201f36fd9c7c6d5218cdcd9948cee700b277d
since net-next-2.6 has fixes that make bridge netpoll work properly thus
we don't need it disabled.

Signed-off-by: David S. Miller

David S. Miller
2010-07-21 09:25:24 +0800

20 Jul, 2010

1 commit

45e77d314 tcp: fix crash in tcp_xmit_retransmit_queue ... Browse Code »

It can happen that there are no packets in queue while calling
tcp_xmit_retransmit_queue(). tcp_write_queue_head() then returns
NULL and that gets deref'ed to get sacked into a local var.

There is no work to do if no packets are outstanding so we just
exit early.

This oops was introduced by 08ebd1721ab8fd (tcp: remove tp->lost_out
guard to make joining diff nicer).

Signed-off-by: Ilpo Järvinen
Reported-by: Lennart Schulte
Tested-by: Lennart Schulte
Signed-off-by: David S. Miller

Ilpo Järvinen
2010-07-20 03:43:49 +0800

16 Jul, 2010

1 commit

e40dbc51f ipmr: Don't leak memory if fib lookup fails. ... Browse Code »

This was detected using two mcast router tables. The
pimreg for the second interface did not have a specific
mrule, so packets received by it were handled by the
default table, which had nothing configured.

This caused the ipmr_fib_lookup to fail, causing
the memory leak.

Signed-off-by: Ben Greear
Signed-off-by: David S. Miller

Ben Greear
2010-07-16 13:38:43 +0800

15 Jul, 2010

1 commit

3a047bf87 rfs: call sock_rps_record_flow() in tcp_splice_read() ... Browse Code »

rfs: call sock_rps_record_flow() in tcp_splice_read()

call sock_rps_record_flow() in tcp_splice_read(), so the applications using
splice(2) or sendfile(2) can utilize RFS.

Signed-off-by: Changli Gao
----
net/ipv4/tcp.c | 1 +
1 file changed, 1 insertion(+)
Signed-off-by: David S. Miller

Changli Gao
2010-07-15 05:45:15 +0800

13 Jul, 2010

2 commits

7ba429100 inet, inet6: make tcp_sendmsg() and tcp_sendpage() through inet_sendmsg() and inet_sendpage() ... Browse Code »

a new boolean flag no_autobind is added to structure proto to avoid the autobind
calls when the protocol is TCP. Then sock_rps_record_flow() is called int the
TCP's sendmsg() and sendpage() pathes.

Signed-off-by: Changli Gao
----
include/net/inet_common.h | 4 ++++
include/net/sock.h | 1 +
include/net/tcp.h | 8 ++++----
net/ipv4/af_inet.c | 15 +++++++++------
net/ipv4/tcp.c | 11 +++++------
net/ipv4/tcp_ipv4.c | 3 +++
net/ipv6/af_inet6.c | 8 ++++----
net/ipv6/tcp_ipv6.c | 3 +++
8 files changed, 33 insertions(+), 20 deletions(-)
Signed-off-by: David S. Miller

Changli Gao
2010-07-13 11:21:46 +0800
4bc2f18ba net/ipv4: EXPORT_SYMBOL cleanups ... Browse Code »

CodingStyle cleanups

EXPORT_SYMBOL should immediately follow the symbol declaration.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2010-07-13 03:57:54 +0800

09 Jul, 2010

1 commit

dd4ba83dc gre: propagate ipv6 transport class ... Browse Code »

This patch makes IPV6 over IPv4 GRE tunnel propagate the transport
class field from the underlying IPV6 header to the IPV4 Type Of Service
field. Without the patch, all IPV6 packets in tunnel look the same to QoS.

This assumes that IPV6 transport class is exactly the same
as IPv4 TOS. Not sure if that is always the case? Maybe need
to mask off some bits.

The mask and shift to get tclass is copied from ipv6/datagram.c

Signed-off-by: Stephen Hemminger
Signed-off-by: David S. Miller

Stephen Hemminger
2010-07-09 12:35:58 +0800

08 Jul, 2010

2 commits

597e608a8 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Browse Code »

David S. Miller
2010-07-08 06:59:38 +0800
49085bd7d net/ipv4/ip_output.c: Removal of unused variable in ip_fragment() ... Browse Code »

Removal of unused integer variable in ip_fragment().

Signed-off-by: George Kadianakis
Signed-off-by: David S. Miller

George Kadianakis
2010-07-08 06:44:59 +0800

06 Jul, 2010

1 commit

fe76cda30 ipv4: use skb_dst_copy() in ip_copy_metadata() ... Browse Code »

Avoid touching dst refcount in ip_fragment().

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2010-07-06 09:50:56 +0800

05 Jul, 2010

3 commits

b13b7125e netfilter: ipt_REJECT: avoid touching dst ref ... Browse Code »

We can avoid a pair of atomic ops in ipt_REJECT send_reset()

Signed-off-by: Eric Dumazet
Signed-off-by: Patrick McHardy

Eric Dumazet
2010-07-05 16:40:09 +0800
98b0e84aa netfilter: ipt_REJECT: postpone the checksum calculation. ... Browse Code »

postpone the checksum calculation, then if the output NIC supports checksum
offloading, we can utlize it. And though the output NIC doesn't support
checksum offloading, but we'll mangle this packet, this can free us from
updating the checksum, as the checksum calculation occurs later.

Signed-off-by: Changli Gao
Signed-off-by: Patrick McHardy

Changli Gao
2010-07-05 16:39:17 +0800
44b451f16 xfrm: fix xfrm by MARK logic ... Browse Code »

While using xfrm by MARK feature in
2.6.34 - 2.6.35 kernels, the mark
is always cleared in flowi structure via memset in
_decode_session4 (net/ipv4/xfrm4_policy.c), so
the policy lookup fails.
IPv6 code is affected by this bug too.

Signed-off-by: Peter Kosyh
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Peter Kosyh
2010-07-05 02:46:07 +0800

03 Jul, 2010

1 commit

e490c1def Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6 Browse Code »

David S. Miller
2010-07-03 13:42:06 +0800

01 Jul, 2010

2 commits

d6bebca92 fragment: add fast path for in-order fragments ... Browse Code »

add fast path for in-order fragments

As the fragments are sent in order in most of OSes, such as Windows, Darwin and
FreeBSD, it is likely the new fragments are at the end of the inet_frag_queue.
In the fast path, we check if the skb at the end of the inet_frag_queue is the
prev we expect.

Signed-off-by: Changli Gao
----
include/net/inet_frag.h | 1 +
net/ipv4/ip_fragment.c | 12 ++++++++++++
net/ipv6/reassembly.c | 11 +++++++++++
3 files changed, 24 insertions(+)
Signed-off-by: David S. Miller

Changli Gao
2010-07-01 04:44:29 +0800
4ce3c183f snmp: 64bit ipstats_mib for all arches ... Browse Code »
43

/proc/net/snmp and /proc/net/netstat expose SNMP counters.

Width of these counters is either 32 or 64 bits, depending on the size
of "unsigned long" in kernel.

This means user program parsing these files must already be prepared to
deal with 64bit values, regardless of user program being 32 or 64 bit.

This patch introduces 64bit snmp values for IPSTAT mib, where some
counters can wrap pretty fast if they are 32bit wide.

# netstat -s|egrep "InOctets|OutOctets"
InOctets: 244068329096
OutOctets: 244069348848

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2010-07-01 04:31:19 +0800

29 Jun, 2010

2 commits

c4ead4c59 tcp: tso_fragment() might avoid GFP_ATOMIC ... Browse Code »

We can pass a gfp argument to tso_fragment() and avoid GFP_ATOMIC
allocations sometimes.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2010-06-29 14:24:31 +0800
7a9b2d595 net: use this_cpu_ptr() ... Browse Code »

use this_cpu_ptr(p) instead of per_cpu_ptr(p, smp_processor_id())

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2010-06-29 14:24:29 +0800