Doug / smarc-fsl-linux-kernel | Embedian Git Server

23 Apr, 2012

1 commit

d135c522f tcp: fix TCP_MAXSEG for established IPv6 passive sockets ... Browse Code »

Commit f5fff5d forgot to fix TCP_MAXSEG behavior IPv6 sockets, so IPv6
TCP server sockets that used TCP_MAXSEG would find that the advmss of
child sockets would be incorrect. This commit mirrors the advmss logic
from tcp_v4_syn_recv_sock in tcp_v6_syn_recv_sock. Eventually this
logic should probably be shared between IPv4 and IPv6, but this at
least fixes this issue.

Signed-off-by: Neal Cardwell
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Neal Cardwell
2012-04-23 05:09:35 +0800

11 Apr, 2012

1 commit

94fb175c0 Merge tag 'dmaengine-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/dmaengine ... Browse Code »

Pull dmaengine fixes from Dan Williams:

1/ regression fix for Xen as it now trips over a broken assumption
about the dma address size on 32-bit builds

2/ new quirk for netdma to ignore dma channels that cannot meet
netdma alignment requirements

3/ fixes for two long standing issues in ioatdma (ring size overflow)
and iop-adma (potential stack corruption)

* tag 'dmaengine-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/dmaengine:
netdma: adding alignment check for NETDMA ops
ioatdma: DMA copy alignment needed to address IOAT DMA silicon errata
ioat: ring size variables need to be 32bit to avoid overflow
iop-adma: Corrected array overflow in RAID6 Xscale(R) test.
ioat: fix size of 'completion' for Xen

Linus Torvalds
2012-04-11 06:30:16 +0800

06 Apr, 2012

1 commit

a2bd1140a netdma: adding alignment check for NETDMA ops ... Browse Code »

This is the fallout from adding memcpy alignment workaround for certain
IOATDMA hardware. NetDMA will only use DMA engine that can handle byte align
ops.

Acked-by: David S. Miller
Signed-off-by: Dave Jiang
Signed-off-by: Dan Williams

Dave Jiang
2012-04-06 06:27:12 +0800

13 Feb, 2012

1 commit

4c507d289 net: implement IP_RECVTOS for IP_PKTOPTIONS ... Browse Code »

Currently, it is not easily possible to get TOS/DSCP value of packets from
an incoming TCP stream. The mechanism is there, IP_PKTOPTIONS getsockopt
with IP_RECVTOS set, the same way as incoming TTL can be queried. This is
not actually implemented for TOS, though.

This patch adds this functionality, both for IPv4 (IP_PKTOPTIONS) and IPv6
(IPV6_2292PKTOPTIONS). For IPv4, like in the IP_RECVTTL case, the value of
the TOS field is stored from the other party's ACK.

This is needed for proxies which require DSCP transparency. One such example
is at http://zph.bratcheda.org/.

Signed-off-by: Jiri Benc
Signed-off-by: David S. Miller

Jiri Benc
2012-02-13 13:46:41 +0800

02 Feb, 2012

1 commit

658ddaaf6 tcp: md5: RST: getting md5 key from listener ... Browse Code »

TCP RST mechanism is broken in TCP md5(RFC2385). When
connection is gone, md5 key is lost, sending RST
without md5 hash is deem to ignored by peer. This can
be a problem since RST help protocal like bgp to fast
recove from peer crash.

In most case, users of tcp md5, such as bgp and ldp,
have listener on both sides to accept connection from peer.
md5 keys for peers are saved in listening socket.

There are two cases in finding md5 key when connection is
lost:
1.Passive receive RST: The message is send to well known port,
tcp will associate it with listner. md5 key is gotten from
listener.

2.Active receive RST (no sock): The message is send to ative
side, there is no socket associated with the message. In this
case, finding listener from source port, then find md5 key from
listener.

we are not loosing sercuriy here:
packet is checked with md5 hash. No RST is generated
if md5 hash doesn't match or no md5 key can be found.

Signed-off-by: Shawn Lu
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Shawn Lu
2012-02-02 01:43:54 +0800

01 Feb, 2012

3 commits

a8afca032 tcp: md5: protects md5sig_info with RCU ... Browse Code »

This patch makes sure we use appropriate memory barriers before
publishing tp->md5sig_info, allowing tcp_md5_do_lookup() being used from
tcp_v4_send_reset() without holding socket lock (upcoming patch from
Shawn Lu)

Note we also need to respect rcu grace period before its freeing, since
we can free socket without this grace period thanks to
SLAB_DESTROY_BY_RCU

Signed-off-by: Eric Dumazet
Cc: Shawn Lu
Signed-off-by: David S. Miller

Eric Dumazet
2012-02-01 15:11:47 +0800
a915da9b6 tcp: md5: rcu conversion ... Browse Code »

In order to be able to support proper RST messages for TCP MD5 flows, we
need to allow access to MD5 keys without locking listener socket.

This conversion is a nice cleanup, and shrinks size of timewait sockets
by 80 bytes.

IPv6 code reuses generic code found in IPv4 instead of duplicating it.

Control path uses GFP_KERNEL allocations instead of GFP_ATOMIC.

Signed-off-by: Eric Dumazet
Cc: Shawn Lu
Signed-off-by: David S. Miller

Eric Dumazet
2012-02-01 01:14:00 +0800
a2d91241a tcp: md5: remove obsolete md5_add() method ... Browse Code »

We no longer use md5_add() method from struct tcp_sock_af_ops

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2012-02-01 01:13:59 +0800

23 Jan, 2012

1 commit

8a622e71f tcp: md5: using remote adress for md5 lookup in rst packet ... Browse Code »

md5 key is added in socket through remote address.
remote address should be used in finding md5 key when
sending out reset packet.

Signed-off-by: shawnlu
Signed-off-by: David S. Miller

shawnlu
2012-01-23 04:08:45 +0800

13 Dec, 2011

3 commits

3dc43e3e4 per-netns ipv4 sysctl_tcp_mem ... Browse Code »

This patch allows each namespace to independently set up
its levels for tcp memory pressure thresholds. This patch
alone does not buy much: we need to make this values
per group of process somehow. This is achieved in the
patches that follows in this patchset.

Signed-off-by: Glauber Costa
Reviewed-by: KAMEZAWA Hiroyuki
CC: David S. Miller
CC: Eric W. Biederman
Signed-off-by: David S. Miller

Glauber Costa
2011-12-13 08:04:11 +0800
d1a4c0b37 tcp memory pressure controls ... Browse Code »

This patch introduces memory pressure controls for the tcp
protocol. It uses the generic socket memory pressure code
introduced in earlier patches, and fills in the
necessary data in cg_proto struct.

Signed-off-by: Glauber Costa
Reviewed-by: KAMEZAWA Hiroyuki
CC: Eric W. Biederman
Signed-off-by: David S. Miller

Glauber Costa
2011-12-13 08:04:10 +0800
180d8cd94 foundations of per-cgroup memory pressure controlling. ... Browse Code »

This patch replaces all uses of struct sock fields' memory_pressure,
memory_allocated, sockets_allocated, and sysctl_mem to acessor
macros. Those macros can either receive a socket argument, or a mem_cgroup
argument, depending on the context they live in.

Since we're only doing a macro wrapping here, no performance impact at all is
expected in the case where we don't have cgroups disabled.

Signed-off-by: Glauber Costa
Reviewed-by: Hiroyouki Kamezawa
CC: David S. Miller
CC: Eric W. Biederman
CC: Eric Dumazet
Signed-off-by: David S. Miller

Glauber Costa
2011-12-13 08:04:10 +0800

27 Nov, 2011

1 commit

6dec4ac4e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Conflicts:
net/ipv4/inet_diag.c

David S. Miller
2011-11-27 03:47:03 +0800

24 Nov, 2011

1 commit

4d0fe50c7 ipv6: tcp: fix tcp_v6_conn_request() ... Browse Code »

Since linux 2.6.26 (commit c6aefafb7ec6 : Add IPv6 support to TCP SYN
cookies), we can drop a SYN packet reusing a TIME_WAIT socket.

(As a matter of fact we fail to send the SYNACK answer)

As the client resends its SYN packet after a one second timeout, we
accept it, because first packet removed the TIME_WAIT socket before
being dropped.

This probably explains why nobody ever noticed or complained.

Reported-by: Jesse Young
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2011-11-24 06:29:23 +0800

23 Nov, 2011

1 commit

4e3fd7a06 net: remove ipv6_addr_copy() ... Browse Code »

C assignment can handle struct in6_addr copying.

Signed-off-by: Alexey Dobriyan
Signed-off-by: David S. Miller

Alexey Dobriyan
2011-11-23 05:43:32 +0800

02 Nov, 2011

1 commit

73cb88ecb net: make the tcp and udp file_operations for the /proc stuff const ... Browse Code »

the tcp and udp code creates a set of struct file_operations at runtime
while it can also be done at compile time, with the added benefit of then
having these file operations be const.

the trickiest part was to get the "THIS_MODULE" reference right; the naive
method of declaring a struct in the place of registration would not work
for this reason.

Signed-off-by: Arjan van de Ven
Signed-off-by: David S. Miller

Arjan van de Ven
2011-11-02 05:56:14 +0800

27 Oct, 2011

1 commit

b903d324b ipv6: tcp: fix TCLASS value in ACK messages sent from TIME_WAIT ... Browse Code »

commit 66b13d99d96a (ipv4: tcp: fix TOS value in ACK messages sent from
TIME_WAIT) fixed IPv4 only.

This part is for the IPv6 side, adding a tclass param to ip6_xmit()

We alias tw_tclass and tw_tos, if socket family is INET6.

[ if sockets is ipv4-mapped, only IP_TOS socket option is used to fill
TOS field, TCLASS is not taken into account ]

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2011-10-27 12:44:35 +0800

24 Oct, 2011

1 commit

318cf7aaa tcp: md5: add more const attributes ... Browse Code »

Now tcp_md5_hash_header() has a const tcphdr argument, we can add more
const attributes to callers.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2011-10-24 14:46:04 +0800

21 Oct, 2011

1 commit

cf533ea53 tcp: add const qualifiers where possible ... Browse Code »

Adding const qualifiers to pointers can ease code review, and spot some
bugs. It might allow compiler to optimize code further.

For example, is it legal to temporary write a null cksum into tcphdr
in tcp_md5_hash_header() ? I am afraid a sniffer could catch the
temporary null value...

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2011-10-21 17:22:42 +0800

08 Oct, 2011

1 commit

88c5100c2 Merge branch 'master' of github.com:davem330/net ... Browse Code »

Conflicts:
net/batman-adv/soft-interface.c

David S. Miller
2011-10-08 01:38:43 +0800

05 Oct, 2011

1 commit

260fcbeb1 tcp: properly handle md5sig_pool references ... Browse Code »

tcp_v4_clear_md5_list() assumes that multiple tcp md5sig peers
only hold one reference to md5sig_pool. but tcp_v4_md5_do_add()
increases use count of md5sig_pool for each peer. This patch
makes tcp_v4_md5_do_add() only increases use count for the first
tcp md5sig peer.

Signed-off-by: Zheng Yan
Signed-off-by: David S. Miller

Yan, Zheng
2011-10-05 11:31:24 +0800

29 Sep, 2011

1 commit

676a1184e ipv6: nullify ipv6_ac_list and ipv6_fl_list when creating new socket ... Browse Code »

ipv6_ac_list and ipv6_fl_list from listening socket are inadvertently
shared with new socket created for connection.

Signed-off-by: Zheng Yan
Signed-off-by: David S. Miller

Yan, Zheng
2011-09-29 12:32:10 +0800

27 Sep, 2011

1 commit

b82d1bb4f tcp: unalias tcp_skb_cb flags and ip_dsfield ... Browse Code »

struct tcp_skb_cb contains a "flags" field containing either tcp flags
or IP dsfield depending on context (input or output path)

Introduce ip_dsfield to make the difference clear and ease maintenance.
If later we want to save space, we can union flags/ip_dsfield

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2011-09-27 14:20:08 +0800

22 Sep, 2011

1 commit

8decf8687 Merge branch 'master' of github.com:davem330/net ... Browse Code »

Conflicts:
MAINTAINERS
drivers/net/Kconfig
drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c
drivers/net/ethernet/broadcom/tg3.c
drivers/net/wireless/iwlwifi/iwl-pci.c
drivers/net/wireless/iwlwifi/iwl-trans-tx-pcie.c
drivers/net/wireless/rt2x00/rt2800usb.c
drivers/net/wireless/wl12xx/main.c

David S. Miller
2011-09-22 15:23:13 +0800

16 Sep, 2011

1 commit

946cedccb tcp: Change possible SYN flooding messages ... Browse Code »

"Possible SYN flooding on port xxxx " messages can fill logs on servers.

Change logic to log the message only once per listener, and add two new
SNMP counters to track :

TCPReqQFullDoCookies : number of times a SYNCOOKIE was replied to client

TCPReqQFullDrop : number of times a SYN request was dropped because
syncookies were not enabled.

Based on a prior patch from Tom Herbert, and suggestions from David.

Signed-off-by: Eric Dumazet
CC: Tom Herbert
Signed-off-by: David S. Miller

Eric Dumazet
2011-09-16 02:49:43 +0800

18 Aug, 2011

1 commit

bdeab9919 rps: Add flag to skb to indicate rxhash is based on L4 tuple ... Browse Code »

The l4_rxhash flag was added to the skb structure to indicate
that the rxhash value was computed over the 4 tuple for the
packet which includes the port information in the encapsulated
transport packet. This is used by the stack to preserve the
rxhash value in __skb_rx_tunnel.

Signed-off-by: Tom Herbert
Signed-off-by: David S. Miller

Tom Herbert
2011-08-18 11:06:03 +0800

07 Aug, 2011

1 commit

6e5714eaf net: Compute protocol sequence numbers and fragment IDs using MD5. ... Browse Code »

Computers have become a lot faster since we compromised on the
partial MD4 hash which we use currently for performance reasons.

MD5 is a much safer choice, and is inline with both RFC1948 and
other ISS generators (OpenBSD, Solaris, etc.)

Furthermore, only having 24-bits of the sequence number be truly
unpredictable is a very serious limitation. So the periodic
regeneration and 8-bit counter have been removed. We compute and
use a full 32-bit sequence number.

For ipv6, DCCP was found to use a 32-bit truncated initial sequence
number (it needs 43-bits) and that is fixed here as well.

Reported-by: Dan Kaminsky
Tested-by: Willy Tarreau
Signed-off-by: David S. Miller

David S. Miller
2011-08-07 09:33:19 +0800

21 Jun, 2011

1 commit

9f6ec8d69 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 ... Browse Code »

Conflicts:
drivers/net/wireless/iwlwifi/iwl-agn-rxon.c
drivers/net/wireless/rtlwifi/pci.c
net/netfilter/ipvs/ip_vs_core.c

David S. Miller
2011-06-21 13:29:08 +0800

18 Jun, 2011

1 commit

1eddceadb net: rfs: enable RFS before first data packet is received ... Browse Code »

Le jeudi 16 juin 2011 à 23:38 -0400, David Miller a écrit :
> From: Ben Hutchings
> Date: Fri, 17 Jun 2011 00:50:46 +0100
>
> > On Wed, 2011-06-15 at 04:15 +0200, Eric Dumazet wrote:
> >> @@ -1594,6 +1594,7 @@ int tcp_v4_do_rcv(struct sock *sk, struct sk_buff *skb)
> >> goto discard;
> >>
> >> if (nsk != sk) {
> >> + sock_rps_save_rxhash(nsk, skb->rxhash);
> >> if (tcp_child_process(sk, nsk, skb)) {
> >> rsk = nsk;
> >> goto reset;
> >>
> >
> > I haven't tried this, but it looks reasonable to me.
> >
> > What about IPv6? The logic in tcp_v6_do_rcv() looks very similar.
>
> Indeed ipv6 side needs the same fix.
>
> Eric please add that part and resubmit. And in fact I might stick
> this into net-2.6 instead of net-next-2.6
>

OK, here is the net-2.6 based one then, thanks !

[PATCH v2] net: rfs: enable RFS before first data packet is received

First packet received on a passive tcp flow is not correctly RFS
steered.

One sock_rps_record_flow() call is missing in inet_accept()

But before that, we also must record rxhash when child socket is setup.

Signed-off-by: Eric Dumazet
CC: Tom Herbert
CC: Ben Hutchings
CC: Jamal Hadi Salim
Signed-off-by: David S. Miller

Eric Dumazet
2011-06-18 03:27:31 +0800

09 Jun, 2011

1 commit

9ad7c049f tcp: RFC2988bis + taking RTT sample from 3WHS for the passive open side ... Browse Code »

This patch lowers the default initRTO from 3secs to 1sec per
RFC2988bis. It falls back to 3secs if the SYN or SYN-ACK packet
has been retransmitted, AND the TCP timestamp option is not on.

It also adds support to take RTT sample during 3WHS on the passive
open side, just like its active open counterpart, and uses it, if
valid, to seed the initRTO for the data transmission phase.

The patch also resets ssthresh to its initial default at the
beginning of the data transmission phase, and reduces cwnd to 1 if
there has been MORE THAN ONE retransmission during 3WHS per RFC5681.

Signed-off-by: H.K. Jerry Chu
Signed-off-by: David S. Miller

Jerry Chu
2011-06-09 08:05:30 +0800

24 May, 2011

1 commit

71338aa7d net: convert %p usage to %pK ... Browse Code »

The %pK format specifier is designed to hide exposed kernel pointers,
specifically via /proc interfaces. Exposing these pointers provides an
easy target for kernel write vulnerabilities, since they reveal the
locations of writable structures containing easily triggerable function
pointers. The behavior of %pK depends on the kptr_restrict sysctl.

If kptr_restrict is set to 0, no deviation from the standard %p behavior
occurs. If kptr_restrict is set to 1, the default, if the current user
(intended to be a reader via seq_printf(), etc.) does not have CAP_SYSLOG
(currently in the LSM tree), kernel pointers using %pK are printed as 0's.
If kptr_restrict is set to 2, kernel pointers using %pK are printed as
0's regardless of privileges. Replacing with 0's was chosen over the
default "(null)", which cannot be parsed by userland %p, which expects
"(nil)".

The supporting code for kptr_restrict and %pK are currently in the -mm
tree. This patch converts users of %p in net/ to %pK. Cases of printing
pointers to the syslog are not covered, since this would eliminate useful
information for postmortem debugging and the reading of the syslog is
already optionally protected by the dmesg_restrict sysctl.

Signed-off-by: Dan Rosenberg
Cc: James Morris
Cc: Eric Dumazet
Cc: Thomas Graf
Cc: Eugene Teo
Cc: Kees Cook
Cc: Ingo Molnar
Cc: David S. Miller
Cc: Peter Zijlstra
Cc: Eric Paris
Signed-off-by: Andrew Morton
Signed-off-by: David S. Miller

Dan Rosenberg
2011-05-24 13:13:12 +0800

29 Apr, 2011

1 commit

f6d8bd051 inet: add RCU protection to inet->opt ... Browse Code »

We lack proper synchronization to manipulate inet->opt ip_options

Problem is ip_make_skb() calls ip_setup_cork() and
ip_setup_cork() possibly makes a copy of ipc->opt (struct ip_options),
without any protection against another thread manipulating inet->opt.

Another thread can change inet->opt pointer and free old one under us.

Use RCU to protect inet->opt (changed to inet->inet_opt).

Instead of handling atomic refcounts, just copy ip_options when
necessary, to avoid cache line dirtying.

We cant insert an rcu_head in struct ip_options since its included in
skb->cb[], so this patch is large because I had to introduce a new
ip_options_rcu structure.

Signed-off-by: Eric Dumazet
Cc: Herbert Xu
Signed-off-by: David S. Miller

Eric Dumazet
2011-04-29 04:16:35 +0800

23 Apr, 2011

1 commit

b71d1d426 inet: constify ip headers and in6_addr ... Browse Code »

Add const qualifiers to structs iphdr, ipv6hdr and in6_addr pointers
where possible, to make code intention more obvious.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2011-04-23 02:04:14 +0800

07 Apr, 2011

1 commit

47482f132 ipv6: Enable RFS sk_rxhash tracking for ipv6 sockets (v2) ... Browse Code »

properly record sk_rxhash in ipv6 sockets (v2)

Noticed while working on another project that flows to sockets which I had open
on a test systems weren't getting steered properly when I had RFS enabled.
Looking more closely I found that:

1) The affected sockets were all ipv6
2) They weren't getting steered because sk->sk_rxhash was never set from the
incomming skbs on that socket.

This was occuring because there are several points in the IPv4 tcp and udp code
which save the rxhash value when a new connection is established. Those calls
to sock_rps_save_rxhash were never added to the corresponding ipv6 code paths.
This patch adds those calls. Tested by myself to properly enable RFS
functionalty on ipv6.

Change notes:
v2:
Filtered UDP to only arm RFS on bound sockets (Eric Dumazet)

Signed-off-by: Neil Horman
Signed-off-by: David S. Miller

Neil Horman
2011-04-07 04:07:09 +0800

05 Apr, 2011

1 commit

738faca34 ipv6: Don't pass invalid dst_entry pointer to dst_release(). ... Browse Code »

Make sure dst_release() is not called with error pointer. This is
similar to commit 4910ac6c526d2868adcb5893e0c428473de862b5 ("ipv4:
Don't ip_rt_put() an error pointer in RAW sockets.").

Signed-off-by: Boris Ostrovsky
Signed-off-by: David S. Miller

Boris Ostrovsky
2011-04-05 04:07:26 +0800

13 Mar, 2011

4 commits

1958b856c net: Put fl6_* macros to struct flowi6 and use them again. ... Browse Code »

Signed-off-by: David S. Miller

David S. Miller
2011-03-13 07:08:55 +0800
4c9483b2f ipv6: Convert to use flowi6 where applicable. ... Browse Code »

Signed-off-by: David S. Miller

David S. Miller
2011-03-13 07:08:54 +0800
6281dcc94 net: Make flowi ports AF dependent. ... Browse Code »

Create two sets of port member accessors, one set prefixed by fl4_*
and the other prefixed by fl6_*

This will let us to create AF optimal flow instances.

It will work because every context in which we access the ports,
we have to be fully aware of which AF the flowi is anyways.

Signed-off-by: David S. Miller

David S. Miller
2011-03-13 07:08:46 +0800
1d28f42c1 net: Put flowi_* prefix on AF independent members of struct flowi ... Browse Code »

I intend to turn struct flowi into a union of AF specific flowi
structs. There will be a common structure that each variant includes
first, much like struct sock_common.

This is the first step to move in that direction.

Signed-off-by: David S. Miller

David S. Miller
2011-03-13 07:08:44 +0800

02 Mar, 2011

1 commit

68d0c6d34 ipv6: Consolidate route lookup sequences. ... Browse Code »

Route lookups follow a general pattern in the ipv6 code wherein
we first find the non-IPSEC route, potentially override the
flow destination address due to ipv6 options settings, and then
finally make an IPSEC search using either xfrm_lookup() or
__xfrm_lookup().

__xfrm_lookup() is used when we want to generate a blackhole route
if the key manager needs to resolve the IPSEC rules (in this case
-EREMOTE is returned and the original 'dst' is left unchanged).

Otherwise plain xfrm_lookup() is used and when asynchronous IPSEC
resolution is necessary, we simply fail the lookup completely.

All of these cases are encapsulated into two routines,
ip6_dst_lookup_flow and ip6_sk_dst_lookup_flow. The latter of which
handles unconnected UDP datagram sockets.

Signed-off-by: David S. Miller

David S. Miller
2011-03-02 05:19:07 +0800