Eric Lee / smarc-fsl-linux-kernel

15 Jan, 2017

1 commit

d36a1cb1e inet: fix IP(V6)_RECVORIGDSTADDR for udp sockets ... Browse Code »

[ Upstream commit 39b2dd765e0711e1efd1d1df089473a8dd93ad48 ]

Socket cmsg IP(V6)_RECVORIGDSTADDR checks that port range lies within
the packet. For sockets that have transport headers pulled, transport
offset can be negative. Use signed comparison to avoid overflow.

Fixes: e6afc8ace6dd ("udp: remove headers from UDP packets before queueing")
Reported-by: Nisar Jagabar
Signed-off-by: Willem de Bruijn
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Willem de Bruijn
2017-01-15 20:42:52 +0800

01 Dec, 2016

1 commit

0382a25af l2tp: lock socket before checking flags in connect() ... Browse Code »

Socket flags aren't updated atomically, so the socket must be locked
while reading the SOCK_ZAPPED flag.

This issue exists for both l2tp_ip and l2tp_ip6. For IPv6, this patch
also brings error handling for __ip6_datagram_connect() failures.

Signed-off-by: Guillaume Nault
Signed-off-by: David S. Miller

Guillaume Nault
2016-12-01 03:14:07 +0800

17 May, 2016

1 commit

2632616bc sock: propagate __sock_cmsg_send() error ... Browse Code »

__sock_cmsg_send() might return different error codes, not only -EINVAL.

Fixes: 24025c465f77 ("ipv4: process socket-level control messages in IPv4")
Fixes: ad1e46a83716 ("ipv6: process socket-level control messages in IPv6")
Signed-off-by: Eric Dumazet
Cc: Soheil Hassas Yeganeh
Acked-by: Soheil Hassas Yeganeh
Signed-off-by: David S. Miller

Eric Dumazet
2016-05-17 01:46:23 +0800

04 May, 2016

1 commit

26879da58 ipv6: add new struct ipcm6_cookie ... Browse Code »

In the sendmsg function of UDP, raw, ICMP and l2tp sockets, we use local
variables like hlimits, tclass, opt and dontfrag and pass them to corresponding
functions like ip6_make_skb, ip6_append_data and xxx_push_pending_frames.
This is not a good practice and makes it hard to add new parameters.
This fix introduces a new struct ipcm6_cookie similar to ipcm_cookie in
ipv4 and include the above mentioned variables. And we only pass the
pointer to this structure to corresponding functions. This makes it easier
to add new parameters in the future and makes the function cleaner.

Signed-off-by: Wei Wang
Signed-off-by: David S. Miller

Wei Wang
2016-05-04 04:08:14 +0800

26 Apr, 2016

1 commit

960a26282 net: better drop monitoring in ip{6}_recv_error() ... Browse Code »

We should call consume_skb(skb) when skb is properly consumed,
or kfree_skb(skb) when skb must be dropped in error case.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2016-04-26 03:48:10 +0800

24 Apr, 2016

1 commit

1602f49b5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Conflicts were two cases of simple overlapping changes,
nothing serious.

In the UDP case, we need to add a hlist_add_tail_rcu()
to linux/rculist.h, because we've moved UDP socket handling
away from using nulls lists.

Signed-off-by: David S. Miller

David S. Miller
2016-04-24 06:51:33 +0800

15 Apr, 2016

4 commits

e646b657f ipv6: udp: Do a route lookup and update during release_cb ... Browse Code »

This patch adds a release_cb for UDPv6. It does a route lookup
and updates sk->sk_dst_cache if it is needed. It picks up the
left-over job from ip6_sk_update_pmtu() if the sk was owned
by user during the pmtu update.

It takes a rcu_read_lock to protect the __sk_dst_get() operations
because another thread may do ip6_dst_store() without taking the
sk lock (e.g. sendmsg).

Fixes: 45e4fd26683c ("ipv6: Only create RTF_CACHE routes after encountering pmtu exception")
Signed-off-by: Martin KaFai Lau
Reported-by: Wei Wang
Cc: Cong Wang
Cc: Eric Dumazet
Cc: Wei Wang
Signed-off-by: David S. Miller

Martin KaFai Lau
2016-04-15 04:29:53 +0800
33c162a98 ipv6: datagram: Update dst cache of a connected datagram sk during pmtu update ... Browse Code »

There is a case in connected UDP socket such that
getsockopt(IPV6_MTU) will return a stale MTU value. The reproducible
sequence could be the following:
1. Create a connected UDP socket
2. Send some datagrams out
3. Receive a ICMPV6_PKT_TOOBIG
4. No new outgoing datagrams to trigger the sk_dst_check()
logic to update the sk->sk_dst_cache.
5. getsockopt(IPV6_MTU) returns the mtu from the invalid
sk->sk_dst_cache instead of the newly created RTF_CACHE clone.

This patch updates the sk->sk_dst_cache for a connected datagram sk
during pmtu-update code path.

Note that the sk->sk_v6_daddr is used to do the route lookup
instead of skb->data (i.e. iph). It is because a UDP socket can become
connected after sending out some datagrams in un-connected state. or
It can be connected multiple times to different destinations. Hence,
iph may not be related to where sk is currently connected to.

It is done under '!sock_owned_by_user(sk)' condition because
the user may make another ip6_datagram_connect() (i.e changing
the sk->sk_v6_daddr) while dst lookup is happening in the pmtu-update
code path.

For the sock_owned_by_user(sk) == true case, the next patch will
introduce a release_cb() which will update the sk->sk_dst_cache.

Test:

Server (Connected UDP Socket):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Route Details:
[root@arch-fb-vm1 ~]# ip -6 r show | egrep '2fac'
2fac::/64 dev eth0 proto kernel metric 256 pref medium
2fac:face::/64 via 2fac::face dev eth0 metric 1024 pref medium

A simple python code to create a connected UDP socket:

import socket
import errno

HOST = '2fac::1'
PORT = 8080

s = socket.socket(socket.AF_INET6, socket.SOCK_DGRAM)
s.bind((HOST, PORT))
s.connect(('2fac:face::face', 53))
print("connected")
while True:
try:
data = s.recv(1024)
except socket.error as se:
if se.errno == errno.EMSGSIZE:
pmtu = s.getsockopt(41, 24)
print("PMTU:%d" % pmtu)
break
s.close()

Python program output after getting a ICMPV6_PKT_TOOBIG:
[root@arch-fb-vm1 ~]# python2 ~/devshare/kernel/tasks/fib6/udp-connect-53-8080.py
connected
PMTU:1300

Cache routes after recieving TOOBIG:
[root@arch-fb-vm1 ~]# ip -6 r show table cache
2fac:face::face via 2fac::face dev eth0 metric 0
cache expires 463sec mtu 1300 pref medium

Client (Send the ICMPV6_PKT_TOOBIG):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
scapy is used to generate the TOOBIG message. Here is the scapy script I have
used:

>>> p=Ether(src='da:75:4d:36:ac:32', dst='52:54:00:12:34:66', type=0x86dd)/IPv6(src='2fac::face', dst='2fac::1')/ICMPv6PacketTooBig(mtu=1300)/IPv6(src='2fac::
1',dst='2fac:face::face', nh='UDP')/UDP(sport=8080,dport=53)
>>> sendp(p, iface='qemubr0')

Fixes: 45e4fd26683c ("ipv6: Only create RTF_CACHE routes after encountering pmtu exception")
Signed-off-by: Martin KaFai Lau
Reported-by: Wei Wang
Cc: Cong Wang
Cc: Eric Dumazet
Cc: Wei Wang
Signed-off-by: David S. Miller

Martin KaFai Lau
2016-04-15 04:29:51 +0800
7e2040db1 ipv6: datagram: Refactor dst lookup and update codes to a new function ... Browse Code »

This patch moves the route lookup and update codes for connected
datagram sk to a newly created function ip6_datagram_dst_update()

It will be reused during the pmtu update in the later patch.

Signed-off-by: Martin KaFai Lau
Cc: Cong Wang
Cc: Eric Dumazet
Cc: Wei Wang
Signed-off-by: David S. Miller

Martin KaFai Lau
2016-04-15 04:29:50 +0800
80fbdb208 ipv6: datagram: Refactor flowi6 init codes to a new function ... Browse Code »

Move flowi6 init codes for connected datagram sk to a newly created
function ip6_datagram_flow_key_init().

Notes:
1. fl6_flowlabel is used instead of fl6.flowlabel in __ip6_datagram_connect
2. ipv6_addr_is_multicast(&fl6->daddr) is used instead of
(addr_type & IPV6_ADDR_MULTICAST) in ip6_datagram_flow_key_init()

This new function will be reused during pmtu update in the later patch.

Signed-off-by: Martin KaFai Lau
Cc: Cong Wang
Cc: Eric Dumazet
Cc: Wei Wang
Signed-off-by: David S. Miller

Martin KaFai Lau
2016-04-15 04:29:49 +0800

05 Apr, 2016

1 commit

ad1e46a83 ipv6: process socket-level control messages in IPv6 ... Browse Code »

Process socket-level control messages by invoking
__sock_cmsg_send in ip6_datagram_send_ctl for control messages on
the SOL_SOCKET layer.

This makes sure whenever ip6_datagram_send_ctl is called for
udp and raw, we also process socket-level control messages.

This is a bit uglier than IPv4, since IPv6 does not have
something like ipcm_cookie. Perhaps we can later create
a control message cookie for IPv6?

Note that this commit interprets new control messages that
were ignored before. As such, this commit does not change
the behavior of IPv6 control messages.

Signed-off-by: Soheil Hassas Yeganeh
Acked-by: Willem de Bruijn
Signed-off-by: David S. Miller

Soheil Hassas Yeganeh
2016-04-05 03:50:30 +0800

30 Jan, 2016

1 commit

1cdda9187 ipv6/udp: use sticky pktinfo egress ifindex on connect() ... Browse Code »

Currently, the egress interface index specified via IPV6_PKTINFO
is ignored by __ip6_datagram_connect(), so that RFC 3542 section 6.7
can be subverted when the user space application calls connect()
before sendmsg().
Fix it by initializing properly flowi6_oif in connect() before
performing the route lookup.

Signed-off-by: Paolo Abeni
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Paolo Abeni
2016-01-30 12:31:26 +0800

03 Dec, 2015

1 commit

45f6fad84 ipv6: add complete rcu protection around np->opt ... Browse Code »

This patch addresses multiple problems :

UDP/RAW sendmsg() need to get a stable struct ipv6_txoptions
while socket is not locked : Other threads can change np->opt
concurrently. Dmitry posted a syzkaller
(http://github.com/google/syzkaller) program desmonstrating
use-after-free.

Starting with TCP/DCCP lockless listeners, tcp_v6_syn_recv_sock()
and dccp_v6_request_recv_sock() also need to use RCU protection
to dereference np->opt once (before calling ipv6_dup_options())

This patch adds full RCU protection to np->opt

Reported-by: Dmitry Vyukov
Signed-off-by: Eric Dumazet
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Eric Dumazet
2015-12-03 12:37:16 +0800

26 Sep, 2015

1 commit

1c1e9d2b6 ipv6: constify ip6_xmit() sock argument ... Browse Code »

This is to document that socket lock might not be held at this point.

skb_set_owner_w() and ipv6_local_error() are using proper atomic ops
or spinlocks, so we promote the socket to non const when calling them.

netfilter hooks should never assume socket lock is held,
we also promote the socket to non const.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2015-09-26 04:00:38 +0800

30 Jul, 2015

1 commit

877d1f629 net: Set sk_txhash from a random number ... Browse Code »

This patch creates sk_set_txhash and eliminates protocol specific
inet_set_txhash and ip6_set_txhash. sk_set_txhash simply sets a
random number instead of performing flow dissection. sk_set_txash
is also allowed to be called multiple times for the same socket,
we'll need this when redoing the hash for negative routing advice.

Signed-off-by: Tom Herbert
Signed-off-by: David S. Miller

Tom Herbert
2015-07-30 13:44:04 +0800

23 Jul, 2015

1 commit

c5e40ee28 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Conflicts:
net/bridge/br_mdb.c

br_mdb.c conflict was a function call being removed to fix a bug in
'net' but whose signature was changed in 'net-next'.

Signed-off-by: David S. Miller

David S. Miller
2015-07-23 15:41:16 +0800

16 Jul, 2015

1 commit

03645a11a ipv6: lock socket in ip6_datagram_connect() ... Browse Code »

ip6_datagram_connect() is doing a lot of socket changes without
socket being locked.

This looks wrong, at least for udp_lib_rehash() which could corrupt
lists because of concurrent udp_sk(sk)->udp_portaddr_hash accesses.

Signed-off-by: Eric Dumazet
Acked-by: Herbert Xu
Signed-off-by: David S. Miller

Eric Dumazet
2015-07-16 08:25:51 +0800

10 Jul, 2015

1 commit

8b58a3984 ipv6: use flag instead of u16 for hop in inet6_skb_parm ... Browse Code »

Hop was always either 0 or sizeof(struct ipv6hdr).

Signed-off-by: Florian Westphal
Signed-off-by: David S. Miller

Florian Westphal
2015-07-10 06:06:59 +0800

24 Jun, 2015

1 commit

34b99df4e ip: report the original address of ICMP messages ... Browse Code »

ICMP messages can trigger ICMP and local errors. In this case
serr->port is 0 and starting from Linux 4.0 we do not return
the original target address to the error queue readers.
Add function to define which errors provide addr_offset.
With this fix my ping command is not silent anymore.

Fixes: c247f0534cc5 ("ip: fix error queue empty skb handling")
Signed-off-by: Julian Anastasov
Acked-by: Willem de Bruijn
Signed-off-by: David S. Miller

Julian Anastasov
2015-06-24 15:48:08 +0800

01 Apr, 2015

1 commit

63159f29b ipv6: coding style: comparison for equality with NULL ... Browse Code »

The ipv6 code uses a mixture of coding styles. In some instances check for NULL
pointer is done as x == NULL and sometimes as !x. !x is preferred according to
checkpatch and this patch makes the code consistent by adopting the latter
form.

No changes detected by objdiff.

Signed-off-by: Ian Morris
Signed-off-by: David S. Miller

Ian Morris
2015-04-01 01:51:54 +0800

09 Mar, 2015

1 commit

c247f0534 ip: fix error queue empty skb handling ... Browse Code »

When reading from the error queue, msg_name and msg_control are only
populated for some errors. A new exception for empty timestamp skbs
added a false positive on icmp errors without payload.

`traceroute -M udpconn` only displayed gateways that return payload
with the icmp error: the embedded network headers are pulled before
sock_queue_err_skb, leaving an skb with skb->len == 0 otherwise.

Fix this regression by refining when msg_name and msg_control
branches are taken. The solutions for the two fields are independent.

msg_name only makes sense for errors that configure serr->port and
serr->addr_offset. Test the first instead of skb->len. This also fixes
another issue. saddr could hold the wrong data, as serr->addr_offset
is not initialized in some code paths, pointing to the start of the
network header. It is only valid when serr->port is set (non-zero).

msg_control support differs between IPv4 and IPv6. IPv4 only honors
requests for ICMP and timestamps with SOF_TIMESTAMPING_OPT_CMSG. The
skb->len test can simply be removed, because skb->dev is also tested
and never true for empty skbs. IPv6 honors requests for all errors
aside from local errors and timestamps on empty skbs.

In both cases, make the policy more explicit by moving this logic to
a new function that decides whether to process msg_control and that
optionally prepares the necessary fields in skb->cb[]. After this
change, the IPv4 and IPv6 paths are more similar.

The last case is rxrpc. Here, simply refine to only match timestamps.

Fixes: 49ca0d8bfaf3 ("net-timestamp: no-payload option")

Reported-by: Jan Niehusmann
Signed-off-by: Willem de Bruijn

----

Changes
v1->v2
- fix local origin test inversion in ip6_datagram_support_cmsg
- make v4 and v6 code paths more similar by introducing analogous
ipv4_datagram_support_cmsg
- fix compile bug in rxrpc
Signed-off-by: David S. Miller

Willem de Bruijn
2015-03-09 11:01:54 +0800

03 Feb, 2015

1 commit

49ca0d8bf net-timestamp: no-payload option ... Browse Code »

Add timestamping option SOF_TIMESTAMPING_OPT_TSONLY. For transmit
timestamps, this loops timestamps on top of empty packets.

Doing so reduces the pressure on SO_RCVBUF. Payload inspection and
cmsg reception (aside from timestamps) are no longer possible. This
works together with a follow on patch that allows administrators to
only allow tx timestamping if it does not loop payload or metadata.

Signed-off-by: Willem de Bruijn

----

Changes (rfc -> v1)
- add documentation
- remove unnecessary skb->len test (thanks to Richard Cochran)
Signed-off-by: David S. Miller

Willem de Bruijn
2015-02-03 10:46:51 +0800

16 Jan, 2015

1 commit

f812116b1 ip: zero sockaddr returned on error queue ... Browse Code »

The sockaddr is returned in IP(V6)_RECVERR as part of errhdr. That
structure is defined and allocated on the stack as

struct {
struct sock_extended_err ee;
struct sockaddr_in(6) offender;
} errhdr;

The second part is only initialized for certain SO_EE_ORIGIN values.
Always initialize it completely.

An MTU exceeded error on a SOCK_RAW/IPPROTO_RAW is one example that
would return uninitialized bytes.

Signed-off-by: Willem de Bruijn

----

Also verified that there is no padding between errhdr.ee and
errhdr.offender that could leak additional kernel data.
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Willem de Bruijn
2015-01-16 08:41:16 +0800

11 Dec, 2014

1 commit

f95b414ed net: introduce helper macro for_each_cmsghdr ... Browse Code »

Introduce helper macro for_each_cmsghdr as a wrapper of the enumerating
cmsghdr from msghdr, just cleanup.

Signed-off-by: Gu Zheng
Signed-off-by: David S. Miller

Gu Zheng
2014-12-11 11:41:55 +0800

09 Dec, 2014

1 commit

829ae9d61 net-timestamp: allow reading recv cmsg on errqueue with origin tstamp ... Browse Code »

Allow reading of timestamps and cmsg at the same time on all relevant
socket families. One use is to correlate timestamps with egress
device, by asking for cmsg IP_PKTINFO.

on AF_INET sockets, call the relevant function (ip_cmsg_recv). To
avoid changing legacy expectations, only do so if the caller sets a
new timestamping flag SOF_TIMESTAMPING_OPT_CMSG.

on AF_INET6 sockets, IPV6_PKTINFO and all other recv cmsg are already
returned for all origins. only change is to set ifindex, which is
not initialized for all error origins.

In both cases, only generate the pktinfo message if an ifindex is
known. This is not the case for ACK timestamps.

The difference between the protocol families is probably a historical
accident as a result of the different conditions for generating cmsg
in the relevant ip(v6)_recv_error function:

ipv4: if (serr->ee.ee_origin == SO_EE_ORIGIN_ICMP) {
ipv6: if (serr->ee.ee_origin != SO_EE_ORIGIN_LOCAL) {

At one time, this was the same test bar for the ICMP/ICMP6
distinction. This is no longer true.

Signed-off-by: Willem de Bruijn

----

Changes
v1 -> v2
large rewrite
- integrate with existing pktinfo cmsg generation code
- on ipv4: only send with new flag, to maintain legacy behavior
- on ipv6: send at most a single pktinfo cmsg
- on ipv6: initialize fields if not yet initialized

The recv cmsg interfaces are also relevant to the discussion of
whether looping packet headers is problematic. For v6, cmsgs that
identify many headers are already returned. This patch expands
that to v4. If it sounds reasonable, I will follow with patches

1. request timestamps without payload with SOF_TIMESTAMPING_OPT_TSONLY
(http://patchwork.ozlabs.org/patch/366967/)
2. sysctl to conditionally drop all timestamps that have payload or
cmsg from users without CAP_NET_RAW.
Signed-off-by: David S. Miller

Willem de Bruijn
2014-12-09 09:20:48 +0800

12 Nov, 2014

1 commit

ba7a46f16 net: Convert LIMIT_NETDEBUG to net_dbg_ratelimited ... Browse Code »

Use the more common dynamic_debug capable net_dbg_ratelimited
and remove the LIMIT_NETDEBUG macro.

All messages are still ratelimited.

Some KERN_ uses are changed to KERN_DEBUG.

This may have some negative impact on messages that were
emitted at KERN_INFO that are not not enabled at all unless
DEBUG is defined or dynamic_debug is enabled. Even so,
these messages are now _not_ emitted by default.

This also eliminates the use of the net_msg_warn sysctl
"/proc/sys/net/core/warnings". For backward compatibility,
the sysctl is not removed, but it has no function. The extern
declaration of net_msg_warn is removed from sock.h and made
static in net/core/sysctl_net_core.c

Miscellanea:

o Update the sysctl documentation
o Remove the embedded uses of pr_fmt
o Coalesce format fragments
o Realign arguments

Signed-off-by: Joe Perches
Signed-off-by: David S. Miller

Joe Perches
2014-11-12 03:10:31 +0800

06 Nov, 2014

1 commit

51f3d02b9 net: Add and use skb_copy_datagram_msg() helper. ... Browse Code »

This encapsulates all of the skb_copy_datagram_iovec() callers
with call argument signature "skb, offset, msghdr->msg_iov, length".

When we move to iov_iters in the networking, the iov_iter object will
sit in the msghdr.

Having a helper like this means there will be less places to touch
during that transformation.

Based upon descriptions and patch from Al Viro.

Signed-off-by: David S. Miller

David S. Miller
2014-11-06 05:46:40 +0800

02 Sep, 2014

1 commit

364a9e932 sock: deduplicate errqueue dequeue ... Browse Code »

sk->sk_error_queue is dequeued in four locations. All share the
exact same logic. Deduplicate.

Also collapse the two critical sections for dequeue (at the top of
the recv handler) and signal (at the bottom).

This moves signal generation for the next packet forward, which should
be harmless.

It also changes the behavior if the recv handler exits early with an
error. Previously, a signal for follow-up packets on the errqueue
would then not be scheduled. The new behavior, to always signal, is
arguably a bug fix.

For rxrpc, the change causes the same function to be called repeatedly
for each queued packet (because the recv handler == sk_error_report).
It is likely that all packets will fail for the same reason (e.g.,
memory exhaustion).

This code runs without sk_lock held, so it is not safe to trust that
sk->sk_err is immutable inbetween releasing q->lock and the subsequent
test. Introduce int err just to avoid this potential race.

Signed-off-by: Willem de Bruijn
Signed-off-by: David S. Miller

Willem de Bruijn
2014-09-02 12:49:08 +0800

25 Aug, 2014

1 commit

67ba4152e ipv6: White-space cleansing : Line Layouts ... Browse Code »

This patch makes no changes to the logic of the code but simply addresses
coding style issues as detected by checkpatch.

Both objdump and diff -w show no differences.

A number of items are addressed in this patch:
* Multiple spaces converted to tabs
* Spaces before tabs removed.
* Spaces in pointer typing cleansed (char *)foo etc.
* Remove space after sizeof
* Ensure spacing around comparators such as if statements.

Signed-off-by: Ian Morris
Signed-off-by: David S. Miller

Ian Morris
2014-08-25 13:37:52 +0800

08 Jul, 2014

1 commit

b73c3d0e4 net: Save TX flow hash in sock and set in skbuf on xmit ... Browse Code »

For a connected socket we can precompute the flow hash for setting
in skb->hash on output. This is a performance advantage over
calculating the skb->hash for every packet on the connection. The
computation is done using the common hash algorithm to be consistent
with computations done for packets of the connection in other states
where thers is no socket (e.g. time-wait, syn-recv, syn-cookies).

This patch adds sk_txhash to the sock structure. inet_set_txhash and
ip6_set_txhash functions are added which are called from points in
TCP and UDP where socket moves to established state.

skb_set_hash_from_sk is a function which sets skb->hash from the
sock txhash value. This is called in UDP and TCP transmit path when
transmitting within the context of a socket.

Tested: ran super_netperf with 200 TCP_RR streams over a vxlan
interface (in this case skb_get_hash called on every TX packet to
create a UDP source port).

Before fix:

95.02% CPU utilization
154/256/505 90/95/99% latencies
1.13042e+06 tps

Time in functions:
0.28% skb_flow_dissect
0.21% __skb_get_hash

After fix:

94.95% CPU utilization
156/254/485 90/95/99% latencies
1.15447e+06

Neither __skb_get_hash nor skb_flow_dissect appear in perf

Signed-off-by: Tom Herbert
Signed-off-by: David S. Miller

Tom Herbert
2014-07-08 12:14:21 +0800

23 Jan, 2014

1 commit

7c90cc2d4 ipv6: enable anycast addresses as source addresses for datagrams ... Browse Code »

This change allows to consider an anycast address valid as source address
when given via an IPV6_PKTINFO or IPV6_2292PKTINFO ancillary data item.
So, when sending a datagram with ancillary data, the unicast and anycast
addresses are handled in the same way.

- Adds ipv6_chk_acast_addr_src() to check if an anycast address is link-local
on given interface or is global.
- Uses it in ip6_datagram_send_ctl().

Signed-off-by: Francois-Xavier Le Bail
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

FX Le Bail
2014-01-23 13:57:05 +0800

22 Jan, 2014

1 commit

82b276cd2 ipv6: protect protocols not handling ipv4 from v4 connection/bind attempts ... Browse Code »

Some ipv6 protocols cannot handle ipv4 addresses, so we must not allow
connecting and binding to them. sendmsg logic does already check msg->name
for this but must trust already connected sockets which could be set up
for connection to ipv4 address family.

Per-socket flag ipv6only is of no use here, as it is under users control
by setsockopt.

Signed-off-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Hannes Frederic Sowa
2014-01-22 08:59:19 +0800

20 Jan, 2014

1 commit

4b261c75a ipv6: make IPV6_RECVPKTINFO work for ipv4 datagrams ... Browse Code »

We currently don't report IPV6_RECVPKTINFO in cmsg access ancillary data
for IPv4 datagrams on IPv6 sockets.

This patch splits the ip6_datagram_recv_ctl into two functions, one
which handles both protocol families, AF_INET and AF_INET6, while the
ip6_datagram_recv_specific_ctl only handles IPv6 cmsg data.

ip6_datagram_recv_*_ctl never reported back any errors, so we can make
them return void. Also provide a helper for protocols which don't offer dual
personality to further use ip6_datagram_recv_ctl, which is exported to
modules.

I needed to shuffle the code for ping around a bit to make it easier to
implement dual personality for ping ipv6 sockets in future.

Reported-by: Gert Doering
Signed-off-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Hannes Frederic Sowa
2014-01-20 11:53:18 +0800

19 Jan, 2014

1 commit

342dfc306 net: add build-time checks for msg->msg_name size ... Browse Code »

This is a follow-up patch to f3d3342602f8bc ("net: rework recvmsg
handler msg_name and msg_namelen logic").

DECLARE_SOCKADDR validates that the structure we use for writing the
name information to is not larger than the buffer which is reserved
for msg->msg_name (which is 128 bytes). Also use DECLARE_SOCKADDR
consistently in sendmsg code paths.

Signed-off-by: Steffen Hurrle
Suggested-by: Hannes Frederic Sowa
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Steffen Hurrle
2014-01-19 15:04:16 +0800

20 Dec, 2013

1 commit

1669cb985 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next ... Browse Code »

Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2013-12-19

1) Use the user supplied policy index instead of a generated one
if present. From Fan Du.

2) Make xfrm migration namespace aware. From Fan Du.

3) Make the xfrm state and policy locks namespace aware. From Fan Du.

4) Remove ancient sleeping when the SA is in acquire state,
we now queue packets to the policy instead. This replaces the
sleeping code.

5) Remove FLOWI_FLAG_CAN_SLEEP. This was used to notify xfrm about the
posibility to sleep. The sleeping code is gone, so remove it.

6) Check user specified spi for IPComp. Thr spi for IPcomp is only
16 bit wide, so check for a valid value. From Fan Du.

7) Export verify_userspi_info to check for valid user supplied spi ranges
with pfkey and netlink. From Fan Du.

8) RFC3173 states that if the total size of a compressed payload and the IPComp
header is not smaller than the size of the original payload, the IP datagram
must be sent in the original non-compressed form. These packets are dropped
by the inbound policy check because they are not transformed. Document the need
to set 'level use' for IPcomp to receive such packets anyway. From Fan Du.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller

David S. Miller
2013-12-20 07:37:49 +0800

11 Dec, 2013

1 commit

ce7a3bdf1 ipv6: do not erase dst address with flow label destination ... Browse Code »

This patch is following b579035ff766c9412e2b92abf5cab794bff102b6
"ipv6: remove old conditions on flow label sharing"

Since there is no reason to restrict a label to a
destination, we should not erase the destination value of a
socket with the value contained in the flow label storage.

This patch allows to really have the same flow label to more
than one destination.

Signed-off-by: Florent Fourcot
Reviewed-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Florent Fourcot
2013-12-11 11:51:00 +0800

06 Dec, 2013

1 commit

0e0d44ab4 net: Remove FLOWI_FLAG_CAN_SLEEP ... Browse Code »

FLOWI_FLAG_CAN_SLEEP was used to notify xfrm about the posibility
to sleep until the needed states are resolved. This code is gone,
so FLOWI_FLAG_CAN_SLEEP is not needed anymore.

Signed-off-by: Steffen Klassert

Steffen Klassert
2013-12-06 14:24:39 +0800

24 Nov, 2013

2 commits

1fa4c710b ipv6: fix leaking uninitialized port number of offender sockaddr ... Browse Code »

Offenders don't have port numbers, so set it to 0.

Signed-off-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Hannes Frederic Sowa
2013-11-24 06:46:23 +0800
85fbaa750 inet: fix addr_len/msg->msg_namelen assignment in recv_error and rxpmtu functions ... Browse Code »

Commit bceaa90240b6019ed73b49965eac7d167610be69 ("inet: prevent leakage
of uninitialized memory to user in recv syscalls") conditionally updated
addr_len if the msg_name is written to. The recv_error and rxpmtu
functions relied on the recvmsg functions to set up addr_len before.

As this does not happen any more we have to pass addr_len to those
functions as well and set it to the size of the corresponding sockaddr
length.

This broke traceroute and such.

Fixes: bceaa90240b6 ("inet: prevent leakage of uninitialized memory to user in recv syscalls")
Reported-by: Brad Spengler
Reported-by: Tom Labanowski
Cc: mpb
Cc: David S. Miller
Cc: Eric Dumazet
Signed-off-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Hannes Frederic Sowa
2013-11-24 06:46:23 +0800

09 Oct, 2013

1 commit

efe4208f4 ipv6: make lookups simpler and faster ... Browse Code »

TCP listener refactoring, part 4 :

To speed up inet lookups, we moved IPv4 addresses from inet to struct
sock_common

Now is time to do the same for IPv6, because it permits us to have fast
lookups for all kind of sockets, including upcoming SYN_RECV.

Getting IPv6 addresses in TCP lookups currently requires two extra cache
lines, plus a dereference (and memory stall).

inet6_sk(sk) does the dereference of inet_sk(__sk)->pinet6

This patch is way bigger than its IPv4 counter part, because for IPv4,
we could add aliases (inet_daddr, inet_rcv_saddr), while on IPv6,
it's not doable easily.

inet6_sk(sk)->daddr becomes sk->sk_v6_daddr
inet6_sk(sk)->rcv_saddr becomes sk->sk_v6_rcv_saddr

And timewait socket also have tw->tw_v6_daddr & tw->tw_v6_rcv_saddr
at the same offset.

We get rid of INET6_TW_MATCH() as INET6_MATCH() is now the generic
macro.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2013-10-09 12:01:25 +0800