Eric Lee / smarc-fsl-linux-kernel

02 Dec, 2015

1 commit

9cd3e072b net: rename SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA ... Browse Code »

This patch is a cleanup to make following patch easier to
review.

Goal is to move SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA
from (struct socket)->flags to a (struct socket_wq)->flags
to benefit from RCU protection in sock_wake_async()

To ease backports, we rename both constants.

Two new helpers, sk_set_bit(int nr, struct sock *sk)
and sk_clear_bit(int net, struct sock *sk) are added so that
following patch can change their implementation.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2015-12-02 04:45:05 +0800

01 Dec, 2015

1 commit

9490f886b af-unix: passcred support for sendpage ... Browse Code »

sendpage did not care about credentials at all. This could lead to
situations in which because of fd passing between processes we could
append data to skbs with different scm data. It is illegal to splice those
skbs together. Instead we have to allocate a new skb and if requested
fill out the scm details.

Fixes: 869e7c62486ec ("net: af_unix: implement stream sendpage support")
Reported-by: Al Viro
Cc: Al Viro
Cc: Eric Dumazet
Signed-off-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Hannes Frederic Sowa
2015-12-01 04:16:06 +0800

24 Nov, 2015

1 commit

7d267278a unix: avoid use-after-free in ep_remove_wait_queue ... Browse Code »

Rainer Weikusat writes:
An AF_UNIX datagram socket being the client in an n:1 association with
some server socket is only allowed to send messages to the server if the
receive queue of this socket contains at most sk_max_ack_backlog
datagrams. This implies that prospective writers might be forced to go
to sleep despite none of the message presently enqueued on the server
receive queue were sent by them. In order to ensure that these will be
woken up once space becomes again available, the present unix_dgram_poll
routine does a second sock_poll_wait call with the peer_wait wait queue
of the server socket as queue argument (unix_dgram_recvmsg does a wake
up on this queue after a datagram was received). This is inherently
problematic because the server socket is only guaranteed to remain alive
for as long as the client still holds a reference to it. In case the
connection is dissolved via connect or by the dead peer detection logic
in unix_dgram_sendmsg, the server socket may be freed despite "the
polling mechanism" (in particular, epoll) still has a pointer to the
corresponding peer_wait queue. There's no way to forcibly deregister a
wait queue with epoll.

Based on an idea by Jason Baron, the patch below changes the code such
that a wait_queue_t belonging to the client socket is enqueued on the
peer_wait queue of the server whenever the peer receive queue full
condition is detected by either a sendmsg or a poll. A wake up on the
peer queue is then relayed to the ordinary wait queue of the client
socket via wake function. The connection to the peer wait queue is again
dissolved if either a wake up is about to be relayed or the client
socket reconnects or a dead peer is detected or the client socket is
itself closed. This enables removing the second sock_poll_wait from
unix_dgram_poll, thus avoiding the use-after-free, while still ensuring
that no blocked writer sleeps forever.

Signed-off-by: Rainer Weikusat
Fixes: ec0d215f9420 ("af_unix: fix 'poll for write'/connected DGRAM sockets")
Reviewed-by: Jason Baron
Signed-off-by: David S. Miller

Rainer Weikusat
2015-11-24 01:29:58 +0800

18 Nov, 2015

1 commit

a3a116e04 af_unix: take receive queue lock while appending new skb ... Browse Code »

While possibly in future we don't necessarily need to use
sk_buff_head.lock this is a rather larger change, as it affects the
af_unix fd garbage collector, diag and socket cleanups. This is too much
for a stable patch.

For the time being grab sk_buff_head.lock without disabling bh and irqs,
so don't use locked skb_queue_tail.

Fixes: 869e7c62486e ("net: af_unix: implement stream sendpage support")
Cc: Eric Dumazet
Signed-off-by: Hannes Frederic Sowa
Reported-by: Eric Dumazet
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Hannes Frederic Sowa
2015-11-18 04:25:45 +0800

17 Nov, 2015

1 commit

8844f9723 af_unix: don't append consumed skbs to sk_receive_queue ... Browse Code »

In case multiple writes to a unix stream socket race we could end up in a
situation where we pre-allocate a new skb for use in unix_stream_sendpage
but have to free it again in the locked section because another skb
has been appended meanwhile, which we must use. Accidentally we didn't
clear the pointer after consuming it and so we touched freed memory
while appending it to the sk_receive_queue. So, clear the pointer after
consuming the skb.

This bug has been found with syzkaller
(http://github.com/google/syzkaller) by Dmitry Vyukov.

Fixes: 869e7c62486e ("net: af_unix: implement stream sendpage support")
Reported-by: Dmitry Vyukov
Cc: Dmitry Vyukov
Cc: Eric Dumazet
Signed-off-by: Hannes Frederic Sowa
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Hannes Frederic Sowa
2015-11-17 04:39:35 +0800

16 Nov, 2015

1 commit

73ed5d25d af-unix: fix use-after-free with concurrent readers while splicing ... Browse Code »

During splicing an af-unix socket to a pipe we have to drop all
af-unix socket locks. While doing so we allow another reader to enter
unix_stream_read_generic which can read, copy and finally free another
skb. If exactly this skb is just in process of being spliced we get a
use-after-free report by kasan.

First, we must make sure to not have a free while the skb is used during
the splice operation. We simply increment its use counter before unlocking
the reader lock.

Stream sockets have the nice characteristic that we don't care about
zero length writes and they never reach the peer socket's queue. That
said, we can take the UNIXCB.consumed field as the indicator if the
skb was already freed from the socket's receive queue. If the skb was
fully consumed after we locked the reader side again we know it has been
dropped by a second reader. We indicate a short read to user space and
abort the current splice operation.

This bug has been found with syzkaller
(http://github.com/google/syzkaller) by Dmitry Vyukov.

Fixes: 2b514574f7e8 ("net: af_unix: implement splice for stream af_unix sockets")
Reported-by: Dmitry Vyukov
Cc: Dmitry Vyukov
Cc: Eric Dumazet
Acked-by: Eric Dumazet
Signed-off-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Hannes Frederic Sowa
2015-11-16 02:16:34 +0800

25 Oct, 2015

1 commit

1586a5877 af_unix: do not report POLLOUT on listeners ... Browse Code »

poll(POLLOUT) on a listener should not report fd is ready for
a write().

This would break some applications using poll() and pfd.events = -1,
as they would not block in poll()

Signed-off-by: Eric Dumazet
Reported-by: Alan Burlison
Tested-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2015-10-25 21:37:45 +0800

05 Oct, 2015

1 commit

e9193d60d net/unix: fix logic about sk_peek_offset ... Browse Code »

Now send with MSG_PEEK can return data from multiple SKBs.

Unfortunately we take into account the peek offset for each skb,
that is wrong. We need to apply the peek offset only once.

In addition, the peek offset should be used only if MSG_PEEK is set.

Cc: "David S. Miller" (maintainer:NETWORKING
Cc: Eric Dumazet (commit_signer:1/14=7%)
Cc: Aaron Conole
Fixes: 9f389e35674f ("af_unix: return data from multiple SKBs on recv() with MSG_PEEK flag")
Signed-off-by: Andrey Vagin
Tested-by: Aaron Conole
Signed-off-by: David S. Miller

Andrey Vagin
2015-10-05 21:33:09 +0800

30 Sep, 2015

1 commit

9f389e356 af_unix: return data from multiple SKBs on recv() with MSG_PEEK flag ... Browse Code »

AF_UNIX sockets now return multiple skbs from recv() when MSG_PEEK flag
is set.

This is referenced in kernel bugzilla #12323 @
https://bugzilla.kernel.org/show_bug.cgi?id=12323

As described both in the BZ and lkml thread @
http://lkml.org/lkml/2008/1/8/444 calling recv() with MSG_PEEK on an
AF_UNIX socket only reads a single skb, where the desired effect is
to return as much skb data has been queued, until hitting the recv
buffer size (whichever comes first).

The modified MSG_PEEK path will now move to the next skb in the tree
and jump to the again: label, rather than following the natural loop
structure. This requires duplicating some of the loop head actions.

This was tested using the python socketpair python code attached to
the bugzilla issue.

Signed-off-by: Aaron Conole
Signed-off-by: David S. Miller

Aaron Conole
2015-09-30 04:47:08 +0800

11 Jun, 2015

1 commit

37a9a8df8 net/unix: support SCM_SECURITY for stream sockets ... Browse Code »

SCM_SECURITY was originally only implemented for datagram sockets,
not for stream sockets. However, SCM_CREDENTIALS is supported on
Unix stream sockets. For consistency, implement Unix stream support
for SCM_SECURITY as well. Also clean up the existing code and get
rid of the superfluous UNIXSID macro.

Motivated by https://bugzilla.redhat.com/show_bug.cgi?id=1224211,
where systemd was using SCM_CREDENTIALS and assumed wrongly that
SCM_SECURITY was also supported on Unix stream sockets.

Signed-off-by: Stephen Smalley
Acked-by: Paul Moore
Signed-off-by: David S. Miller

Stephen Smalley
2015-06-11 13:49:20 +0800

02 Jun, 2015

1 commit

dda922c83 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Conflicts:
drivers/net/phy/amd-xgbe-phy.c
drivers/net/wireless/iwlwifi/Kconfig
include/net/mac80211.h

iwlwifi/Kconfig and mac80211.h were both trivial overlapping
changes.

The drivers/net/phy/amd-xgbe-phy.c file got removed in 'net-next' and
the bug fix that happened on the 'net' side is already integrated
into the rest of the amd-xgbe driver.

Signed-off-by: David S. Miller

David S. Miller
2015-06-02 13:51:30 +0800

27 May, 2015

1 commit

b48732e4a unix/caif: sk_socket can disappear when state is unlocked ... Browse Code »

got a rare NULL pointer dereference in clear_bit

Signed-off-by: Mark Salyzyn
Acked-by: Hannes Frederic Sowa
----
v2: switch to sock_flag(sk, SOCK_DEAD) and added net/caif/caif_socket.c
v3: return -ECONNRESET in upstream caller of wait function for SOCK_DEAD
Signed-off-by: David S. Miller

Mark Salyzyn
2015-05-27 11:19:29 +0800

25 May, 2015

2 commits

2b514574f net: af_unix: implement splice for stream af_unix sockets ... Browse Code »

unix_stream_recvmsg is refactored to unix_stream_read_generic in this
patch and enhanced to deal with pipe splicing. The refactoring is
inneglible, we mostly have to deal with a non-existing struct msghdr
argument.

Signed-off-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Hannes Frederic Sowa
2015-05-25 12:06:59 +0800
869e7c624 net: af_unix: implement stream sendpage support ... Browse Code »

This patch implements sendpage support for AF_UNIX SOCK_STREAM
sockets. This is also required for a complete splice implementation.

The implementation is a bit tricky because we append to already existing
skbs and so have to hold unix_sk->readlock to protect the reading side
from either advancing UNIXCB.consumed or freeing the skb at the socket
receive tail.

Signed-off-by: Hannes Frederic Sowa
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Hannes Frederic Sowa
2015-05-25 12:06:58 +0800

11 May, 2015

1 commit

11aa9c28b net: Pass kern from net_proto_family.create to sk_alloc ... Browse Code »

In preparation for changing how struct net is refcounted
on kernel sockets pass the knowledge that we are creating
a kernel socket from sock_create_kern through to sk_alloc.

Signed-off-by: "Eric W. Biederman"
Signed-off-by: David S. Miller

Eric W. Biederman
2015-05-11 22:50:17 +0800

28 Apr, 2015

1 commit

2decb2682 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Pull networking fixes from David Miller:

1) mlx4 doesn't check fully for supported valid RSS hash function, fix
from Amir Vadai

2) Off by one in ibmveth_change_mtu(), from David Gibson

3) Prevent altera chip from reporting false error interrupts in some
circumstances, from Chee Nouk Phoon

4) Get rid of that stupid endless loop trying to allocate a FIN packet
in TCP, and in the process kill deadlocks. From Eric Dumazet

5) Fix get_rps_cpus() crash due to wrong invalid-cpu value, also from
Eric Dumazet

6) Fix two bugs in async rhashtable resizing, from Thomas Graf

7) Fix topology server listener socket namespace bug in TIPC, from Ying
Xue

8) Add some missing HAS_DMA kconfig dependencies, from Geert
Uytterhoeven

9) bgmac driver intends to force re-polling but does so by returning
the wrong value from it's ->poll() handler. Fix from Rafał Miłecki

10) When the creater of an rhashtable configures a max size for it,
don't bark in the logs and drop insertions when that is exceeded.
Fix from Johannes Berg

11) Recover from out of order packets in ppp mppe properly, from Sylvain
Rochet

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (41 commits)
bnx2x: really disable TPA if 'disable_tpa' option is set
net:treewide: Fix typo in drivers/net
net/mlx4_en: Prevent setting invalid RSS hash function
mdio-mux-gpio: use new gpiod_get_array and gpiod_put_array functions
netfilter; Add some missing default cases to switch statements in nft_reject.
ppp: mppe: discard late packet in stateless mode
ppp: mppe: sanity error path rework
net/bonding: Make DRV macros private
net: rfs: fix crash in get_rps_cpus()
altera tse: add support for fixed-links.
pxa168: fix double deallocation of managed resources
net: fix crash in build_skb()
net: eth: altera: Resolve false errors from MSGDMA to TSE
ehea: Fix memory hook reference counting crashes
net/tg3: Release IRQs on permanent error
net: mdio-gpio: support access that may sleep
inet: fix possible panic in reqsk_queue_unlink()
rhashtable: don't attempt to grow when at max_size
bgmac: fix requests for extra polling calls from NAPI
tcp: avoid looping in tcp_send_fin()
...

Linus Torvalds
2015-04-28 05:05:19 +0800

24 Apr, 2015

1 commit

d1ab39f17 net: unix: garbage: fixed several comment and whitespace style issues ... Browse Code »

fixed several comment and whitespace style issues

Signed-off-by: Jason Eastman
Signed-off-by: David S. Miller

Jason Eastman
2015-04-24 01:15:20 +0800

16 Apr, 2015

2 commits

a25b376bd VFS: net/unix: d_backing_inode() annotations ... Browse Code »

places where we are dealing with S_ISSOCK file creation/lookups.

Signed-off-by: David Howells
Signed-off-by: Al Viro

David Howells
2015-04-16 03:06:56 +0800
ee8ac4d61 VFS: AF_UNIX sockets should call mknod on the top layer only ... Browse Code »

AF_UNIX sockets should call mknod on the top layer only and should not attempt
to modify the lower layer in a layered filesystem such as overlayfs.

Signed-off-by: David Howells
Signed-off-by: Al Viro

David Howells
2015-04-16 03:06:54 +0800

03 Mar, 2015

1 commit

1b7841404 net: Remove iocb argument from sendmsg and recvmsg ... Browse Code »

After TIPC doesn't depend on iocb argument in its internal
implementations of sendmsg() and recvmsg() hooks defined in proto
structure, no any user is using iocb argument in them at all now.
Then we can drop the redundant iocb argument completely from kinds of
implementations of both sendmsg() and recvmsg() in the entire
networking stack.

Cc: Christoph Hellwig
Suggested-by: Al Viro
Signed-off-by: Ying Xue
Signed-off-by: David S. Miller

Ying Xue
2015-03-03 02:06:31 +0800

29 Jan, 2015

1 commit

7cc056626 net: remove sock_iocb ... Browse Code »

The sock_iocb structure is allocate on stack for each read/write-like
operation on sockets, and contains various fields of which only the
embedded msghdr and sometimes a pointer to the scm_cookie is ever used.
Get rid of the sock_iocb and put a msghdr directly on the stack and pass
the scm_cookie explicitly to netlink_mmap_sendmsg.

Signed-off-by: Christoph Hellwig
Signed-off-by: David S. Miller

Christoph Hellwig
2015-01-29 15:15:07 +0800

18 Jan, 2015

1 commit

053c095a8 netlink: make nlmsg_end() and genlmsg_end() void ... Browse Code »

Contrary to common expectations for an "int" return, these functions
return only a positive value -- if used correctly they cannot even
return 0 because the message header will necessarily be in the skb.

This makes the very common pattern of

if (genlmsg_end(...) < 0) { ... }

be a whole bunch of dead code. Many places also simply do

return nlmsg_end(...);

and the caller is expected to deal with it.

This also commonly (at least for me) causes errors, because it is very
common to write

if (my_function(...))
/* error condition */

and if my_function() does "return nlmsg_end()" this is of course wrong.

Additionally, there's not a single place in the kernel that actually
needs the message length returned, and if anyone needs it later then
it'll be very easy to just use skb->len there.

Remove this, and make the functions void. This removes a bunch of dead
code as described above. The patch adds lines because I did

- return nlmsg_end(...);
+ nlmsg_end(...);
+ return 0;

I could have preserved all the function's return values by returning
skb->len, but instead I've audited all the places calling the affected
functions and found that none cared. A few places actually compared
the return value with < 0 with no change in behaviour, so I opted for the more
efficient version.

One instance of the error I've made numerous times now is also present
in net/phonet/pn_netlink.c in the route_dumpit() function - it didn't
check for
Signed-off-by: David S. Miller

Johannes Berg
2015-01-18 14:03:45 +0800

10 Dec, 2014

1 commit

c0371da60 put iov_iter into msghdr ... Browse Code »

Note that the code _using_ ->msg_iter at that point will be very
unhappy with anything other than unshifted iovec-backed iov_iter.
We still need to convert users to proper primitives.

Signed-off-by: Al Viro

Al Viro
2014-12-10 05:29:03 +0800

24 Nov, 2014

1 commit

8feb2fb2b switch AF_PACKET and AF_UNIX to skb_copy_datagram_from_iter() ... Browse Code »

... and kill skb_copy_datagram_iovec()

Signed-off-by: Al Viro

Al Viro
2014-11-24 18:16:39 +0800

06 Nov, 2014

1 commit

51f3d02b9 net: Add and use skb_copy_datagram_msg() helper. ... Browse Code »

This encapsulates all of the skb_copy_datagram_iovec() callers
with call argument signature "skb, offset, msghdr->msg_iov, length".

When we move to iov_iters in the networking, the iov_iter object will
sit in the msghdr.

Having a helper like this means there will be less places to touch
during that transformation.

Based upon descriptions and patch from Al Viro.

Signed-off-by: David S. Miller

David S. Miller
2014-11-06 05:46:40 +0800

08 Oct, 2014

1 commit

505e907db af_unix: remove 0 assignment on static ... Browse Code »

static values are automatically initialized to 0

Signed-off-by: Fabian Frederick
Signed-off-by: David S. Miller

Fabian Frederick
2014-10-08 05:03:14 +0800

13 Jun, 2014

1 commit

f9da455b9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next ... Browse Code »

Pull networking updates from David Miller:

1) Seccomp BPF filters can now be JIT'd, from Alexei Starovoitov.

2) Multiqueue support in xen-netback and xen-netfront, from Andrew J
Benniston.

3) Allow tweaking of aggregation settings in cdc_ncm driver, from Bjørn
Mork.

4) BPF now has a "random" opcode, from Chema Gonzalez.

5) Add more BPF documentation and improve test framework, from Daniel
Borkmann.

6) Support TCP fastopen over ipv6, from Daniel Lee.

7) Add software TSO helper functions and use them to support software
TSO in mvneta and mv643xx_eth drivers. From Ezequiel Garcia.

8) Support software TSO in fec driver too, from Nimrod Andy.

9) Add Broadcom SYSTEMPORT driver, from Florian Fainelli.

10) Handle broadcasts more gracefully over macvlan when there are large
numbers of interfaces configured, from Herbert Xu.

11) Allow more control over fwmark used for non-socket based responses,
from Lorenzo Colitti.

12) Do TCP congestion window limiting based upon measurements, from Neal
Cardwell.

13) Support busy polling in SCTP, from Neal Horman.

14) Allow RSS key to be configured via ethtool, from Venkata Duvvuru.

15) Bridge promisc mode handling improvements from Vlad Yasevich.

16) Don't use inetpeer entries to implement ID generation any more, it
performs poorly, from Eric Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1522 commits)
rtnetlink: fix userspace API breakage for iproute2 < v3.9.0
tcp: fixing TLP's FIN recovery
net: fec: Add software TSO support
net: fec: Add Scatter/gather support
net: fec: Increase buffer descriptor entry number
net: fec: Factorize feature setting
net: fec: Enable IP header hardware checksum
net: fec: Factorize the .xmit transmit function
bridge: fix compile error when compiling without IPv6 support
bridge: fix smatch warning / potential null pointer dereference
via-rhine: fix full-duplex with autoneg disable
bnx2x: Enlarge the dorq threshold for VFs
bnx2x: Check for UNDI in uncommon branch
bnx2x: Fix 1G-baseT link
bnx2x: Fix link for KR with swapped polarity lane
sctp: Fix sk_ack_backlog wrap-around problem
net/core: Add VF link state control policy
net/fsl: xgmac_mdio is dependent on OF_MDIO
net/fsl: Make xgmac_mdio read error message useful
net_sched: drr: warn when qdisc is not work conserving
...

Linus Torvalds
2014-06-13 05:27:40 +0800

17 May, 2014

1 commit

31ff6aa5c net: unix: Align send data_len up to PAGE_SIZE ... Browse Code »

Using whole of allocated pages reduces requested skb->data size.
This is just a little more thriftily allocation.

netperf does not show difference with the current performance.

Signed-off-by: Kirill Tkhai
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Kirill Tkhai
2014-05-17 04:04:03 +0800

18 Apr, 2014

1 commit

4e857c58e arch: Mass conversion of smp_mb__*() ... Browse Code »

Mostly scripted conversion of the smp_mb__* barriers.

Signed-off-by: Peter Zijlstra
Acked-by: Paul E. McKenney
Link: http://lkml.kernel.org/n/tip-55dhyhocezdw1dg7u19hmh1u@git.kernel.org
Cc: Linus Torvalds
Cc: linux-arch@vger.kernel.org
Signed-off-by: Ingo Molnar

Peter Zijlstra
2014-04-18 20:20:48 +0800

12 Apr, 2014

1 commit

676d23690 net: Fix use after free by removing length arg from sk_data_ready callbacks. ... Browse Code »

Several spots in the kernel perform a sequence like:

skb_queue_tail(&sk->s_receive_queue, skb);
sk->sk_data_ready(sk, skb->len);

But at the moment we place the SKB onto the socket receive queue it
can be consumed and freed up. So this skb->len access is potentially
to freed up memory.

Furthermore, the skb->len can be modified by the consumer so it is
possible that the value isn't accurate.

And finally, no actual implementation of this callback actually uses
the length argument. And since nobody actually cared about it's
value, lots of call sites pass arbitrary values in such as '0' and
even '1'.

So just remove the length argument from the callback, that way there
is no confusion whatsoever and all of these use-after-free cases get
fixed as a side effect.

Based upon a patch by Eric Dumazet and his suggestion to audit this
issue tree-wide.

Signed-off-by: David S. Miller

David S. Miller
2014-04-12 04:15:36 +0800

27 Mar, 2014

1 commit

de1443916 net: unix: non blocking recvmsg() should not return -EINTR ... Browse Code »

Some applications didn't expect recvmsg() on a non blocking socket
could return -EINTR. This possibility was added as a side effect
of commit b3ca9b02b00704 ("net: fix multithreaded signal handling in
unix recv routines").

To hit this bug, you need to be a bit unlucky, as the u->readlock
mutex is usually held for very small periods.

Fixes: b3ca9b02b00704 ("net: fix multithreaded signal handling in unix recv routines")
Signed-off-by: Eric Dumazet
Cc: Rainer Weikusat
Signed-off-by: David S. Miller

Eric Dumazet
2014-03-27 05:05:40 +0800

07 Mar, 2014

1 commit

0a13404dd net: unix socket code abuses csum_partial ... Browse Code »

The unix socket code is using the result of csum_partial to
hash into a lookup table:

unix_hash_fold(csum_partial(sunaddr, len, 0));

csum_partial is only guaranteed to produce something that can be
folded into a checksum, as its prototype explains:

* returns a 32-bit number suitable for feeding into itself
* or csum_tcpudp_magic

The 32bit value should not be used directly.

Depending on the alignment, the ppc64 csum_partial will return
different 32bit partial checksums that will fold into the same
16bit checksum.

This difference causes the following testcase (courtesy of
Gustavo) to sometimes fail:

#include
#include

int main()
{
int fd = socket(PF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC, 0);

int i = 1;
setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &i, 4);

struct sockaddr addr;
addr.sa_family = AF_LOCAL;
bind(fd, &addr, 2);

listen(fd, 128);

struct sockaddr_storage ss;
socklen_t sslen = (socklen_t)sizeof(ss);
getsockname(fd, (struct sockaddr*)&ss, &sslen);

fd = socket(PF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC, 0);

if (connect(fd, (struct sockaddr*)&ss, sslen) == -1){
perror(NULL);
return 1;
}
printf("OK\n");
return 0;
}

As suggested by davem, fix this by using csum_fold to fold the
partial 32bit checksum into a 16bit checksum before using it.

Signed-off-by: Anton Blanchard
Cc: stable@vger.kernel.org
Signed-off-by: David S. Miller

Anton Blanchard
2014-03-07 05:19:33 +0800

19 Jan, 2014

1 commit

342dfc306 net: add build-time checks for msg->msg_name size ... Browse Code »

This is a follow-up patch to f3d3342602f8bc ("net: rework recvmsg
handler msg_name and msg_namelen logic").

DECLARE_SOCKADDR validates that the structure we use for writing the
name information to is not larger than the buffer which is reserved
for msg->msg_name (which is 128 bytes). Also use DECLARE_SOCKADDR
consistently in sendmsg code paths.

Signed-off-by: Steffen Hurrle
Suggested-by: Hannes Frederic Sowa
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Steffen Hurrle
2014-01-19 15:04:16 +0800

19 Dec, 2013

1 commit

143c90549 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Conflicts:
drivers/net/ethernet/intel/i40e/i40e_main.c
drivers/net/macvtap.c

Both minor merge hassles, simple overlapping changes.

Signed-off-by: David S. Miller

David S. Miller
2013-12-19 05:42:06 +0800

18 Dec, 2013

1 commit

37ab4fa78 net: unix: allow bind to fail on mutex lock ... Browse Code »

This is similar to the set_peek_off patch where calling bind while the
socket is stuck in unix_dgram_recvmsg() will block and cause a hung task
spew after a while.

This is also the last place that did a straightforward mutex_lock(), so
there shouldn't be any more of these patches.

Signed-off-by: Sasha Levin
Signed-off-by: David S. Miller

Sasha Levin
2013-12-18 04:04:42 +0800

11 Dec, 2013

1 commit

12663bfc9 net: unix: allow set_peek_off to fail ... Browse Code »

unix_dgram_recvmsg() will hold the readlock of the socket until recv
is complete.

In the same time, we may try to setsockopt(SO_PEEK_OFF) which will hang until
unix_dgram_recvmsg() will complete (which can take a while) without allowing
us to break out of it, triggering a hung task spew.

Instead, allow set_peek_off to fail, this way userspace will not hang.

Signed-off-by: Sasha Levin
Acked-by: Pavel Emelyanov
Signed-off-by: David S. Miller

Sasha Levin
2013-12-11 10:45:15 +0800

07 Dec, 2013

1 commit

5cc208bec unix: convert printks to pr_<level> ... Browse Code »

use pr_ instead of printk(LEVEL)

Signed-off-by: Wang Weidong
Signed-off-by: David S. Miller

wangweidong
2013-12-07 05:35:58 +0800

21 Nov, 2013

1 commit

f3d334260 net: rework recvmsg handler msg_name and msg_namelen logic ... Browse Code »

This patch now always passes msg->msg_namelen as 0. recvmsg handlers must
set msg_namelen to the proper size
Suggested-by: Eric Dumazet
Signed-off-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Hannes Frederic Sowa
2013-11-21 10:52:30 +0800

20 Oct, 2013

1 commit

90c6bd34f net: unix: inherit SOCK_PASS{CRED, SEC} flags from socket to fix race ... Browse Code »

In the case of credentials passing in unix stream sockets (dgram
sockets seem not affected), we get a rather sparse race after
commit 16e5726 ("af_unix: dont send SCM_CREDENTIALS by default").

We have a stream server on receiver side that requests credential
passing from senders (e.g. nc -U). Since we need to set SO_PASSCRED
on each spawned/accepted socket on server side to 1 first (as it's
not inherited), it can happen that in the time between accept() and
setsockopt() we get interrupted, the sender is being scheduled and
continues with passing data to our receiver. At that time SO_PASSCRED
is neither set on sender nor receiver side, hence in cmsg's
SCM_CREDENTIALS we get eventually pid:0, uid:65534, gid:65534
(== overflow{u,g}id) instead of what we actually would like to see.

On the sender side, here nc -U, the tests in maybe_add_creds()
invoked through unix_stream_sendmsg() would fail, as at that exact
time, as mentioned, the sender has neither SO_PASSCRED on his side
nor sees it on the server side, and we have a valid 'other' socket
in place. Thus, sender believes it would just look like a normal
connection, not needing/requesting SO_PASSCRED at that time.

As reverting 16e5726 would not be an option due to the significant
performance regression reported when having creds always passed,
one way/trade-off to prevent that would be to set SO_PASSCRED on
the listener socket and allow inheriting these flags to the spawned
socket on server side in accept(). It seems also logical to do so
if we'd tell the listener socket to pass those flags onwards, and
would fix the race.

Before, strace:

recvmsg(4, {msg_name(0)=NULL, msg_iov(1)=[{"blub\n", 4096}],
msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET,
cmsg_type=SCM_CREDENTIALS{pid=0, uid=65534, gid=65534}},
msg_flags=0}, 0) = 5

After, strace:

recvmsg(4, {msg_name(0)=NULL, msg_iov(1)=[{"blub\n", 4096}],
msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET,
cmsg_type=SCM_CREDENTIALS{pid=11580, uid=1000, gid=1000}},
msg_flags=0}, 0) = 5

Signed-off-by: Daniel Borkmann
Cc: Eric Dumazet
Cc: Eric W. Biederman
Signed-off-by: David S. Miller

Daniel Borkmann
2013-10-20 06:50:15 +0800

03 Oct, 2013

1 commit

6865d1e83 unix_diag: fix info leak ... Browse Code »

When filling the netlink message we miss to wipe the pad field,
therefore leak one byte of heap memory to userland. Fix this by
setting pad to 0.

Signed-off-by: Mathias Krause
Signed-off-by: David S. Miller

Mathias Krause
2013-10-03 04:08:24 +0800