02 Dec, 2015
1 commit
-
This patch is a cleanup to make following patch easier to
review.Goal is to move SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA
from (struct socket)->flags to a (struct socket_wq)->flags
to benefit from RCU protection in sock_wake_async()To ease backports, we rename both constants.
Two new helpers, sk_set_bit(int nr, struct sock *sk)
and sk_clear_bit(int net, struct sock *sk) are added so that
following patch can change their implementation.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
01 Dec, 2015
1 commit
-
sendpage did not care about credentials at all. This could lead to
situations in which because of fd passing between processes we could
append data to skbs with different scm data. It is illegal to splice those
skbs together. Instead we have to allocate a new skb and if requested
fill out the scm details.Fixes: 869e7c62486ec ("net: af_unix: implement stream sendpage support")
Reported-by: Al Viro
Cc: Al Viro
Cc: Eric Dumazet
Signed-off-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller
24 Nov, 2015
1 commit
-
Rainer Weikusat writes:
An AF_UNIX datagram socket being the client in an n:1 association with
some server socket is only allowed to send messages to the server if the
receive queue of this socket contains at most sk_max_ack_backlog
datagrams. This implies that prospective writers might be forced to go
to sleep despite none of the message presently enqueued on the server
receive queue were sent by them. In order to ensure that these will be
woken up once space becomes again available, the present unix_dgram_poll
routine does a second sock_poll_wait call with the peer_wait wait queue
of the server socket as queue argument (unix_dgram_recvmsg does a wake
up on this queue after a datagram was received). This is inherently
problematic because the server socket is only guaranteed to remain alive
for as long as the client still holds a reference to it. In case the
connection is dissolved via connect or by the dead peer detection logic
in unix_dgram_sendmsg, the server socket may be freed despite "the
polling mechanism" (in particular, epoll) still has a pointer to the
corresponding peer_wait queue. There's no way to forcibly deregister a
wait queue with epoll.Based on an idea by Jason Baron, the patch below changes the code such
that a wait_queue_t belonging to the client socket is enqueued on the
peer_wait queue of the server whenever the peer receive queue full
condition is detected by either a sendmsg or a poll. A wake up on the
peer queue is then relayed to the ordinary wait queue of the client
socket via wake function. The connection to the peer wait queue is again
dissolved if either a wake up is about to be relayed or the client
socket reconnects or a dead peer is detected or the client socket is
itself closed. This enables removing the second sock_poll_wait from
unix_dgram_poll, thus avoiding the use-after-free, while still ensuring
that no blocked writer sleeps forever.Signed-off-by: Rainer Weikusat
Fixes: ec0d215f9420 ("af_unix: fix 'poll for write'/connected DGRAM sockets")
Reviewed-by: Jason Baron
Signed-off-by: David S. Miller
18 Nov, 2015
1 commit
-
While possibly in future we don't necessarily need to use
sk_buff_head.lock this is a rather larger change, as it affects the
af_unix fd garbage collector, diag and socket cleanups. This is too much
for a stable patch.For the time being grab sk_buff_head.lock without disabling bh and irqs,
so don't use locked skb_queue_tail.Fixes: 869e7c62486e ("net: af_unix: implement stream sendpage support")
Cc: Eric Dumazet
Signed-off-by: Hannes Frederic Sowa
Reported-by: Eric Dumazet
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller
17 Nov, 2015
1 commit
-
In case multiple writes to a unix stream socket race we could end up in a
situation where we pre-allocate a new skb for use in unix_stream_sendpage
but have to free it again in the locked section because another skb
has been appended meanwhile, which we must use. Accidentally we didn't
clear the pointer after consuming it and so we touched freed memory
while appending it to the sk_receive_queue. So, clear the pointer after
consuming the skb.This bug has been found with syzkaller
(http://github.com/google/syzkaller) by Dmitry Vyukov.Fixes: 869e7c62486e ("net: af_unix: implement stream sendpage support")
Reported-by: Dmitry Vyukov
Cc: Dmitry Vyukov
Cc: Eric Dumazet
Signed-off-by: Hannes Frederic Sowa
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller
16 Nov, 2015
1 commit
-
During splicing an af-unix socket to a pipe we have to drop all
af-unix socket locks. While doing so we allow another reader to enter
unix_stream_read_generic which can read, copy and finally free another
skb. If exactly this skb is just in process of being spliced we get a
use-after-free report by kasan.First, we must make sure to not have a free while the skb is used during
the splice operation. We simply increment its use counter before unlocking
the reader lock.Stream sockets have the nice characteristic that we don't care about
zero length writes and they never reach the peer socket's queue. That
said, we can take the UNIXCB.consumed field as the indicator if the
skb was already freed from the socket's receive queue. If the skb was
fully consumed after we locked the reader side again we know it has been
dropped by a second reader. We indicate a short read to user space and
abort the current splice operation.This bug has been found with syzkaller
(http://github.com/google/syzkaller) by Dmitry Vyukov.Fixes: 2b514574f7e8 ("net: af_unix: implement splice for stream af_unix sockets")
Reported-by: Dmitry Vyukov
Cc: Dmitry Vyukov
Cc: Eric Dumazet
Acked-by: Eric Dumazet
Signed-off-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller
25 Oct, 2015
1 commit
-
poll(POLLOUT) on a listener should not report fd is ready for
a write().This would break some applications using poll() and pfd.events = -1,
as they would not block in poll()Signed-off-by: Eric Dumazet
Reported-by: Alan Burlison
Tested-by: Eric Dumazet
Signed-off-by: David S. Miller
05 Oct, 2015
1 commit
-
Now send with MSG_PEEK can return data from multiple SKBs.
Unfortunately we take into account the peek offset for each skb,
that is wrong. We need to apply the peek offset only once.In addition, the peek offset should be used only if MSG_PEEK is set.
Cc: "David S. Miller" (maintainer:NETWORKING
Cc: Eric Dumazet (commit_signer:1/14=7%)
Cc: Aaron Conole
Fixes: 9f389e35674f ("af_unix: return data from multiple SKBs on recv() with MSG_PEEK flag")
Signed-off-by: Andrey Vagin
Tested-by: Aaron Conole
Signed-off-by: David S. Miller
30 Sep, 2015
1 commit
-
AF_UNIX sockets now return multiple skbs from recv() when MSG_PEEK flag
is set.This is referenced in kernel bugzilla #12323 @
https://bugzilla.kernel.org/show_bug.cgi?id=12323As described both in the BZ and lkml thread @
http://lkml.org/lkml/2008/1/8/444 calling recv() with MSG_PEEK on an
AF_UNIX socket only reads a single skb, where the desired effect is
to return as much skb data has been queued, until hitting the recv
buffer size (whichever comes first).The modified MSG_PEEK path will now move to the next skb in the tree
and jump to the again: label, rather than following the natural loop
structure. This requires duplicating some of the loop head actions.This was tested using the python socketpair python code attached to
the bugzilla issue.Signed-off-by: Aaron Conole
Signed-off-by: David S. Miller
11 Jun, 2015
1 commit
-
SCM_SECURITY was originally only implemented for datagram sockets,
not for stream sockets. However, SCM_CREDENTIALS is supported on
Unix stream sockets. For consistency, implement Unix stream support
for SCM_SECURITY as well. Also clean up the existing code and get
rid of the superfluous UNIXSID macro.Motivated by https://bugzilla.redhat.com/show_bug.cgi?id=1224211,
where systemd was using SCM_CREDENTIALS and assumed wrongly that
SCM_SECURITY was also supported on Unix stream sockets.Signed-off-by: Stephen Smalley
Acked-by: Paul Moore
Signed-off-by: David S. Miller
02 Jun, 2015
1 commit
-
Conflicts:
drivers/net/phy/amd-xgbe-phy.c
drivers/net/wireless/iwlwifi/Kconfig
include/net/mac80211.hiwlwifi/Kconfig and mac80211.h were both trivial overlapping
changes.The drivers/net/phy/amd-xgbe-phy.c file got removed in 'net-next' and
the bug fix that happened on the 'net' side is already integrated
into the rest of the amd-xgbe driver.Signed-off-by: David S. Miller
27 May, 2015
1 commit
-
got a rare NULL pointer dereference in clear_bit
Signed-off-by: Mark Salyzyn
Acked-by: Hannes Frederic Sowa
----
v2: switch to sock_flag(sk, SOCK_DEAD) and added net/caif/caif_socket.c
v3: return -ECONNRESET in upstream caller of wait function for SOCK_DEAD
Signed-off-by: David S. Miller
25 May, 2015
2 commits
-
unix_stream_recvmsg is refactored to unix_stream_read_generic in this
patch and enhanced to deal with pipe splicing. The refactoring is
inneglible, we mostly have to deal with a non-existing struct msghdr
argument.Signed-off-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller -
This patch implements sendpage support for AF_UNIX SOCK_STREAM
sockets. This is also required for a complete splice implementation.The implementation is a bit tricky because we append to already existing
skbs and so have to hold unix_sk->readlock to protect the reading side
from either advancing UNIXCB.consumed or freeing the skb at the socket
receive tail.Signed-off-by: Hannes Frederic Sowa
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller
11 May, 2015
1 commit
-
In preparation for changing how struct net is refcounted
on kernel sockets pass the knowledge that we are creating
a kernel socket from sock_create_kern through to sk_alloc.Signed-off-by: "Eric W. Biederman"
Signed-off-by: David S. Miller
28 Apr, 2015
1 commit
-
Pull networking fixes from David Miller:
1) mlx4 doesn't check fully for supported valid RSS hash function, fix
from Amir Vadai2) Off by one in ibmveth_change_mtu(), from David Gibson
3) Prevent altera chip from reporting false error interrupts in some
circumstances, from Chee Nouk Phoon4) Get rid of that stupid endless loop trying to allocate a FIN packet
in TCP, and in the process kill deadlocks. From Eric Dumazet5) Fix get_rps_cpus() crash due to wrong invalid-cpu value, also from
Eric Dumazet6) Fix two bugs in async rhashtable resizing, from Thomas Graf
7) Fix topology server listener socket namespace bug in TIPC, from Ying
Xue8) Add some missing HAS_DMA kconfig dependencies, from Geert
Uytterhoeven9) bgmac driver intends to force re-polling but does so by returning
the wrong value from it's ->poll() handler. Fix from Rafał Miłecki10) When the creater of an rhashtable configures a max size for it,
don't bark in the logs and drop insertions when that is exceeded.
Fix from Johannes Berg11) Recover from out of order packets in ppp mppe properly, from Sylvain
Rochet* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (41 commits)
bnx2x: really disable TPA if 'disable_tpa' option is set
net:treewide: Fix typo in drivers/net
net/mlx4_en: Prevent setting invalid RSS hash function
mdio-mux-gpio: use new gpiod_get_array and gpiod_put_array functions
netfilter; Add some missing default cases to switch statements in nft_reject.
ppp: mppe: discard late packet in stateless mode
ppp: mppe: sanity error path rework
net/bonding: Make DRV macros private
net: rfs: fix crash in get_rps_cpus()
altera tse: add support for fixed-links.
pxa168: fix double deallocation of managed resources
net: fix crash in build_skb()
net: eth: altera: Resolve false errors from MSGDMA to TSE
ehea: Fix memory hook reference counting crashes
net/tg3: Release IRQs on permanent error
net: mdio-gpio: support access that may sleep
inet: fix possible panic in reqsk_queue_unlink()
rhashtable: don't attempt to grow when at max_size
bgmac: fix requests for extra polling calls from NAPI
tcp: avoid looping in tcp_send_fin()
...
24 Apr, 2015
1 commit
-
fixed several comment and whitespace style issues
Signed-off-by: Jason Eastman
Signed-off-by: David S. Miller
16 Apr, 2015
2 commits
-
places where we are dealing with S_ISSOCK file creation/lookups.
Signed-off-by: David Howells
Signed-off-by: Al Viro -
AF_UNIX sockets should call mknod on the top layer only and should not attempt
to modify the lower layer in a layered filesystem such as overlayfs.Signed-off-by: David Howells
Signed-off-by: Al Viro
03 Mar, 2015
1 commit
-
After TIPC doesn't depend on iocb argument in its internal
implementations of sendmsg() and recvmsg() hooks defined in proto
structure, no any user is using iocb argument in them at all now.
Then we can drop the redundant iocb argument completely from kinds of
implementations of both sendmsg() and recvmsg() in the entire
networking stack.Cc: Christoph Hellwig
Suggested-by: Al Viro
Signed-off-by: Ying Xue
Signed-off-by: David S. Miller
29 Jan, 2015
1 commit
-
The sock_iocb structure is allocate on stack for each read/write-like
operation on sockets, and contains various fields of which only the
embedded msghdr and sometimes a pointer to the scm_cookie is ever used.
Get rid of the sock_iocb and put a msghdr directly on the stack and pass
the scm_cookie explicitly to netlink_mmap_sendmsg.Signed-off-by: Christoph Hellwig
Signed-off-by: David S. Miller
18 Jan, 2015
1 commit
-
Contrary to common expectations for an "int" return, these functions
return only a positive value -- if used correctly they cannot even
return 0 because the message header will necessarily be in the skb.This makes the very common pattern of
if (genlmsg_end(...) < 0) { ... }
be a whole bunch of dead code. Many places also simply do
return nlmsg_end(...);
and the caller is expected to deal with it.
This also commonly (at least for me) causes errors, because it is very
common to writeif (my_function(...))
/* error condition */and if my_function() does "return nlmsg_end()" this is of course wrong.
Additionally, there's not a single place in the kernel that actually
needs the message length returned, and if anyone needs it later then
it'll be very easy to just use skb->len there.Remove this, and make the functions void. This removes a bunch of dead
code as described above. The patch adds lines because I did- return nlmsg_end(...);
+ nlmsg_end(...);
+ return 0;I could have preserved all the function's return values by returning
skb->len, but instead I've audited all the places calling the affected
functions and found that none cared. A few places actually compared
the return value with < 0 with no change in behaviour, so I opted for the more
efficient version.One instance of the error I've made numerous times now is also present
in net/phonet/pn_netlink.c in the route_dumpit() function - it didn't
check for
Signed-off-by: David S. Miller
10 Dec, 2014
1 commit
-
Note that the code _using_ ->msg_iter at that point will be very
unhappy with anything other than unshifted iovec-backed iov_iter.
We still need to convert users to proper primitives.Signed-off-by: Al Viro
24 Nov, 2014
1 commit
-
... and kill skb_copy_datagram_iovec()
Signed-off-by: Al Viro
06 Nov, 2014
1 commit
-
This encapsulates all of the skb_copy_datagram_iovec() callers
with call argument signature "skb, offset, msghdr->msg_iov, length".When we move to iov_iters in the networking, the iov_iter object will
sit in the msghdr.Having a helper like this means there will be less places to touch
during that transformation.Based upon descriptions and patch from Al Viro.
Signed-off-by: David S. Miller
08 Oct, 2014
1 commit
-
static values are automatically initialized to 0
Signed-off-by: Fabian Frederick
Signed-off-by: David S. Miller
13 Jun, 2014
1 commit
-
Pull networking updates from David Miller:
1) Seccomp BPF filters can now be JIT'd, from Alexei Starovoitov.
2) Multiqueue support in xen-netback and xen-netfront, from Andrew J
Benniston.3) Allow tweaking of aggregation settings in cdc_ncm driver, from Bjørn
Mork.4) BPF now has a "random" opcode, from Chema Gonzalez.
5) Add more BPF documentation and improve test framework, from Daniel
Borkmann.6) Support TCP fastopen over ipv6, from Daniel Lee.
7) Add software TSO helper functions and use them to support software
TSO in mvneta and mv643xx_eth drivers. From Ezequiel Garcia.8) Support software TSO in fec driver too, from Nimrod Andy.
9) Add Broadcom SYSTEMPORT driver, from Florian Fainelli.
10) Handle broadcasts more gracefully over macvlan when there are large
numbers of interfaces configured, from Herbert Xu.11) Allow more control over fwmark used for non-socket based responses,
from Lorenzo Colitti.12) Do TCP congestion window limiting based upon measurements, from Neal
Cardwell.13) Support busy polling in SCTP, from Neal Horman.
14) Allow RSS key to be configured via ethtool, from Venkata Duvvuru.
15) Bridge promisc mode handling improvements from Vlad Yasevich.
16) Don't use inetpeer entries to implement ID generation any more, it
performs poorly, from Eric Dumazet.* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1522 commits)
rtnetlink: fix userspace API breakage for iproute2 < v3.9.0
tcp: fixing TLP's FIN recovery
net: fec: Add software TSO support
net: fec: Add Scatter/gather support
net: fec: Increase buffer descriptor entry number
net: fec: Factorize feature setting
net: fec: Enable IP header hardware checksum
net: fec: Factorize the .xmit transmit function
bridge: fix compile error when compiling without IPv6 support
bridge: fix smatch warning / potential null pointer dereference
via-rhine: fix full-duplex with autoneg disable
bnx2x: Enlarge the dorq threshold for VFs
bnx2x: Check for UNDI in uncommon branch
bnx2x: Fix 1G-baseT link
bnx2x: Fix link for KR with swapped polarity lane
sctp: Fix sk_ack_backlog wrap-around problem
net/core: Add VF link state control policy
net/fsl: xgmac_mdio is dependent on OF_MDIO
net/fsl: Make xgmac_mdio read error message useful
net_sched: drr: warn when qdisc is not work conserving
...
17 May, 2014
1 commit
-
Using whole of allocated pages reduces requested skb->data size.
This is just a little more thriftily allocation.netperf does not show difference with the current performance.
Signed-off-by: Kirill Tkhai
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller
18 Apr, 2014
1 commit
-
Mostly scripted conversion of the smp_mb__* barriers.
Signed-off-by: Peter Zijlstra
Acked-by: Paul E. McKenney
Link: http://lkml.kernel.org/n/tip-55dhyhocezdw1dg7u19hmh1u@git.kernel.org
Cc: Linus Torvalds
Cc: linux-arch@vger.kernel.org
Signed-off-by: Ingo Molnar
12 Apr, 2014
1 commit
-
Several spots in the kernel perform a sequence like:
skb_queue_tail(&sk->s_receive_queue, skb);
sk->sk_data_ready(sk, skb->len);But at the moment we place the SKB onto the socket receive queue it
can be consumed and freed up. So this skb->len access is potentially
to freed up memory.Furthermore, the skb->len can be modified by the consumer so it is
possible that the value isn't accurate.And finally, no actual implementation of this callback actually uses
the length argument. And since nobody actually cared about it's
value, lots of call sites pass arbitrary values in such as '0' and
even '1'.So just remove the length argument from the callback, that way there
is no confusion whatsoever and all of these use-after-free cases get
fixed as a side effect.Based upon a patch by Eric Dumazet and his suggestion to audit this
issue tree-wide.Signed-off-by: David S. Miller
27 Mar, 2014
1 commit
-
Some applications didn't expect recvmsg() on a non blocking socket
could return -EINTR. This possibility was added as a side effect
of commit b3ca9b02b00704 ("net: fix multithreaded signal handling in
unix recv routines").To hit this bug, you need to be a bit unlucky, as the u->readlock
mutex is usually held for very small periods.Fixes: b3ca9b02b00704 ("net: fix multithreaded signal handling in unix recv routines")
Signed-off-by: Eric Dumazet
Cc: Rainer Weikusat
Signed-off-by: David S. Miller
07 Mar, 2014
1 commit
-
The unix socket code is using the result of csum_partial to
hash into a lookup table:unix_hash_fold(csum_partial(sunaddr, len, 0));
csum_partial is only guaranteed to produce something that can be
folded into a checksum, as its prototype explains:* returns a 32-bit number suitable for feeding into itself
* or csum_tcpudp_magicThe 32bit value should not be used directly.
Depending on the alignment, the ppc64 csum_partial will return
different 32bit partial checksums that will fold into the same
16bit checksum.This difference causes the following testcase (courtesy of
Gustavo) to sometimes fail:#include
#includeint main()
{
int fd = socket(PF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC, 0);int i = 1;
setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &i, 4);struct sockaddr addr;
addr.sa_family = AF_LOCAL;
bind(fd, &addr, 2);listen(fd, 128);
struct sockaddr_storage ss;
socklen_t sslen = (socklen_t)sizeof(ss);
getsockname(fd, (struct sockaddr*)&ss, &sslen);fd = socket(PF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC, 0);
if (connect(fd, (struct sockaddr*)&ss, sslen) == -1){
perror(NULL);
return 1;
}
printf("OK\n");
return 0;
}As suggested by davem, fix this by using csum_fold to fold the
partial 32bit checksum into a 16bit checksum before using it.Signed-off-by: Anton Blanchard
Cc: stable@vger.kernel.org
Signed-off-by: David S. Miller
19 Jan, 2014
1 commit
-
This is a follow-up patch to f3d3342602f8bc ("net: rework recvmsg
handler msg_name and msg_namelen logic").DECLARE_SOCKADDR validates that the structure we use for writing the
name information to is not larger than the buffer which is reserved
for msg->msg_name (which is 128 bytes). Also use DECLARE_SOCKADDR
consistently in sendmsg code paths.Signed-off-by: Steffen Hurrle
Suggested-by: Hannes Frederic Sowa
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller
19 Dec, 2013
1 commit
-
Conflicts:
drivers/net/ethernet/intel/i40e/i40e_main.c
drivers/net/macvtap.cBoth minor merge hassles, simple overlapping changes.
Signed-off-by: David S. Miller
18 Dec, 2013
1 commit
-
This is similar to the set_peek_off patch where calling bind while the
socket is stuck in unix_dgram_recvmsg() will block and cause a hung task
spew after a while.This is also the last place that did a straightforward mutex_lock(), so
there shouldn't be any more of these patches.Signed-off-by: Sasha Levin
Signed-off-by: David S. Miller
11 Dec, 2013
1 commit
-
unix_dgram_recvmsg() will hold the readlock of the socket until recv
is complete.In the same time, we may try to setsockopt(SO_PEEK_OFF) which will hang until
unix_dgram_recvmsg() will complete (which can take a while) without allowing
us to break out of it, triggering a hung task spew.Instead, allow set_peek_off to fail, this way userspace will not hang.
Signed-off-by: Sasha Levin
Acked-by: Pavel Emelyanov
Signed-off-by: David S. Miller
07 Dec, 2013
1 commit
-
use pr_ instead of printk(LEVEL)
Signed-off-by: Wang Weidong
Signed-off-by: David S. Miller
21 Nov, 2013
1 commit
-
This patch now always passes msg->msg_namelen as 0. recvmsg handlers must
set msg_namelen to the proper size
Suggested-by: Eric Dumazet
Signed-off-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller
20 Oct, 2013
1 commit
-
In the case of credentials passing in unix stream sockets (dgram
sockets seem not affected), we get a rather sparse race after
commit 16e5726 ("af_unix: dont send SCM_CREDENTIALS by default").We have a stream server on receiver side that requests credential
passing from senders (e.g. nc -U). Since we need to set SO_PASSCRED
on each spawned/accepted socket on server side to 1 first (as it's
not inherited), it can happen that in the time between accept() and
setsockopt() we get interrupted, the sender is being scheduled and
continues with passing data to our receiver. At that time SO_PASSCRED
is neither set on sender nor receiver side, hence in cmsg's
SCM_CREDENTIALS we get eventually pid:0, uid:65534, gid:65534
(== overflow{u,g}id) instead of what we actually would like to see.On the sender side, here nc -U, the tests in maybe_add_creds()
invoked through unix_stream_sendmsg() would fail, as at that exact
time, as mentioned, the sender has neither SO_PASSCRED on his side
nor sees it on the server side, and we have a valid 'other' socket
in place. Thus, sender believes it would just look like a normal
connection, not needing/requesting SO_PASSCRED at that time.As reverting 16e5726 would not be an option due to the significant
performance regression reported when having creds always passed,
one way/trade-off to prevent that would be to set SO_PASSCRED on
the listener socket and allow inheriting these flags to the spawned
socket on server side in accept(). It seems also logical to do so
if we'd tell the listener socket to pass those flags onwards, and
would fix the race.Before, strace:
recvmsg(4, {msg_name(0)=NULL, msg_iov(1)=[{"blub\n", 4096}],
msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET,
cmsg_type=SCM_CREDENTIALS{pid=0, uid=65534, gid=65534}},
msg_flags=0}, 0) = 5After, strace:
recvmsg(4, {msg_name(0)=NULL, msg_iov(1)=[{"blub\n", 4096}],
msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET,
cmsg_type=SCM_CREDENTIALS{pid=11580, uid=1000, gid=1000}},
msg_flags=0}, 0) = 5Signed-off-by: Daniel Borkmann
Cc: Eric Dumazet
Cc: Eric W. Biederman
Signed-off-by: David S. Miller
03 Oct, 2013
1 commit
-
When filling the netlink message we miss to wipe the pad field,
therefore leak one byte of heap memory to userland. Fix this by
setting pad to 0.Signed-off-by: Mathias Krause
Signed-off-by: David S. Miller