02 Dec, 2015

1 commit

  • This patch is a cleanup to make following patch easier to
    review.

    Goal is to move SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA
    from (struct socket)->flags to a (struct socket_wq)->flags
    to benefit from RCU protection in sock_wake_async()

    To ease backports, we rename both constants.

    Two new helpers, sk_set_bit(int nr, struct sock *sk)
    and sk_clear_bit(int net, struct sock *sk) are added so that
    following patch can change their implementation.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

01 Dec, 2015

1 commit

  • sendpage did not care about credentials at all. This could lead to
    situations in which because of fd passing between processes we could
    append data to skbs with different scm data. It is illegal to splice those
    skbs together. Instead we have to allocate a new skb and if requested
    fill out the scm details.

    Fixes: 869e7c62486ec ("net: af_unix: implement stream sendpage support")
    Reported-by: Al Viro
    Cc: Al Viro
    Cc: Eric Dumazet
    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Hannes Frederic Sowa
     

24 Nov, 2015

1 commit

  • Rainer Weikusat writes:
    An AF_UNIX datagram socket being the client in an n:1 association with
    some server socket is only allowed to send messages to the server if the
    receive queue of this socket contains at most sk_max_ack_backlog
    datagrams. This implies that prospective writers might be forced to go
    to sleep despite none of the message presently enqueued on the server
    receive queue were sent by them. In order to ensure that these will be
    woken up once space becomes again available, the present unix_dgram_poll
    routine does a second sock_poll_wait call with the peer_wait wait queue
    of the server socket as queue argument (unix_dgram_recvmsg does a wake
    up on this queue after a datagram was received). This is inherently
    problematic because the server socket is only guaranteed to remain alive
    for as long as the client still holds a reference to it. In case the
    connection is dissolved via connect or by the dead peer detection logic
    in unix_dgram_sendmsg, the server socket may be freed despite "the
    polling mechanism" (in particular, epoll) still has a pointer to the
    corresponding peer_wait queue. There's no way to forcibly deregister a
    wait queue with epoll.

    Based on an idea by Jason Baron, the patch below changes the code such
    that a wait_queue_t belonging to the client socket is enqueued on the
    peer_wait queue of the server whenever the peer receive queue full
    condition is detected by either a sendmsg or a poll. A wake up on the
    peer queue is then relayed to the ordinary wait queue of the client
    socket via wake function. The connection to the peer wait queue is again
    dissolved if either a wake up is about to be relayed or the client
    socket reconnects or a dead peer is detected or the client socket is
    itself closed. This enables removing the second sock_poll_wait from
    unix_dgram_poll, thus avoiding the use-after-free, while still ensuring
    that no blocked writer sleeps forever.

    Signed-off-by: Rainer Weikusat
    Fixes: ec0d215f9420 ("af_unix: fix 'poll for write'/connected DGRAM sockets")
    Reviewed-by: Jason Baron
    Signed-off-by: David S. Miller

    Rainer Weikusat
     

18 Nov, 2015

1 commit

  • While possibly in future we don't necessarily need to use
    sk_buff_head.lock this is a rather larger change, as it affects the
    af_unix fd garbage collector, diag and socket cleanups. This is too much
    for a stable patch.

    For the time being grab sk_buff_head.lock without disabling bh and irqs,
    so don't use locked skb_queue_tail.

    Fixes: 869e7c62486e ("net: af_unix: implement stream sendpage support")
    Cc: Eric Dumazet
    Signed-off-by: Hannes Frederic Sowa
    Reported-by: Eric Dumazet
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Hannes Frederic Sowa
     

17 Nov, 2015

1 commit

  • In case multiple writes to a unix stream socket race we could end up in a
    situation where we pre-allocate a new skb for use in unix_stream_sendpage
    but have to free it again in the locked section because another skb
    has been appended meanwhile, which we must use. Accidentally we didn't
    clear the pointer after consuming it and so we touched freed memory
    while appending it to the sk_receive_queue. So, clear the pointer after
    consuming the skb.

    This bug has been found with syzkaller
    (http://github.com/google/syzkaller) by Dmitry Vyukov.

    Fixes: 869e7c62486e ("net: af_unix: implement stream sendpage support")
    Reported-by: Dmitry Vyukov
    Cc: Dmitry Vyukov
    Cc: Eric Dumazet
    Signed-off-by: Hannes Frederic Sowa
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Hannes Frederic Sowa
     

16 Nov, 2015

1 commit

  • During splicing an af-unix socket to a pipe we have to drop all
    af-unix socket locks. While doing so we allow another reader to enter
    unix_stream_read_generic which can read, copy and finally free another
    skb. If exactly this skb is just in process of being spliced we get a
    use-after-free report by kasan.

    First, we must make sure to not have a free while the skb is used during
    the splice operation. We simply increment its use counter before unlocking
    the reader lock.

    Stream sockets have the nice characteristic that we don't care about
    zero length writes and they never reach the peer socket's queue. That
    said, we can take the UNIXCB.consumed field as the indicator if the
    skb was already freed from the socket's receive queue. If the skb was
    fully consumed after we locked the reader side again we know it has been
    dropped by a second reader. We indicate a short read to user space and
    abort the current splice operation.

    This bug has been found with syzkaller
    (http://github.com/google/syzkaller) by Dmitry Vyukov.

    Fixes: 2b514574f7e8 ("net: af_unix: implement splice for stream af_unix sockets")
    Reported-by: Dmitry Vyukov
    Cc: Dmitry Vyukov
    Cc: Eric Dumazet
    Acked-by: Eric Dumazet
    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Hannes Frederic Sowa
     

25 Oct, 2015

1 commit

  • poll(POLLOUT) on a listener should not report fd is ready for
    a write().

    This would break some applications using poll() and pfd.events = -1,
    as they would not block in poll()

    Signed-off-by: Eric Dumazet
    Reported-by: Alan Burlison
    Tested-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

05 Oct, 2015

1 commit

  • Now send with MSG_PEEK can return data from multiple SKBs.

    Unfortunately we take into account the peek offset for each skb,
    that is wrong. We need to apply the peek offset only once.

    In addition, the peek offset should be used only if MSG_PEEK is set.

    Cc: "David S. Miller" (maintainer:NETWORKING
    Cc: Eric Dumazet (commit_signer:1/14=7%)
    Cc: Aaron Conole
    Fixes: 9f389e35674f ("af_unix: return data from multiple SKBs on recv() with MSG_PEEK flag")
    Signed-off-by: Andrey Vagin
    Tested-by: Aaron Conole
    Signed-off-by: David S. Miller

    Andrey Vagin
     

30 Sep, 2015

1 commit

  • AF_UNIX sockets now return multiple skbs from recv() when MSG_PEEK flag
    is set.

    This is referenced in kernel bugzilla #12323 @
    https://bugzilla.kernel.org/show_bug.cgi?id=12323

    As described both in the BZ and lkml thread @
    http://lkml.org/lkml/2008/1/8/444 calling recv() with MSG_PEEK on an
    AF_UNIX socket only reads a single skb, where the desired effect is
    to return as much skb data has been queued, until hitting the recv
    buffer size (whichever comes first).

    The modified MSG_PEEK path will now move to the next skb in the tree
    and jump to the again: label, rather than following the natural loop
    structure. This requires duplicating some of the loop head actions.

    This was tested using the python socketpair python code attached to
    the bugzilla issue.

    Signed-off-by: Aaron Conole
    Signed-off-by: David S. Miller

    Aaron Conole
     

11 Jun, 2015

1 commit

  • SCM_SECURITY was originally only implemented for datagram sockets,
    not for stream sockets. However, SCM_CREDENTIALS is supported on
    Unix stream sockets. For consistency, implement Unix stream support
    for SCM_SECURITY as well. Also clean up the existing code and get
    rid of the superfluous UNIXSID macro.

    Motivated by https://bugzilla.redhat.com/show_bug.cgi?id=1224211,
    where systemd was using SCM_CREDENTIALS and assumed wrongly that
    SCM_SECURITY was also supported on Unix stream sockets.

    Signed-off-by: Stephen Smalley
    Acked-by: Paul Moore
    Signed-off-by: David S. Miller

    Stephen Smalley
     

02 Jun, 2015

1 commit

  • Conflicts:
    drivers/net/phy/amd-xgbe-phy.c
    drivers/net/wireless/iwlwifi/Kconfig
    include/net/mac80211.h

    iwlwifi/Kconfig and mac80211.h were both trivial overlapping
    changes.

    The drivers/net/phy/amd-xgbe-phy.c file got removed in 'net-next' and
    the bug fix that happened on the 'net' side is already integrated
    into the rest of the amd-xgbe driver.

    Signed-off-by: David S. Miller

    David S. Miller
     

27 May, 2015

1 commit


25 May, 2015

2 commits

  • unix_stream_recvmsg is refactored to unix_stream_read_generic in this
    patch and enhanced to deal with pipe splicing. The refactoring is
    inneglible, we mostly have to deal with a non-existing struct msghdr
    argument.

    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Hannes Frederic Sowa
     
  • This patch implements sendpage support for AF_UNIX SOCK_STREAM
    sockets. This is also required for a complete splice implementation.

    The implementation is a bit tricky because we append to already existing
    skbs and so have to hold unix_sk->readlock to protect the reading side
    from either advancing UNIXCB.consumed or freeing the skb at the socket
    receive tail.

    Signed-off-by: Hannes Frederic Sowa
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Hannes Frederic Sowa
     

11 May, 2015

1 commit


28 Apr, 2015

1 commit

  • Pull networking fixes from David Miller:

    1) mlx4 doesn't check fully for supported valid RSS hash function, fix
    from Amir Vadai

    2) Off by one in ibmveth_change_mtu(), from David Gibson

    3) Prevent altera chip from reporting false error interrupts in some
    circumstances, from Chee Nouk Phoon

    4) Get rid of that stupid endless loop trying to allocate a FIN packet
    in TCP, and in the process kill deadlocks. From Eric Dumazet

    5) Fix get_rps_cpus() crash due to wrong invalid-cpu value, also from
    Eric Dumazet

    6) Fix two bugs in async rhashtable resizing, from Thomas Graf

    7) Fix topology server listener socket namespace bug in TIPC, from Ying
    Xue

    8) Add some missing HAS_DMA kconfig dependencies, from Geert
    Uytterhoeven

    9) bgmac driver intends to force re-polling but does so by returning
    the wrong value from it's ->poll() handler. Fix from Rafał Miłecki

    10) When the creater of an rhashtable configures a max size for it,
    don't bark in the logs and drop insertions when that is exceeded.
    Fix from Johannes Berg

    11) Recover from out of order packets in ppp mppe properly, from Sylvain
    Rochet

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (41 commits)
    bnx2x: really disable TPA if 'disable_tpa' option is set
    net:treewide: Fix typo in drivers/net
    net/mlx4_en: Prevent setting invalid RSS hash function
    mdio-mux-gpio: use new gpiod_get_array and gpiod_put_array functions
    netfilter; Add some missing default cases to switch statements in nft_reject.
    ppp: mppe: discard late packet in stateless mode
    ppp: mppe: sanity error path rework
    net/bonding: Make DRV macros private
    net: rfs: fix crash in get_rps_cpus()
    altera tse: add support for fixed-links.
    pxa168: fix double deallocation of managed resources
    net: fix crash in build_skb()
    net: eth: altera: Resolve false errors from MSGDMA to TSE
    ehea: Fix memory hook reference counting crashes
    net/tg3: Release IRQs on permanent error
    net: mdio-gpio: support access that may sleep
    inet: fix possible panic in reqsk_queue_unlink()
    rhashtable: don't attempt to grow when at max_size
    bgmac: fix requests for extra polling calls from NAPI
    tcp: avoid looping in tcp_send_fin()
    ...

    Linus Torvalds
     

24 Apr, 2015

1 commit


16 Apr, 2015

2 commits


03 Mar, 2015

1 commit

  • After TIPC doesn't depend on iocb argument in its internal
    implementations of sendmsg() and recvmsg() hooks defined in proto
    structure, no any user is using iocb argument in them at all now.
    Then we can drop the redundant iocb argument completely from kinds of
    implementations of both sendmsg() and recvmsg() in the entire
    networking stack.

    Cc: Christoph Hellwig
    Suggested-by: Al Viro
    Signed-off-by: Ying Xue
    Signed-off-by: David S. Miller

    Ying Xue
     

29 Jan, 2015

1 commit

  • The sock_iocb structure is allocate on stack for each read/write-like
    operation on sockets, and contains various fields of which only the
    embedded msghdr and sometimes a pointer to the scm_cookie is ever used.
    Get rid of the sock_iocb and put a msghdr directly on the stack and pass
    the scm_cookie explicitly to netlink_mmap_sendmsg.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: David S. Miller

    Christoph Hellwig
     

18 Jan, 2015

1 commit

  • Contrary to common expectations for an "int" return, these functions
    return only a positive value -- if used correctly they cannot even
    return 0 because the message header will necessarily be in the skb.

    This makes the very common pattern of

    if (genlmsg_end(...) < 0) { ... }

    be a whole bunch of dead code. Many places also simply do

    return nlmsg_end(...);

    and the caller is expected to deal with it.

    This also commonly (at least for me) causes errors, because it is very
    common to write

    if (my_function(...))
    /* error condition */

    and if my_function() does "return nlmsg_end()" this is of course wrong.

    Additionally, there's not a single place in the kernel that actually
    needs the message length returned, and if anyone needs it later then
    it'll be very easy to just use skb->len there.

    Remove this, and make the functions void. This removes a bunch of dead
    code as described above. The patch adds lines because I did

    - return nlmsg_end(...);
    + nlmsg_end(...);
    + return 0;

    I could have preserved all the function's return values by returning
    skb->len, but instead I've audited all the places calling the affected
    functions and found that none cared. A few places actually compared
    the return value with < 0 with no change in behaviour, so I opted for the more
    efficient version.

    One instance of the error I've made numerous times now is also present
    in net/phonet/pn_netlink.c in the route_dumpit() function - it didn't
    check for
    Signed-off-by: David S. Miller

    Johannes Berg
     

10 Dec, 2014

1 commit

  • Note that the code _using_ ->msg_iter at that point will be very
    unhappy with anything other than unshifted iovec-backed iov_iter.
    We still need to convert users to proper primitives.

    Signed-off-by: Al Viro

    Al Viro
     

24 Nov, 2014

1 commit


06 Nov, 2014

1 commit

  • This encapsulates all of the skb_copy_datagram_iovec() callers
    with call argument signature "skb, offset, msghdr->msg_iov, length".

    When we move to iov_iters in the networking, the iov_iter object will
    sit in the msghdr.

    Having a helper like this means there will be less places to touch
    during that transformation.

    Based upon descriptions and patch from Al Viro.

    Signed-off-by: David S. Miller

    David S. Miller
     

08 Oct, 2014

1 commit


13 Jun, 2014

1 commit

  • Pull networking updates from David Miller:

    1) Seccomp BPF filters can now be JIT'd, from Alexei Starovoitov.

    2) Multiqueue support in xen-netback and xen-netfront, from Andrew J
    Benniston.

    3) Allow tweaking of aggregation settings in cdc_ncm driver, from Bjørn
    Mork.

    4) BPF now has a "random" opcode, from Chema Gonzalez.

    5) Add more BPF documentation and improve test framework, from Daniel
    Borkmann.

    6) Support TCP fastopen over ipv6, from Daniel Lee.

    7) Add software TSO helper functions and use them to support software
    TSO in mvneta and mv643xx_eth drivers. From Ezequiel Garcia.

    8) Support software TSO in fec driver too, from Nimrod Andy.

    9) Add Broadcom SYSTEMPORT driver, from Florian Fainelli.

    10) Handle broadcasts more gracefully over macvlan when there are large
    numbers of interfaces configured, from Herbert Xu.

    11) Allow more control over fwmark used for non-socket based responses,
    from Lorenzo Colitti.

    12) Do TCP congestion window limiting based upon measurements, from Neal
    Cardwell.

    13) Support busy polling in SCTP, from Neal Horman.

    14) Allow RSS key to be configured via ethtool, from Venkata Duvvuru.

    15) Bridge promisc mode handling improvements from Vlad Yasevich.

    16) Don't use inetpeer entries to implement ID generation any more, it
    performs poorly, from Eric Dumazet.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1522 commits)
    rtnetlink: fix userspace API breakage for iproute2 < v3.9.0
    tcp: fixing TLP's FIN recovery
    net: fec: Add software TSO support
    net: fec: Add Scatter/gather support
    net: fec: Increase buffer descriptor entry number
    net: fec: Factorize feature setting
    net: fec: Enable IP header hardware checksum
    net: fec: Factorize the .xmit transmit function
    bridge: fix compile error when compiling without IPv6 support
    bridge: fix smatch warning / potential null pointer dereference
    via-rhine: fix full-duplex with autoneg disable
    bnx2x: Enlarge the dorq threshold for VFs
    bnx2x: Check for UNDI in uncommon branch
    bnx2x: Fix 1G-baseT link
    bnx2x: Fix link for KR with swapped polarity lane
    sctp: Fix sk_ack_backlog wrap-around problem
    net/core: Add VF link state control policy
    net/fsl: xgmac_mdio is dependent on OF_MDIO
    net/fsl: Make xgmac_mdio read error message useful
    net_sched: drr: warn when qdisc is not work conserving
    ...

    Linus Torvalds
     

17 May, 2014

1 commit


18 Apr, 2014

1 commit

  • Mostly scripted conversion of the smp_mb__* barriers.

    Signed-off-by: Peter Zijlstra
    Acked-by: Paul E. McKenney
    Link: http://lkml.kernel.org/n/tip-55dhyhocezdw1dg7u19hmh1u@git.kernel.org
    Cc: Linus Torvalds
    Cc: linux-arch@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

12 Apr, 2014

1 commit

  • Several spots in the kernel perform a sequence like:

    skb_queue_tail(&sk->s_receive_queue, skb);
    sk->sk_data_ready(sk, skb->len);

    But at the moment we place the SKB onto the socket receive queue it
    can be consumed and freed up. So this skb->len access is potentially
    to freed up memory.

    Furthermore, the skb->len can be modified by the consumer so it is
    possible that the value isn't accurate.

    And finally, no actual implementation of this callback actually uses
    the length argument. And since nobody actually cared about it's
    value, lots of call sites pass arbitrary values in such as '0' and
    even '1'.

    So just remove the length argument from the callback, that way there
    is no confusion whatsoever and all of these use-after-free cases get
    fixed as a side effect.

    Based upon a patch by Eric Dumazet and his suggestion to audit this
    issue tree-wide.

    Signed-off-by: David S. Miller

    David S. Miller
     

27 Mar, 2014

1 commit

  • Some applications didn't expect recvmsg() on a non blocking socket
    could return -EINTR. This possibility was added as a side effect
    of commit b3ca9b02b00704 ("net: fix multithreaded signal handling in
    unix recv routines").

    To hit this bug, you need to be a bit unlucky, as the u->readlock
    mutex is usually held for very small periods.

    Fixes: b3ca9b02b00704 ("net: fix multithreaded signal handling in unix recv routines")
    Signed-off-by: Eric Dumazet
    Cc: Rainer Weikusat
    Signed-off-by: David S. Miller

    Eric Dumazet
     

07 Mar, 2014

1 commit

  • The unix socket code is using the result of csum_partial to
    hash into a lookup table:

    unix_hash_fold(csum_partial(sunaddr, len, 0));

    csum_partial is only guaranteed to produce something that can be
    folded into a checksum, as its prototype explains:

    * returns a 32-bit number suitable for feeding into itself
    * or csum_tcpudp_magic

    The 32bit value should not be used directly.

    Depending on the alignment, the ppc64 csum_partial will return
    different 32bit partial checksums that will fold into the same
    16bit checksum.

    This difference causes the following testcase (courtesy of
    Gustavo) to sometimes fail:

    #include
    #include

    int main()
    {
    int fd = socket(PF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC, 0);

    int i = 1;
    setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &i, 4);

    struct sockaddr addr;
    addr.sa_family = AF_LOCAL;
    bind(fd, &addr, 2);

    listen(fd, 128);

    struct sockaddr_storage ss;
    socklen_t sslen = (socklen_t)sizeof(ss);
    getsockname(fd, (struct sockaddr*)&ss, &sslen);

    fd = socket(PF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC, 0);

    if (connect(fd, (struct sockaddr*)&ss, sslen) == -1){
    perror(NULL);
    return 1;
    }
    printf("OK\n");
    return 0;
    }

    As suggested by davem, fix this by using csum_fold to fold the
    partial 32bit checksum into a 16bit checksum before using it.

    Signed-off-by: Anton Blanchard
    Cc: stable@vger.kernel.org
    Signed-off-by: David S. Miller

    Anton Blanchard
     

19 Jan, 2014

1 commit

  • This is a follow-up patch to f3d3342602f8bc ("net: rework recvmsg
    handler msg_name and msg_namelen logic").

    DECLARE_SOCKADDR validates that the structure we use for writing the
    name information to is not larger than the buffer which is reserved
    for msg->msg_name (which is 128 bytes). Also use DECLARE_SOCKADDR
    consistently in sendmsg code paths.

    Signed-off-by: Steffen Hurrle
    Suggested-by: Hannes Frederic Sowa
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Steffen Hurrle
     

19 Dec, 2013

1 commit


18 Dec, 2013

1 commit

  • This is similar to the set_peek_off patch where calling bind while the
    socket is stuck in unix_dgram_recvmsg() will block and cause a hung task
    spew after a while.

    This is also the last place that did a straightforward mutex_lock(), so
    there shouldn't be any more of these patches.

    Signed-off-by: Sasha Levin
    Signed-off-by: David S. Miller

    Sasha Levin
     

11 Dec, 2013

1 commit

  • unix_dgram_recvmsg() will hold the readlock of the socket until recv
    is complete.

    In the same time, we may try to setsockopt(SO_PEEK_OFF) which will hang until
    unix_dgram_recvmsg() will complete (which can take a while) without allowing
    us to break out of it, triggering a hung task spew.

    Instead, allow set_peek_off to fail, this way userspace will not hang.

    Signed-off-by: Sasha Levin
    Acked-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Sasha Levin
     

07 Dec, 2013

1 commit


21 Nov, 2013

1 commit


20 Oct, 2013

1 commit

  • In the case of credentials passing in unix stream sockets (dgram
    sockets seem not affected), we get a rather sparse race after
    commit 16e5726 ("af_unix: dont send SCM_CREDENTIALS by default").

    We have a stream server on receiver side that requests credential
    passing from senders (e.g. nc -U). Since we need to set SO_PASSCRED
    on each spawned/accepted socket on server side to 1 first (as it's
    not inherited), it can happen that in the time between accept() and
    setsockopt() we get interrupted, the sender is being scheduled and
    continues with passing data to our receiver. At that time SO_PASSCRED
    is neither set on sender nor receiver side, hence in cmsg's
    SCM_CREDENTIALS we get eventually pid:0, uid:65534, gid:65534
    (== overflow{u,g}id) instead of what we actually would like to see.

    On the sender side, here nc -U, the tests in maybe_add_creds()
    invoked through unix_stream_sendmsg() would fail, as at that exact
    time, as mentioned, the sender has neither SO_PASSCRED on his side
    nor sees it on the server side, and we have a valid 'other' socket
    in place. Thus, sender believes it would just look like a normal
    connection, not needing/requesting SO_PASSCRED at that time.

    As reverting 16e5726 would not be an option due to the significant
    performance regression reported when having creds always passed,
    one way/trade-off to prevent that would be to set SO_PASSCRED on
    the listener socket and allow inheriting these flags to the spawned
    socket on server side in accept(). It seems also logical to do so
    if we'd tell the listener socket to pass those flags onwards, and
    would fix the race.

    Before, strace:

    recvmsg(4, {msg_name(0)=NULL, msg_iov(1)=[{"blub\n", 4096}],
    msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET,
    cmsg_type=SCM_CREDENTIALS{pid=0, uid=65534, gid=65534}},
    msg_flags=0}, 0) = 5

    After, strace:

    recvmsg(4, {msg_name(0)=NULL, msg_iov(1)=[{"blub\n", 4096}],
    msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET,
    cmsg_type=SCM_CREDENTIALS{pid=11580, uid=1000, gid=1000}},
    msg_flags=0}, 0) = 5

    Signed-off-by: Daniel Borkmann
    Cc: Eric Dumazet
    Cc: Eric W. Biederman
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

03 Oct, 2013

1 commit

  • When filling the netlink message we miss to wipe the pad field,
    therefore leak one byte of heap memory to userland. Fix this by
    setting pad to 0.

    Signed-off-by: Mathias Krause
    Signed-off-by: David S. Miller

    Mathias Krause