04 Nov, 2018

1 commit

  • [ Upstream commit b6168562c8ce2bd5a30e213021650422e08764dc ]

    In ethtool_ioctl(), the ioctl command 'ethcmd' is checked through a switch
    statement to see whether it is necessary to pre-process the ethtool
    structure, because, as mentioned in the comment, the structure
    ethtool_rxnfc is defined with padding. If yes, a user-space buffer 'rxnfc'
    is allocated through compat_alloc_user_space(). One thing to note here is
    that, if 'ethcmd' is ETHTOOL_GRXCLSRLALL, the size of the buffer 'rxnfc' is
    partially determined by 'rule_cnt', which is actually acquired from the
    user-space buffer 'compat_rxnfc', i.e., 'compat_rxnfc->rule_cnt', through
    get_user(). After 'rxnfc' is allocated, the data in the original user-space
    buffer 'compat_rxnfc' is then copied to 'rxnfc' through copy_in_user(),
    including the 'rule_cnt' field. However, after this copy, no check is
    re-enforced on 'rxnfc->rule_cnt'. So it is possible that a malicious user
    race to change the value in the 'compat_rxnfc->rule_cnt' between these two
    copies. Through this way, the attacker can bypass the previous check on
    'rule_cnt' and inject malicious data. This can cause undefined behavior of
    the kernel and introduce potential security risk.

    This patch avoids the above issue via copying the value acquired by
    get_user() to 'rxnfc->rule_cn', if 'ethcmd' is ETHTOOL_GRXCLSRLALL.

    Signed-off-by: Wenwen Wang
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Wenwen Wang
     

06 Aug, 2018

1 commit

  • commit c8e8cd579bb4265651df8223730105341e61a2d1 upstream.

    'call' is a user-controlled value, so sanitize the array index after the
    bounds check to avoid speculating past the bounds of the 'nargs' array.

    Found with the help of Smatch:

    net/socket.c:2508 __do_sys_socketcall() warn: potential spectre issue
    'nargs' [r] (local cap)

    Cc: Josh Poimboeuf
    Cc: stable@vger.kernel.org
    Signed-off-by: Jeremy Cline
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Jeremy Cline
     

26 Jun, 2018

1 commit

  • [ Upstream commit 6d8c50dcb029872b298eea68cc6209c866fd3e14 ]

    fchownat() doesn't even hold refcnt of fd until it figures out
    fd is really needed (otherwise is ignored) and releases it after
    it resolves the path. This means sock_close() could race with
    sockfs_setattr(), which leads to a NULL pointer dereference
    since typically we set sock->sk to NULL in ->release().

    As pointed out by Al, this is unique to sockfs. So we can fix this
    in socket layer by acquiring inode_lock in sock_close() and
    checking against NULL in sockfs_setattr().

    sock_release() is called in many places, only the sock_close()
    path matters here. And fortunately, this should not affect normal
    sock_close() as it is only called when the last fd refcnt is gone.
    It only affects sock_close() with a parallel sockfs_setattr() in
    progress, which is not common.

    Fixes: 86741ec25462 ("net: core: Add a UID field to struct sock.")
    Reported-by: shankarapailoor
    Cc: Tetsuo Handa
    Cc: Lorenzo Colitti
    Cc: Al Viro
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Cong Wang
     

22 Feb, 2018

1 commit

  • commit 4950276672fce5c241857540f8561c440663673d upstream.

    Patch series "kmemcheck: kill kmemcheck", v2.

    As discussed at LSF/MM, kill kmemcheck.

    KASan is a replacement that is able to work without the limitation of
    kmemcheck (single CPU, slow). KASan is already upstream.

    We are also not aware of any users of kmemcheck (or users who don't
    consider KASan as a suitable replacement).

    The only objection was that since KASAN wasn't supported by all GCC
    versions provided by distros at that time we should hold off for 2
    years, and try again.

    Now that 2 years have passed, and all distros provide gcc that supports
    KASAN, kill kmemcheck again for the very same reasons.

    This patch (of 4):

    Remove kmemcheck annotations, and calls to kmemcheck from the kernel.

    [alexander.levin@verizon.com: correctly remove kmemcheck call from dma_map_sg_attrs]
    Link: http://lkml.kernel.org/r/20171012192151.26531-1-alexander.levin@verizon.com
    Link: http://lkml.kernel.org/r/20171007030159.22241-2-alexander.levin@verizon.com
    Signed-off-by: Sasha Levin
    Cc: Alexander Potapenko
    Cc: Eric W. Biederman
    Cc: Michal Hocko
    Cc: Pekka Enberg
    Cc: Steven Rostedt
    Cc: Tim Hansen
    Cc: Vegard Nossum
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Levin, Alexander (Sasha Levin)
     

31 Jan, 2018

1 commit

  • [ upstream commit 290af86629b25ffd1ed6232c4e9107da031705cb ]

    The BPF interpreter has been used as part of the spectre 2 attack CVE-2017-5715.

    A quote from goolge project zero blog:
    "At this point, it would normally be necessary to locate gadgets in
    the host kernel code that can be used to actually leak data by reading
    from an attacker-controlled location, shifting and masking the result
    appropriately and then using the result of that as offset to an
    attacker-controlled address for a load. But piecing gadgets together
    and figuring out which ones work in a speculation context seems annoying.
    So instead, we decided to use the eBPF interpreter, which is built into
    the host kernel - while there is no legitimate way to invoke it from inside
    a VM, the presence of the code in the host kernel's text section is sufficient
    to make it usable for the attack, just like with ordinary ROP gadgets."

    To make attacker job harder introduce BPF_JIT_ALWAYS_ON config
    option that removes interpreter from the kernel in favor of JIT-only mode.
    So far eBPF JIT is supported by:
    x64, arm64, arm32, sparc64, s390, powerpc64, mips64

    The start of JITed program is randomized and code page is marked as read-only.
    In addition "constant blinding" can be turned on with net.core.bpf_jit_harden

    v2->v3:
    - move __bpf_prog_ret0 under ifdef (Daniel)

    v1->v2:
    - fix init order, test_bpf and cBPF (Daniel's feedback)
    - fix offloaded bpf (Jakub's feedback)
    - add 'return 0' dummy in case something can invoke prog->bpf_func
    - retarget bpf tree. For bpf-next the patch would need one extra hunk.
    It will be sent when the trees are merged back to net-next

    Considered doing:
    int bpf_jit_enable __read_mostly = BPF_EBPF_JIT_DEFAULT;
    but it seems better to land the patch as-is and in bpf-next remove
    bpf_jit_enable global variable from all JITs, consolidate in one place
    and remove this jit_init() function.

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: Daniel Borkmann
    Signed-off-by: Greg Kroah-Hartman

    Alexei Starovoitov
     

17 Aug, 2017

1 commit

  • A couple fixes to new skb_send_sock infrastructure. However, no users
    currently exist for this code (adding user in next handful of patches)
    so it should not be possible to trigger a panic with existing in-kernel
    code.

    Fixes: 306b13eb3cf9 ("proto_ops: Add locked held versions of sendmsg and sendpage")
    Signed-off-by: John Fastabend
    Signed-off-by: David S. Miller

    John Fastabend
     

02 Aug, 2017

2 commits


26 Jul, 2017

1 commit

  • The variable owned_by_user is always set, but only used
    when kernel is configured with LOCKDEP enabled.

    Get rid of the warning by moving the code to put the call
    to owned_by_user into the the rcu_protected call.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    stephen hemminger
     

25 Jul, 2017

1 commit

  • The commit ffb07550c76f ("copy_msghdr_from_user(): get rid of
    field-by-field copyin") introduce a new sparse warning:

    net/socket.c:1919:27: warning: incorrect type in assignment (different address spaces)
    net/socket.c:1919:27: expected void *msg_control
    net/socket.c:1919:27: got void [noderef] *[addressable] msg_control

    and a line above 80 chars, let's fix them

    Fixes: ffb07550c76f ("copy_msghdr_from_user(): get rid of field-by-field copyin")
    Signed-off-by: Paolo Abeni
    Signed-off-by: David S. Miller

    Paolo Abeni
     

16 Jul, 2017

1 commit


06 Jul, 2017

1 commit

  • Pull misc user access cleanups from Al Viro:
    "The first pile is assorted getting rid of cargo-culted access_ok(),
    cargo-culted set_fs() and field-by-field copyouts.

    The same description applies to a lot of stuff in other branches -
    this is just the stuff that didn't fit into a more specific topical
    branch"

    * 'work.misc-set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    Switch flock copyin/copyout primitives to copy_{from,to}_user()
    fs/fcntl: return -ESRCH in f_setown when pid/pgid can't be found
    fs/fcntl: f_setown, avoid undefined behaviour
    fs/fcntl: f_setown, allow returning error
    lpfc debugfs: get rid of pointless access_ok()
    adb: get rid of pointless access_ok()
    isdn: get rid of pointless access_ok()
    compat statfs: switch to copy_to_user()
    fs/locks: don't mess with the address limit in compat_fcntl64
    nfsd_readlink(): switch to vfs_get_link()
    drbd: ->sendpage() never needed set_fs()
    fs/locks: pass kernel struct flock to fcntl_getlk/setlk
    fs: locks: Fix some troubles at kernel-doc comments

    Linus Torvalds
     

05 Jul, 2017

1 commit


14 Jun, 2017

1 commit

  • Allow f_setown to return an error value. We will fail in the next patch
    with EINVAL for bad input to f_setown, so tile the path for the later
    patch.

    Signed-off-by: Jiri Slaby
    Reviewed-by: Jeff Layton
    Cc: Jeff Layton
    Cc: "J. Bruce Fields"
    Cc: Alexander Viro
    Cc: linux-fsdevel@vger.kernel.org
    Signed-off-by: Jeff Layton

    Jiri Slaby
     

23 May, 2017

1 commit


22 May, 2017

2 commits

  • Add SOF_TIMESTAMPING_OPT_TX_SWHW option to allow an outgoing packet to
    be looped to the socket's error queue with a software timestamp even
    when a hardware transmit timestamp is expected to be provided by the
    driver.

    Applications using this option will receive two separate messages from
    the error queue, one with a software timestamp and the other with a
    hardware timestamp. As the hardware timestamp is saved to the shared skb
    info, which may happen before the first message with software timestamp
    is received by the application, the hardware timestamp is copied to the
    SCM_TIMESTAMPING control message only when the skb has no software
    timestamp or it is an incoming packet.

    While changing sw_tx_timestamp(), inline it in skb_tx_timestamp() as
    there are no other users.

    CC: Richard Cochran
    CC: Willem de Bruijn
    Signed-off-by: Miroslav Lichvar
    Acked-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Miroslav Lichvar
     
  • Add SOF_TIMESTAMPING_OPT_PKTINFO option to request a new control message
    for incoming packets with hardware timestamps. It contains the index of
    the real interface which received the packet and the length of the
    packet at layer 2.

    The index is useful with bonding, bridges and other interfaces, where
    IP_PKTINFO doesn't allow applications to determine which PHC made the
    timestamp. With the L2 length (and link speed) it is possible to
    transpose preamble timestamps to trailer timestamps, which are used in
    the NTP protocol.

    While this information could be provided by two new socket options
    independently from timestamping, it doesn't look like they would be very
    useful. With this option any performance impact is limited to hardware
    timestamping.

    Use dev_get_by_napi_id() to get the device and its index. On kernels
    with disabled CONFIG_NET_RX_BUSY_POLL or drivers not using NAPI, a zero
    index will be returned in the control message.

    CC: Richard Cochran
    Acked-by: Willem de Bruijn
    Signed-off-by: Miroslav Lichvar
    Signed-off-by: David S. Miller

    Miroslav Lichvar
     

18 Apr, 2017

1 commit

  • The MTU overhead calculation in L2TP device set-up
    merged via commit b784e7ebfce8cfb16c6f95e14e8532d0768ab7ff
    needs to be adjusted to lock the tunnel socket while
    referencing the sub-data structures to derive the
    socket's IP overhead.

    Reported-by: Guillaume Nault
    Tested-by: Guillaume Nault
    Signed-off-by: R. Parameswaran
    Signed-off-by: David S. Miller

    R. Parameswaran
     

07 Apr, 2017

1 commit

  • A new function, kernel_sock_ip_overhead(), is provided
    to calculate the cumulative overhead imposed by the IP
    Header and IP options, if any, on a socket's payload.
    The new function returns an overhead of zero for sockets
    that do not belong to the IPv4 or IPv6 address families.
    This is used in the L2TP code path to compute the
    total outer IP overhead on the L2TP tunnel socket when
    calculating the default MTU for Ethernet pseudowires.

    Signed-off-by: R. Parameswaran
    Signed-off-by: David S. Miller

    R. Parameswaran
     

22 Mar, 2017

2 commits

  • SOF_TIMESTAMPING_OPT_STATS can be enabled and disabled
    while packets are collected on the error queue.
    So, checking SOF_TIMESTAMPING_OPT_STATS in sk->sk_tsflags
    is not enough to safely assume that the skb contains
    OPT_STATS data.

    Add a bit in sock_exterr_skb to indicate whether the
    skb contains opt_stats data.

    Fixes: 1c885808e456 ("tcp: SOF_TIMESTAMPING_OPT_STATS option for SO_TIMESTAMPING")
    Reported-by: JongHwan Kim
    Signed-off-by: Soheil Hassas Yeganeh
    Signed-off-by: Eric Dumazet
    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Soheil Hassas Yeganeh
     
  • __sock_recv_timestamp can be called for both normal skbs (for
    receive timestamps) and for skbs on the error queue (for transmit
    timestamps).

    Commit 1c885808e456
    (tcp: SOF_TIMESTAMPING_OPT_STATS option for SO_TIMESTAMPING)
    assumes any skb passed to __sock_recv_timestamp are from
    the error queue, containing OPT_STATS in the content of the skb.
    This results in accessing invalid memory or generating junk
    data.

    To fix this, set skb->pkt_type to PACKET_OUTGOING for packets
    on the error queue. This is safe because on the receive path
    on local sockets skb->pkt_type is never set to PACKET_OUTGOING.
    With that, copy OPT_STATS from a packet, only if its pkt_type
    is PACKET_OUTGOING.

    Fixes: 1c885808e456 ("tcp: SOF_TIMESTAMPING_OPT_STATS option for SO_TIMESTAMPING")
    Reported-by: JongHwan Kim
    Signed-off-by: Soheil Hassas Yeganeh
    Signed-off-by: Eric Dumazet
    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Soheil Hassas Yeganeh
     

10 Mar, 2017

2 commits

  • Lockdep issues a circular dependency warning when AFS issues an operation
    through AF_RXRPC from a context in which the VFS/VM holds the mmap_sem.

    The theory lockdep comes up with is as follows:

    (1) If the pagefault handler decides it needs to read pages from AFS, it
    calls AFS with mmap_sem held and AFS begins an AF_RXRPC call, but
    creating a call requires the socket lock:

    mmap_sem must be taken before sk_lock-AF_RXRPC

    (2) afs_open_socket() opens an AF_RXRPC socket and binds it. rxrpc_bind()
    binds the underlying UDP socket whilst holding its socket lock.
    inet_bind() takes its own socket lock:

    sk_lock-AF_RXRPC must be taken before sk_lock-AF_INET

    (3) Reading from a TCP socket into a userspace buffer might cause a fault
    and thus cause the kernel to take the mmap_sem, but the TCP socket is
    locked whilst doing this:

    sk_lock-AF_INET must be taken before mmap_sem

    However, lockdep's theory is wrong in this instance because it deals only
    with lock classes and not individual locks. The AF_INET lock in (2) isn't
    really equivalent to the AF_INET lock in (3) as the former deals with a
    socket entirely internal to the kernel that never sees userspace. This is
    a limitation in the design of lockdep.

    Fix the general case by:

    (1) Double up all the locking keys used in sockets so that one set are
    used if the socket is created by userspace and the other set is used
    if the socket is created by the kernel.

    (2) Store the kern parameter passed to sk_alloc() in a variable in the
    sock struct (sk_kern_sock). This informs sock_lock_init(),
    sock_init_data() and sk_clone_lock() as to the lock keys to be used.

    Note that the child created by sk_clone_lock() inherits the parent's
    kern setting.

    (3) Add a 'kern' parameter to ->accept() that is analogous to the one
    passed in to ->create() that distinguishes whether kernel_accept() or
    sys_accept4() was the caller and can be passed to sk_alloc().

    Note that a lot of accept functions merely dequeue an already
    allocated socket. I haven't touched these as the new socket already
    exists before we get the parameter.

    Note also that there are a couple of places where I've made the accepted
    socket unconditionally kernel-based:

    irda_accept()
    rds_rcp_accept_one()
    tcp_accept_from_sock()

    because they follow a sock_create_kern() and accept off of that.

    Whilst creating this, I noticed that lustre and ocfs don't create sockets
    through sock_create_kern() and thus they aren't marked as for-kernel,
    though they appear to be internal. I wonder if these should do that so
    that they use the new set of lock keys.

    Signed-off-by: David Howells
    Signed-off-by: David S. Miller

    David Howells
     
  • KMSAN reports a use of uninitialized memory in put_cmsg() because
    msg.msg_flags in recvfrom haven't been initialized properly.
    The flag values don't affect the result on this path, but it's still a
    good idea to initialize them explicitly.

    Signed-off-by: Alexander Potapenko
    Signed-off-by: David S. Miller

    Alexander Potapenko
     

22 Feb, 2017

1 commit

  • Commit 34b88a68f26a ("net: Fix use after free in the recvmmsg exit path"),
    changed the exit path of recvmmsg to always return the datagrams
    variable and modified the error paths to set the variable to the error
    code returned by recvmsg if necessary.

    However in the case sock_error returned an error, the error code was
    then ignored, and recvmmsg returned 0.

    Change the error path of recvmmsg to correctly return the error code
    of sock_error.

    The bug was triggered by using recvmmsg on a CAN interface which was
    not up. Linux 4.6 and later return 0 in this case while earlier
    releases returned -ENETDOWN.

    Fixes: 34b88a68f26a ("net: Fix use after free in the recvmmsg exit path")
    Signed-off-by: Maxime Jayat
    Signed-off-by: David S. Miller

    Maxime Jayat
     

12 Jan, 2017

1 commit


11 Jan, 2017

1 commit

  • Make sockfs_setattr() static as it is not used outside of net/socket.c

    This fixes the following GCC warning:
    net/socket.c:534:5: warning: no previous prototype for ‘sockfs_setattr’ [-Wmissing-prototypes]

    Fixes: 86741ec25462 ("net: core: Add a UID field to struct sock.")
    Cc: Lorenzo Colitti
    Signed-off-by: Tobias Klauser
    Acked-by: Lorenzo Colitti
    Signed-off-by: David S. Miller

    Tobias Klauser
     

10 Jan, 2017

1 commit


06 Jan, 2017

1 commit


05 Jan, 2017

1 commit


02 Jan, 2017

1 commit

  • ->setattr() was recently implemented for socket files to sync the socket
    inode's uid to the new 'sk_uid' member of struct sock. It does this by
    copying over the ia_uid member of struct iattr. However, ia_uid is
    actually only valid when ATTR_UID is set in ia_valid, indicating that
    the uid is being changed, e.g. by chown. Other metadata operations such
    as chmod or utimes leave ia_uid uninitialized. Therefore, sk_uid could
    be set to a "garbage" value from the stack.

    Fix this by only copying the uid over when ATTR_UID is set.

    Fixes: 86741ec25462 ("net: core: Add a UID field to struct sock.")
    Signed-off-by: Eric Biggers
    Tested-by: Lorenzo Colitti
    Acked-by: Lorenzo Colitti
    Signed-off-by: David S. Miller

    Eric Biggers
     

26 Dec, 2016

1 commit

  • ktime is a union because the initial implementation stored the time in
    scalar nanoseconds on 64 bit machine and in a endianess optimized timespec
    variant for 32bit machines. The Y2038 cleanup removed the timespec variant
    and switched everything to scalar nanoseconds. The union remained, but
    become completely pointless.

    Get rid of the union and just keep ktime_t as simple typedef of type s64.

    The conversion was done with coccinelle and some manual mopping up.

    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra

    Thomas Gleixner
     

25 Dec, 2016

1 commit


11 Dec, 2016

1 commit


09 Dec, 2016

1 commit


30 Nov, 2016

1 commit

  • This patch exports the sender chronograph stats via the socket
    SO_TIMESTAMPING channel. Currently we can instrument how long a
    particular application unit of data was queued in TCP by tracking
    SOF_TIMESTAMPING_TX_SOFTWARE and SOF_TIMESTAMPING_TX_SCHED. Having
    these sender chronograph stats exported simultaneously along with
    these timestamps allow further breaking down the various sender
    limitation. For example, a video server can tell if a particular
    chunk of video on a connection takes a long time to deliver because
    TCP was experiencing small receive window. It is not possible to
    tell before this patch without packet traces.

    To prepare these stats, the user needs to set
    SOF_TIMESTAMPING_OPT_STATS and SOF_TIMESTAMPING_OPT_TSONLY flags
    while requesting other SOF_TIMESTAMPING TX timestamps. When the
    timestamps are available in the error queue, the stats are returned
    in a separate control message of type SCM_TIMESTAMPING_OPT_STATS,
    in a list of TLVs (struct nlattr) of types: TCP_NLA_BUSY_TIME,
    TCP_NLA_RWND_LIMITED, TCP_NLA_SNDBUF_LIMITED. Unit is microsecond.

    Signed-off-by: Francis Yan
    Signed-off-by: Yuchung Cheng
    Signed-off-by: Soheil Hassas Yeganeh
    Acked-by: Neal Cardwell
    Signed-off-by: David S. Miller

    Francis Yan
     

23 Nov, 2016

1 commit

  • All conflicts were simple overlapping changes except perhaps
    for the Thunder driver.

    That driver has a change_mtu method explicitly for sending
    a message to the hardware. If that fails it returns an
    error.

    Normally a driver doesn't need an ndo_change_mtu method becuase those
    are usually just range changes, which are now handled generically.
    But since this extra operation is needed in the Thunder driver, it has
    to stay.

    However, if the message send fails we have to restore the original
    MTU before the change because the entire call chain expects that if
    an error is thrown by ndo_change_mtu then the MTU did not change.
    Therefore code is added to nicvf_change_mtu to remember the original
    MTU, and to restore it upon nicvf_update_hw_max_frs() failue.

    Signed-off-by: David S. Miller

    David S. Miller
     

17 Nov, 2016

1 commit

  • The IOP_XATTR flag is set on sockfs because sockfs supports getting the
    "system.sockprotoname" xattr. Since commit 6c6ef9f2, this flag is checked for
    setxattr support as well. This is wrong on sockfs because security xattr
    support there is supposed to be provided by security_inode_setsecurity. The
    smack security module relies on socket labels (xattrs).

    Fix this by adding a security xattr handler on sockfs that returns
    -EAGAIN, and by checking for -EAGAIN in setxattr.

    We cannot simply check for -EOPNOTSUPP in setxattr because there are
    filesystems that neither have direct security xattr support nor support
    via security_inode_setsecurity. A more proper fix might be to move the
    call to security_inode_setsecurity into sockfs, but it's not clear to me
    if that is safe: we would end up calling security_inode_post_setxattr after
    that as well.

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Al Viro

    Andreas Gruenbacher
     

15 Nov, 2016

1 commit


10 Nov, 2016

1 commit

  • Do not send the next message in sendmmsg for partial sendmsg
    invocations.

    sendmmsg assumes that it can continue sending the next message
    when the return value of the individual sendmsg invocations
    is positive. It results in corrupting the data for TCP,
    SCTP, and UNIX streams.

    For example, sendmmsg([["abcd"], ["efgh"]]) can result in a stream
    of "aefgh" if the first sendmsg invocation sends only the first
    byte while the second sendmsg goes through.

    Datagram sockets either send the entire datagram or fail, so
    this patch affects only sockets of type SOCK_STREAM and
    SOCK_SEQPACKET.

    Fixes: 228e548e6020 ("net: Add sendmmsg socket system call")
    Signed-off-by: Soheil Hassas Yeganeh
    Signed-off-by: Eric Dumazet
    Signed-off-by: Willem de Bruijn
    Signed-off-by: Neal Cardwell
    Acked-by: Maciej Żenczykowski
    Signed-off-by: David S. Miller

    Soheil Hassas Yeganeh
     

05 Nov, 2016

1 commit

  • Protocol sockets (struct sock) don't have UIDs, but most of the
    time, they map 1:1 to userspace sockets (struct socket) which do.

    Various operations such as the iptables xt_owner match need
    access to the "UID of a socket", and do so by following the
    backpointer to the struct socket. This involves taking
    sk_callback_lock and doesn't work when there is no socket
    because userspace has already called close().

    Simplify this by adding a sk_uid field to struct sock whose value
    matches the UID of the corresponding struct socket. The semantics
    are as follows:

    1. Whenever sk_socket is non-null: sk_uid is the same as the UID
    in sk_socket, i.e., matches the return value of sock_i_uid.
    Specifically, the UID is set when userspace calls socket(),
    fchown(), or accept().
    2. When sk_socket is NULL, sk_uid is defined as follows:
    - For a socket that no longer has a sk_socket because
    userspace has called close(): the previous UID.
    - For a cloned socket (e.g., an incoming connection that is
    established but on which userspace has not yet called
    accept): the UID of the socket it was cloned from.
    - For a socket that has never had an sk_socket: UID 0 inside
    the user namespace corresponding to the network namespace
    the socket belongs to.

    Kernel sockets created by sock_create_kern are a special case
    of #1 and sk_uid is the user that created them. For kernel
    sockets created at network namespace creation time, such as the
    per-processor ICMP and TCP sockets, this is the user that created
    the network namespace.

    Signed-off-by: Lorenzo Colitti
    Signed-off-by: David S. Miller

    Lorenzo Colitti