06 Oct, 2020

1 commit


03 Oct, 2020

1 commit

  • If a page sent into kernel_sendpage() is a slab page or it doesn't have
    ref_count, this page is improper to send by the zero copy sendpage()
    method. Otherwise such page might be unexpected released in network code
    path and causes impredictable panic due to kernel memory management data
    structure corruption.

    This path adds a WARN_ON() on the sending page before sends it into the
    concrete zero-copy sendpage() method, if the page is improper for the
    zero-copy sendpage() method, a warning message can be observed before
    the consequential unpredictable kernel panic.

    This patch does not change existing kernel_sendpage() behavior for the
    improper page zero-copy send, it just provides hint warning message for
    following potential panic due the kernel memory heap corruption.

    Signed-off-by: Coly Li
    Cc: Cong Wang
    Cc: Christoph Hellwig
    Cc: David S. Miller
    Cc: Sridhar Samudrala
    Signed-off-by: David S. Miller

    Coly Li
     

05 Sep, 2020

1 commit

  • We got slightly different patches removing a double word
    in a comment in net/ipv4/raw.c - picked the version from net.

    Simple conflict in drivers/net/ethernet/ibm/ibmvnic.c. Use cached
    values instead of VNIC login response buffer (following what
    commit 507ebe6444a4 ("ibmvnic: Fix use-after-free of VNIC login
    response buffer") did).

    Signed-off-by: Jakub Kicinski

    Jakub Kicinski
     

27 Aug, 2020

1 commit


25 Aug, 2020

1 commit

  • For TCP tx zero-copy, the kernel notifies the process of completions by
    queuing completion notifications on the socket error queue. This patch
    allows reading these notifications via recvmsg to support TCP tx
    zero-copy.

    Ancillary data was originally disallowed due to privilege escalation
    via io_uring's offloading of sendmsg() onto a kernel thread with kernel
    credentials (https://crbug.com/project-zero/1975). So, we must ensure
    that the socket type is one where the ancillary data types that are
    delivered on recvmsg are plain data (no file descriptors or values that
    are translated based on the identity of the calling process).

    This was tested by using io_uring to call recvmsg on the MSG_ERRQUEUE
    with tx zero-copy enabled. Before this patch, we received -EINVALID from
    this specific code path. After this patch, we could read tcp tx
    zero-copy completion notifications from the MSG_ERRQUEUE.

    Signed-off-by: Soheil Hassas Yeganeh
    Signed-off-by: Arjun Roy
    Acked-by: Eric Dumazet
    Reviewed-by: Jann Horn
    Reviewed-by: Jens Axboe
    Signed-off-by: Luke Hsiao
    Signed-off-by: David S. Miller

    Luke Hsiao
     

11 Aug, 2020

1 commit

  • This reverts commits 6d04fe15f78acdf8e32329e208552e226f7a8ae6 and
    a31edb2059ed4e498f9aa8230c734b59d0ad797a.

    It turns out the idea to share a single pointer for both kernel and user
    space address causes various kinds of problems. So use the slightly less
    optimal version that uses an extra bit, but which is guaranteed to be safe
    everywhere.

    Fixes: 6d04fe15f78a ("net: optimize the sockptr_t for unified kernel/user address spaces")
    Reported-by: Eric Dumazet
    Reported-by: John Stultz
    Signed-off-by: Christoph Hellwig
    Signed-off-by: David S. Miller

    Christoph Hellwig
     

09 Aug, 2020

4 commits


29 Jul, 2020

1 commit

  • Make sure not just the pointer itself but the whole range lies in
    the user address space. For that pass the length and then use
    the access_ok helper to do the check.

    Fixes: 6d04fe15f78a ("net: optimize the sockptr_t for unified kernel/user address spaces")
    Reported-by: David Laight
    Signed-off-by: Christoph Hellwig
    Signed-off-by: David S. Miller

    Christoph Hellwig
     

25 Jul, 2020

3 commits

  • For architectures like x86 and arm64 we don't need the separate bit to
    indicate that a pointer is a kernel pointer as the address spaces are
    unified. That way the sockptr_t can be reduced to a union of two
    pointers, which leads to nicer calling conventions.

    The only caveat is that we need to check that users don't pass in kernel
    address and thus gain access to kernel memory. Thus the USER_SOCKPTR
    helper is replaced with a init_user_sockptr function that does this check
    and returns an error if it fails.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: David S. Miller

    Christoph Hellwig
     
  • Rework the remaining setsockopt code to pass a sockptr_t instead of a
    plain user pointer. This removes the last remaining set_fs(KERNEL_DS)
    outside of architecture specific code.

    Signed-off-by: Christoph Hellwig
    Acked-by: Stefan Schmidt [ieee802154]
    Acked-by: Matthieu Baerts
    Signed-off-by: David S. Miller

    Christoph Hellwig
     
  • Pass a sockptr_t to prepare for set_fs-less handling of the kernel
    pointer from bpf-cgroup.

    Signed-off-by: Christoph Hellwig
    Acked-by: Matthieu Baerts
    Signed-off-by: David S. Miller

    Christoph Hellwig
     

20 Jul, 2020

4 commits


14 Jul, 2020

1 commit


05 Jul, 2020

1 commit

  • setsockopt(mptcp_fd, SOL_SOCKET, ...)... appears to work (returns 0),
    but it has no effect -- this is because the MPTCP layer never has a
    chance to copy the settings to the subflow socket.

    Skip the generic handling for the mptcp case and instead call the
    mptcp specific handler instead for SOL_SOCKET too.

    Next patch adds more specific handling for SOL_SOCKET to mptcp.

    Signed-off-by: Florian Westphal
    Signed-off-by: David S. Miller

    Florian Westphal
     

30 May, 2020

1 commit


28 May, 2020

1 commit


19 May, 2020

2 commits


12 May, 2020

1 commit

  • The msg_control field in struct msghdr can either contain a user
    pointer when used with the recvmsg system call, or a kernel pointer
    when used with sendmsg. To complicate things further kernel_recvmsg
    can stuff a kernel pointer in and then use set_fs to make the uaccess
    helpers accept it.

    Replace it with a union of a kernel pointer msg_control field, and
    a user pointer msg_control_user one, and allow kernel_recvmsg operate
    on a proper kernel pointer using a bitfield to override the normal
    choice of a user pointer for recvmsg.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: David S. Miller

    Christoph Hellwig
     

31 Mar, 2020

1 commit

  • Pull io_uring updates from Jens Axboe:
    "Here are the io_uring changes for this merge window. Light on new
    features this time around (just splice + buffer selection), lots of
    cleanups, fixes, and improvements to existing support. In particular,
    this contains:

    - Cleanup fixed file update handling for stack fallback (Hillf)

    - Re-work of how pollable async IO is handled, we no longer require
    thread offload to handle that. Instead we rely using poll to drive
    this, with task_work execution.

    - In conjunction with the above, allow expendable buffer selection,
    so that poll+recv (for example) no longer has to be a split
    operation.

    - Make sure we honor RLIMIT_FSIZE for buffered writes

    - Add support for splice (Pavel)

    - Linked work inheritance fixes and optimizations (Pavel)

    - Async work fixes and cleanups (Pavel)

    - Improve io-wq locking (Pavel)

    - Hashed link write improvements (Pavel)

    - SETUP_IOPOLL|SETUP_SQPOLL improvements (Xiaoguang)"

    * tag 'for-5.7/io_uring-2020-03-29' of git://git.kernel.dk/linux-block: (54 commits)
    io_uring: cleanup io_alloc_async_ctx()
    io_uring: fix missing 'return' in comment
    io-wq: handle hashed writes in chains
    io-uring: drop 'free_pfile' in struct io_file_put
    io-uring: drop completion when removing file
    io_uring: Fix ->data corruption on re-enqueue
    io-wq: close cancel gap for hashed linked work
    io_uring: make spdxcheck.py happy
    io_uring: honor original task RLIMIT_FSIZE
    io-wq: hash dependent work
    io-wq: split hashing and enqueueing
    io-wq: don't resched if there is no work
    io-wq: remove duplicated cancel code
    io_uring: fix truncated async read/readv and write/writev retry
    io_uring: dual license io_uring.h uapi header
    io_uring: io_uring_enter(2) don't poll while SETUP_IOPOLL|SETUP_SQPOLL enabled
    io_uring: Fix unused function warnings
    io_uring: add end-of-bits marker and build time verify it
    io_uring: provide means of removing buffers
    io_uring: add IOSQE_BUFFER_SELECT support for IORING_OP_RECVMSG
    ...

    Linus Torvalds
     

20 Mar, 2020

1 commit

  • Just like commit 4022e7af86be, this fixes the fact that
    IORING_OP_ACCEPT ends up using get_unused_fd_flags(), which checks
    current->signal->rlim[] for limits.

    Add an extra argument to __sys_accept4_file() that allows us to pass
    in the proper nofile limit, and grab it at request prep time.

    Acked-by: David S. Miller
    Signed-off-by: Jens Axboe

    Jens Axboe
     

10 Mar, 2020

1 commit


09 Jan, 2020

1 commit

  • When procfs is disabled, the fdinfo code causes a harmless
    warning:

    net/socket.c:1000:13: error: 'sock_show_fdinfo' defined but not used [-Werror=unused-function]
    static void sock_show_fdinfo(struct seq_file *m, struct file *f)

    Move the function definition up so we can use a single #ifdef
    around it.

    Fixes: b4653342b151 ("net: Allow to show socket-specific information in /proc/[pid]/fdinfo/[fd]")
    Suggested-by: Al Viro
    Acked-by: Kirill Tkhai
    Signed-off-by: Arnd Bergmann
    Signed-off-by: David S. Miller

    Arnd Bergmann
     

23 Dec, 2019

1 commit


14 Dec, 2019

1 commit

  • Pull io_uring fixes from Jens Axboe:

    - A tweak to IOSQE_IO_LINK (also marked for stable) to allow links that
    don't sever if the result is < 0.

    This is mostly for linked timeouts, where if we ask for a pure
    timeout we always get -ETIME. This makes links useless for that case,
    hence allow a case where it works.

    - Five minor optimizations to fix and improve cases that regressed
    since v5.4.

    - An SQTHREAD locking fix.

    - A sendmsg/recvmsg iov assignment fix.

    - Net fix where read_iter/write_iter don't honor IOCB_NOWAIT, and
    subsequently ensuring that works for io_uring.

    - Fix a case where for an invalid opcode we might return -EBADF instead
    of -EINVAL, if the ->fd of that sqe was set to an invalid fd value.

    * tag 'io_uring-5.5-20191212' of git://git.kernel.dk/linux-block:
    io_uring: ensure we return -EINVAL on unknown opcode
    io_uring: add sockets to list of files that support non-blocking issue
    net: make socket read/write_iter() honor IOCB_NOWAIT
    io_uring: only hash regular files for async work execution
    io_uring: run next sqe inline if possible
    io_uring: don't dynamically allocate poll data
    io_uring: deferred send/recvmsg should assign iov
    io_uring: sqthread should grab ctx->uring_lock for submissions
    io-wq: briefly spin for new work after finishing work
    io-wq: remove worker->wait waitqueue
    io_uring: allow unbreakable links

    Linus Torvalds
     

13 Dec, 2019

1 commit


11 Dec, 2019

1 commit

  • The socket read/write helpers only look at the file O_NONBLOCK. not
    the iocb IOCB_NOWAIT flag. This breaks users like preadv2/pwritev2
    and io_uring that rely on not having the file itself marked nonblocking,
    but rather the iocb itself.

    Cc: netdev@vger.kernel.org
    Acked-by: David Miller
    Signed-off-by: Jens Axboe

    Jens Axboe
     

09 Dec, 2019

1 commit

  • Pull networking fixes from David Miller:

    1) More jumbo frame fixes in r8169, from Heiner Kallweit.

    2) Fix bpf build in minimal configuration, from Alexei Starovoitov.

    3) Use after free in slcan driver, from Jouni Hogander.

    4) Flower classifier port ranges don't work properly in the HW offload
    case, from Yoshiki Komachi.

    5) Use after free in hns3_nic_maybe_stop_tx(), from Yunsheng Lin.

    6) Out of bounds access in mqprio_dump(), from Vladyslav Tarasiuk.

    7) Fix flow dissection in dsa TX path, from Alexander Lobakin.

    8) Stale syncookie timestampe fixes from Guillaume Nault.

    [ Did an evil merge to silence a warning introduced by this pull - Linus ]

    * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (84 commits)
    r8169: fix rtl_hw_jumbo_disable for RTL8168evl
    net_sched: validate TCA_KIND attribute in tc_chain_tmplt_add()
    r8169: add missing RX enabling for WoL on RTL8125
    vhost/vsock: accept only packets with the right dst_cid
    net: phy: dp83867: fix hfs boot in rgmii mode
    net: ethernet: ti: cpsw: fix extra rx interrupt
    inet: protect against too small mtu values.
    gre: refetch erspan header from skb->data after pskb_may_pull()
    pppoe: remove redundant BUG_ON() check in pppoe_pernet
    tcp: Protect accesses to .ts_recent_stamp with {READ,WRITE}_ONCE()
    tcp: tighten acceptance of ACKs not matching a child socket
    tcp: fix rejected syncookies due to stale timestamps
    lpc_eth: kernel BUG on remove
    tcp: md5: fix potential overestimation of TCP option space
    net: sched: allow indirect blocks to bind to clsact in TC
    net: core: rename indirect block ingress cb function
    net-sysfs: Call dev_hold always in netdev_queue_add_kobject
    net: dsa: fix flow dissection on Tx path
    net/tls: Fix return values to avoid ENOTSUPP
    net: avoid an indirect call in ____sys_recvmsg()
    ...

    Linus Torvalds
     

07 Dec, 2019

1 commit

  • CONFIG_RETPOLINE=y made indirect calls expensive.

    gcc seems to add an indirect call in ____sys_recvmsg().

    Rewriting the code slightly makes sure to avoid this indirection.

    Alternative would be to not call sock_recvmsg() and instead
    use security_socket_recvmsg() and sock_recvmsg_nosec(),
    but this is less readable IMO.

    Signed-off-by: Eric Dumazet
    Cc: Paolo Abeni
    Cc: David Laight
    Acked-by: Paolo Abeni
    Signed-off-by: David S. Miller

    Eric Dumazet
     

03 Dec, 2019

2 commits


02 Dec, 2019

2 commits

  • Pull y2038 cleanups from Arnd Bergmann:
    "y2038 syscall implementation cleanups

    This is a series of cleanups for the y2038 work, mostly intended for
    namespace cleaning: the kernel defines the traditional time_t, timeval
    and timespec types that often lead to y2038-unsafe code. Even though
    the unsafe usage is mostly gone from the kernel, having the types and
    associated functions around means that we can still grow new users,
    and that we may be missing conversions to safe types that actually
    matter.

    There are still a number of driver specific patches needed to get the
    last users of these types removed, those have been submitted to the
    respective maintainers"

    Link: https://lore.kernel.org/lkml/20191108210236.1296047-1-arnd@arndb.de/

    * tag 'y2038-cleanups-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: (26 commits)
    y2038: alarm: fix half-second cut-off
    y2038: ipc: fix x32 ABI breakage
    y2038: fix typo in powerpc vdso "LOPART"
    y2038: allow disabling time32 system calls
    y2038: itimer: change implementation to timespec64
    y2038: move itimer reset into itimer.c
    y2038: use compat_{get,set}_itimer on alpha
    y2038: itimer: compat handling to itimer.c
    y2038: time: avoid timespec usage in settimeofday()
    y2038: timerfd: Use timespec64 internally
    y2038: elfcore: Use __kernel_old_timeval for process times
    y2038: make ns_to_compat_timeval use __kernel_old_timeval
    y2038: socket: use __kernel_old_timespec instead of timespec
    y2038: socket: remove timespec reference in timestamping
    y2038: syscalls: change remaining timeval to __kernel_old_timeval
    y2038: rusage: use __kernel_old_timeval
    y2038: uapi: change __kernel_time_t to __kernel_old_time_t
    y2038: stat: avoid 'time_t' in 'struct stat'
    y2038: ipc: remove __kernel_time_t reference from headers
    y2038: vdso: powerpc: avoid timespec references
    ...

    Linus Torvalds
     
  • Pull removal of most of fs/compat_ioctl.c from Arnd Bergmann:
    "As part of the cleanup of some remaining y2038 issues, I came to
    fs/compat_ioctl.c, which still has a couple of commands that need
    support for time64_t.

    In completely unrelated work, I spent time on cleaning up parts of
    this file in the past, moving things out into drivers instead.

    After Al Viro reviewed an earlier version of this series and did a lot
    more of that cleanup, I decided to try to completely eliminate the
    rest of it and move it all into drivers.

    This series incorporates some of Al's work and many patches of my own,
    but in the end stops short of actually removing the last part, which
    is the scsi ioctl handlers. I have patches for those as well, but they
    need more testing or possibly a rewrite"

    * tag 'compat-ioctl-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: (42 commits)
    scsi: sd: enable compat ioctls for sed-opal
    pktcdvd: add compat_ioctl handler
    compat_ioctl: move SG_GET_REQUEST_TABLE handling
    compat_ioctl: ppp: move simple commands into ppp_generic.c
    compat_ioctl: handle PPPIOCGIDLE for 64-bit time_t
    compat_ioctl: move PPPIOCSCOMPRESS to ppp_generic
    compat_ioctl: unify copy-in of ppp filters
    tty: handle compat PPP ioctls
    compat_ioctl: move SIOCOUTQ out of compat_ioctl.c
    compat_ioctl: handle SIOCOUTQNSD
    af_unix: add compat_ioctl support
    compat_ioctl: reimplement SG_IO handling
    compat_ioctl: move WDIOC handling into wdt drivers
    fs: compat_ioctl: move FITRIM emulation into file systems
    gfs2: add compat_ioctl support
    compat_ioctl: remove unused convert_in_user macro
    compat_ioctl: remove last RAID handling code
    compat_ioctl: remove /dev/raw ioctl translation
    compat_ioctl: remove PCI ioctl translation
    compat_ioctl: remove joystick ioctl translation
    ...

    Linus Torvalds
     

27 Nov, 2019

1 commit

  • Only io_uring uses (and added) these, and we want to disallow the
    use of sendmsg/recvmsg for anything but regular data transfers.
    Use the newly added prep helper to split the msghdr copy out from
    the core function, to check for msg_control and msg_controllen
    settings. If either is set, we return -EINVAL.

    Acked-by: David S. Miller
    Signed-off-by: Jens Axboe

    Jens Axboe