09 Jan, 2020

1 commit

  • When procfs is disabled, the fdinfo code causes a harmless
    warning:

    net/socket.c:1000:13: error: 'sock_show_fdinfo' defined but not used [-Werror=unused-function]
    static void sock_show_fdinfo(struct seq_file *m, struct file *f)

    Move the function definition up so we can use a single #ifdef
    around it.

    Fixes: b4653342b151 ("net: Allow to show socket-specific information in /proc/[pid]/fdinfo/[fd]")
    Suggested-by: Al Viro
    Acked-by: Kirill Tkhai
    Signed-off-by: Arnd Bergmann
    Signed-off-by: David S. Miller

    Arnd Bergmann
     

23 Dec, 2019

1 commit


14 Dec, 2019

1 commit

  • Pull io_uring fixes from Jens Axboe:

    - A tweak to IOSQE_IO_LINK (also marked for stable) to allow links that
    don't sever if the result is < 0.

    This is mostly for linked timeouts, where if we ask for a pure
    timeout we always get -ETIME. This makes links useless for that case,
    hence allow a case where it works.

    - Five minor optimizations to fix and improve cases that regressed
    since v5.4.

    - An SQTHREAD locking fix.

    - A sendmsg/recvmsg iov assignment fix.

    - Net fix where read_iter/write_iter don't honor IOCB_NOWAIT, and
    subsequently ensuring that works for io_uring.

    - Fix a case where for an invalid opcode we might return -EBADF instead
    of -EINVAL, if the ->fd of that sqe was set to an invalid fd value.

    * tag 'io_uring-5.5-20191212' of git://git.kernel.dk/linux-block:
    io_uring: ensure we return -EINVAL on unknown opcode
    io_uring: add sockets to list of files that support non-blocking issue
    net: make socket read/write_iter() honor IOCB_NOWAIT
    io_uring: only hash regular files for async work execution
    io_uring: run next sqe inline if possible
    io_uring: don't dynamically allocate poll data
    io_uring: deferred send/recvmsg should assign iov
    io_uring: sqthread should grab ctx->uring_lock for submissions
    io-wq: briefly spin for new work after finishing work
    io-wq: remove worker->wait waitqueue
    io_uring: allow unbreakable links

    Linus Torvalds
     

13 Dec, 2019

1 commit


11 Dec, 2019

1 commit

  • The socket read/write helpers only look at the file O_NONBLOCK. not
    the iocb IOCB_NOWAIT flag. This breaks users like preadv2/pwritev2
    and io_uring that rely on not having the file itself marked nonblocking,
    but rather the iocb itself.

    Cc: netdev@vger.kernel.org
    Acked-by: David Miller
    Signed-off-by: Jens Axboe

    Jens Axboe
     

09 Dec, 2019

1 commit

  • Pull networking fixes from David Miller:

    1) More jumbo frame fixes in r8169, from Heiner Kallweit.

    2) Fix bpf build in minimal configuration, from Alexei Starovoitov.

    3) Use after free in slcan driver, from Jouni Hogander.

    4) Flower classifier port ranges don't work properly in the HW offload
    case, from Yoshiki Komachi.

    5) Use after free in hns3_nic_maybe_stop_tx(), from Yunsheng Lin.

    6) Out of bounds access in mqprio_dump(), from Vladyslav Tarasiuk.

    7) Fix flow dissection in dsa TX path, from Alexander Lobakin.

    8) Stale syncookie timestampe fixes from Guillaume Nault.

    [ Did an evil merge to silence a warning introduced by this pull - Linus ]

    * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (84 commits)
    r8169: fix rtl_hw_jumbo_disable for RTL8168evl
    net_sched: validate TCA_KIND attribute in tc_chain_tmplt_add()
    r8169: add missing RX enabling for WoL on RTL8125
    vhost/vsock: accept only packets with the right dst_cid
    net: phy: dp83867: fix hfs boot in rgmii mode
    net: ethernet: ti: cpsw: fix extra rx interrupt
    inet: protect against too small mtu values.
    gre: refetch erspan header from skb->data after pskb_may_pull()
    pppoe: remove redundant BUG_ON() check in pppoe_pernet
    tcp: Protect accesses to .ts_recent_stamp with {READ,WRITE}_ONCE()
    tcp: tighten acceptance of ACKs not matching a child socket
    tcp: fix rejected syncookies due to stale timestamps
    lpc_eth: kernel BUG on remove
    tcp: md5: fix potential overestimation of TCP option space
    net: sched: allow indirect blocks to bind to clsact in TC
    net: core: rename indirect block ingress cb function
    net-sysfs: Call dev_hold always in netdev_queue_add_kobject
    net: dsa: fix flow dissection on Tx path
    net/tls: Fix return values to avoid ENOTSUPP
    net: avoid an indirect call in ____sys_recvmsg()
    ...

    Linus Torvalds
     

07 Dec, 2019

1 commit

  • CONFIG_RETPOLINE=y made indirect calls expensive.

    gcc seems to add an indirect call in ____sys_recvmsg().

    Rewriting the code slightly makes sure to avoid this indirection.

    Alternative would be to not call sock_recvmsg() and instead
    use security_socket_recvmsg() and sock_recvmsg_nosec(),
    but this is less readable IMO.

    Signed-off-by: Eric Dumazet
    Cc: Paolo Abeni
    Cc: David Laight
    Acked-by: Paolo Abeni
    Signed-off-by: David S. Miller

    Eric Dumazet
     

03 Dec, 2019

2 commits


02 Dec, 2019

2 commits

  • Pull y2038 cleanups from Arnd Bergmann:
    "y2038 syscall implementation cleanups

    This is a series of cleanups for the y2038 work, mostly intended for
    namespace cleaning: the kernel defines the traditional time_t, timeval
    and timespec types that often lead to y2038-unsafe code. Even though
    the unsafe usage is mostly gone from the kernel, having the types and
    associated functions around means that we can still grow new users,
    and that we may be missing conversions to safe types that actually
    matter.

    There are still a number of driver specific patches needed to get the
    last users of these types removed, those have been submitted to the
    respective maintainers"

    Link: https://lore.kernel.org/lkml/20191108210236.1296047-1-arnd@arndb.de/

    * tag 'y2038-cleanups-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: (26 commits)
    y2038: alarm: fix half-second cut-off
    y2038: ipc: fix x32 ABI breakage
    y2038: fix typo in powerpc vdso "LOPART"
    y2038: allow disabling time32 system calls
    y2038: itimer: change implementation to timespec64
    y2038: move itimer reset into itimer.c
    y2038: use compat_{get,set}_itimer on alpha
    y2038: itimer: compat handling to itimer.c
    y2038: time: avoid timespec usage in settimeofday()
    y2038: timerfd: Use timespec64 internally
    y2038: elfcore: Use __kernel_old_timeval for process times
    y2038: make ns_to_compat_timeval use __kernel_old_timeval
    y2038: socket: use __kernel_old_timespec instead of timespec
    y2038: socket: remove timespec reference in timestamping
    y2038: syscalls: change remaining timeval to __kernel_old_timeval
    y2038: rusage: use __kernel_old_timeval
    y2038: uapi: change __kernel_time_t to __kernel_old_time_t
    y2038: stat: avoid 'time_t' in 'struct stat'
    y2038: ipc: remove __kernel_time_t reference from headers
    y2038: vdso: powerpc: avoid timespec references
    ...

    Linus Torvalds
     
  • Pull removal of most of fs/compat_ioctl.c from Arnd Bergmann:
    "As part of the cleanup of some remaining y2038 issues, I came to
    fs/compat_ioctl.c, which still has a couple of commands that need
    support for time64_t.

    In completely unrelated work, I spent time on cleaning up parts of
    this file in the past, moving things out into drivers instead.

    After Al Viro reviewed an earlier version of this series and did a lot
    more of that cleanup, I decided to try to completely eliminate the
    rest of it and move it all into drivers.

    This series incorporates some of Al's work and many patches of my own,
    but in the end stops short of actually removing the last part, which
    is the scsi ioctl handlers. I have patches for those as well, but they
    need more testing or possibly a rewrite"

    * tag 'compat-ioctl-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: (42 commits)
    scsi: sd: enable compat ioctls for sed-opal
    pktcdvd: add compat_ioctl handler
    compat_ioctl: move SG_GET_REQUEST_TABLE handling
    compat_ioctl: ppp: move simple commands into ppp_generic.c
    compat_ioctl: handle PPPIOCGIDLE for 64-bit time_t
    compat_ioctl: move PPPIOCSCOMPRESS to ppp_generic
    compat_ioctl: unify copy-in of ppp filters
    tty: handle compat PPP ioctls
    compat_ioctl: move SIOCOUTQ out of compat_ioctl.c
    compat_ioctl: handle SIOCOUTQNSD
    af_unix: add compat_ioctl support
    compat_ioctl: reimplement SG_IO handling
    compat_ioctl: move WDIOC handling into wdt drivers
    fs: compat_ioctl: move FITRIM emulation into file systems
    gfs2: add compat_ioctl support
    compat_ioctl: remove unused convert_in_user macro
    compat_ioctl: remove last RAID handling code
    compat_ioctl: remove /dev/raw ioctl translation
    compat_ioctl: remove PCI ioctl translation
    compat_ioctl: remove joystick ioctl translation
    ...

    Linus Torvalds
     

27 Nov, 2019

2 commits


26 Nov, 2019

3 commits

  • This is identical to __sys_connect(), except it takes a struct file
    instead of an fd, and it also allows passing in extra file->f_flags
    flags. The latter is done to support masking in O_NONBLOCK without
    manipulating the original file flags.

    No functional changes in this patch.

    Cc: netdev@vger.kernel.org
    Acked-by: David S. Miller
    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Pull io_uring updates from Jens Axboe:
    "A lot of stuff has been going on this cycle, with improving the
    support for networked IO (and hence unbounded request completion
    times) being one of the major themes. There's been a set of fixes done
    this week, I'll send those out as well once we're certain we're fully
    happy with them.

    This contains:

    - Unification of the "normal" submit path and the SQPOLL path (Pavel)

    - Support for sparse (and bigger) file sets, and updating of those
    file sets without needing to unregister/register again.

    - Independently sized CQ ring, instead of just making it always 2x
    the SQ ring size. This makes it more flexible for networked
    applications.

    - Support for overflowed CQ ring, never dropping events but providing
    backpressure on submits.

    - Add support for absolute timeouts, not just relative ones.

    - Support for generic cancellations. This divorces io_uring from
    workqueues as well, which additionally gets us one step closer to
    generic async system call support.

    - With cancellations, we can support grabbing the process file table
    as well, just like we do mm context. This allows support for system
    calls that create file descriptors, like accept4() support that's
    built on top of that.

    - Support for io_uring tracing (Dmitrii)

    - Support for linked timeouts. These abort an operation if it isn't
    completed by the time noted in the linke timeout.

    - Speedup tracking of poll requests

    - Various cleanups making the coder easier to follow (Jackie, Pavel,
    Bob, YueHaibing, me)

    - Update MAINTAINERS with new io_uring list"

    * tag 'for-5.5/io_uring-20191121' of git://git.kernel.dk/linux-block: (64 commits)
    io_uring: make POLL_ADD/POLL_REMOVE scale better
    io-wq: remove now redundant struct io_wq_nulls_list
    io_uring: Fix getting file for non-fd opcodes
    io_uring: introduce req_need_defer()
    io_uring: clean up io_uring_cancel_files()
    io-wq: ensure free/busy list browsing see all items
    io-wq: ensure we have a stable view of ->cur_work for cancellations
    io_wq: add get/put_work handlers to io_wq_create()
    io_uring: check for validity of ->rings in teardown
    io_uring: fix potential deadlock in io_poll_wake()
    io_uring: use correct "is IO worker" helper
    io_uring: fix -ENOENT issue with linked timer with short timeout
    io_uring: don't do flush cancel under inflight_lock
    io_uring: flag SQPOLL busy condition to userspace
    io_uring: make ASYNC_CANCEL work with poll and timeout
    io_uring: provide fallback request for OOM situations
    io_uring: convert accept4() -ERESTARTSYS into -EINTR
    io_uring: fix error clear of ->file_table in io_sqe_files_register()
    io_uring: separate the io_free_req and io_free_req_find_next interface
    io_uring: keep io_put_req only responsible for release and put req
    ...

    Linus Torvalds
     
  • In commit 3975b097e577 ("convert stream-like files -> stream_open, even
    if they use noop_llseek") Kirill used a coccinelle script to change
    "nonseekable_open()" to "stream_open()", which changed the trivial cases
    of stream-like file descriptors to the new model with FMODE_STREAM.

    However, the two big cases - sockets and pipes - don't actually have
    that trivial pattern at all, and were thus never converted to
    FMODE_STREAM even though it makes lots of sense to do so.

    That's particularly true when looking forward to the next change:
    getting rid of FMODE_ATOMIC_POS entirely, and just using FMODE_STREAM to
    decide whether f_pos updates are needed or not. And if they are, we'll
    always do them atomically.

    This came up because KCSAN (correctly) noted that the non-locked f_pos
    updates are data races: they are clearly benign for the case where we
    don't care, but it would be good to just not have that issue exist at
    all.

    Note that the reason we used FMODE_ATOMIC_POS originally is that only
    doing it for the minimal required case is "safer" in that it's possible
    that the f_pos locking can cause unnecessary serialization across the
    whole write() call. And in the worst case, that kind of serialization
    can cause deadlock issues: think writers that need readers to empty the
    state using the same file descriptor.

    [ Note that the locking is per-file descriptor - because it protects
    "f_pos", which is obviously per-file descriptor - so it only affects
    cases where you literally use the same file descriptor to both read
    and write.

    So a regular pipe that has separate reading and writing file
    descriptors doesn't really have this situation even though it's the
    obvious case of "reader empties what a bit writer concurrently fills"

    But we want to make pipes as being stream-line anyway, because we
    don't want the unnecessary overhead of locking, and because a named
    pipe can be (ab-)used by reading and writing to the same file
    descriptor. ]

    There are likely a lot of other cases that might want FMODE_STREAM, and
    looking for ".llseek = no_llseek" users and other cases that don't have
    an lseek file operation at all and making them use "stream_open()" might
    be a good idea. But pipes and sockets are likely to be the two main
    cases.

    Cc: Kirill Smelkov
    Cc: Eic Dumazet
    Cc: Al Viro
    Cc: Alan Stern
    Cc: Marco Elver
    Cc: Andrea Parri
    Cc: Paul McKenney
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

15 Nov, 2019

2 commits

  • The 'timespec' type definition and helpers like ktime_to_timespec()
    or timespec64_to_timespec() should no longer be used in the kernel so
    we can remove them and avoid introducing y2038 issues in new code.

    Change the socket code that needs to pass a timespec to user space for
    backward compatibility to use __kernel_old_timespec instead. This type
    has the same layout but with a clearer defined name.

    Slightly reformat tcp_recv_timestamp() for consistency after the removal
    of timespec64_to_timespec().

    Acked-by: Deepa Dinamani
    Signed-off-by: Arnd Bergmann

    Arnd Bergmann
     
  • The CONFIG_64BIT_TIME option is defined on all architectures, and can
    be removed for simplicity now.

    Signed-off-by: Arnd Bergmann

    Arnd Bergmann
     

30 Oct, 2019

1 commit

  • This is identical to __sys_accept4(), except it takes a struct file
    instead of an fd, and it also allows passing in extra file->f_flags
    flags. The latter is done to support masking in O_NONBLOCK without
    manipulating the original file flags.

    No functional changes in this patch.

    Cc: netdev@vger.kernel.org
    Acked-by: David S. Miller
    Signed-off-by: Jens Axboe

    Jens Axboe
     

23 Oct, 2019

2 commits

  • All users of this call are in socket or tty code, so handling
    it there means we can avoid the table entry in fs/compat_ioctl.c.

    Reviewed-by: Greg Kroah-Hartman
    Cc: Eric Dumazet
    Cc: netdev@vger.kernel.org
    Cc: "David S. Miller"
    Signed-off-by: Arnd Bergmann

    Arnd Bergmann
     
  • Unlike the normal SIOCOUTQ, SIOCOUTQNSD was never handled in compat
    mode. Add it to the common socket compat handler along with similar
    ones.

    Fixes: 2f4e1b397097 ("tcp: ioctl type SIOCOUTQNSD returns amount of data not sent")
    Cc: Eric Dumazet
    Cc: netdev@vger.kernel.org
    Cc: "David S. Miller"
    Signed-off-by: Arnd Bergmann

    Arnd Bergmann
     

20 Jul, 2019

1 commit

  • Pull vfs mount updates from Al Viro:
    "The first part of mount updates.

    Convert filesystems to use the new mount API"

    * 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
    mnt_init(): call shmem_init() unconditionally
    constify ksys_mount() string arguments
    don't bother with registering rootfs
    init_rootfs(): don't bother with init_ramfs_fs()
    vfs: Convert smackfs to use the new mount API
    vfs: Convert selinuxfs to use the new mount API
    vfs: Convert securityfs to use the new mount API
    vfs: Convert apparmorfs to use the new mount API
    vfs: Convert openpromfs to use the new mount API
    vfs: Convert xenfs to use the new mount API
    vfs: Convert gadgetfs to use the new mount API
    vfs: Convert oprofilefs to use the new mount API
    vfs: Convert ibmasmfs to use the new mount API
    vfs: Convert qib_fs/ipathfs to use the new mount API
    vfs: Convert efivarfs to use the new mount API
    vfs: Convert configfs to use the new mount API
    vfs: Convert binfmt_misc to use the new mount API
    convenience helper: get_tree_single()
    convenience helper get_tree_nodev()
    vfs: Kill sget_userns()
    ...

    Linus Torvalds
     

14 Jul, 2019

1 commit

  • Pull io_uring updates from Jens Axboe:
    "This contains:

    - Support for recvmsg/sendmsg as first class opcodes.

    I don't envision going much further down this path, as there are
    plans in progress to support potentially any system call in an
    async fashion through io_uring. But I think it does make sense to
    have certain core ops available directly, especially those that can
    support a "try this non-blocking" flag/mode. (me)

    - Handle generic short reads automatically.

    This can happen fairly easily if parts of the buffered read is
    cached. Since the application needs to issue another request for
    the remainder, just do this internally and save kernel/user
    roundtrip while providing a nicer more robust API. (me)

    - Support for linked SQEs.

    This allows SQEs to depend on each other, enabling an application
    to eg queue a read-from-this-file,write-to-that-file pair. (me)

    - Fix race in stopping SQ thread (Jackie)"

    * tag 'for-5.3/io_uring-20190711' of git://git.kernel.dk/linux-block:
    io_uring: fix io_sq_thread_stop running in front of io_sq_thread
    io_uring: add support for recvmsg()
    io_uring: add support for sendmsg()
    io_uring: add support for sqe links
    io_uring: punt short reads to async context
    uio: make import_iovec()/compat_import_iovec() return bytes on success

    Linus Torvalds
     

10 Jul, 2019

2 commits

  • This is done through IORING_OP_RECVMSG. This opcode uses the same
    sqe->msg_flags that IORING_OP_SENDMSG added, and we pass in the
    msghdr struct in the sqe->addr field as well.

    We use MSG_DONTWAIT to force an inline fast path if recvmsg() doesn't
    block, and punt to async execution if it would have.

    Acked-by: David S. Miller
    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • This is done through IORING_OP_SENDMSG. There's a new sqe->msg_flags
    for the flags argument, and the msghdr struct is passed in the
    sqe->addr field.

    We use MSG_DONTWAIT to force an inline fast path if sendmsg() doesn't
    block, and punt to async execution if it would have.

    Acked-by: David S. Miller
    Signed-off-by: Jens Axboe

    Jens Axboe
     

09 Jul, 2019

2 commits

  • socket->wq is assign-once, set when we are initializing both
    struct socket it's in and struct socket_wq it points to. As the
    matter of fact, the only reason for separate allocation was the
    ability to RCU-delay freeing of socket_wq. RCU-delaying the
    freeing of socket itself gets rid of that need, so we can just
    fold struct socket_wq into the end of struct socket and simplify
    the life both for sock_alloc_inode() (one allocation instead of
    two) and for tun/tap oddballs, where we used to embed struct socket
    and struct socket_wq into the same structure (now - embedding just
    the struct socket).

    Note that reference to struct socket_wq in struct sock does remain
    a reference - that's unchanged.

    Signed-off-by: Al Viro
    Signed-off-by: David S. Miller

    Al Viro
     
  • we do have an RCU-delayed part there already (freeing the wq),
    so it's not like the pipe situation; moreover, it might be
    worth considering coallocating wq with the rest of struct sock_alloc.
    ->sk_wq in struct sock would remain a pointer as it is, but
    the object it normally points to would be coallocated with
    struct socket...

    Signed-off-by: Al Viro
    Signed-off-by: David S. Miller

    Al Viro
     

05 Jul, 2019

1 commit

  • Daniel Borkmann says:

    ====================
    pull-request: bpf-next 2019-07-03

    The following pull-request contains BPF updates for your *net-next* tree.

    There is a minor merge conflict in mlx5 due to 8960b38932be ("linux/dim:
    Rename externally used net_dim members") which has been pulled into your
    tree in the meantime, but resolution seems not that bad ... getting current
    bpf-next out now before there's coming more on mlx5. ;) I'm Cc'ing Saeed
    just so he's aware of the resolution below:

    ** First conflict in drivers/net/ethernet/mellanox/mlx5/core/en_main.c:

    <<<<<<< HEAD
    static int mlx5e_open_cq(struct mlx5e_channel *c,
    struct dim_cq_moder moder,
    struct mlx5e_cq_param *param,
    struct mlx5e_cq *cq)
    =======
    int mlx5e_open_cq(struct mlx5e_channel *c, struct net_dim_cq_moder moder,
    struct mlx5e_cq_param *param, struct mlx5e_cq *cq)
    >>>>>>> e5a3e259ef239f443951d401db10db7d426c9497

    Resolution is to take the second chunk and rename net_dim_cq_moder into
    dim_cq_moder. Also the signature for mlx5e_open_cq() in ...

    drivers/net/ethernet/mellanox/mlx5/core/en.h +977

    ... and in mlx5e_open_xsk() ...

    drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c +64

    ... needs the same rename from net_dim_cq_moder into dim_cq_moder.

    ** Second conflict in drivers/net/ethernet/mellanox/mlx5/core/en_main.c:

    <<<<<<< HEAD
    int cpu = cpumask_first(mlx5_comp_irq_get_affinity_mask(priv->mdev, ix));
    struct dim_cq_moder icocq_moder = {0, 0};
    struct net_device *netdev = priv->netdev;
    struct mlx5e_channel *c;
    unsigned int irq;
    =======
    struct net_dim_cq_moder icocq_moder = {0, 0};
    >>>>>>> e5a3e259ef239f443951d401db10db7d426c9497

    Take the second chunk and rename net_dim_cq_moder into dim_cq_moder
    as well.

    Let me know if you run into any issues. Anyway, the main changes are:

    1) Long-awaited AF_XDP support for mlx5e driver, from Maxim.

    2) Addition of two new per-cgroup BPF hooks for getsockopt and
    setsockopt along with a new sockopt program type which allows more
    fine-grained pass/reject settings for containers. Also add a sock_ops
    callback that can be selectively enabled on a per-socket basis and is
    executed for every RTT to help tracking TCP statistics, both features
    from Stanislav.

    3) Follow-up fix from loops in precision tracking which was not propagating
    precision marks and as a result verifier assumed that some branches were
    not taken and therefore wrongly removed as dead code, from Alexei.

    4) Fix BPF cgroup release synchronization race which could lead to a
    double-free if a leaf's cgroup_bpf object is released and a new BPF
    program is attached to the one of ancestor cgroups in parallel, from Roman.

    5) Support for bulking XDP_TX on veth devices which improves performance
    in some cases by around 9%, from Toshiaki.

    6) Allow for lookups into BPF devmap and improve feedback when calling into
    bpf_redirect_map() as lookup is now performed right away in the helper
    itself, from Toke.

    7) Add support for fq's Earliest Departure Time to the Host Bandwidth
    Manager (HBM) sample BPF program, from Lawrence.

    8) Various cleanups and minor fixes all over the place from many others.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

04 Jul, 2019

1 commit

  • After the previous patch we have ipv{6,4} variants for {recv,send}msg,
    we should use the generic _INET ICW variant to call into the proper
    build-in.

    This also allows dropping the now unused and rather ugly _INET4 ICW macro

    v1 -> v2:
    - use ICW macro to declare inet6_{recv,send}msg
    - fix a couple of checkpatch offender in the code context

    Signed-off-by: Paolo Abeni
    Signed-off-by: David S. Miller

    Paolo Abeni
     

28 Jun, 2019

1 commit

  • Implement new BPF_PROG_TYPE_CGROUP_SOCKOPT program type and
    BPF_CGROUP_{G,S}ETSOCKOPT cgroup hooks.

    BPF_CGROUP_SETSOCKOPT can modify user setsockopt arguments before
    passing them down to the kernel or bypass kernel completely.
    BPF_CGROUP_GETSOCKOPT can can inspect/modify getsockopt arguments that
    kernel returns.
    Both hooks reuse existing PTR_TO_PACKET{,_END} infrastructure.

    The buffer memory is pre-allocated (because I don't think there is
    a precedent for working with __user memory from bpf). This might be
    slow to do for each {s,g}etsockopt call, that's why I've added
    __cgroup_bpf_prog_array_is_empty that exits early if there is nothing
    attached to a cgroup. Note, however, that there is a race between
    __cgroup_bpf_prog_array_is_empty and BPF_PROG_RUN_ARRAY where cgroup
    program layout might have changed; this should not be a problem
    because in general there is a race between multiple calls to
    {s,g}etsocktop and user adding/removing bpf progs from a cgroup.

    The return code of the BPF program is handled as follows:
    * 0: EPERM
    * 1: success, continue with next BPF program in the cgroup chain

    v9:
    * allow overwriting setsockopt arguments (Alexei Starovoitov):
    * use set_fs (same as kernel_setsockopt)
    * buffer is always kzalloc'd (no small on-stack buffer)

    v8:
    * use s32 for optlen (Andrii Nakryiko)

    v7:
    * return only 0 or 1 (Alexei Starovoitov)
    * always run all progs (Alexei Starovoitov)
    * use optval=0 as kernel bypass in setsockopt (Alexei Starovoitov)
    (decided to use optval=-1 instead, optval=0 might be a valid input)
    * call getsockopt hook after kernel handlers (Alexei Starovoitov)

    v6:
    * rework cgroup chaining; stop as soon as bpf program returns
    0 or 2; see patch with the documentation for the details
    * drop Andrii's and Martin's Acked-by (not sure they are comfortable
    with the new state of things)

    v5:
    * skip copy_to_user() and put_user() when ret == 0 (Martin Lau)

    v4:
    * don't export bpf_sk_fullsock helper (Martin Lau)
    * size != sizeof(__u64) for uapi pointers (Martin Lau)
    * offsetof instead of bpf_ctx_range when checking ctx access (Martin Lau)

    v3:
    * typos in BPF_PROG_CGROUP_SOCKOPT_RUN_ARRAY comments (Andrii Nakryiko)
    * reverse christmas tree in BPF_PROG_CGROUP_SOCKOPT_RUN_ARRAY (Andrii
    Nakryiko)
    * use __bpf_md_ptr instead of __u32 for optval{,_end} (Martin Lau)
    * use BPF_FIELD_SIZEOF() for consistency (Martin Lau)
    * new CG_SOCKOPT_ACCESS macro to wrap repeated parts

    v2:
    * moved bpf_sockopt_kern fields around to remove a hole (Martin Lau)
    * aligned bpf_sockopt_kern->buf to 8 bytes (Martin Lau)
    * bpf_prog_array_is_empty instead of bpf_prog_array_length (Martin Lau)
    * added [0,2] return code check to verifier (Martin Lau)
    * dropped unused buf[64] from the stack (Martin Lau)
    * use PTR_TO_SOCKET for bpf_sockopt->sk (Martin Lau)
    * dropped bpf_target_off from ctx rewrites (Martin Lau)
    * use return code for kernel bypass (Martin Lau & Andrii Nakryiko)

    Cc: Andrii Nakryiko
    Cc: Martin Lau
    Signed-off-by: Stanislav Fomichev
    Signed-off-by: Alexei Starovoitov

    Stanislav Fomichev
     

08 Jun, 2019

1 commit


06 Jun, 2019

1 commit


01 Jun, 2019

1 commit


31 May, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license as published by
    the free software foundation either version 2 of the license or at
    your option any later version

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-or-later

    has been chosen to replace the boilerplate/reference in 3029 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

26 May, 2019

2 commits

  • Convert the sockfs filesystem to the new internal mount API as the old
    one will be obsoleted and removed. This allows greater flexibility in
    communication of mount parameters between userspace, the VFS and the
    filesystem.

    See Documentation/filesystems/mount_api.txt for more information.

    Signed-off-by: David Howells
    cc: netdev@vger.kernel.org
    Signed-off-by: Al Viro

    David Howells
     
  • Once upon a time we used to set ->d_name of e.g. pipefs root
    so that d_path() on pipes would work. These days it's
    completely pointless - dentries of pipes are not even connected
    to pipefs root. However, mount_pseudo() had set the root
    dentry name (passed as the second argument) and callers
    kept inventing names to pass to it. Including those that
    didn't *have* any non-root dentries to start with...

    All of that had been pointless for about 8 years now; it's
    time to get rid of that cargo-culting...

    Signed-off-by: Al Viro

    Al Viro
     

20 May, 2019

1 commit

  • Fix kernel-doc warnings by moving the kernel-doc notation to be
    immediately above the functions that it describes.

    Fixes these warnings for sock_sendmsg() and sock_recvmsg():

    ../net/socket.c:658: warning: Excess function parameter 'sock' description in 'INDIRECT_CALLABLE_DECLARE'
    ../net/socket.c:658: warning: Excess function parameter 'msg' description in 'INDIRECT_CALLABLE_DECLARE'
    ../net/socket.c:889: warning: Excess function parameter 'sock' description in 'INDIRECT_CALLABLE_DECLARE'
    ../net/socket.c:889: warning: Excess function parameter 'msg' description in 'INDIRECT_CALLABLE_DECLARE'
    ../net/socket.c:889: warning: Excess function parameter 'flags' description in 'INDIRECT_CALLABLE_DECLARE'

    Signed-off-by: Randy Dunlap
    Signed-off-by: David S. Miller

    Randy Dunlap
     

06 May, 2019

1 commit


26 Apr, 2019

1 commit

  • Add missing break statement in order to prevent the code from falling
    through to cases SIOCGSTAMP_NEW and SIOCGSTAMPNS_NEW.

    This bug was found thanks to the ongoing efforts to enable
    -Wimplicit-fallthrough.

    Fixes: 0768e17073dc ("net: socket: implement 64-bit timestamps")
    Signed-off-by: Gustavo A. R. Silva
    Reported-by: Dan Carpenter
    Acked-by: Arnd Bergmann
    Signed-off-by: David S. Miller

    Gustavo A. R. Silva
     

20 Apr, 2019

1 commit

  • The 'timeval' and 'timespec' data structures used for socket timestamps
    are going to be redefined in user space based on 64-bit time_t in future
    versions of the C library to deal with the y2038 overflow problem,
    which breaks the ABI definition.

    Unlike many modern ioctl commands, SIOCGSTAMP and SIOCGSTAMPNS do not
    use the _IOR() macro to encode the size of the transferred data, so it
    remains ambiguous whether the application uses the old or new layout.

    The best workaround I could find is rather ugly: we redefine the command
    code based on the size of the respective data structure with a ternary
    operator. This lets it get evaluated as late as possible, hopefully after
    that structure is visible to the caller. We cannot use an #ifdef here,
    because inux/sockios.h might have been included before any libc header
    that could determine the size of time_t.

    The ioctl implementation now interprets the new command codes as always
    referring to the 64-bit structure on all architectures, while the old
    architecture specific command code still refers to the old architecture
    specific layout. The new command number is only used when they are
    actually different.

    Signed-off-by: Arnd Bergmann
    Signed-off-by: David S. Miller

    Arnd Bergmann