06 May, 2019

1 commit


26 Apr, 2019

1 commit

  • Add missing break statement in order to prevent the code from falling
    through to cases SIOCGSTAMP_NEW and SIOCGSTAMPNS_NEW.

    This bug was found thanks to the ongoing efforts to enable
    -Wimplicit-fallthrough.

    Fixes: 0768e17073dc ("net: socket: implement 64-bit timestamps")
    Signed-off-by: Gustavo A. R. Silva
    Reported-by: Dan Carpenter
    Acked-by: Arnd Bergmann
    Signed-off-by: David S. Miller

    Gustavo A. R. Silva
     

20 Apr, 2019

2 commits

  • The 'timeval' and 'timespec' data structures used for socket timestamps
    are going to be redefined in user space based on 64-bit time_t in future
    versions of the C library to deal with the y2038 overflow problem,
    which breaks the ABI definition.

    Unlike many modern ioctl commands, SIOCGSTAMP and SIOCGSTAMPNS do not
    use the _IOR() macro to encode the size of the transferred data, so it
    remains ambiguous whether the application uses the old or new layout.

    The best workaround I could find is rather ugly: we redefine the command
    code based on the size of the respective data structure with a ternary
    operator. This lets it get evaluated as late as possible, hopefully after
    that structure is visible to the caller. We cannot use an #ifdef here,
    because inux/sockios.h might have been included before any libc header
    that could determine the size of time_t.

    The ioctl implementation now interprets the new command codes as always
    referring to the 64-bit structure on all architectures, while the old
    architecture specific command code still refers to the old architecture
    specific layout. The new command number is only used when they are
    actually different.

    Signed-off-by: Arnd Bergmann
    Signed-off-by: David S. Miller

    Arnd Bergmann
     
  • The SIOCGSTAMP/SIOCGSTAMPNS ioctl commands are implemented by many
    socket protocol handlers, and all of those end up calling the same
    sock_get_timestamp()/sock_get_timestampns() helper functions, which
    results in a lot of duplicate code.

    With the introduction of 64-bit time_t on 32-bit architectures, this
    gets worse, as we then need four different ioctl commands in each
    socket protocol implementation.

    To simplify that, let's add a new .gettstamp() operation in
    struct proto_ops, and move ioctl implementation into the common
    sock_ioctl()/compat_sock_ioctl_trans() functions that these all go
    through.

    We can reuse the sock_get_timestamp() implementation, but generalize
    it so it can deal with both native and compat mode, as well as
    timeval and timespec structures.

    Acked-by: Stefan Schmidt
    Acked-by: Neil Horman
    Acked-by: Marc Kleine-Budde
    Link: https://lore.kernel.org/lkml/CAK8P3a038aDQQotzua_QtKGhq8O9n+rdiz2=WDCp82ys8eUT+A@mail.gmail.com/
    Signed-off-by: Arnd Bergmann
    Acked-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Arnd Bergmann
     

16 Mar, 2019

1 commit

  • Adds missing sphinx documentation to the
    socket.c's functions. Also fixes some whitespaces.

    I also changed the style of older documentation as an
    effort to have an uniform documentation style.

    Signed-off-by: Pedro Tammela
    Signed-off-by: David S. Miller

    Pedro Tammela
     

03 Mar, 2019

1 commit


26 Feb, 2019

1 commit

  • Commit 9060cb719e61 ("net: crypto set sk to NULL when af_alg_release.")
    fixed a use-after-free in sockfs_setattr() when an AF_ALG socket is
    closed concurrently with fchownat(). However, it ignored that many
    other proto_ops::release() methods don't set sock->sk to NULL and
    therefore allow the same use-after-free:

    - base_sock_release
    - bnep_sock_release
    - cmtp_sock_release
    - data_sock_release
    - dn_release
    - hci_sock_release
    - hidp_sock_release
    - iucv_sock_release
    - l2cap_sock_release
    - llcp_sock_release
    - llc_ui_release
    - rawsock_release
    - rfcomm_sock_release
    - sco_sock_release
    - svc_release
    - vcc_release
    - x25_release

    Rather than fixing all these and relying on every socket type to get
    this right forever, just make __sock_release() set sock->sk to NULL
    itself after calling proto_ops::release().

    Reproducer that produces the KASAN splat when any of these socket types
    are configured into the kernel:

    #include
    #include
    #include
    #include

    pthread_t t;
    volatile int fd;

    void *close_thread(void *arg)
    {
    for (;;) {
    usleep(rand() % 100);
    close(fd);
    }
    }

    int main()
    {
    pthread_create(&t, NULL, close_thread, NULL);
    for (;;) {
    fd = socket(rand() % 50, rand() % 11, 0);
    fchownat(fd, "", 1000, 1000, 0x1000);
    close(fd);
    }
    }

    Fixes: 86741ec25462 ("net: core: Add a UID field to struct sock.")
    Signed-off-by: Eric Biggers
    Acked-by: Cong Wang
    Signed-off-by: David S. Miller

    Eric Biggers
     

09 Feb, 2019

1 commit


04 Feb, 2019

4 commits

  • Add SO_TIMESTAMPING_NEW variant of socket timestamp options.
    This is the y2038 safe versions of the SO_TIMESTAMPING_OLD
    for all architectures.

    Signed-off-by: Deepa Dinamani
    Acked-by: Willem de Bruijn
    Cc: chris@zankel.net
    Cc: fenghua.yu@intel.com
    Cc: rth@twiddle.net
    Cc: tglx@linutronix.de
    Cc: ubraun@linux.ibm.com
    Cc: linux-alpha@vger.kernel.org
    Cc: linux-arch@vger.kernel.org
    Cc: linux-ia64@vger.kernel.org
    Cc: linux-mips@linux-mips.org
    Cc: linux-s390@vger.kernel.org
    Cc: linux-xtensa@linux-xtensa.org
    Cc: sparclinux@vger.kernel.org
    Signed-off-by: David S. Miller

    Deepa Dinamani
     
  • Add SO_TIMESTAMP_NEW and SO_TIMESTAMPNS_NEW variants of
    socket timestamp options.
    These are the y2038 safe versions of the SO_TIMESTAMP_OLD
    and SO_TIMESTAMPNS_OLD for all architectures.

    Note that the format of scm_timestamping.ts[0] is not changed
    in this patch.

    Signed-off-by: Deepa Dinamani
    Acked-by: Willem de Bruijn
    Cc: jejb@parisc-linux.org
    Cc: ralf@linux-mips.org
    Cc: rth@twiddle.net
    Cc: linux-alpha@vger.kernel.org
    Cc: linux-mips@linux-mips.org
    Cc: linux-parisc@vger.kernel.org
    Cc: linux-rdma@vger.kernel.org
    Cc: netdev@vger.kernel.org
    Cc: sparclinux@vger.kernel.org
    Signed-off-by: David S. Miller

    Deepa Dinamani
     
  • As part of y2038 solution, all internal uses of
    struct timeval are replaced by struct __kernel_old_timeval
    and struct compat_timeval by struct old_timeval32.
    Make socket timestamps use these new types.

    This is mainly to be able to verify that the kernel build
    is y2038 safe when such non y2038 safe types are not
    supported anymore.

    Signed-off-by: Deepa Dinamani
    Acked-by: Willem de Bruijn
    Cc: isdn@linux-pingi.de
    Signed-off-by: David S. Miller

    Deepa Dinamani
     
  • SO_TIMESTAMP, SO_TIMESTAMPNS and SO_TIMESTAMPING options, the
    way they are currently defined, are not y2038 safe.
    Subsequent patches in the series add new y2038 safe versions
    of these options which provide 64 bit timestamps on all
    architectures uniformly.
    Hence, rename existing options with OLD tag suffixes.

    Also note that kernel will not use the untagged SO_TIMESTAMP*
    and SCM_TIMESTAMP* options internally anymore.

    Signed-off-by: Deepa Dinamani
    Acked-by: Willem de Bruijn
    Cc: deller@gmx.de
    Cc: dhowells@redhat.com
    Cc: jejb@parisc-linux.org
    Cc: ralf@linux-mips.org
    Cc: rth@twiddle.net
    Cc: linux-afs@lists.infradead.org
    Cc: linux-alpha@vger.kernel.org
    Cc: linux-arch@vger.kernel.org
    Cc: linux-mips@linux-mips.org
    Cc: linux-parisc@vger.kernel.org
    Cc: linux-rdma@vger.kernel.org
    Cc: sparclinux@vger.kernel.org
    Signed-off-by: David S. Miller

    Deepa Dinamani
     

31 Jan, 2019

4 commits

  • Same story as before, these use struct ifreq and thus need
    to be read with the shorter version to not cause faults.

    Cc: stable@vger.kernel.org
    Fixes: f92d4fc95341 ("kill bond_ioctl()")
    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • As reported by Robert O'Callahan in
    https://bugzilla.kernel.org/show_bug.cgi?id=202273
    reverting the previous changes in this area broke
    the SIOCGIFNAME ioctl in compat again (I'd previously
    fixed it after his previous report of breakage in
    https://bugzilla.kernel.org/show_bug.cgi?id=199469).

    This is obviously because I fixed SIOCGIFNAME more or
    less by accident.

    Fix it explicitly now by making it pass through the
    restored compat translation code.

    Cc: stable@vger.kernel.org
    Fixes: 4cf808e7ac32 ("kill dev_ifname32()")
    Reported-by: Robert O'Callahan
    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • This reverts commit bf4405737f9f ("kill dev_ifsioc()").

    This wasn't really unused as implied by the original commit,
    it still handles the copy to/from user differently, and the
    commit thus caused issues such as
    https://bugzilla.kernel.org/show_bug.cgi?id=199469
    and
    https://bugzilla.kernel.org/show_bug.cgi?id=202273

    However, deviating from a strict revert, rename dev_ifsioc()
    to compat_ifreq_ioctl() to be clearer as to its purpose and
    add a comment.

    Cc: stable@vger.kernel.org
    Fixes: bf4405737f9f ("kill dev_ifsioc()")
    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • This reverts commit 1cebf8f143c2 ("socket: fix struct ifreq
    size in compat ioctl"), it's a bugfix for another commit that
    I'll revert next.

    This is not a 'perfect' revert, I'm keeping some coding style
    intact rather than revert to the state with indentation errors.

    Cc: stable@vger.kernel.org
    Fixes: 1cebf8f143c2 ("socket: fix struct ifreq size in compat ioctl")
    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

29 Dec, 2018

1 commit

  • Pull y2038 updates from Arnd Bergmann:
    "More syscalls and cleanups

    This concludes the main part of the system call rework for 64-bit
    time_t, which has spread over most of year 2018, the last six system
    calls being

    - ppoll
    - pselect6
    - io_pgetevents
    - recvmmsg
    - futex
    - rt_sigtimedwait

    As before, nothing changes for 64-bit architectures, while 32-bit
    architectures gain another entry point that differs only in the layout
    of the timespec structure. Hopefully in the next release we can wire
    up all 22 of those system calls on all 32-bit architectures, which
    gives us a baseline version for glibc to start using them.

    This does not include the clock_adjtime, getrusage/waitid, and
    getitimer/setitimer system calls. I still plan to have new versions of
    those as well, but they are not required for correct operation of the
    C library since they can be emulated using the old 32-bit time_t based
    system calls.

    Aside from the system calls, there are also a few cleanups here,
    removing old kernel internal interfaces that have become unused after
    all references got removed. The arch/sh cleanups are part of this,
    there were posted several times over the past year without a reaction
    from the maintainers, while the corresponding changes made it into all
    other architectures"

    * tag 'y2038-for-4.21' of ssh://gitolite.kernel.org:/pub/scm/linux/kernel/git/arnd/playground:
    timekeeping: remove obsolete time accessors
    vfs: replace current_kernel_time64 with ktime equivalent
    timekeeping: remove timespec_add/timespec_del
    timekeeping: remove unused {read,update}_persistent_clock
    sh: remove board_time_init() callback
    sh: remove unused rtc_sh_get/set_time infrastructure
    sh: sh03: rtc: push down rtc class ops into driver
    sh: dreamcast: rtc: push down rtc class ops into driver
    y2038: signal: Add compat_sys_rt_sigtimedwait_time64
    y2038: signal: Add sys_rt_sigtimedwait_time32
    y2038: socket: Add compat_sys_recvmmsg_time64
    y2038: futex: Add support for __kernel_timespec
    y2038: futex: Move compat implementation into futex.c
    io_pgetevents: use __kernel_timespec
    pselect6: use __kernel_timespec
    ppoll: use __kernel_timespec
    signal: Add restore_user_sigmask()
    signal: Add set_user_sigmask()

    Linus Torvalds
     

18 Dec, 2018

1 commit

  • recvmmsg() takes two arguments to pointers of structures that differ
    between 32-bit and 64-bit architectures: mmsghdr and timespec.

    For y2038 compatbility, we are changing the native system call from
    timespec to __kernel_timespec with a 64-bit time_t (in another patch),
    and use the existing compat system call on both 32-bit and 64-bit
    architectures for compatibility with traditional 32-bit user space.

    As we now have two variants of recvmmsg() for 32-bit tasks that are both
    different from the variant that we use on 64-bit tasks, this means we
    also require two compat system calls!

    The solution I picked is to flip things around: The existing
    compat_sys_recvmmsg() call gets moved from net/compat.c into net/socket.c
    and now handles the case for old user space on all architectures that
    have set CONFIG_COMPAT_32BIT_TIME. A new compat_sys_recvmmsg_time64()
    call gets added in the old place for 64-bit architectures only, this
    one handles the case of a compat mmsghdr structure combined with
    __kernel_timespec.

    In the indirect sys_socketcall(), we now need to call either
    do_sys_recvmmsg() or __compat_sys_recvmmsg(), depending on what kind of
    architecture we are on. For compat_sys_socketcall(), no such change is
    needed, we always call __compat_sys_recvmmsg().

    I decided to not add a new SYS_RECVMMSG_TIME64 socketcall: Any libc
    implementation for 64-bit time_t will need significant changes including
    an updated asm/unistd.h, and it seems better to consistently use the
    separate syscalls that configuration, leaving the socketcall only for
    backward compatibility with 32-bit time_t based libc.

    The naming is asymmetric for the moment, so both existing syscalls
    entry points keep their names, while the new ones are recvmmsg_time32
    and compat_recvmmsg_time64 respectively. I expect that we will rename
    the compat syscalls later as we start using generated syscall tables
    everywhere and add these entry points.

    Signed-off-by: Arnd Bergmann

    Arnd Bergmann
     

18 Nov, 2018

1 commit


02 Nov, 2018

1 commit

  • Pull AFS updates from Al Viro:
    "AFS series, with some iov_iter bits included"

    * 'work.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (26 commits)
    missing bits of "iov_iter: Separate type from direction and use accessor functions"
    afs: Probe multiple fileservers simultaneously
    afs: Fix callback handling
    afs: Eliminate the address pointer from the address list cursor
    afs: Allow dumping of server cursor on operation failure
    afs: Implement YFS support in the fs client
    afs: Expand data structure fields to support YFS
    afs: Get the target vnode in afs_rmdir() and get a callback on it
    afs: Calc callback expiry in op reply delivery
    afs: Fix FS.FetchStatus delivery from updating wrong vnode
    afs: Implement the YFS cache manager service
    afs: Remove callback details from afs_callback_break struct
    afs: Commit the status on a new file/dir/symlink
    afs: Increase to 64-bit volume ID and 96-bit vnode ID for YFS
    afs: Don't invoke the server to read data beyond EOF
    afs: Add a couple of tracepoints to log I/O errors
    afs: Handle EIO from delivery function
    afs: Fix TTL on VL server and address lists
    afs: Implement VL server rotation
    afs: Improve FS server rotation error handling
    ...

    Linus Torvalds
     

26 Oct, 2018

1 commit

  • Pull timekeeping updates from Thomas Gleixner:
    "The timers and timekeeping departement provides:

    - Another large y2038 update with further preparations for providing
    the y2038 safe timespecs closer to the syscalls.

    - An overhaul of the SHCMT clocksource driver

    - SPDX license identifier updates

    - Small cleanups and fixes all over the place"

    * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits)
    tick/sched : Remove redundant cpu_online() check
    clocksource/drivers/dw_apb: Add reset control
    clocksource: Remove obsolete CLOCKSOURCE_OF_DECLARE
    clocksource/drivers: Unify the names to timer-* format
    clocksource/drivers/sh_cmt: Add R-Car gen3 support
    dt-bindings: timer: renesas: cmt: document R-Car gen3 support
    clocksource/drivers/sh_cmt: Properly line-wrap sh_cmt_of_table[] initializer
    clocksource/drivers/sh_cmt: Fix clocksource width for 32-bit machines
    clocksource/drivers/sh_cmt: Fixup for 64-bit machines
    clocksource/drivers/sh_tmu: Convert to SPDX identifiers
    clocksource/drivers/sh_mtu2: Convert to SPDX identifiers
    clocksource/drivers/sh_cmt: Convert to SPDX identifiers
    clocksource/drivers/renesas-ostm: Convert to SPDX identifiers
    clocksource: Convert to using %pOFn instead of device_node.name
    tick/broadcast: Remove redundant check
    RISC-V: Request newstat syscalls
    y2038: signal: Change rt_sigtimedwait to use __kernel_timespec
    y2038: socket: Change recvmmsg to use __kernel_timespec
    y2038: sched: Change sched_rr_get_interval to use __kernel_timespec
    y2038: utimes: Rework #ifdef guards for compat syscalls
    ...

    Linus Torvalds
     

24 Oct, 2018

1 commit

  • In the iov_iter struct, separate the iterator type from the iterator
    direction and use accessor functions to access them in most places.

    Convert a bunch of places to use switch-statements to access them rather
    then chains of bitwise-AND statements. This makes it easier to add further
    iterator types. Also, this can be more efficient as to implement a switch
    of small contiguous integers, the compiler can use ~50% fewer compare
    instructions than it has to use bitwise-and instructions.

    Further, cease passing the iterator type into the iterator setup function.
    The iterator function can set that itself. Only the direction is required.

    Signed-off-by: David Howells

    David Howells
     

20 Oct, 2018

1 commit

  • net/sched/cls_api.c has overlapping changes to a call to
    nlmsg_parse(), one (from 'net') added rtm_tca_policy instead of NULL
    to the 5th argument, and another (from 'net-next') added cb->extack
    instead of NULL to the 6th argument.

    net/ipv4/ipmr_base.c is a case of a bug fix in 'net' being done to
    code which moved (to mr_table_dump)) in 'net-next'. Thanks to David
    Ahern for the heads up.

    Signed-off-by: David S. Miller

    David S. Miller
     

19 Oct, 2018

1 commit

  • In ethtool_ioctl(), the ioctl command 'ethcmd' is checked through a switch
    statement to see whether it is necessary to pre-process the ethtool
    structure, because, as mentioned in the comment, the structure
    ethtool_rxnfc is defined with padding. If yes, a user-space buffer 'rxnfc'
    is allocated through compat_alloc_user_space(). One thing to note here is
    that, if 'ethcmd' is ETHTOOL_GRXCLSRLALL, the size of the buffer 'rxnfc' is
    partially determined by 'rule_cnt', which is actually acquired from the
    user-space buffer 'compat_rxnfc', i.e., 'compat_rxnfc->rule_cnt', through
    get_user(). After 'rxnfc' is allocated, the data in the original user-space
    buffer 'compat_rxnfc' is then copied to 'rxnfc' through copy_in_user(),
    including the 'rule_cnt' field. However, after this copy, no check is
    re-enforced on 'rxnfc->rule_cnt'. So it is possible that a malicious user
    race to change the value in the 'compat_rxnfc->rule_cnt' between these two
    copies. Through this way, the attacker can bypass the previous check on
    'rule_cnt' and inject malicious data. This can cause undefined behavior of
    the kernel and introduce potential security risk.

    This patch avoids the above issue via copying the value acquired by
    get_user() to 'rxnfc->rule_cn', if 'ethcmd' is ETHTOOL_GRXCLSRLALL.

    Signed-off-by: Wenwen Wang
    Signed-off-by: David S. Miller

    Wenwen Wang
     

06 Oct, 2018

1 commit


14 Sep, 2018

1 commit

  • As reported by Reobert O'Callahan, since Viro's commit to kill
    dev_ifsioc() we attempt to copy too much data in compat mode,
    which may lead to EFAULT when the 32-bit version of struct ifreq
    sits at/near the end of a page boundary, and the next page isn't
    mapped.

    Fix this by passing the approprate compat/non-compat size to copy
    and using that, as before the dev_ifsioc() removal. This works
    because only the embedded "struct ifmap" has different size, and
    this is only used in SIOCGIFMAP/SIOCSIFMAP which has a different
    handler. All other parts of the union are naturally compatible.

    This fixes https://bugzilla.kernel.org/show_bug.cgi?id=199469.

    Fixes: bf4405737f9f ("kill dev_ifsioc()")
    Reported-by: Robert O'Callahan
    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

29 Aug, 2018

1 commit

  • This converts the recvmmsg() system call in all its variations to use
    'timespec64' internally for its timeout, and have a __kernel_timespec64
    argument in the native entry point. This lets us change the type to use
    64-bit time_t at a later point while using the 32-bit compat system call
    emulation for existing user space.

    Signed-off-by: Arnd Bergmann

    Arnd Bergmann
     

16 Aug, 2018

1 commit

  • Pull networking updates from David Miller:
    "Highlights:

    - Gustavo A. R. Silva keeps working on the implicit switch fallthru
    changes.

    - Support 802.11ax High-Efficiency wireless in cfg80211 et al, From
    Luca Coelho.

    - Re-enable ASPM in r8169, from Kai-Heng Feng.

    - Add virtual XFRM interfaces, which avoids all of the limitations of
    existing IPSEC tunnels. From Steffen Klassert.

    - Convert GRO over to use a hash table, so that when we have many
    flows active we don't traverse a long list during accumluation.

    - Many new self tests for routing, TC, tunnels, etc. Too many
    contributors to mention them all, but I'm really happy to keep
    seeing this stuff.

    - Hardware timestamping support for dpaa_eth/fsl-fman from Yangbo Lu.

    - Lots of cleanups and fixes in L2TP code from Guillaume Nault.

    - Add IPSEC offload support to netdevsim, from Shannon Nelson.

    - Add support for slotting with non-uniform distribution to netem
    packet scheduler, from Yousuk Seung.

    - Add UDP GSO support to mlx5e, from Boris Pismenny.

    - Support offloading of Team LAG in NFP, from John Hurley.

    - Allow to configure TX queue selection based upon RX queue, from
    Amritha Nambiar.

    - Support ethtool ring size configuration in aquantia, from Anton
    Mikaev.

    - Support DSCP and flowlabel per-transport in SCTP, from Xin Long.

    - Support list based batching and stack traversal of SKBs, this is
    very exciting work. From Edward Cree.

    - Busyloop optimizations in vhost_net, from Toshiaki Makita.

    - Introduce the ETF qdisc, which allows time based transmissions. IGB
    can offload this in hardware. From Vinicius Costa Gomes.

    - Add parameter support to devlink, from Moshe Shemesh.

    - Several multiplication and division optimizations for BPF JIT in
    nfp driver, from Jiong Wang.

    - Lots of prepatory work to make more of the packet scheduler layer
    lockless, when possible, from Vlad Buslov.

    - Add ACK filter and NAT awareness to sch_cake packet scheduler, from
    Toke Høiland-Jørgensen.

    - Support regions and region snapshots in devlink, from Alex Vesker.

    - Allow to attach XDP programs to both HW and SW at the same time on
    a given device, with initial support in nfp. From Jakub Kicinski.

    - Add TLS RX offload and support in mlx5, from Ilya Lesokhin.

    - Use PHYLIB in r8169 driver, from Heiner Kallweit.

    - All sorts of changes to support Spectrum 2 in mlxsw driver, from
    Ido Schimmel.

    - PTP support in mv88e6xxx DSA driver, from Andrew Lunn.

    - Make TCP_USER_TIMEOUT socket option more accurate, from Jon
    Maxwell.

    - Support for templates in packet scheduler classifier, from Jiri
    Pirko.

    - IPV6 support in RDS, from Ka-Cheong Poon.

    - Native tproxy support in nf_tables, from Máté Eckl.

    - Maintain IP fragment queue in an rbtree, but optimize properly for
    in-order frags. From Peter Oskolkov.

    - Improvde handling of ACKs on hole repairs, from Yuchung Cheng"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1996 commits)
    bpf: test: fix spelling mistake "REUSEEPORT" -> "REUSEPORT"
    hv/netvsc: Fix NULL dereference at single queue mode fallback
    net: filter: mark expected switch fall-through
    xen-netfront: fix warn message as irq device name has '/'
    cxgb4: Add new T5 PCI device ids 0x50af and 0x50b0
    net: dsa: mv88e6xxx: missing unlock on error path
    rds: fix building with IPV6=m
    inet/connection_sock: prefer _THIS_IP_ to current_text_addr
    net: dsa: mv88e6xxx: bitwise vs logical bug
    net: sock_diag: Fix spectre v1 gadget in __sock_diag_cmd()
    ieee802154: hwsim: using right kind of iteration
    net: hns3: Add vlan filter setting by ethtool command -K
    net: hns3: Set tx ring' tc info when netdev is up
    net: hns3: Remove tx ring BD len register in hns3_enet
    net: hns3: Fix desc num set to default when setting channel
    net: hns3: Fix for phy link issue when using marvell phy driver
    net: hns3: Fix for information of phydev lost problem when down/up
    net: hns3: Fix for command format parsing error in hclge_is_all_function_id_zero
    net: hns3: Add support for serdes loopback selftest
    bnxt_en: take coredump_record structure off stack
    ...

    Linus Torvalds
     

15 Aug, 2018

1 commit

  • req->sdiag_family is a user-controlled value that's used as an array
    index. Sanitize it after the bounds check to avoid speculative
    out-of-bounds array access.

    This also protects the sock_is_registered() call, so this removes the
    sanitize call there.

    Fixes: e978de7a6d38 ("net: socket: Fix potential spectre v1 gadget in sock_is_registered")
    Cc: Josh Poimboeuf
    Cc: konrad.wilk@oracle.com
    Cc: jamie.iles@oracle.com
    Cc: liran.alon@oracle.com
    Cc: stable@vger.kernel.org
    Signed-off-by: Jeremy Cline
    Signed-off-by: David S. Miller

    Jeremy Cline
     

14 Aug, 2018

1 commit

  • Pull vfs open-related updates from Al Viro:

    - "do we need fput() or put_filp()" rules are gone - it's always fput()
    now. We keep track of that state where it belongs - in ->f_mode.

    - int *opened mess killed - in finish_open(), in ->atomic_open()
    instances and in fs/namei.c code around do_last()/lookup_open()/atomic_open().

    - alloc_file() wrappers with saner calling conventions are introduced
    (alloc_file_clone() and alloc_file_pseudo()); callers converted, with
    much simplification.

    - while we are at it, saner calling conventions for path_init() and
    link_path_walk(), simplifying things inside fs/namei.c (both on
    open-related paths and elsewhere).

    * 'work.open3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (40 commits)
    few more cleanups of link_path_walk() callers
    allow link_path_walk() to take ERR_PTR()
    make path_init() unconditionally paired with terminate_walk()
    document alloc_file() changes
    make alloc_file() static
    do_shmat(): grab shp->shm_file earlier, switch to alloc_file_clone()
    new helper: alloc_file_clone()
    create_pipe_files(): switch the first allocation to alloc_file_pseudo()
    anon_inode_getfile(): switch to alloc_file_pseudo()
    hugetlb_file_setup(): switch to alloc_file_pseudo()
    ocxlflash_getfile(): switch to alloc_file_pseudo()
    cxl_getfile(): switch to alloc_file_pseudo()
    ... and switch shmem_file_setup() to alloc_file_pseudo()
    __shmem_file_setup(): reorder allocations
    new wrapper: alloc_file_pseudo()
    kill FILE_{CREATED,OPENED}
    switch atomic_open() and lookup_open() to returning 0 in all success cases
    document ->atomic_open() changes
    ->atomic_open(): return 0 in all success cases
    get rid of 'opened' in path_openat() and the helpers downstream
    ...

    Linus Torvalds
     

03 Aug, 2018

1 commit


01 Aug, 2018

1 commit

  • We never use RCU protection for it, just a lot of cargo-cult
    rcu_deference_protects calls.

    Note that we do keep the kfree_rcu call for it, as the references through
    struct sock are RCU protected and thus might require a grace period before
    freeing.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Eric Dumazet
    Acked-by: Paul E. McKenney
    Signed-off-by: David S. Miller

    Christoph Hellwig
     

31 Jul, 2018

2 commits


29 Jul, 2018

2 commits


12 Jul, 2018

2 commits


29 Jun, 2018

2 commits

  • The big aio poll revert broke various network protocols that don't
    implement ->poll as a patch in the aio poll serie removed sock_no_poll
    and made the common code handle this case.

    Reported-by: syzbot+57727883dbad76db2ef0@syzkaller.appspotmail.com
    Reported-by: syzbot+cdb0d3176b53d35ad454@syzkaller.appspotmail.com
    Reported-by: syzbot+2c7e8f74f8b2571c87e8@syzkaller.appspotmail.com
    Reported-by: Tetsuo Handa
    Fixes: a11e1d432b51 ("Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL")
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • The poll() changes were not well thought out, and completely
    unexplained. They also caused a huge performance regression, because
    "->poll()" was no longer a trivial file operation that just called down
    to the underlying file operations, but instead did at least two indirect
    calls.

    Indirect calls are sadly slow now with the Spectre mitigation, but the
    performance problem could at least be largely mitigated by changing the
    "->get_poll_head()" operation to just have a per-file-descriptor pointer
    to the poll head instead. That gets rid of one of the new indirections.

    But that doesn't fix the new complexity that is completely unwarranted
    for the regular case. The (undocumented) reason for the poll() changes
    was some alleged AIO poll race fixing, but we don't make the common case
    slower and more complex for some uncommon special case, so this all
    really needs way more explanations and most likely a fundamental
    redesign.

    [ This revert is a revert of about 30 different commits, not reverted
    individually because that would just be unnecessarily messy - Linus ]

    Cc: Al Viro
    Cc: Christoph Hellwig
    Signed-off-by: Linus Torvalds

    Linus Torvalds