10 Jun, 2019

3 commits

  • [Resent to net instead of net-next - may clash with Anders Roxell's patch
    series addressing duplicate module names]

    Commit 31dd83b96641 ("net-next: phy: new Asix Electronics PHY driver")
    introduced a new PHY driver drivers/net/phy/asix.c that causes a module
    name conflict with a pre-existiting driver (drivers/net/usb/asix.c).

    The PHY driver is used by the X-Surf 100 ethernet card driver, and loaded
    by that driver via its PHY ID. A rename of the driver looks unproblematic.

    Rename PHY driver to ax88796b.c in order to resolve name conflict.

    Signed-off-by: Michael Schmitz
    Tested-by: Michael Schmitz
    Fixes: 31dd83b96641 ("net-next: phy: new Asix Electronics PHY driver")
    Reviewed-by: Andrew Lunn
    Signed-off-by: David S. Miller

    Michael Schmitz
     
  • Before taking a refcount, make sure the object is not already
    scheduled for deletion.

    Same fix is needed in ipv6_flowlabel_opt()

    Fixes: 18367681a10b ("ipv6 flowlabel: Convert np->ipv6_fl_list to RCU.")
    Signed-off-by: Eric Dumazet
    Cc: Willem de Bruijn
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • fix an uninitialized variable:

    CC net/ipv4/fib_semantics.o
    net/ipv4/fib_semantics.c: In function 'fib_check_nh_v4_gw':
    net/ipv4/fib_semantics.c:1027:12: warning: 'err' may be used uninitialized in this function [-Wmaybe-uninitialized]
    if (!tbl || err) {
    ^~

    Signed-off-by: Enrico Weigelt
    Signed-off-by: David S. Miller

    Enrico Weigelt
     

08 Jun, 2019

5 commits

  • Daniel Borkmann says:

    ====================
    pull-request: bpf 2019-06-07

    The following pull-request contains BPF updates for your *net* tree.

    The main changes are:

    1) Fix several bugs in riscv64 JIT code emission which forgot to clear high
    32-bits for alu32 ops, from Björn and Luke with selftests covering all
    relevant BPF alu ops from Björn and Jiong.

    2) Two fixes for UDP BPF reuseport that avoid calling the program in case of
    __udp6_lib_err and UDP GRO which broke reuseport_select_sock() assumption
    that skb->data is pointing to transport header, from Martin.

    3) Two fixes for BPF sockmap: a use-after-free from sleep in psock's backlog
    workqueue, and a missing restore of sk_write_space when psock gets dropped,
    from Jakub and John.

    4) Fix unconnected UDP sendmsg hook API which is insufficient as-is since it
    breaks standard applications like DNS if reverse NAT is not performed upon
    receive, from Daniel.

    5) Fix an out-of-bounds read in __bpf_skc_lookup which in case of AF_INET6
    fails to verify that the length of the tuple is long enough, from Lorenz.

    6) Fix libbpf's libbpf__probe_raw_btf to return an fd instead of 0/1 (for
    {un,}successful probe) as that is expected to be propagated as an fd to
    load_sk_storage_btf() and thus closing the wrong descriptor otherwise,
    from Michal.

    7) Fix bpftool's JSON output for the case when a lookup fails, from Krzesimir.

    8) Minor misc fixes in docs, samples and selftests, from various others.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • …m/linux/kernel/git/kvalo/wireless-drivers

    Kalle Valo says:

    ====================
    wireless-drivers fixes for 5.2

    First set of fixes for 5.2. Most important here are buffer overflow
    fixes for mwifiex.

    rtw88

    * fix out of bounds compiler warning

    * fix rssi handling to get 4x more throughput

    * avoid circular locking

    rsi

    * fix unitilised data warning, these are hopefully the last ones so
    that the warning can be enabled by default

    mwifiex

    * fix buffer overflows

    iwlwifi

    * remove not used debugfs file

    * various fixes
    ====================

    Signed-off-by: David S. Miller <davem@davemloft.net>

    David S. Miller
     
  • Pull networking fixes from David Miller:

    1) Free AF_PACKET po->rollover properly, from Willem de Bruijn.

    2) Read SFP eeprom in max 16 byte increments to avoid problems with
    some SFP modules, from Russell King.

    3) Fix UDP socket lookup wrt. VRF, from Tim Beale.

    4) Handle route invalidation properly in s390 qeth driver, from Julian
    Wiedmann.

    5) Memory leak on unload in RDS, from Zhu Yanjun.

    6) sctp_process_init leak, from Neil HOrman.

    7) Fix fib_rules rule insertion semantic change that broke Android,
    from Hangbin Liu.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (33 commits)
    pktgen: do not sleep with the thread lock held.
    net: mvpp2: Use strscpy to handle stat strings
    net: rds: fix memory leak in rds_ib_flush_mr_pool
    ipv6: fix EFAULT on sendto with icmpv6 and hdrincl
    ipv6: use READ_ONCE() for inet->hdrincl as in ipv4
    Revert "fib_rules: return 0 directly if an exactly same rule exists when NLM_F_EXCL not supplied"
    net: aquantia: fix wol configuration not applied sometimes
    ethtool: fix potential userspace buffer overflow
    Fix memory leak in sctp_process_init
    net: rds: fix memory leak when unload rds_rdma
    ipv6: fix the check before getting the cookie in rt6_get_cookie
    ipv4: not do cache for local delivery if bc_forwarding is enabled
    s390/qeth: handle error when updating TX queue count
    s390/qeth: fix VLAN attribute in bridge_hostnotify udev event
    s390/qeth: check dst entry before use
    s390/qeth: handle limited IPv4 broadcast in L3 TX path
    net: fix indirect calls helpers for ptype list hooks.
    net: ipvlan: Fix ipvlan device tso disabled while NETIF_F_IP_CSUM is set
    udp: only choose unbound UDP socket for multicast when not in a VRF
    net/tls: replace the sleeping lock around RX resync with a bit lock
    ...

    Linus Torvalds
     
  • Pull rdma fixes from Jason Gunthorpe:
    "Things are looking pretty quiet here in RDMA, not too many bug fixes
    rolling in right now. The usual driver bug fixes and fixes for a
    couple of regressions introduced in 5.2:

    - Fix a race on bootup with RDMA device renaming and srp. SRP also
    needs to rename its internal sys files

    - Fix a memory leak in hns

    - Don't leak resources in efa on certain error unwinds

    - Don't panic in certain error unwinds in ib_register_device

    - Various small user visible bug fix patches for the hfi and efa
    drivers

    - Fix the 32 bit compilation break"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
    RDMA/efa: Remove MAYEXEC flag check from mmap flow
    mlx5: avoid 64-bit division
    IB/hfi1: Validate page aligned for a given virtual address
    IB/{qib, hfi1, rdmavt}: Correct ibv_devinfo max_mr value
    IB/hfi1: Insure freeze_work work_struct is canceled on shutdown
    IB/rdmavt: Fix alloc_qpn() WARN_ON()
    RDMA/core: Fix panic when port_data isn't initialized
    RDMA/uverbs: Pass udata on uverbs error unwind
    RDMA/core: Clear out the udata before error unwind
    RDMA/hns: Fix PD memory leak for internal allocation
    RDMA/srp: Rename SRP sysfs name after IB device rename trigger

    Linus Torvalds
     
  • Pull arm64 fixes from Will Deacon:
    "Another round of mostly-benign fixes, the exception being a boot crash
    on SVE2-capable CPUs (although I don't know where you'd find such a
    thing, so maybe it's benign too).

    We're in the process of resolving some big-endian ptrace breakage, so
    I'll probably have some more for you next week.

    Summary:

    - Fix boot crash on platforms with SVE2 due to missing register
    encoding

    - Fix architected timer accessors when CONFIG_OPTIMIZE_INLINING=y

    - Move cpu_logical_map into smp.h for use by upcoming irqchip drivers

    - Trivial typo fix in comment

    - Disable some useless, noisy warnings from GCC 9"

    * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
    arm64: Silence gcc warnings about arch ABI drift
    ARM64: trivial: s/TIF_SECOMP/TIF_SECCOMP/ comment typo fix
    arm64: arch_timer: mark functions as __always_inline
    arm64: smp: Moved cpu_logical_map[] to smp.h
    arm64: cpufeature: Fix missing ZFR0 in __read_sysreg_by_encoding()

    Linus Torvalds
     

07 Jun, 2019

21 commits

  • Daniel Borkmann says:

    ====================
    Please refer to the patch 1/6 as the main patch with the details
    on the current sendmsg hook API limitations and proposal to fix
    it in order to work with basic applications like DNS. Remaining
    patches are the usual uapi and tooling updates as well as test
    cases. Thanks a lot!

    v2 -> v3:
    - Add attach types to test_section_names.c and libbpf (Andrey)
    - Added given Acks, rest as-is
    v1 -> v2:
    - Split off uapi header sync and bpftool bits (Martin, Alexei)
    - Added missing bpftool doc and bash completion as well
    ====================

    Signed-off-by: Alexei Starovoitov

    Alexei Starovoitov
     
  • Add cgroup/recvmsg{4,6} to test_section_names as well. Test run output:

    # ./test_section_names
    libbpf: failed to guess program type based on ELF section name 'InvAliD'
    libbpf: supported section(type) names are: [...]
    libbpf: failed to guess attach type based on ELF section name 'InvAliD'
    libbpf: attachable section(type) names are: [...]
    libbpf: failed to guess program type based on ELF section name 'cgroup'
    libbpf: supported section(type) names are: [...]
    libbpf: failed to guess attach type based on ELF section name 'cgroup'
    libbpf: attachable section(type) names are: [...]
    Summary: 38 PASSED, 0 FAILED

    Signed-off-by: Daniel Borkmann
    Signed-off-by: Alexei Starovoitov

    Daniel Borkmann
     
  • Extend test_sock_addr for recvmsg test cases, bigger parts of the
    sendmsg code can be reused for this. Below are the strace view of
    the recvmsg rewrites; the sendmsg side does not have a BPF prog
    connected to it for the context of this test:

    IPv4 test case:

    [pid 4846] bpf(BPF_PROG_ATTACH, {target_fd=3, attach_bpf_fd=4, attach_type=0x13 /* BPF_??? */, attach_flags=BPF_F_ALLOW_OVERRIDE}, 112) = 0
    [pid 4846] socket(AF_INET, SOCK_DGRAM, IPPROTO_IP) = 5
    [pid 4846] bind(5, {sa_family=AF_INET, sin_port=htons(4444), sin_addr=inet_addr("127.0.0.1")}, 128) = 0
    [pid 4846] socket(AF_INET, SOCK_DGRAM, IPPROTO_IP) = 6
    [pid 4846] sendmsg(6, {msg_name={sa_family=AF_INET, sin_port=htons(4444), sin_addr=inet_addr("127.0.0.1")}, msg_namelen=128, msg_iov=[{iov_base="a", iov_len=1}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 1
    [pid 4846] select(6, [5], NULL, NULL, {tv_sec=2, tv_usec=0}) = 1 (in [5], left {tv_sec=1, tv_usec=999995})
    [pid 4846] recvmsg(5, {msg_name={sa_family=AF_INET, sin_port=htons(4040), sin_addr=inet_addr("192.168.1.254")}, msg_namelen=128->16, msg_iov=[{iov_base="a", iov_len=64}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 1
    [pid 4846] close(6) = 0
    [pid 4846] close(5) = 0
    [pid 4846] bpf(BPF_PROG_DETACH, {target_fd=3, attach_type=0x13 /* BPF_??? */}, 112) = 0

    IPv6 test case:

    [pid 4846] bpf(BPF_PROG_ATTACH, {target_fd=3, attach_bpf_fd=4, attach_type=0x14 /* BPF_??? */, attach_flags=BPF_F_ALLOW_OVERRIDE}, 112) = 0
    [pid 4846] socket(AF_INET6, SOCK_DGRAM, IPPROTO_IP) = 5
    [pid 4846] bind(5, {sa_family=AF_INET6, sin6_port=htons(6666), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, 128) = 0
    [pid 4846] socket(AF_INET6, SOCK_DGRAM, IPPROTO_IP) = 6
    [pid 4846] sendmsg(6, {msg_name={sa_family=AF_INET6, sin6_port=htons(6666), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, msg_namelen=128, msg_iov=[{iov_base="a", iov_len=1}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 1
    [pid 4846] select(6, [5], NULL, NULL, {tv_sec=2, tv_usec=0}) = 1 (in [5], left {tv_sec=1, tv_usec=999996})
    [pid 4846] recvmsg(5, {msg_name={sa_family=AF_INET6, sin6_port=htons(6060), inet_pton(AF_INET6, "face:b00c:1234:5678::abcd", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, msg_namelen=128->28, msg_iov=[{iov_base="a", iov_len=64}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 1
    [pid 4846] close(6) = 0
    [pid 4846] close(5) = 0
    [pid 4846] bpf(BPF_PROG_DETACH, {target_fd=3, attach_type=0x14 /* BPF_??? */}, 112) = 0

    test_sock_addr run w/o strace view:

    # ./test_sock_addr.sh
    [...]
    Test case: recvmsg4: return code ok .. [PASS]
    Test case: recvmsg4: return code !ok .. [PASS]
    Test case: recvmsg6: return code ok .. [PASS]
    Test case: recvmsg6: return code !ok .. [PASS]
    Test case: recvmsg4: rewrite IP & port (asm) .. [PASS]
    Test case: recvmsg6: rewrite IP & port (asm) .. [PASS]
    [...]

    Signed-off-by: Daniel Borkmann
    Acked-by: Andrey Ignatov
    Acked-by: Martin KaFai Lau
    Signed-off-by: Alexei Starovoitov

    Daniel Borkmann
     
  • Trivial patch to bpftool in order to complete enabling attaching programs
    to BPF_CGROUP_UDP{4,6}_RECVMSG.

    Signed-off-by: Daniel Borkmann
    Acked-by: Andrey Ignatov
    Acked-by: Martin KaFai Lau
    Signed-off-by: Alexei Starovoitov

    Daniel Borkmann
     
  • Another trivial patch to libbpf in order to enable identifying and
    attaching programs to BPF_CGROUP_UDP{4,6}_RECVMSG by section name.

    Signed-off-by: Daniel Borkmann
    Signed-off-by: Alexei Starovoitov

    Daniel Borkmann
     
  • Sync BPF uapi header in order to pull in BPF_CGROUP_UDP{4,6}_RECVMSG
    attach types. This is done and preferred as an extra patch in order
    to ease sync of libbpf.

    Signed-off-by: Daniel Borkmann
    Acked-by: Andrey Ignatov
    Acked-by: Martin KaFai Lau
    Signed-off-by: Alexei Starovoitov

    Daniel Borkmann
     
  • Intention of cgroup bind/connect/sendmsg BPF hooks is to act transparently
    to applications as also stated in original motivation in 7828f20e3779 ("Merge
    branch 'bpf-cgroup-bind-connect'"). When recently integrating the latter
    two hooks into Cilium to enable host based load-balancing with Kubernetes,
    I ran into the issue that pods couldn't start up as DNS got broken. Kubernetes
    typically sets up DNS as a service and is thus subject to load-balancing.

    Upon further debugging, it turns out that the cgroupv2 sendmsg BPF hooks API
    is currently insufficient and thus not usable as-is for standard applications
    shipped with most distros. To break down the issue we ran into with a simple
    example:

    # cat /etc/resolv.conf
    nameserver 147.75.207.207
    nameserver 147.75.207.208

    For the purpose of a simple test, we set up above IPs as service IPs and
    transparently redirect traffic to a different DNS backend server for that
    node:

    # cilium service list
    ID Frontend Backend
    1 147.75.207.207:53 1 => 8.8.8.8:53
    2 147.75.207.208:53 1 => 8.8.8.8:53

    The attached BPF program is basically selecting one of the backends if the
    service IP/port matches on the cgroup hook. DNS breaks here, because the
    hooks are not transparent enough to applications which have built-in msg_name
    address checks:

    # nslookup 1.1.1.1
    ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.207#53
    ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.208#53
    ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.207#53
    [...]
    ;; connection timed out; no servers could be reached

    # dig 1.1.1.1
    ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.207#53
    ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.208#53
    ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.207#53
    [...]

    ; <<>> DiG 9.11.3-1ubuntu1.7-Ubuntu <<>> 1.1.1.1
    ;; global options: +cmd
    ;; connection timed out; no servers could be reached

    For comparison, if none of the service IPs is used, and we tell nslookup
    to use 8.8.8.8 directly it works just fine, of course:

    # nslookup 1.1.1.1 8.8.8.8
    1.1.1.1.in-addr.arpa name = one.one.one.one.

    In order to fix this and thus act more transparent to the application,
    this needs reverse translation on recvmsg() side. A minimal fix for this
    API is to add similar recvmsg() hooks behind the BPF cgroups static key
    such that the program can track state and replace the current sockaddr_in{,6}
    with the original service IP. From BPF side, this basically tracks the
    service tuple plus socket cookie in an LRU map where the reverse NAT can
    then be retrieved via map value as one example. Side-note: the BPF cgroups
    static key should be converted to a per-hook static key in future.

    Same example after this fix:

    # cilium service list
    ID Frontend Backend
    1 147.75.207.207:53 1 => 8.8.8.8:53
    2 147.75.207.208:53 1 => 8.8.8.8:53

    Lookups work fine now:

    # nslookup 1.1.1.1
    1.1.1.1.in-addr.arpa name = one.one.one.one.

    Authoritative answers can be found from:

    # dig 1.1.1.1

    ; <<>> DiG 9.11.3-1ubuntu1.7-Ubuntu <<>> 1.1.1.1
    ;; global options: +cmd
    ;; Got answer:
    ;; ->>HEADER< google-public-dns-a.google.com.domain: 18803+ PTR? 1.1.1.1.in-addr.arpa. (38)
    12:59:52.698735 IP foo.42011 > google-public-dns-a.google.com.domain: 18803+ PTR? 1.1.1.1.in-addr.arpa. (38)
    12:59:52.701208 IP google-public-dns-a.google.com.domain > foo.42011: 18803 1/0/0 PTR one.one.one.one. (67)
    12:59:52.701208 IP google-public-dns-a.google.com.domain > foo.42011: 18803 1/0/0 PTR one.one.one.one. (67)
    [...]

    In order to be flexible and to have same semantics as in sendmsg BPF
    programs, we only allow return codes in [1,1] range. In the sendmsg case
    the program is called if msg->msg_name is present which can be the case
    in both, connected and unconnected UDP.

    The former only relies on the sockaddr_in{,6} passed via connect(2) if
    passed msg->msg_name was NULL. Therefore, on recvmsg side, we act in similar
    way to call into the BPF program whenever a non-NULL msg->msg_name was
    passed independent of sk->sk_state being TCP_ESTABLISHED or not. Note
    that for TCP case, the msg->msg_name is ignored in the regular recvmsg
    path and therefore not relevant.

    For the case of ip{,v6}_recv_error() paths, picked up via MSG_ERRQUEUE,
    the hook is not called. This is intentional as it aligns with the same
    semantics as in case of TCP cgroup BPF hooks right now. This might be
    better addressed in future through a different bpf_attach_type such
    that this case can be distinguished from the regular recvmsg paths,
    for example.

    Fixes: 1cedee13d25a ("bpf: Hooks for sys_sendmsg")
    Signed-off-by: Daniel Borkmann
    Acked-by: Andrey Ignatov
    Acked-by: Martin KaFai Lau
    Acked-by: Martynas Pumputis
    Signed-off-by: Alexei Starovoitov

    Daniel Borkmann
     
  • Pull parisc fixes from Helge Deller:

    - Fix crashes when accessing PCI devices on some machines like C240 and
    J5000. The crashes were triggered because we replaced cache flushes
    by nops in the alternative coding where we shouldn't for some
    machines.

    - Dave fixed a race in the usage of the sr1 space register when used to
    load the coherence index.

    - Use the hardware lpa instruction to to load the physical address of
    kernel virtual addresses in the iommu driver code.

    - The kernel may fail to link when CONFIG_MLONGCALLS isn't set. Solve
    that by rearranging functions in the final vmlinux executeable.

    - Some defconfig cleanups and removal of compiler warnings.

    * 'parisc-5.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
    parisc: Fix crash due alternative coding for NP iopdir_fdc bit
    parisc: Use lpa instruction to load physical addresses in driver code
    parisc: configs: Remove useless UEVENT_HELPER_PATH
    parisc: Use implicit space register selection for loading the coherence index of I/O pdirs
    parisc: Fix compiler warnings in float emulation code
    parisc/slab: cleanup after /proc/slab_allocators removal
    parisc: Allow building 64-bit kernel without -mlong-calls compiler option
    parisc: Kconfig: remove ARCH_DISCARD_MEMBLOCK

    Linus Torvalds
     
  • Pull crypto fixes from Herbert Xu:
    "This fixes a regression that breaks the jitterentropy RNG and a
    potential memory leak in hmac"

    * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
    crypto: hmac - fix memory leak in hmac_init_tfm()
    crypto: jitterentropy - change back to module_init()

    Linus Torvalds
     
  • Pull xfs fixes from Darrick Wong:
    "Here are a couple more bug fixes for 5.2. Changes since last update:

    - Fix some forgotten strings in a log debugging function

    - Fix incorrect unit conversion in online fsck code"

    * tag 'xfs-5.2-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
    xfs: inode btree scrubber should calculate im_boffset correctly
    xfs: fix broken log reservation debugging

    Linus Torvalds
     
  • Pull gfs2 fix from Andreas Gruenbacher:
    "A revert for a patch that turned out to be broken"

    * tag 'gfs2-v5.2.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
    Revert "gfs2: Replace gl_revokes with a GLF flag"

    Linus Torvalds
     
  • Pull overlayfs fixes from Miklos Szeredi:
    "Here's one fix for a class of bugs triggered by syzcaller, and one
    that makes xfstests fail less"

    * tag 'ovl-fixes-5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
    ovl: doc: add non-standard corner cases
    ovl: detect overlapping layers
    ovl: support the FS_IOC_FS[SG]ETXATTR ioctls

    Linus Torvalds
     
  • Pull fuse fixes from Miklos Szeredi:
    "This fixes a leaked inode lock in an error cleanup path and a data
    consistency issue with copy_file_range().

    It also adds a new flag for the WRITE request that allows userspace
    filesystems to clear suid/sgid bits on the file if necessary"

    * tag 'fuse-fixes-5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
    fuse: extract helper for range writeback
    fuse: fix copy_file_range() in the writeback case
    fuse: add FUSE_WRITE_KILL_PRIV
    fuse: fallocate: fix return with locked inode

    Linus Torvalds
     
  • Pull NFS client fixes from Anna Schumaker:
    "These are mostly stable bugfixes found during testing, many during the
    recent NFS bake-a-thon.

    Stable bugfixes:
    - SUNRPC: Fix regression in umount of a secure mount
    - SUNRPC: Fix a use after free when a server rejects the RPCSEC_GSS credential
    - NFSv4.1: Again fix a race where CB_NOTIFY_LOCK fails to wake a waiter
    - NFSv4.1: Fix bug only first CB_NOTIFY_LOCK is handled

    Other bugfixes:
    - xprtrdma: Use struct_size() in kzalloc()"

    * tag 'nfs-for-5.2-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
    NFSv4.1: Fix bug only first CB_NOTIFY_LOCK is handled
    NFSv4.1: Again fix a race where CB_NOTIFY_LOCK fails to wake a waiter
    SUNRPC: Fix a use after free when a server rejects the RPCSEC_GSS credential
    SUNRPC fix regression in umount of a secure mount
    xprtrdma: Use struct_size() in kzalloc()

    Linus Torvalds
     
  • Currently, the process issuing a "start" command on the pktgen procfs
    interface, acquires the pktgen thread lock and never release it, until
    all pktgen threads are completed. The above can blocks indefinitely any
    other pktgen command and any (even unrelated) netdevice removal - as
    the pktgen netdev notifier acquires the same lock.

    The issue is demonstrated by the following script, reported by Matteo:

    ip -b - </proc/net/pktgen/pgctrl
    {
    echo rem_device_all
    echo add_device dummy0
    } >/proc/net/pktgen/kpktgend_0
    echo count 0 >/proc/net/pktgen/dummy0
    echo start >/proc/net/pktgen/pgctrl &
    sleep 1
    rmmod veth

    Fix the above releasing the thread lock around the sleep call.

    Additionally we must prevent racing with forcefull rmmod - as the
    thread lock no more protects from them. Instead, acquire a self-reference
    before waiting for any thread. As a side effect, running

    rmmod pktgen

    while some thread is running now fails with "module in use" error,
    before this patch such command hanged indefinitely.

    Note: the issue predates the commit reported in the fixes tag, but
    this fix can't be applied before the mentioned commit.

    v1 -> v2:
    - no need to check for thread existence after flipping the lock,
    pktgen threads are freed only at net exit time
    -

    Fixes: 6146e6a43b35 ("[PKTGEN]: Removes thread_{un,}lock() macros.")
    Reported-and-tested-by: Matteo Croce
    Signed-off-by: Paolo Abeni
    Signed-off-by: David S. Miller

    Paolo Abeni
     
  • Pull ADFS cleanups/fixes from Russell King:
    "As a result of some of Al Viro's great work, here are a few cleanups
    with fixes for adfs:

    - factor out filename comparison, so we can be sure that
    adfs_compare() (used for namei compare) and adfs_match() (used for
    lookup) have the same behaviour.

    - factor out filename lowering (which is not the same as tolower()
    which will lower top-bit-set characters) to ensure that we have the
    same behaviour when comparing filenames as when we hash them.

    - factor out the object fixups, so we are applying all fixups to
    directory objects in the same way, independent of the disk format.

    - factor out the object name fixup (into the previously factored out
    function) to ensure that filenames are appropriately translated -
    for example, adfs allows '/' in filenames, which being the Unix
    path separator, need to be translated to a different character,
    which is normally '.' (DOS 8.3 filenames represent the . as a / on
    adfs, so this is the expected reverse translation.)

    - remove filename truncation; Al asked about this and apparently the
    decision is to remove it. In any case, adfs's truncation was buggy,
    so this rids us of that bug by removing the truncation feature.

    - we now have only one location which adds the "filetype" suffix to
    the filename, so there's no point that code being out of line.

    - since we translate '/' into '.', an adfs filename of "/" or "//"
    would end up being translated to "." and ".." which have special
    meanings. In this case, change the first character to "^" to avoid
    these special directory names being abused"

    * tag 'for-rc-adfs' of git://git.armlinux.org.uk/~rmk/linux-arm:
    fs/adfs: fix filename fixup handling for "/" and "//" names
    fs/adfs: move append_filetype_suffix() into adfs_object_fixup()
    fs/adfs: remove truncated filename hashing
    fs/adfs: factor out filename fixup
    fs/adfs: factor out object fixups
    fs/adfs: factor out filename case lowering
    fs/adfs: factor out filename comparison

    Linus Torvalds
     
  • Use a safe strscpy call to copy the ethtool stat strings into the
    relevant buffers, instead of a memcpy that will be accessing
    out-of-bound data.

    Fixes: 118d6298f6f0 ("net: mvpp2: add ethtool GOP statistics")
    Signed-off-by: Maxime Chevallier
    Signed-off-by: David S. Miller

    Maxime Chevallier
     
  • When the following tests last for several hours, the problem will occur.

    Server:
    rds-stress -r 1.1.1.16 -D 1M
    Client:
    rds-stress -r 1.1.1.14 -s 1.1.1.16 -D 1M -T 30

    The following will occur.

    "
    Starting up....
    tsks tx/s rx/s tx+rx K/s mbi K/s mbo K/s tx us/c rtt us cpu
    %
    1 0 0 0.00 0.00 0.00 0.00 0.00 -1.00
    1 0 0 0.00 0.00 0.00 0.00 0.00 -1.00
    1 0 0 0.00 0.00 0.00 0.00 0.00 -1.00
    1 0 0 0.00 0.00 0.00 0.00 0.00 -1.00
    "
    >From vmcore, we can find that clean_list is NULL.

    >From the source code, rds_mr_flushd calls rds_ib_mr_pool_flush_worker.
    Then rds_ib_mr_pool_flush_worker calls
    "
    rds_ib_flush_mr_pool(pool, 0, NULL);
    "
    Then in function
    "
    int rds_ib_flush_mr_pool(struct rds_ib_mr_pool *pool,
    int free_all, struct rds_ib_mr **ibmr_ret)
    "
    ibmr_ret is NULL.

    In the source code,
    "
    ...
    list_to_llist_nodes(pool, &unmap_list, &clean_nodes, &clean_tail);
    if (ibmr_ret)
    *ibmr_ret = llist_entry(clean_nodes, struct rds_ib_mr, llnode);

    /* more than one entry in llist nodes */
    if (clean_nodes->next)
    llist_add_batch(clean_nodes->next, clean_tail, &pool->clean_list);
    ...
    "
    When ibmr_ret is NULL, llist_entry is not executed. clean_nodes->next
    instead of clean_nodes is added in clean_list.
    So clean_nodes is discarded. It can not be used again.
    The workqueue is executed periodically. So more and more clean_nodes are
    discarded. Finally the clean_list is NULL.
    Then this problem will occur.

    Fixes: 1bc144b62524 ("net, rds, Replace xlist in net/rds/xlist.h with llist")
    Signed-off-by: Zhu Yanjun
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Zhu Yanjun
     
  • Olivier Matz says:

    ====================
    ipv6: fix EFAULT on sendto with icmpv6 and hdrincl

    The following code returns EFAULT (Bad address):

    s = socket(AF_INET6, SOCK_RAW, IPPROTO_ICMPV6);
    setsockopt(s, SOL_IPV6, IPV6_HDRINCL, 1);
    sendto(ipv6_icmp6_packet, addr); /* returns -1, errno = EFAULT */

    The problem is fixed in the second patch. The first one aligns the
    code to ipv4, to avoid a race condition in the second patch.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • The following code returns EFAULT (Bad address):

    s = socket(AF_INET6, SOCK_RAW, IPPROTO_ICMPV6);
    setsockopt(s, SOL_IPV6, IPV6_HDRINCL, 1);
    sendto(ipv6_icmp6_packet, addr); /* returns -1, errno = EFAULT */

    The IPv4 equivalent code works. A workaround is to use IPPROTO_RAW
    instead of IPPROTO_ICMPV6.

    The failure happens because 2 bytes are eaten from the msghdr by
    rawv6_probe_proto_opt() starting from commit 19e3c66b52ca ("ipv6
    equivalent of "ipv4: Avoid reading user iov twice after
    raw_probe_proto_opt""), but at that time it was not a problem because
    IPV6_HDRINCL was not yet introduced.

    Only eat these 2 bytes if hdrincl == 0.

    Fixes: 715f504b1189 ("ipv6: add IPV6_HDRINCL option for raw sockets")
    Signed-off-by: Olivier Matz
    Acked-by: Nicolas Dichtel
    Signed-off-by: David S. Miller

    Olivier Matz
     
  • As it was done in commit 8f659a03a0ba ("net: ipv4: fix for a race
    condition in raw_sendmsg") and commit 20b50d79974e ("net: ipv4: emulate
    READ_ONCE() on ->hdrincl bit-field in raw_sendmsg()") for ipv4, copy the
    value of inet->hdrincl in a local variable, to avoid introducing a race
    condition in the next commit.

    Signed-off-by: Olivier Matz
    Signed-off-by: David S. Miller

    Olivier Matz
     

06 Jun, 2019

11 commits

  • Commit 73118ca8baf7 introduced a glock reference counting bug in
    gfs2_trans_remove_revoke. Given that, replacing gl_revokes with a GLF flag is
    no longer useful, so revert that commit.

    Signed-off-by: Bob Peterson
    Signed-off-by: Andreas Gruenbacher

    Bob Peterson
     
  • Since GCC 9, the compiler warns about evolution of the
    platform-specific ABI, in particular relating for the marshaling of
    certain structures involving bitfields.

    The kernel is a standalone binary, and of course nobody would be
    so stupid as to expose structs containing bitfields as function
    arguments in ABI. (Passing a pointer to such a struct, however
    inadvisable, should be unaffected by this change. perf and various
    drivers rely on that.)

    So these warnings do more harm than good: turn them off.

    We may miss warnings about future ABI drift, but that's too bad.
    Future ABI breaks of this class will have to be debugged and fixed
    the traditional way unless the compiler evolves finer-grained
    diagnostics.

    Signed-off-by: Dave Martin
    Signed-off-by: Will Deacon

    Dave Martin
     
  • According to the found documentation, data cache flushes and sync
    instructions are needed on the PCX-U+ (PA8200, e.g. C200/C240)
    platforms, while PCX-W (PA8500, e.g. C360) platforms aparently don't
    need those flushes when changing the IO PDIR data structures.

    We have no documentation for PCX-W+ (PA8600) and PCX-W2 (PA8700) CPUs,
    but Carlo Pisani reported that his C3600 machine (PA8600, PCX-W+) fails
    when the fdc instructions were removed. His firmware didn't set the NIOP
    bit, so one may assume it's a firmware bug since other C3750 machines
    had the bit set.

    Even if documentation (as mentioned above) states that PCX-W (PA8500,
    e.g. J5000) does not need fdc flushes, Sven could show that an Adaptec
    29320A PCI-X SCSI controller reliably failed on a dd command during the
    first five minutes in his J5000 when fdc flushes were missing.

    Going forward, we will now NOT replace the fdc and sync assembler
    instructions by NOPS if:
    a) the NP iopdir_fdc bit was set by firmware, or
    b) we find a CPU up to and including a PCX-W+ (PA8600).

    This fixes the HPMC crashes on a C240 and C36XX machines. For other
    machines we rely on the firmware to set the bit when needed.

    In case one finds HPMC issues, people could try to boot their machines
    with the "no-alternatives" kernel option to turn off any alternative
    patching.

    Reported-by: Sven Schnelle
    Reported-by: Carlo Pisani
    Tested-by: Sven Schnelle
    Fixes: 3847dab77421 ("parisc: Add alternative coding infrastructure")
    Signed-off-by: Helge Deller
    Cc: stable@vger.kernel.org # 5.0+

    Helge Deller
     
  • Most I/O in the kernel is done using the kernel offset mapping.
    However, there is one API that uses aliased kernel address ranges:

    > The final category of APIs is for I/O to deliberately aliased address
    > ranges inside the kernel. Such aliases are set up by use of the
    > vmap/vmalloc API. Since kernel I/O goes via physical pages, the I/O
    > subsystem assumes that the user mapping and kernel offset mapping are
    > the only aliases. This isn't true for vmap aliases, so anything in
    > the kernel trying to do I/O to vmap areas must manually manage
    > coherency. It must do this by flushing the vmap range before doing
    > I/O and invalidating it after the I/O returns.

    For this reason, we should use the hardware lpa instruction to load the
    physical address of kernel virtual addresses in the driver code.

    I believe we only use the vmap/vmalloc API with old PA 1.x processors
    which don't have a sba, so we don't hit this problem.

    Tested on c3750, c8000 and rp3440.

    Signed-off-by: John David Anglin
    Signed-off-by: Helge Deller

    John David Anglin
     
  • Remove the CONFIG_UEVENT_HELPER_PATH because:
    1. It is disabled since commit 1be01d4a5714 ("driver: base: Disable
    CONFIG_UEVENT_HELPER by default") as its dependency (UEVENT_HELPER) was
    made default to 'n',
    2. It is not recommended (help message: "This should not be used today
    [...] creates a high system load") and was kept only for ancient
    userland,
    3. Certain userland specifically requests it to be disabled (systemd
    README: "Legacy hotplug slows down the system and confuses udev").

    Signed-off-by: Krzysztof Kozlowski
    Acked-by: Geert Uytterhoeven
    Signed-off-by: Helge Deller

    Krzysztof Kozlowski
     
  • We only support I/O to kernel space. Using %sr1 to load the coherence
    index may be racy unless interrupts are disabled. This patch changes the
    code used to load the coherence index to use implicit space register
    selection. This saves one instruction and eliminates the race.

    Tested on rp3440, c8000 and c3750.

    Signed-off-by: John David Anglin
    Cc: stable@vger.kernel.org
    Signed-off-by: Helge Deller

    John David Anglin
     
  • Fix a s/TIF_SECOMP/TIF_SECCOMP/ comment typo

    Cc: Jiri Kosina
    Reviewed-by: Kees Cook
    Signed-off-by: Will Deacon

    George G. Davis
     
  • This reverts commit e9919a24d3022f72bcadc407e73a6ef17093a849.

    Nathan reported the new behaviour breaks Android, as Android just add
    new rules and delete old ones.

    If we return 0 without adding dup rules, Android will remove the new
    added rules and causing system to soft-reboot.

    Fixes: e9919a24d302 ("fib_rules: return 0 directly if an exactly same rule exists when NLM_F_EXCL not supplied")
    Reported-by: Nathan Chancellor
    Reported-by: Yaro Slav
    Reported-by: Maciej Żenczykowski
    Signed-off-by: Hangbin Liu
    Reviewed-by: Nathan Chancellor
    Tested-by: Nathan Chancellor
    Signed-off-by: David S. Miller

    Hangbin Liu
     
  • WoL magic packet configuration sometimes does not work due to
    couple of leakages found.

    Mainly there was a regression introduced during readx_poll refactoring.

    Next, fw request waiting time was too small. Sometimes that
    caused sleep proxy config function to return with an error
    and to skip WoL configuration.
    At last, WoL data were passed to FW from not clean buffer.
    That could cause FW to accept garbage as a random configuration data.

    Fixes: 6a7f2277313b ("net: aquantia: replace AQ_HW_WAIT_FOR with readx_poll_timeout_atomic")
    Signed-off-by: Nikita Danilov
    Signed-off-by: Igor Russkikh
    Signed-off-by: David S. Miller

    Nikita Danilov
     
  • ethtool_get_regs() allocates a buffer of size ops->get_regs_len(),
    and pass it to the kernel driver via ops->get_regs() for filling.

    There is no restriction about what the kernel drivers can or cannot do
    with the open ethtool_regs structure. They usually set regs->version
    and ignore regs->len or set it to the same size as ops->get_regs_len().

    But if userspace allocates a smaller buffer for the registers dump,
    we would cause a userspace buffer overflow in the final copy_to_user()
    call, which uses the regs.len value potentially reset by the driver.

    To fix this, make this case obvious and store regs.len before calling
    ops->get_regs(), to only copy as much data as requested by userspace,
    up to the value returned by ops->get_regs_len().

    While at it, remove the redundant check for non-null regbuf.

    Signed-off-by: Vivien Didelot
    Reviewed-by: Michal Kubecek
    Signed-off-by: David S. Miller

    Vivien Didelot
     
  • syzbot found the following leak in sctp_process_init
    BUG: memory leak
    unreferenced object 0xffff88810ef68400 (size 1024):
    comm "syz-executor273", pid 7046, jiffies 4294945598 (age 28.770s)
    hex dump (first 32 bytes):
    1d de 28 8d de 0b 1b e3 b5 c2 f9 68 fd 1a 97 25 ..(........h...%
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
    backtrace:
    [] kmemleak_alloc_recursive include/linux/kmemleak.h:55
    [inline]
    [] slab_post_alloc_hook mm/slab.h:439 [inline]
    [] slab_alloc mm/slab.c:3326 [inline]
    [] __do_kmalloc mm/slab.c:3658 [inline]
    [] __kmalloc_track_caller+0x15d/0x2c0 mm/slab.c:3675
    [] kmemdup+0x27/0x60 mm/util.c:119
    [] kmemdup include/linux/string.h:432 [inline]
    [] sctp_process_init+0xa7e/0xc20
    net/sctp/sm_make_chunk.c:2437
    [] sctp_cmd_process_init net/sctp/sm_sideeffect.c:682
    [inline]
    [] sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1384
    [inline]
    [] sctp_side_effects net/sctp/sm_sideeffect.c:1194
    [inline]
    [] sctp_do_sm+0xbdc/0x1d60 net/sctp/sm_sideeffect.c:1165
    [] sctp_assoc_bh_rcv+0x13c/0x200
    net/sctp/associola.c:1074
    [] sctp_inq_push+0x7f/0xb0 net/sctp/inqueue.c:95
    [] sctp_backlog_rcv+0x5e/0x2a0 net/sctp/input.c:354
    [] sk_backlog_rcv include/net/sock.h:950 [inline]
    [] __release_sock+0xab/0x110 net/core/sock.c:2418
    [] release_sock+0x37/0xd0 net/core/sock.c:2934
    [] sctp_sendmsg+0x2c0/0x990 net/sctp/socket.c:2122
    [] inet_sendmsg+0x64/0x120 net/ipv4/af_inet.c:802
    [] sock_sendmsg_nosec net/socket.c:652 [inline]
    [] sock_sendmsg+0x54/0x70 net/socket.c:671
    [] ___sys_sendmsg+0x393/0x3c0 net/socket.c:2292
    [] __sys_sendmsg+0x80/0xf0 net/socket.c:2330
    [] __do_sys_sendmsg net/socket.c:2339 [inline]
    [] __se_sys_sendmsg net/socket.c:2337 [inline]
    [] __x64_sys_sendmsg+0x23/0x30 net/socket.c:2337
    [] do_syscall_64+0x76/0x1a0 arch/x86/entry/common.c:3

    The problem was that the peer.cookie value points to an skb allocated
    area on the first pass through this function, at which point it is
    overwritten with a heap allocated value, but in certain cases, where a
    COOKIE_ECHO chunk is included in the packet, a second pass through
    sctp_process_init is made, where the cookie value is re-allocated,
    leaking the first allocation.

    Fix is to always allocate the cookie value, and free it when we are done
    using it.

    Signed-off-by: Neil Horman
    Reported-by: syzbot+f7e9153b037eac9b1df8@syzkaller.appspotmail.com
    CC: Marcelo Ricardo Leitner
    CC: "David S. Miller"
    CC: netdev@vger.kernel.org
    Acked-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Neil Horman