26 Nov, 2020

1 commit

  • tls_device_offload_cleanup_rx doesn't clear tls_ctx->netdev after
    calling tls_dev_del if TLX TX offload is also enabled. Clearing
    tls_ctx->netdev gets postponed until tls_device_gc_task. It leaves a
    time frame when tls_device_down may get called and call tls_dev_del for
    RX one extra time, confusing the driver, which may lead to a crash.

    This patch corrects this racy behavior by adding a flag to prevent
    tls_device_down from calling tls_dev_del the second time.

    Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure")
    Signed-off-by: Maxim Mikityanskiy
    Signed-off-by: Saeed Mahameed
    Link: https://lore.kernel.org/r/20201125221810.69870-1-saeedm@nvidia.com
    Signed-off-by: Jakub Kicinski

    Maxim Mikityanskiy
     

21 Nov, 2020

1 commit

  • In case when tcp socket received FIN after some data and the
    parser haven't started before reading data caller will receive
    an empty buffer. This behavior differs from plain TCP socket and
    leads to special treating in user-space.
    The flow that triggers the race is simple. Server sends small
    amount of data right after the connection is configured to use TLS
    and closes the connection. In this case receiver sees TLS Handshake
    data, configures TLS socket right after Change Cipher Spec record.
    While the configuration is in process, TCP socket receives small
    Application Data record, Encrypted Alert record and FIN packet. So
    the TCP socket changes sk_shutdown to RCV_SHUTDOWN and sk_flag with
    SK_DONE bit set. The received data is not parsed upon arrival and is
    never sent to user-space.

    Patch unpauses parser directly if we have unparsed data in tcp
    receive queue.

    Fixes: fcf4793e278e ("tls: check RCV_SHUTDOWN in tls_wait_data")
    Signed-off-by: Vadim Fedorenko
    Link: https://lore.kernel.org/r/1605801588-12236-1-git-send-email-vfedorenko@novek.ru
    Signed-off-by: Jakub Kicinski

    Vadim Fedorenko
     

18 Nov, 2020

1 commit

  • In async_resync mode, we log the TCP seq of records until the async request
    is completed. Later, in case one of the logged seqs matches the resync
    request, we return it, together with its record serial number. Before this
    fix, we mistakenly returned the serial number of the current record
    instead.

    Fixes: ed9b7646b06a ("net/tls: Add asynchronous resync")
    Signed-off-by: Tariq Toukan
    Reviewed-by: Boris Pismenny
    Link: https://lore.kernel.org/r/20201115131448.2702-1-tariqt@nvidia.com
    Signed-off-by: Jakub Kicinski

    Tariq Toukan
     

17 Nov, 2020

1 commit

  • If tcp socket has more data than Encrypted Handshake Message then
    tls_sw_recvmsg will try to decrypt next record instead of returning
    full control message to userspace as mentioned in comment. The next
    message - usually Application Data - gets corrupted because it uses
    zero copy for decryption that's why the data is not stored in skb
    for next iteration. Revert check to not decrypt next record if
    current is not Application Data.

    Fixes: 692d7b5d1f91 ("tls: Fix recvmsg() to be able to peek across multiple records")
    Signed-off-by: Vadim Fedorenko
    Link: https://lore.kernel.org/r/1605413760-21153-1-git-send-email-vfedorenko@novek.ru
    Signed-off-by: Jakub Kicinski

    Vadim Fedorenko
     

16 Oct, 2020

1 commit


14 Oct, 2020

1 commit

  • Replace commas with semicolons. Commas introduce unnecessary
    variability in the code structure and are hard to see. What is done
    is essentially described by the following Coccinelle semantic patch
    (http://coccinelle.lip6.fr/):

    //
    @@ expression e1,e2; @@
    e1
    -,
    +;
    e2
    ... when any
    //

    Signed-off-by: Julia Lawall
    Link: https://lore.kernel.org/r/1602412498-32025-6-git-send-email-Julia.Lawall@inria.fr
    Signed-off-by: Jakub Kicinski

    Julia Lawall
     

10 Oct, 2020

1 commit

  • At first when sendpage gets called, if there is more data, 'more' in
    tls_push_data() gets set which later sets pending_open_record_frags, but
    when there is no more data in file left, and last time tls_push_data()
    gets called, pending_open_record_frags doesn't get reset. And later when
    2 bytes of encrypted alert comes as sendmsg, it first checks for
    pending_open_record_frags, and since this is set, it creates a record with
    0 data bytes to encrypt, meaning record length is prepend_size + tag_size
    only, which causes problem.
    We should set/reset pending_open_record_frags based on more bit.

    Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure")
    Signed-off-by: Rohit Maheshwari
    Signed-off-by: Jakub Kicinski

    Rohit Maheshwari
     

06 Oct, 2020

1 commit


25 Sep, 2020

1 commit

  • BUG: kernel NULL pointer dereference, address: 00000000000000b8
    #PF: supervisor read access in kernel mode
    #PF: error_code(0x0000) - not-present page
    PGD 80000008b6fef067 P4D 80000008b6fef067 PUD 8b6fe6067 PMD 0
    Oops: 0000 [#1] SMP PTI
    CPU: 12 PID: 23871 Comm: kworker/12:80 Kdump: loaded Tainted: G S
    5.9.0-rc3+ #1
    Hardware name: Supermicro X10SRA-F/X10SRA-F, BIOS 2.1 03/29/2018
    Workqueue: events tx_work_handler [tls]
    RIP: 0010:tx_work_handler+0x1b/0x70 [tls]
    Code: dc fe ff ff e8 16 d4 a3 f6 66 0f 1f 44 00 00 0f 1f 44 00 00 55 53 48 8b
    6f 58 48 8b bd a0 04 00 00 48 85 ff 74 1c 48 8b 47 28 8b 90 b8 00 00 00 83
    e2 02 75 0c f0 48 0f ba b0 b8 00 00 00 00
    RSP: 0018:ffffa44ace61fe88 EFLAGS: 00010286
    RAX: 0000000000000000 RBX: ffff91da9e45cc30 RCX: dead000000000122
    RDX: 0000000000000001 RSI: ffff91da9e45cc38 RDI: ffff91d95efac200
    RBP: ffff91da133fd780 R08: 0000000000000000 R09: 000073746e657665
    R10: 8080808080808080 R11: 0000000000000000 R12: ffff91dad7d30700
    R13: ffff91dab6561080 R14: 0ffff91dad7d3070 R15: ffff91da9e45cc38
    FS: 0000000000000000(0000) GS:ffff91dad7d00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00000000000000b8 CR3: 0000000906478003 CR4: 00000000003706e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
    process_one_work+0x1a7/0x370
    worker_thread+0x30/0x370
    ? process_one_work+0x370/0x370
    kthread+0x114/0x130
    ? kthread_park+0x80/0x80
    ret_from_fork+0x22/0x30

    tls_sw_release_resources_tx() waits for encrypt_pending, which
    can have race, so we need similar changes as in commit
    0cada33241d9de205522e3858b18e506ca5cce2c here as well.

    Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption of records for performance")
    Signed-off-by: Rohit Maheshwari
    Acked-by: Jakub Kicinski
    Signed-off-by: David S. Miller

    Rohit Maheshwari
     

02 Sep, 2020

1 commit

  • Implement the getsockopt SOL_TLS TLS_RX which is currently missing. The
    primary usecase is to use it in conjunction with TCP_REPAIR to
    checkpoint/restore the TLS record layer state.

    TLS connection state usually exists on the user space library. So
    basically we can easily extract it from there, but when the TLS
    connections are delegated to the kTLS, it is not the case. We need to
    have a way to extract the TLS state from the kernel for both of TX and
    RX side.

    The new TLS_RX getsockopt copies the crypto_info to user in the same
    way as TLS_TX does.

    We have described use cases in our research work in Netdev 0x14
    Transport Workshop [1].

    Also, there is an TLS implementation called tlse [2] which supports
    TLS connection migration. They have support of kTLS and their code
    shows that they are expecting the future support of this option.

    [1] https://speakerdeck.com/yutarohayakawa/prism-proxies-without-the-pain
    [2] https://github.com/eduardsui/tlse

    Signed-off-by: Yutaro Hayakawa
    Reviewed-by: Jakub Kicinski
    Signed-off-by: David S. Miller

    Yutaro Hayakawa
     

12 Aug, 2020

1 commit

  • When MSG_OOB is specified to tls_device_sendpage() the mapped page is
    never unmapped.

    Hold off mapping the page until after the flags are checked and the page
    is actually needed.

    Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure")
    Signed-off-by: Ira Weiny
    Reviewed-by: Jakub Kicinski
    Signed-off-by: David S. Miller

    Ira Weiny
     

08 Aug, 2020

1 commit

  • Trying to use ktls on a system with 32-bit userspace and 64-bit kernel
    results in a EOPNOTSUPP message during sendmsg:

    setsockopt(3, SOL_TLS, TLS_TX, …, 40) = 0
    sendmsg(3, …, msg_flags=0}, 0) = -1 EOPNOTSUPP (Operation not supported)

    The tls_sw implementation does strict flag checking and does not allow
    the MSG_CMSG_COMPAT flag, which is set if the message comes in through
    the compat syscall.

    This patch adds MSG_CMSG_COMPAT to the flag check to allow the usage of
    the TLS SW implementation on systems using the compat syscall path.

    Note that the same check is present in the sendmsg path for the TLS
    device implementation, however the flag hasn't been added there for lack
    of testing hardware.

    Signed-off-by: Rouven Czerwinski
    Signed-off-by: David S. Miller

    Rouven Czerwinski
     

06 Aug, 2020

1 commit

  • Pull networking updates from David Miller:

    1) Support 6Ghz band in ath11k driver, from Rajkumar Manoharan.

    2) Support UDP segmentation in code TSO code, from Eric Dumazet.

    3) Allow flashing different flash images in cxgb4 driver, from Vishal
    Kulkarni.

    4) Add drop frames counter and flow status to tc flower offloading,
    from Po Liu.

    5) Support n-tuple filters in cxgb4, from Vishal Kulkarni.

    6) Various new indirect call avoidance, from Eric Dumazet and Brian
    Vazquez.

    7) Fix BPF verifier failures on 32-bit pointer arithmetic, from
    Yonghong Song.

    8) Support querying and setting hardware address of a port function via
    devlink, use this in mlx5, from Parav Pandit.

    9) Support hw ipsec offload on bonding slaves, from Jarod Wilson.

    10) Switch qca8k driver over to phylink, from Jonathan McDowell.

    11) In bpftool, show list of processes holding BPF FD references to
    maps, programs, links, and btf objects. From Andrii Nakryiko.

    12) Several conversions over to generic power management, from Vaibhav
    Gupta.

    13) Add support for SO_KEEPALIVE et al. to bpf_setsockopt(), from Dmitry
    Yakunin.

    14) Various https url conversions, from Alexander A. Klimov.

    15) Timestamping and PHC support for mscc PHY driver, from Antoine
    Tenart.

    16) Support bpf iterating over tcp and udp sockets, from Yonghong Song.

    17) Support 5GBASE-T i40e NICs, from Aleksandr Loktionov.

    18) Add kTLS RX HW offload support to mlx5e, from Tariq Toukan.

    19) Fix the ->ndo_start_xmit() return type to be netdev_tx_t in several
    drivers. From Luc Van Oostenryck.

    20) XDP support for xen-netfront, from Denis Kirjanov.

    21) Support receive buffer autotuning in MPTCP, from Florian Westphal.

    22) Support EF100 chip in sfc driver, from Edward Cree.

    23) Add XDP support to mvpp2 driver, from Matteo Croce.

    24) Support MPTCP in sock_diag, from Paolo Abeni.

    25) Commonize UDP tunnel offloading code by creating udp_tunnel_nic
    infrastructure, from Jakub Kicinski.

    26) Several pci_ --> dma_ API conversions, from Christophe JAILLET.

    27) Add FLOW_ACTION_POLICE support to mlxsw, from Ido Schimmel.

    28) Add SK_LOOKUP bpf program type, from Jakub Sitnicki.

    29) Refactor a lot of networking socket option handling code in order to
    avoid set_fs() calls, from Christoph Hellwig.

    30) Add rfc4884 support to icmp code, from Willem de Bruijn.

    31) Support TBF offload in dpaa2-eth driver, from Ioana Ciornei.

    32) Support XDP_REDIRECT in qede driver, from Alexander Lobakin.

    33) Support PCI relaxed ordering in mlx5 driver, from Aya Levin.

    34) Support TCP syncookies in MPTCP, from Flowian Westphal.

    35) Fix several tricky cases of PMTU handling wrt. briding, from Stefano
    Brivio.

    * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2056 commits)
    net: thunderx: initialize VF's mailbox mutex before first usage
    usb: hso: remove bogus check for EINPROGRESS
    usb: hso: no complaint about kmalloc failure
    hso: fix bailout in error case of probe
    ip_tunnel_core: Fix build for archs without _HAVE_ARCH_IPV6_CSUM
    selftests/net: relax cpu affinity requirement in msg_zerocopy test
    mptcp: be careful on subflow creation
    selftests: rtnetlink: make kci_test_encap() return sub-test result
    selftests: rtnetlink: correct the final return value for the test
    net: dsa: sja1105: use detected device id instead of DT one on mismatch
    tipc: set ub->ifindex for local ipv6 address
    ipv6: add ipv6_dev_find()
    net: openvswitch: silence suspicious RCU usage warning
    Revert "vxlan: fix tos value before xmit"
    ptp: only allow phase values lower than 1 period
    farsync: switch from 'pci_' to 'dma_' API
    wan: wanxl: switch from 'pci_' to 'dma_' API
    hv_netvsc: do not use VF device if link is down
    dpaa2-eth: Fix passing zero to 'PTR_ERR' warning
    net: macb: Properly handle phylink on at91sam9x
    ...

    Linus Torvalds
     

29 Jul, 2020

1 commit

  • sockptr_advance never properly worked. Replace it with _offset variants
    of copy_from_sockptr and copy_to_sockptr.

    Fixes: ba423fdaa589 ("net: add a new sockptr_t type")
    Reported-by: Jason A. Donenfeld
    Reported-by: Ido Schimmel
    Signed-off-by: Christoph Hellwig
    Acked-by: Jason A. Donenfeld
    Tested-by: Ido Schimmel
    Signed-off-by: David S. Miller

    Christoph Hellwig
     

25 Jul, 2020

1 commit

  • Rework the remaining setsockopt code to pass a sockptr_t instead of a
    plain user pointer. This removes the last remaining set_fs(KERNEL_DS)
    outside of architecture specific code.

    Signed-off-by: Christoph Hellwig
    Acked-by: Stefan Schmidt [ieee802154]
    Acked-by: Matthieu Baerts
    Signed-off-by: David S. Miller

    Christoph Hellwig
     

17 Jul, 2020

1 commit

  • Using uninitialized_var() is dangerous as it papers over real bugs[1]
    (or can in the future), and suppresses unrelated compiler warnings
    (e.g. "unused variable"). If the compiler thinks it is uninitialized,
    either simply initialize the variable or make compiler changes.

    In preparation for removing[2] the[3] macro[4], remove all remaining
    needless uses with the following script:

    git grep '\buninitialized_var\b' | cut -d: -f1 | sort -u | \
    xargs perl -pi -e \
    's/\buninitialized_var\(([^\)]+)\)/\1/g;
    s:\s*/\* (GCC be quiet|to make compiler happy) \*/$::g;'

    drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid
    pathological white-space.

    No outstanding warnings were found building allmodconfig with GCC 9.3.0
    for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64,
    alpha, and m68k.

    [1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/
    [2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/
    [3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/
    [4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/

    Reviewed-by: Leon Romanovsky # drivers/infiniband and mlx4/mlx5
    Acked-by: Jason Gunthorpe # IB
    Acked-by: Kalle Valo # wireless drivers
    Reviewed-by: Chao Yu # erofs
    Signed-off-by: Kees Cook

    Kees Cook
     

28 Jun, 2020

2 commits

  • This patch adds support for asynchronous resynchronization in tls_device.
    Async resync follows two distinct stages:

    1. The NIC driver indicates that it would like to resync on some TLS
    record within the received packet (P), but the driver does not
    know (yet) which of the TLS records within the packet.
    At this stage, the NIC driver will query the device to find the exact
    TCP sequence for resync (tcpsn), however, the driver does not wait
    for the device to provide the response.

    2. Eventually, the device responds, and the driver provides the tcpsn
    within the resync packet to KTLS. Now, KTLS can check the tcpsn against
    any processed TLS records within packet P, and also against any record
    that is processed in the future within packet P.

    The asynchronous resync path simplifies the device driver, as it can
    save bits on the packet completion (32-bit TCP sequence), and pass this
    information on an asynchronous command instead.

    Signed-off-by: Boris Pismenny
    Signed-off-by: Tariq Toukan
    Signed-off-by: Saeed Mahameed

    Boris Pismenny
     
  • This reverts commit b3ae2459f89773adcbf16fef4b68deaaa3be1929.
    Revert the force resync API.
    Not in use. To be replaced by a better async resync API downstream.

    Signed-off-by: Boris Pismenny
    Signed-off-by: Tariq Toukan
    Reviewed-by: Maxim Mikityanskiy
    Signed-off-by: Saeed Mahameed

    Boris Pismenny
     

14 Jun, 2020

1 commit

  • Since commit 84af7a6194e4 ("checkpatch: kconfig: prefer 'help' over
    '---help---'"), the number of '---help---' has been gradually
    decreasing, but there are still more than 2400 instances.

    This commit finishes the conversion. While I touched the lines,
    I also fixed the indentation.

    There are a variety of indentation styles found.

    a) 4 spaces + '---help---'
    b) 7 spaces + '---help---'
    c) 8 spaces + '---help---'
    d) 1 space + 1 tab + '---help---'
    e) 1 tab + '---help---' (correct indentation)
    f) 1 tab + 1 space + '---help---'
    g) 1 tab + 2 spaces + '---help---'

    In order to convert all of them to 1 tab + 'help', I ran the
    following commend:

    $ find . -name 'Kconfig*' | xargs sed -i 's/^[[:space:]]*---help---/\thelp/'

    Signed-off-by: Masahiro Yamada

    Masahiro Yamada
     

11 Jun, 2020

1 commit

  • Pull READ/WRITE_ONCE rework from Will Deacon:
    "This the READ_ONCE rework I've been working on for a while, which
    bumps the minimum GCC version and improves code-gen on arm64 when
    stack protector is enabled"

    [ Side note: I'm _really_ tempted to raise the minimum gcc version to
    4.9, so that we can just say that we require _Generic() support.

    That would allow us to more cleanly handle a lot of the cases where we
    depend on very complex macros with 'sizeof' or __builtin_choose_expr()
    with __builtin_types_compatible_p() etc.

    This branch has a workaround for sparse not handling _Generic(),
    either, but that was already fixed in the sparse development branch,
    so it's really just gcc-4.9 that we'd require. - Linus ]

    * 'rwonce/rework' of git://git.kernel.org/pub/scm/linux/kernel/git/will/linux:
    compiler_types.h: Use unoptimized __unqual_scalar_typeof for sparse
    compiler_types.h: Optimize __unqual_scalar_typeof compilation time
    compiler.h: Enforce that READ_ONCE_NOCHECK() access size is sizeof(long)
    compiler-types.h: Include naked type in __pick_integer_type() match
    READ_ONCE: Fix comment describing 2x32-bit atomicity
    gcov: Remove old GCC 3.4 support
    arm64: barrier: Use '__unqual_scalar_typeof' for acquire/release macros
    locking/barriers: Use '__unqual_scalar_typeof' for load-acquire macros
    READ_ONCE: Drop pointer qualifiers when reading from scalar types
    READ_ONCE: Enforce atomicity for {READ,WRITE}_ONCE() memory accesses
    READ_ONCE: Simplify implementations of {READ,WRITE}_ONCE()
    arm64: csum: Disable KASAN for do_csum()
    fault_inject: Don't rely on "return value" from WRITE_ONCE()
    net: tls: Avoid assigning 'const' pointer to non-const pointer
    netfilter: Avoid assigning 'const' pointer to non-const pointer
    compiler/gcc: Raise minimum GCC version for kernel builds to 4.8

    Linus Torvalds
     

02 Jun, 2020

1 commit

  • KTLS uses a stream parser to collect TLS messages and send them to
    the upper layer tls receive handler. This ensures the tls receiver
    has a full TLS header to parse when it is run. However, when a
    socket has BPF_SK_SKB_STREAM_VERDICT program attached before KTLS
    is enabled we end up with two stream parsers running on the same
    socket.

    The result is both try to run on the same socket. First the KTLS
    stream parser runs and calls read_sock() which will tcp_read_sock
    which in turn calls tcp_rcv_skb(). This dequeues the skb from the
    sk_receive_queue. When this is done KTLS code then data_ready()
    callback which because we stacked KTLS on top of the bpf stream
    verdict program has been replaced with sk_psock_start_strp(). This
    will in turn kick the stream parser again and eventually do the
    same thing KTLS did above calling into tcp_rcv_skb() and dequeuing
    a skb from the sk_receive_queue.

    At this point the data stream is broke. Part of the stream was
    handled by the KTLS side some other bytes may have been handled
    by the BPF side. Generally this results in either missing data
    or more likely a "Bad Message" complaint from the kTLS receive
    handler as the BPF program steals some bytes meant to be in a
    TLS header and/or the TLS header length is no longer correct.

    We've already broke the idealized model where we can stack ULPs
    in any order with generic callbacks on the TX side to handle this.
    So in this patch we do the same thing but for RX side. We add
    a sk_psock_strp_enabled() helper so TLS can learn a BPF verdict
    program is running and add a tls_sw_has_ctx_rx() helper so BPF
    side can learn there is a TLS ULP on the socket.

    Then on BPF side we omit calling our stream parser to avoid
    breaking the data stream for the KTLS receiver. Then on the
    KTLS side we call BPF_SK_SKB_STREAM_VERDICT once the KTLS
    receiver is done with the packet but before it posts the
    msg to userspace. This gives us symmetry between the TX and
    RX halfs and IMO makes it usable again. On the TX side we
    process packets in this order BPF -> TLS -> TCP and on
    the receive side in the reverse order TCP -> TLS -> BPF.

    Discovered while testing OpenSSL 3.0 Alpha2.0 release.

    Fixes: d829e9c4112b5 ("tls: convert to generic sk_msg interface")
    Signed-off-by: John Fastabend
    Signed-off-by: Alexei Starovoitov
    Link: https://lore.kernel.org/bpf/159079361946.5745.605854335665044485.stgit@john-Precision-5820-Tower
    Signed-off-by: Alexei Starovoitov

    John Fastabend
     

01 Jun, 2020

1 commit

  • xdp_umem.c had overlapping changes between the 64-bit math fix
    for the calculation of npgs and the removal of the zerocopy
    memory type which got rid of the chunk_size_nohdr member.

    The mlx5 Kconfig conflict is a case where we just take the
    net-next copy of the Kconfig entry dependency as it takes on
    the ESWITCH dependency by one level of indirection which is
    what the 'net' conflicting change is trying to ensure.

    Signed-off-by: David S. Miller

    David S. Miller
     

28 May, 2020

1 commit

  • This patch adds a field to the tls rx offload context which enables
    drivers to force a send_resync call.

    This field can be used by drivers to request a resync at the next
    possible tls record. It is beneficial for hardware that provides the
    resync sequence number asynchronously. In such cases, the packet that
    triggered the resync does not contain the information required for a
    resync. Instead, the driver requests resync for all the following
    TLS record until the asynchronous notification with the resync request
    TCP sequence arrives.

    A following series for mlx5e ConnectX-6DX TLS RX offload support will
    use this mechanism.

    Signed-off-by: Boris Pismenny
    Signed-off-by: Tariq Toukan
    Reviewed-by: Maxim Mikityanskiy
    Reviewed-by: Saeed Mahameed
    Signed-off-by: David S. Miller

    Tariq Toukan
     

26 May, 2020

1 commit

  • tls_sw_recvmsg() and tls_decrypt_done() can be run concurrently.
    // tls_sw_recvmsg()
    if (atomic_read(&ctx->decrypt_pending))
    crypto_wait_req(-EINPROGRESS, &ctx->async_wait);
    else
    reinit_completion(&ctx->async_wait.completion);

    //tls_decrypt_done()
    pending = atomic_dec_return(&ctx->decrypt_pending);

    if (!pending && READ_ONCE(ctx->async_notify))
    complete(&ctx->async_wait.completion);

    Consider the scenario tls_decrypt_done() is about to run complete()

    if (!pending && READ_ONCE(ctx->async_notify))

    and tls_sw_recvmsg() reads decrypt_pending == 0, does reinit_completion(),
    then tls_decrypt_done() runs complete(). This sequence of execution
    results in wrong completion. Consequently, for next decrypt request,
    it will not wait for completion, eventually on connection close, crypto
    resources freed, there is no way to handle pending decrypt response.

    This race condition can be avoided by having atomic_read() mutually
    exclusive with atomic_dec_return(),complete().Intoduced spin lock to
    ensure the mutual exclution.

    Addressed similar problem in tx direction.

    v1->v2:
    - More readable commit message.
    - Corrected the lock to fix new race scenario.
    - Removed barrier which is not needed now.

    Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption of records for performance")
    Signed-off-by: Vinay Kumar Yadav
    Reviewed-by: Jakub Kicinski
    Signed-off-by: David S. Miller

    Vinay Kumar Yadav
     

22 May, 2020

2 commits

  • We cannot free record on any transient error because it leads to
    losing previos data. Check socket error to know whether record must
    be freed or not.

    Fixes: d10523d0b3d7 ("net/tls: free the record on encryption error")
    Signed-off-by: Vadim Fedorenko
    Signed-off-by: David S. Miller

    Vadim Fedorenko
     
  • bpf_exec_tx_verdict() can return negative value for copied
    variable. In that case this value will be pushed back to caller
    and the real error code will be lost. Fix it using signed type and
    checking for positive value.

    Fixes: d10523d0b3d7 ("net/tls: free the record on encryption error")
    Fixes: d3b18ad31f93 ("tls: add bpf support to sk_msg handling")
    Signed-off-by: Vadim Fedorenko
    Signed-off-by: David S. Miller

    Vadim Fedorenko
     

28 Apr, 2020

2 commits

  • tls_data_ready() invokes sk_psock_get(), which returns a reference of
    the specified sk_psock object to "psock" with increased refcnt.

    When tls_data_ready() returns, local variable "psock" becomes invalid,
    so the refcount should be decreased to keep refcount balanced.

    The reference counting issue happens in one exception handling path of
    tls_data_ready(). When "psock->ingress_msg" is empty but "psock" is not
    NULL, the function forgets to decrease the refcnt increased by
    sk_psock_get(), causing a refcnt leak.

    Fix this issue by calling sk_psock_put() on all paths when "psock" is
    not NULL.

    Signed-off-by: Xiyu Yang
    Signed-off-by: Xin Tan
    Signed-off-by: David S. Miller

    Xiyu Yang
     
  • bpf_exec_tx_verdict() invokes sk_psock_get(), which returns a reference
    of the specified sk_psock object to "psock" with increased refcnt.

    When bpf_exec_tx_verdict() returns, local variable "psock" becomes
    invalid, so the refcount should be decreased to keep refcount balanced.

    The reference counting issue happens in one exception handling path of
    bpf_exec_tx_verdict(). When "policy" equals to NULL but "psock" is not
    NULL, the function forgets to decrease the refcnt increased by
    sk_psock_get(), causing a refcnt leak.

    Fix this issue by calling sk_psock_put() on this error path before
    bpf_exec_tx_verdict() returns.

    Signed-off-by: Xiyu Yang
    Signed-off-by: Xin Tan
    Signed-off-by: David S. Miller

    Xiyu Yang
     

16 Apr, 2020

1 commit

  • tls_build_proto() uses WRITE_ONCE() to assign a 'const' pointer to a
    'non-const' pointer. Cleanups to the implementation of WRITE_ONCE() mean
    that this will give rise to a compiler warning, just like a plain old
    assignment would do:

    | net/tls/tls_main.c: In function ‘tls_build_proto’:
    | ./include/linux/compiler.h:229:30: warning: assignment discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
    | net/tls/tls_main.c:640:4: note: in expansion of macro ‘smp_store_release’
    | 640 | smp_store_release(&saved_tcpv6_prot, prot);
    | | ^~~~~~~~~~~~~~~~~

    Drop the const qualifier from the local 'prot' variable, as it isn't
    needed.

    Cc: Boris Pismenny
    Cc: Aviad Yehezkel
    Cc: John Fastabend
    Cc: Daniel Borkmann
    Signed-off-by: Will Deacon

    Will Deacon
     

09 Apr, 2020

1 commit

  • Building with some experimental patches, I came across a warning
    in the tls code:

    include/linux/compiler.h:215:30: warning: assignment discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
    215 | *(volatile typeof(x) *)&(x) = (val); \
    | ^
    net/tls/tls_main.c:650:4: note: in expansion of macro 'smp_store_release'
    650 | smp_store_release(&saved_tcpv4_prot, prot);

    This appears to be a legitimate warning about assigning a const pointer
    into the non-const 'saved_tcpv4_prot' global. Annotate both the ipv4 and
    ipv6 pointers 'const' to make the code internally consistent.

    Fixes: 5bb4c45d466c ("net/tls: Read sk_prot once when building tls proto ops")
    Signed-off-by: Arnd Bergmann
    Signed-off-by: David S. Miller

    Arnd Bergmann
     

22 Mar, 2020

3 commits

  • sockmap performs lockless writes to sk->sk_prot on the following paths:

    tcp_bpf_{recvmsg|sendmsg} / sock_map_unref
    sk_psock_put
    sk_psock_drop
    sk_psock_restore_proto
    WRITE_ONCE(sk->sk_prot, proto)

    To prevent load/store tearing [1], and to make tooling aware of intentional
    shared access [2], we need to annotate other sites that access sk_prot with
    READ_ONCE/WRITE_ONCE macros.

    Change done with Coccinelle with following semantic patch:

    @@
    expression E;
    identifier I;
    struct sock *sk;
    identifier sk_prot =~ "^sk_prot$";
    @@
    (
    E =
    -sk->sk_prot
    +READ_ONCE(sk->sk_prot)
    |
    -sk->sk_prot = E
    +WRITE_ONCE(sk->sk_prot, E)
    |
    -sk->sk_prot
    +READ_ONCE(sk->sk_prot)
    ->I
    )

    Signed-off-by: Jakub Sitnicki
    Signed-off-by: David S. Miller

    Jakub Sitnicki
     
  • Apart from being a "tremendous" win when it comes to generated machine
    code (see bloat-o-meter output for x86-64 below) this mainly prepares
    ground for annotating access to sk_prot with READ_ONCE, so that we don't
    pepper the code with access annotations and needlessly repeat loads.

    add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-46 (-46)
    Function old new delta
    tls_init 851 805 -46
    Total: Before=21063, After=21017, chg -0.22%

    Signed-off-by: Jakub Sitnicki
    Signed-off-by: David S. Miller

    Jakub Sitnicki
     
  • The helper that builds kTLS proto ops doesn't need to and should not modify
    the base proto ops. Annotate the parameter as read-only.

    Signed-off-by: Jakub Sitnicki
    Signed-off-by: David S. Miller

    Jakub Sitnicki
     

22 Feb, 2020

2 commits

  • Daniel Borkmann says:

    ====================
    pull-request: bpf-next 2020-02-21

    The following pull-request contains BPF updates for your *net-next* tree.

    We've added 25 non-merge commits during the last 4 day(s) which contain
    a total of 33 files changed, 2433 insertions(+), 161 deletions(-).

    The main changes are:

    1) Allow for adding TCP listen sockets into sock_map/hash so they can be used
    with reuseport BPF programs, from Jakub Sitnicki.

    2) Add a new bpf_program__set_attach_target() helper for adding libbpf support
    to specify the tracepoint/function dynamically, from Eelco Chaudron.

    3) Add bpf_read_branch_records() BPF helper which helps use cases like profile
    guided optimizations, from Daniel Xu.

    4) Enable bpf_perf_event_read_value() in all tracing programs, from Song Liu.

    5) Relax BTF mandatory check if only used for libbpf itself e.g. to process
    BTF defined maps, from Andrii Nakryiko.

    6) Move BPF selftests -mcpu compilation attribute from 'probe' to 'v3' as it has
    been observed that former fails in envs with low memlock, from Yonghong Song.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • sk_msg and ULP frameworks override protocol callbacks pointer in
    sk->sk_prot, while tcp accesses it locklessly when cloning the listening
    socket, that is with neither sk_lock nor sk_callback_lock held.

    Once we enable use of listening sockets with sockmap (and hence sk_msg),
    there will be shared access to sk->sk_prot if socket is getting cloned
    while being inserted/deleted to/from the sockmap from another CPU:

    Read side:

    tcp_v4_rcv
    sk = __inet_lookup_skb(...)
    tcp_check_req(sk)
    inet_csk(sk)->icsk_af_ops->syn_recv_sock
    tcp_v4_syn_recv_sock
    tcp_create_openreq_child
    inet_csk_clone_lock
    sk_clone_lock
    READ_ONCE(sk->sk_prot)

    Write side:

    sock_map_ops->map_update_elem
    sock_map_update_elem
    sock_map_update_common
    sock_map_link_no_progs
    tcp_bpf_init
    tcp_bpf_update_sk_prot
    sk_psock_update_proto
    WRITE_ONCE(sk->sk_prot, ops)

    sock_map_ops->map_delete_elem
    sock_map_delete_elem
    __sock_map_delete
    sock_map_unref
    sk_psock_put
    sk_psock_drop
    sk_psock_restore_proto
    tcp_update_ulp
    WRITE_ONCE(sk->sk_prot, proto)

    Mark the shared access with READ_ONCE/WRITE_ONCE annotations.

    Signed-off-by: Jakub Sitnicki
    Signed-off-by: Daniel Borkmann
    Link: https://lore.kernel.org/bpf/20200218171023.844439-2-jakub@cloudflare.com

    Jakub Sitnicki
     

20 Feb, 2020

1 commit

  • Current code doesn't check if tcp sequence number is starting from (/after)
    1st record's start sequnce number. It only checks if seq number is before
    1st record's end sequnce number. This problem will always be a possibility
    in re-transmit case. If a record which belongs to a requested seq number is
    already deleted, tls_get_record will start looking into list and as per the
    check it will look if seq number is before the end seq of 1st record, which
    will always be true and will return 1st record always, it should in fact
    return NULL.
    As part of the fix, start looking each record only if the sequence number
    lies in the list else return NULL.
    There is one more check added, driver look for the start marker record to
    handle tcp packets which are before the tls offload start sequence number,
    hence return 1st record if the record is tls start marker and seq number is
    before the 1st record's starting sequence number.

    Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure")
    Signed-off-by: Rohit Maheshwari
    Reviewed-by: Jakub Kicinski
    Signed-off-by: David S. Miller

    Rohit Maheshwari
     

20 Jan, 2020

1 commit


16 Jan, 2020

3 commits

  • Daniel Borkmann says:

    ====================
    pull-request: bpf 2020-01-15

    The following pull-request contains BPF updates for your *net* tree.

    We've added 12 non-merge commits during the last 9 day(s) which contain
    a total of 13 files changed, 95 insertions(+), 43 deletions(-).

    The main changes are:

    1) Fix refcount leak for TCP time wait and request sockets for socket lookup
    related BPF helpers, from Lorenz Bauer.

    2) Fix wrong verification of ARSH instruction under ALU32, from Daniel Borkmann.

    3) Batch of several sockmap and related TLS fixes found while operating
    more complex BPF programs with Cilium and OpenSSL, from John Fastabend.

    4) Fix sockmap to read psock's ingress_msg queue before regular sk_receive_queue()
    to avoid purging data upon teardown, from Lingpeng Chen.

    5) Fix printing incorrect pointer in bpftool's btf_dump_ptr() in order to properly
    dump a BPF map's value with BTF, from Martin KaFai Lau.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • When user returns SK_DROP we need to reset the number of copied bytes
    to indicate to the user the bytes were dropped and not sent. If we
    don't reset the copied arg sendmsg will return as if those bytes were
    copied giving the user a positive return value.

    This works as expected today except in the case where the user also
    pops bytes. In the pop case the sg.size is reduced but we don't correctly
    account for this when copied bytes is reset. The popped bytes are not
    accounted for and we return a small positive value potentially confusing
    the user.

    The reason this happens is due to a typo where we do the wrong comparison
    when accounting for pop bytes. In this fix notice the if/else is not
    needed and that we have a similar problem if we push data except its not
    visible to the user because if delta is larger the sg.size we return a
    negative value so it appears as an error regardless.

    Fixes: 7246d8ed4dcce ("bpf: helper to pop data from messages")
    Signed-off-by: John Fastabend
    Signed-off-by: Daniel Borkmann
    Acked-by: Jonathan Lemon
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/bpf/20200111061206.8028-9-john.fastabend@gmail.com

    John Fastabend
     
  • Its possible through a set of push, pop, apply helper calls to construct
    a skmsg, which is just a ring of scatterlist elements, with the start
    value larger than the end value. For example,

    end start
    |_0_|_1_| ... |_n_|_n+1_|

    Where end points at 1 and start points and n so that valid elements is
    the set {n, n+1, 0, 1}.

    Currently, because we don't build the correct chain only {n, n+1} will
    be sent. This adds a check and sg_chain call to correctly submit the
    above to the crypto and tls send path.

    Fixes: d3b18ad31f93d ("tls: add bpf support to sk_msg handling")
    Signed-off-by: John Fastabend
    Signed-off-by: Daniel Borkmann
    Acked-by: Jonathan Lemon
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/bpf/20200111061206.8028-8-john.fastabend@gmail.com

    John Fastabend