13 Jan, 2019

1 commit

  • [ Upstream commit 6c0563e442528733219afe15c749eb2cc365da3f ]

    create_ctx is called from tls_init and tls_hw_prot
    hence initialize function pointers in common routine.

    Signed-off-by: Atul Gupta
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin

    Atul Gupta
     

10 Jan, 2019

1 commit

  • [ Upstream commit c6ec179a0082e2e76e3a72050c2b99d3d0f3da3f ]

    create_ctx can be called from atomic context, hence use
    GFP_ATOMIC instead of GFP_KERNEL.

    [ 395.962599] BUG: sleeping function called from invalid context at mm/slab.h:421
    [ 395.979896] in_atomic(): 1, irqs_disabled(): 0, pid: 16254, name: openssl
    [ 395.996564] 2 locks held by openssl/16254:
    [ 396.010492] #0: 00000000347acb52 (sk_lock-AF_INET){+.+.}, at: do_tcp_setsockopt.isra.44+0x13b/0x9a0
    [ 396.029838] #1: 000000006c9552b5 (device_spinlock){+...}, at: tls_init+0x1d/0x280
    [ 396.047675] CPU: 5 PID: 16254 Comm: openssl Tainted: G O 4.20.0-rc6+ #25
    [ 396.066019] Hardware name: Supermicro X10SRA-F/X10SRA-F, BIOS 2.0c 09/25/2017
    [ 396.083537] Call Trace:
    [ 396.096265] dump_stack+0x5e/0x8b
    [ 396.109876] ___might_sleep+0x216/0x250
    [ 396.123940] kmem_cache_alloc_trace+0x1b0/0x240
    [ 396.138800] create_ctx+0x1f/0x60
    [ 396.152504] tls_init+0xbd/0x280
    [ 396.166135] tcp_set_ulp+0x191/0x2d0
    [ 396.180035] ? tcp_set_ulp+0x2c/0x2d0
    [ 396.193960] do_tcp_setsockopt.isra.44+0x148/0x9a0
    [ 396.209013] __sys_setsockopt+0x7c/0xe0
    [ 396.223054] __x64_sys_setsockopt+0x20/0x30
    [ 396.237378] do_syscall_64+0x4a/0x180
    [ 396.251200] entry_SYSCALL_64_after_hwframe+0x49/0xbe

    Fixes: df9d4a178022 ("net/tls: sleeping function from invalid context")
    Signed-off-by: Ganesh Goudar
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Ganesh Goudar
     

17 Sep, 2018

1 commit

  • In kTLS MSG_PEEK behavior is currently failing, strace example:

    [pid 2430] socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 3
    [pid 2430] socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 4
    [pid 2430] bind(4, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
    [pid 2430] listen(4, 10) = 0
    [pid 2430] getsockname(4, {sa_family=AF_INET, sin_port=htons(38855), sin_addr=inet_addr("0.0.0.0")}, [16]) = 0
    [pid 2430] connect(3, {sa_family=AF_INET, sin_port=htons(38855), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
    [pid 2430] setsockopt(3, SOL_TCP, 0x1f /* TCP_??? */, [7564404], 4) = 0
    [pid 2430] setsockopt(3, 0x11a /* SOL_?? */, 1, "\3\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0
    [pid 2430] accept(4, {sa_family=AF_INET, sin_port=htons(49636), sin_addr=inet_addr("127.0.0.1")}, [16]) = 5
    [pid 2430] setsockopt(5, SOL_TCP, 0x1f /* TCP_??? */, [7564404], 4) = 0
    [pid 2430] setsockopt(5, 0x11a /* SOL_?? */, 2, "\3\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0
    [pid 2430] close(4) = 0
    [pid 2430] sendto(3, "test_read_peek", 14, 0, NULL, 0) = 14
    [pid 2430] sendto(3, "_mult_recs\0", 11, 0, NULL, 0) = 11
    [pid 2430] recvfrom(5, "test_read_peektest_read_peektest"..., 64, MSG_PEEK, NULL, NULL) = 64

    As can be seen from strace, there are two TLS records sent,
    i) 'test_read_peek' and ii) '_mult_recs\0' where we end up
    peeking 'test_read_peektest_read_peektest'. This is clearly
    wrong, and what happens is that given peek cannot call into
    tls_sw_advance_skb() to unpause strparser and proceed with
    the next skb, we end up looping over the current one, copying
    the 'test_read_peek' over and over into the user provided
    buffer.

    Here, we can only peek into the currently held skb (current,
    full TLS record) as otherwise we would end up having to hold
    all the original skb(s) (depending on the peek depth) in a
    separate queue when unpausing strparser to process next
    records, minimally intrusive is to return only up to the
    current record's size (which likely was what c46234ebb4d1
    ("tls: RX path for ktls") originally intended as well). Thus,
    after patch we properly peek the first record:

    [pid 2046] wait4(2075,
    [pid 2075] socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 3
    [pid 2075] socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 4
    [pid 2075] bind(4, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
    [pid 2075] listen(4, 10) = 0
    [pid 2075] getsockname(4, {sa_family=AF_INET, sin_port=htons(55115), sin_addr=inet_addr("0.0.0.0")}, [16]) = 0
    [pid 2075] connect(3, {sa_family=AF_INET, sin_port=htons(55115), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
    [pid 2075] setsockopt(3, SOL_TCP, 0x1f /* TCP_??? */, [7564404], 4) = 0
    [pid 2075] setsockopt(3, 0x11a /* SOL_?? */, 1, "\3\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0
    [pid 2075] accept(4, {sa_family=AF_INET, sin_port=htons(45732), sin_addr=inet_addr("127.0.0.1")}, [16]) = 5
    [pid 2075] setsockopt(5, SOL_TCP, 0x1f /* TCP_??? */, [7564404], 4) = 0
    [pid 2075] setsockopt(5, 0x11a /* SOL_?? */, 2, "\3\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0
    [pid 2075] close(4) = 0
    [pid 2075] sendto(3, "test_read_peek", 14, 0, NULL, 0) = 14
    [pid 2075] sendto(3, "_mult_recs\0", 11, 0, NULL, 0) = 11
    [pid 2075] recvfrom(5, "test_read_peek", 64, MSG_PEEK, NULL, NULL) = 14

    Fixes: c46234ebb4d1 ("tls: RX path for ktls")
    Signed-off-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

14 Sep, 2018

3 commits


09 Sep, 2018

1 commit

  • tls_sw_sendmsg() allocates plaintext and encrypted SG entries using
    function sk_alloc_sg(). In case the number of SG entries hit
    MAX_SKB_FRAGS, sk_alloc_sg() returns -ENOSPC and sets the variable for
    current SG index to '0'. This leads to calling of function
    tls_push_record() with 'sg_encrypted_num_elem = 0' and later causes
    kernel crash. To fix this, set the number of SG elements to the number
    of elements in plaintext/encrypted SG arrays in case sk_alloc_sg()
    returns -ENOSPC.

    Fixes: 3c4d7559159b ("tls: kernel TLS support")
    Signed-off-by: Vakul Garg
    Signed-off-by: David S. Miller

    Vakul Garg
     

23 Aug, 2018

1 commit

  • Currently, the lower protocols sk_write_space handler is not called if
    TLS is sending a scatterlist via tls_push_sg. However, normally
    tls_push_sg calls do_tcp_sendpage, which may be under memory pressure,
    that in turn may trigger a wait via sk_wait_event. Typically, this
    happens when the in-flight bytes exceed the sdnbuf size. In the normal
    case when enough ACKs are received sk_write_space() will be called and
    the sk_wait_event will be woken up allowing it to send more data
    and/or return to the user.

    But, in the TLS case because the sk_write_space() handler does not
    wake up the events the above send will wait until the sndtimeo is
    exceeded. By default this is MAX_SCHEDULE_TIMEOUT so it look like a
    hang to the user (especially this impatient user). To fix this pass
    the sk_write_space event to the lower layers sk_write_space event
    which in the TCP case will wake any pending events.

    I observed the above while integrating sockmap and ktls. It
    initially appeared as test_sockmap (modified to use ktls) occasionally
    hanging. To reliably reproduce this reduce the sndbuf size and stress
    the tls layer by sending many 1B sends. This results in every byte
    needing a header and each byte individually being sent to the crypto
    layer.

    Signed-off-by: John Fastabend
    Acked-by: Dave Watson
    Signed-off-by: Daniel Borkmann

    John Fastabend
     

19 Aug, 2018

1 commit

  • Daniel Borkmann says:

    ====================
    pull-request: bpf 2018-08-18

    The following pull-request contains BPF updates for your *net* tree.

    The main changes are:

    1) Fix a BPF selftest failure in test_cgroup_storage due to rlimit
    restrictions, from Yonghong.

    2) Fix a suspicious RCU rcu_dereference_check() warning triggered
    from removing a device's XDP memory allocator by using the correct
    rhashtable lookup function, from Tariq.

    3) A batch of BPF sockmap and ULP fixes mainly fixing leaks and races
    as well as enforcing module aliases for ULPs. Another fix for BPF
    map redirect to make them work again with tail calls, from Daniel.

    4) Fix XDP BPF samples to unload their programs upon SIGTERM, from Jesper.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

17 Aug, 2018

1 commit

  • Lets not turn the TCP ULP lookup into an arbitrary module loader as
    we only intend to load ULP modules through this mechanism, not other
    unrelated kernel modules:

    [root@bar]# cat foo.c
    #include
    #include
    #include
    #include

    int main(void)
    {
    int sock = socket(PF_INET, SOCK_STREAM, 0);
    setsockopt(sock, IPPROTO_TCP, TCP_ULP, "sctp", sizeof("sctp"));
    return 0;
    }

    [root@bar]# gcc foo.c -O2 -Wall
    [root@bar]# lsmod | grep sctp
    [root@bar]# ./a.out
    [root@bar]# lsmod | grep sctp
    sctp 1077248 4
    libcrc32c 16384 3 nf_conntrack,nf_nat,sctp
    [root@bar]#

    Fix it by adding module alias to TCP ULP modules, so probing module
    via request_module() will be limited to tcp-ulp-[name]. The existing
    modules like kTLS will load fine given tcp-ulp-tls alias, but others
    will fail to load:

    [root@bar]# lsmod | grep sctp
    [root@bar]# ./a.out
    [root@bar]# lsmod | grep sctp
    [root@bar]#

    Sockmap is not affected from this since it's either built-in or not.

    Fixes: 734942cc4ea6 ("tcp: ULP infrastructure")
    Signed-off-by: Daniel Borkmann
    Acked-by: John Fastabend
    Acked-by: Song Liu
    Signed-off-by: Alexei Starovoitov

    Daniel Borkmann
     

16 Aug, 2018

1 commit

  • Pull crypto updates from Herbert Xu:
    "API:
    - Fix dcache flushing crash in skcipher.
    - Add hash finup self-tests.
    - Reschedule during speed tests.

    Algorithms:
    - Remove insecure vmac and replace it with vmac64.
    - Add public key verification for DH/ECDH.

    Drivers:
    - Decrease priority of sha-mb on x86.
    - Improve NEON latency/throughput on ARM64.
    - Add md5/sha384/sha512/des/3des to inside-secure.
    - Support eip197d in inside-secure.
    - Only register algorithms supported by the host in virtio.
    - Add cts and remove incompatible cts1 from ccree.
    - Add hisilicon SEC security accelerator driver.
    - Replace msm hwrng driver with qcom pseudo rng driver.

    Misc:
    - Centralize CRC polynomials"

    * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (121 commits)
    crypto: arm64/ghash-ce - implement 4-way aggregation
    crypto: arm64/ghash-ce - replace NEON yield check with block limit
    crypto: hisilicon - sec_send_request() can be static
    lib/mpi: remove redundant variable esign
    crypto: arm64/aes-ce-gcm - don't reload key schedule if avoidable
    crypto: arm64/aes-ce-gcm - implement 2-way aggregation
    crypto: arm64/aes-ce-gcm - operate on two input blocks at a time
    crypto: dh - make crypto_dh_encode_key() make robust
    crypto: dh - fix calculating encoded key size
    crypto: ccp - Check for NULL PSP pointer at module unload
    crypto: arm/chacha20 - always use vrev for 16-bit rotates
    crypto: ccree - allow bigger than sector XTS op
    crypto: ccree - zero all of request ctx before use
    crypto: ccree - remove cipher ivgen left overs
    crypto: ccree - drop useless type flag during reg
    crypto: ablkcipher - fix crash flushing dcache in error path
    crypto: blkcipher - fix crash flushing dcache in error path
    crypto: skcipher - fix crash flushing dcache in error path
    crypto: skcipher - remove unnecessary setting of walk->nbytes
    crypto: scatterwalk - remove scatterwalk_samebuf()
    ...

    Linus Torvalds
     

13 Aug, 2018

1 commit

  • For preparing decryption request, several memory chunks are required
    (aead_req, sgin, sgout, iv, aad). For submitting the decrypt request to
    an accelerator, it is required that the buffers which are read by the
    accelerator must be dma-able and not come from stack. The buffers for
    aad and iv can be separately kmalloced each, but it is inefficient.
    This patch does a combined allocation for preparing decryption request
    and then segments into aead_req || sgin || sgout || iv || aad.

    Signed-off-by: Vakul Garg
    Signed-off-by: David S. Miller

    Vakul Garg
     

06 Aug, 2018

1 commit

  • Function zerocopy_from_iter() unmarks the 'end' in input sgtable while
    adding new entries in it. The last entry in sgtable remained unmarked.
    This results in KASAN error report on using apis like sg_nents(). Before
    returning, the function needs to mark the 'end' in the last entry it
    adds.

    Signed-off-by: Vakul Garg
    Acked-by: Dave Watson
    Signed-off-by: David S. Miller

    Vakul Garg
     

03 Aug, 2018

1 commit


02 Aug, 2018

1 commit


31 Jul, 2018

1 commit

  • On receipt of a complete tls record, use socket's saved data_ready
    callback instead of state_change callback. In function tls_queue(),
    the TLS record is queued in encrypted state. But the decryption
    happen inline when tls_sw_recvmsg() or tls_sw_splice_read() get invoked.
    So it should be ok to notify the waiting context about the availability
    of data as soon as we could collect a full TLS record. For new data
    availability notification, sk_data_ready callback is more appropriate.
    It points to sock_def_readable() which wakes up specifically for EPOLLIN
    event. This is in contrast to the socket callback sk_state_change which
    points to sock_def_wakeup() which issues a wakeup unconditionally
    (without event mask).

    Signed-off-by: Vakul Garg
    Signed-off-by: David S. Miller

    Vakul Garg
     

29 Jul, 2018

2 commits

  • The current code is problematic because the iov_iter is reverted and
    never advanced in the non-error case. This patch skips the revert in the
    non-error case. This patch also fixes the amount by which the iov_iter
    is reverted. Currently, iov_iter is reverted by size, which can be
    greater than the amount by which the iter was actually advanced.
    Instead, only revert by the amount that the iter was advanced.

    Fixes: 4718799817c5 ("tls: Fix zerocopy_from_iter iov handling")
    Signed-off-by: Doron Roberts-Kedes
    Signed-off-by: David S. Miller

    Doron Roberts-Kedes
     
  • tls_push_record either returns 0 on success or a negative value on failure.
    This patch removes code that would only be executed if tls_push_record
    were to return a positive value.

    Signed-off-by: Doron Roberts-Kedes
    Signed-off-by: David S. Miller

    Doron Roberts-Kedes
     

27 Jul, 2018

2 commits

  • The zerocopy path ultimately calls iov_iter_get_pages, which defines the
    step function for ITER_KVECs as simply, return -EFAULT. Taking the
    non-zerocopy path for ITER_KVECs avoids the unnecessary fallback.

    See https://lore.kernel.org/lkml/20150401023311.GL29656@ZenIV.linux.org.uk/T/#u
    for a discussion of why zerocopy for vmalloc data is not a good idea.

    Discovered while testing NBD traffic encrypted with ktls.

    Fixes: c46234ebb4d1 ("tls: RX path for ktls")
    Signed-off-by: Doron Roberts-Kedes
    Signed-off-by: David S. Miller

    Doron Roberts-Kedes
     
  • Removed checks against non-NULL before calling kfree_skb() and
    crypto_free_aead(). These functions are safe to be called with NULL
    as an argument.

    Signed-off-by: Vakul Garg
    Acked-by: Dave Watson
    Signed-off-by: David S. Miller

    Vakul Garg
     

25 Jul, 2018

1 commit


21 Jul, 2018

3 commits


17 Jul, 2018

1 commit

  • In the zerocopy sendmsg() path, there are error checks to revert
    the zerocopy if we get any error code. syzkaller has discovered
    that tls_push_record can return -ECONNRESET, which is fatal, and
    happens after the point at which it is safe to revert the iter,
    as we've already passed the memory to do_tcp_sendpages.

    Previously this code could return -ENOMEM and we would want to
    revert the iter, but AFAIK this no longer returns ENOMEM after
    a447da7d004 ("tls: fix waitall behavior in tls_sw_recvmsg"),
    so we fail for all error codes.

    Reported-by: syzbot+c226690f7b3126c5ee04@syzkaller.appspotmail.com
    Reported-by: syzbot+709f2810a6a05f11d4d3@syzkaller.appspotmail.com
    Signed-off-by: Dave Watson
    Fixes: 3c4d7559159b ("tls: kernel TLS support")
    Signed-off-by: David S. Miller

    Dave Watson
     

16 Jul, 2018

6 commits

  • zerocopy_from_iter iterates over the message, but it doesn't revert the
    updates made by the iov iteration. This patch fixes it. Now, the iov can
    be used after calling zerocopy_from_iter.

    Fixes: 3c4d75591 ("tls: kernel TLS support")
    Signed-off-by: Boris Pismenny
    Signed-off-by: David S. Miller

    Boris Pismenny
     
  • This patch completes the generic infrastructure to offload TLS crypto to a
    network device. It enables the kernel to skip decryption and
    authentication of some skbs marked as decrypted by the NIC. In the fast
    path, all packets received are decrypted by the NIC and the performance
    is comparable to plain TCP.

    This infrastructure doesn't require a TCP offload engine. Instead, the
    NIC only decrypts packets that contain the expected TCP sequence number.
    Out-Of-Order TCP packets are provided unmodified. As a result, at the
    worst case a received TLS record consists of both plaintext and ciphertext
    packets. These partially decrypted records must be reencrypted,
    only to be decrypted.

    The notable differences between SW KTLS Rx and this offload are as
    follows:
    1. Partial decryption - Software must handle the case of a TLS record
    that was only partially decrypted by HW. This can happen due to packet
    reordering.
    2. Resynchronization - tls_read_size calls the device driver to
    resynchronize HW after HW lost track of TLS record framing in
    the TCP stream.

    Signed-off-by: Boris Pismenny
    Signed-off-by: David S. Miller

    Boris Pismenny
     
  • This patch allows tls_set_sw_offload to fill the context in case it was
    already allocated previously.

    We will use it in TLS_DEVICE to fill the RX software context.

    Signed-off-by: Boris Pismenny
    Signed-off-by: David S. Miller

    Boris Pismenny
     
  • This patch splits tls_sw_release_resources_rx into two functions one
    which releases all inner software tls structures and another that also
    frees the containing structure.

    In TLS_DEVICE we will need to release the software structures without
    freeeing the containing structure, which contains other information.

    Signed-off-by: Boris Pismenny
    Signed-off-by: David S. Miller

    Boris Pismenny
     
  • Previously, decrypt_skb also updated the TLS context.
    Now, decrypt_skb only decrypts the payload using the current context,
    while decrypt_skb_update also updates the state.

    Later, in the tls_device Rx flow, we will use decrypt_skb directly.

    Signed-off-by: Boris Pismenny
    Signed-off-by: David S. Miller

    Boris Pismenny
     
  • For symmetry, we rename tls_offload_context to
    tls_offload_context_tx before we add tls_offload_context_rx.

    Signed-off-by: Boris Pismenny
    Signed-off-by: David S. Miller

    Boris Pismenny
     

13 Jul, 2018

1 commit


03 Jul, 2018

2 commits

  • The current code does not inspect the return value of skb_to_sgvec. This
    can cause a nullptr kernel panic when the malformed sgvec is passed into
    the crypto request.

    Checking the return value of skb_to_sgvec and skipping decryption if it
    is negative fixes this problem.

    Fixes: c46234ebb4d1 ("tls: RX path for ktls")
    Acked-by: Dave Watson
    Signed-off-by: Doron Roberts-Kedes
    Signed-off-by: David S. Miller

    Doron Roberts-Kedes
     
  • Simple overlapping changes in stmmac driver.

    Adjust skb_gro_flush_final_remcsum function signature to make GRO list
    changes in net-next, as per Stephen Rothwell's example merge
    resolution.

    Signed-off-by: David S. Miller

    David S. Miller
     

29 Jun, 2018

1 commit

  • The poll() changes were not well thought out, and completely
    unexplained. They also caused a huge performance regression, because
    "->poll()" was no longer a trivial file operation that just called down
    to the underlying file operations, but instead did at least two indirect
    calls.

    Indirect calls are sadly slow now with the Spectre mitigation, but the
    performance problem could at least be largely mitigated by changing the
    "->get_poll_head()" operation to just have a per-file-descriptor pointer
    to the poll head instead. That gets rid of one of the new indirections.

    But that doesn't fix the new complexity that is completely unwarranted
    for the regular case. The (undocumented) reason for the poll() changes
    was some alleged AIO poll race fixing, but we don't make the common case
    slower and more complex for some uncommon special case, so this all
    really needs way more explanations and most likely a fundamental
    redesign.

    [ This revert is a revert of about 30 different commits, not reverted
    individually because that would just be unnecessarily messy - Linus ]

    Cc: Al Viro
    Cc: Christoph Hellwig
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

27 Jun, 2018

1 commit

  • It looks like the prior VLA removal, commit b16520f7493d ("net/tls: Remove
    VLA usage"), and a new VLA addition, commit c46234ebb4d1e ("tls: RX path
    for ktls"), passed in the night. This removes the newly added VLA, which
    happens to have its bounds based on the same max value.

    Signed-off-by: Kees Cook
    Signed-off-by: David S. Miller

    Kees Cook
     

24 Jun, 2018

1 commit


16 Jun, 2018

2 commits

  • Current behavior in tls_sw_recvmsg() is to wait for incoming tls
    messages and copy up to exactly len bytes of data that the user
    provided. This is problematic in the sense that i) if no packet
    is currently queued in strparser we keep waiting until one has been
    processed and pushed into tls receive layer for tls_wait_data() to
    wake up and push the decrypted bits to user space. Given after
    tls decryption, we're back at streaming data, use sock_rcvlowat()
    hint from tcp socket instead. Retain current behavior with MSG_WAITALL
    flag and otherwise use the hint target for breaking the loop and
    returning to application. This is done if currently no ctx->recv_pkt
    is ready, otherwise continue to process it from our strparser
    backlog.

    Fixes: c46234ebb4d1 ("tls: RX path for ktls")
    Signed-off-by: Daniel Borkmann
    Acked-by: Dave Watson
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • syzkaller managed to trigger a use-after-free in tls like the
    following:

    BUG: KASAN: use-after-free in tls_push_record.constprop.15+0x6a2/0x810 [tls]
    Write of size 1 at addr ffff88037aa08000 by task a.out/2317

    CPU: 3 PID: 2317 Comm: a.out Not tainted 4.17.0+ #144
    Hardware name: LENOVO 20FBCTO1WW/20FBCTO1WW, BIOS N1FET47W (1.21 ) 11/28/2016
    Call Trace:
    dump_stack+0x71/0xab
    print_address_description+0x6a/0x280
    kasan_report+0x258/0x380
    ? tls_push_record.constprop.15+0x6a2/0x810 [tls]
    tls_push_record.constprop.15+0x6a2/0x810 [tls]
    tls_sw_push_pending_record+0x2e/0x40 [tls]
    tls_sk_proto_close+0x3fe/0x710 [tls]
    ? tcp_check_oom+0x4c0/0x4c0
    ? tls_write_space+0x260/0x260 [tls]
    ? kmem_cache_free+0x88/0x1f0
    inet_release+0xd6/0x1b0
    __sock_release+0xc0/0x240
    sock_close+0x11/0x20
    __fput+0x22d/0x660
    task_work_run+0x114/0x1a0
    do_exit+0x71a/0x2780
    ? mm_update_next_owner+0x650/0x650
    ? handle_mm_fault+0x2f5/0x5f0
    ? __do_page_fault+0x44f/0xa50
    ? mm_fault_error+0x2d0/0x2d0
    do_group_exit+0xde/0x300
    __x64_sys_exit_group+0x3a/0x50
    do_syscall_64+0x9a/0x300
    ? page_fault+0x8/0x30
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    This happened through fault injection where aead_req allocation in
    tls_do_encryption() eventually failed and we returned -ENOMEM from
    the function. Turns out that the use-after-free is triggered from
    tls_sw_sendmsg() in the second tls_push_record(). The error then
    triggers a jump to waiting for memory in sk_stream_wait_memory()
    resp. returning immediately in case of MSG_DONTWAIT. What follows is
    the trim_both_sgl(sk, orig_size), which drops elements from the sg
    list added via tls_sw_sendmsg(). Now the use-after-free gets triggered
    when the socket is being closed, where tls_sk_proto_close() callback
    is invoked. The tls_complete_pending_work() will figure that there's
    a pending closed tls record to be flushed and thus calls into the
    tls_push_pending_closed_record() from there. ctx->push_pending_record()
    is called from the latter, which is the tls_sw_push_pending_record()
    from sw path. This again calls into tls_push_record(). And here the
    tls_fill_prepend() will panic since the buffer address has been freed
    earlier via trim_both_sgl(). One way to fix it is to move the aead
    request allocation out of tls_do_encryption() early into tls_push_record().
    This means we don't prep the tls header and advance state to the
    TLS_PENDING_CLOSED_RECORD before allocation which could potentially
    fail happened. That fixes the issue on my side.

    Fixes: 3c4d7559159b ("tls: kernel TLS support")
    Reported-by: syzbot+5c74af81c547738e1684@syzkaller.appspotmail.com
    Reported-by: syzbot+709f2810a6a05f11d4d3@syzkaller.appspotmail.com
    Signed-off-by: Daniel Borkmann
    Acked-by: Dave Watson
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

12 Jun, 2018

1 commit

  • While hacking on kTLS, I ran into the following panic from an
    unprivileged netserver / netperf TCP session:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
    PGD 800000037f378067 P4D 800000037f378067 PUD 3c0e61067 PMD 0
    Oops: 0010 [#1] SMP KASAN PTI
    CPU: 1 PID: 2289 Comm: netserver Not tainted 4.17.0+ #139
    Hardware name: LENOVO 20FBCTO1WW/20FBCTO1WW, BIOS N1FET47W (1.21 ) 11/28/2016
    RIP: 0010: (null)
    Code: Bad RIP value.
    RSP: 0018:ffff88036abcf740 EFLAGS: 00010246
    RAX: dffffc0000000000 RBX: ffff88036f5f6800 RCX: 1ffff1006debed26
    RDX: ffff88036abcf920 RSI: ffff8803cb1a4f00 RDI: ffff8803c258c280
    RBP: ffff8803c258c280 R08: ffff8803c258c280 R09: ffffed006f559d48
    R10: ffff88037aacea43 R11: ffffed006f559d49 R12: ffff8803c258c280
    R13: ffff8803cb1a4f20 R14: 00000000000000db R15: ffffffffc168a350
    FS: 00007f7e631f4700(0000) GS:ffff8803d1c80000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: ffffffffffffffd6 CR3: 00000003ccf64005 CR4: 00000000003606e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
    ? tls_sw_poll+0xa4/0x160 [tls]
    ? sock_poll+0x20a/0x680
    ? do_select+0x77b/0x11a0
    ? poll_schedule_timeout.constprop.12+0x130/0x130
    ? pick_link+0xb00/0xb00
    ? read_word_at_a_time+0x13/0x20
    ? vfs_poll+0x270/0x270
    ? deref_stack_reg+0xad/0xe0
    ? __read_once_size_nocheck.constprop.6+0x10/0x10
    [...]

    Debugging further, it turns out that calling into ctx->sk_poll() is
    invalid since sk_poll itself is NULL which was saved from the original
    TCP socket in order for tls_sw_poll() to invoke it.

    Looks like the recent conversion from poll to poll_mask callback started
    in 152524231023 ("net: add support for ->poll_mask in proto_ops") missed
    to eventually convert kTLS, too: TCP's ->poll was converted over to the
    ->poll_mask in commit 2c7d3dacebd4 ("net/tcp: convert to ->poll_mask")
    and therefore kTLS wrongly saved the ->poll old one which is now NULL.

    Convert kTLS over to use ->poll_mask instead. Also instead of POLLIN |
    POLLRDNORM use the proper EPOLLIN | EPOLLRDNORM bits as the case in
    tcp_poll_mask() as well that is mangled here.

    Fixes: 2c7d3dacebd4 ("net/tcp: convert to ->poll_mask")
    Signed-off-by: Daniel Borkmann
    Cc: Christoph Hellwig
    Cc: Dave Watson
    Tested-by: Dave Watson
    Signed-off-by: David S. Miller

    Daniel Borkmann