23 Jan, 2020

1 commit

  • commit 33bfe20dd7117dd81fd896a53f743a233e1ad64f upstream.

    When sockmap sock with TLS enabled is removed we cleanup bpf/psock state
    and call tcp_update_ulp() to push updates to TLS ULP on top. However, we
    don't push the write_space callback up and instead simply overwrite the
    op with the psock stored previous op. This may or may not be correct so
    to ensure we don't overwrite the TLS write space hook pass this field to
    the ULP and have it fixup the ctx.

    This completes a previous fix that pushed the ops through to the ULP
    but at the time missed doing this for write_space, presumably because
    write_space TLS hook was added around the same time.

    Fixes: 95fa145479fbc ("bpf: sockmap/tls, close can race with map free")
    Signed-off-by: John Fastabend
    Signed-off-by: Daniel Borkmann
    Reviewed-by: Jakub Sitnicki
    Acked-by: Jonathan Lemon
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/bpf/20200111061206.8028-4-john.fastabend@gmail.com
    Signed-off-by: Greg Kroah-Hartman

    John Fastabend
     

18 Dec, 2019

1 commit

  • [ Upstream commit 4a5cdc604b9cf645e6fa24d8d9f055955c3c8516 ]

    ENOTSUPP is not available in userspace, for example:

    setsockopt failed, 524, Unknown error 524

    Signed-off-by: Valentin Vidic
    Acked-by: Jakub Kicinski
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Valentin Vidic
     

05 Dec, 2019

1 commit

  • [ Upstream commit c5daa6cccdc2f94aca2c9b3fa5f94e4469997293 ]

    Partially sent record cleanup path increments an SG entry
    directly instead of using sg_next(). This should not be a
    problem today, as encrypted messages should be always
    allocated as arrays. But given this is a cleanup path it's
    easy to miss was this ever to change. Use sg_next(), and
    simplify the code.

    Signed-off-by: Jakub Kicinski
    Reviewed-by: Simon Horman
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Jakub Kicinski
     

20 Nov, 2019

1 commit

  • Bring back tls_sw_sendpage_locked. sk_msg redirection into a socket
    with TLS_TX takes the following path:

    tcp_bpf_sendmsg_redir
    tcp_bpf_push_locked
    tcp_bpf_push
    kernel_sendpage_locked
    sock->ops->sendpage_locked

    Also update the flags test in tls_sw_sendpage_locked to allow flag
    MSG_NO_SHARED_FRAGS. bpf_tcp_sendmsg sets this.

    Link: https://lore.kernel.org/netdev/CA+FuTSdaAawmZ2N8nfDDKu3XLpXBbMtcCT0q4FntDD2gn8ASUw@mail.gmail.com/T/#t
    Link: https://github.com/wdebruij/kerneltools/commits/icept.2
    Fixes: 0608c69c9a80 ("bpf: sk_msg, sock{map|hash} redirect through ULP")
    Fixes: f3de19af0f5b ("Revert \"net/tls: remove unused function tls_sw_sendpage_locked\"")
    Signed-off-by: Willem de Bruijn
    Acked-by: John Fastabend
    Signed-off-by: David S. Miller

    Willem de Bruijn
     

07 Nov, 2019

1 commit

  • TLS TX needs to release and re-acquire the socket lock if send buffer
    fills up.

    TLS SW TX path currently depends on only allowing one thread to enter
    the function by the abuse of sk_write_pending. If another writer is
    already waiting for memory no new ones are allowed in.

    This has two problems:
    - writers don't wake other threads up when they leave the kernel;
    meaning that this scheme works for single extra thread (second
    application thread or delayed work) because memory becoming
    available will send a wake up request, but as Mallesham and
    Pooja report with larger number of threads it leads to threads
    being put to sleep indefinitely;
    - the delayed work does not get _scheduled_ but it may _run_ when
    other writers are present leading to crashes as writers don't
    expect state to change under their feet (same records get pushed
    and freed multiple times); it's hard to reliably bail from the
    work, however, because the mere presence of a writer does not
    guarantee that the writer will push pending records before exiting.

    Ensuring wakeups always happen will make the code basically open
    code a mutex. Just use a mutex.

    The TLS HW TX path does not have any locking (not even the
    sk_write_pending hack), yet it uses a per-socket sg_tx_data
    array to push records.

    Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption of records for performance")
    Reported-by: Mallesham Jatharakonda
    Reported-by: Pooja Trivedi
    Signed-off-by: Jakub Kicinski
    Reviewed-by: Simon Horman
    Signed-off-by: David S. Miller

    Jakub Kicinski
     

05 Sep, 2019

2 commits

  • TLS code has a number of #ifdefs which make the code a little
    harder to follow. Recent fixes removed the ifdef around the
    TLS_HW define, so we can switch to the often used pattern
    of defining tls_device functions as empty static inlines
    in the header when CONFIG_TLS_DEVICE=n.

    Signed-off-by: Jakub Kicinski
    Reviewed-by: John Hurley
    Reviewed-by: Dirk van der Merwe
    Acked-by: John Fastabend
    Signed-off-by: David S. Miller

    Jakub Kicinski
     
  • Since we already have the pointer to the full original sk_proto
    stored use that instead of storing all individual callback
    pointers as well.

    Signed-off-by: Jakub Kicinski
    Reviewed-by: John Hurley
    Reviewed-by: Dirk van der Merwe
    Acked-by: John Fastabend
    Signed-off-by: David S. Miller

    Jakub Kicinski
     

01 Sep, 2019

2 commits


16 Aug, 2019

1 commit

  • The ctx->sk_write_space pointer is only set when TLS tx mode is enabled.
    When running without TX mode its a null pointer but we still set the
    sk sk_write_space pointer on close().

    Fix the close path to only overwrite sk->sk_write_space when the current
    pointer is to the tls_write_space function indicating the tls module should
    clean it up properly as well.

    Reported-by: Hillf Danton
    Cc: Ying Xue
    Cc: Andrey Konovalov
    Fixes: 57c722e932cfb ("net/tls: swap sk_write_space on close")
    Signed-off-by: John Fastabend
    Reviewed-by: Jakub Kicinski
    Signed-off-by: David S. Miller

    John Fastabend
     

10 Aug, 2019

1 commit

  • Now that we swap the original proto and clear the ULP pointer
    on close we have to make sure no callback will try to access
    the freed state. sk_write_space is not part of sk_prot, remember
    to swap it.

    Reported-by: syzbot+dcdc9deefaec44785f32@syzkaller.appspotmail.com
    Fixes: 95fa145479fb ("bpf: sockmap/tls, close can race with map free")
    Signed-off-by: Jakub Kicinski
    Signed-off-by: David S. Miller

    Jakub Kicinski
     

06 Aug, 2019

1 commit

  • Looks like we were slightly overzealous with the shutdown()
    cleanup. Even though the sock->sk_state can reach CLOSED again,
    socket->state will not got back to SS_UNCONNECTED once
    connections is ESTABLISHED. Meaning we will see EISCONN if
    we try to reconnect, and EINVAL if we try to listen.

    Only listen sockets can be shutdown() and reused, but since
    ESTABLISHED sockets can never be re-connected() or used for
    listen() we don't need to try to clean up the ULP state early.

    Fixes: 32857cf57f92 ("net/tls: fix transition through disconnect with close")
    Signed-off-by: Jakub Kicinski
    Signed-off-by: David S. Miller

    Jakub Kicinski
     

22 Jul, 2019

6 commits

  • When a map free is called and in parallel a socket is closed we
    have two paths that can potentially reset the socket prot ops, the
    bpf close() path and the map free path. This creates a problem
    with which prot ops should be used from the socket closed side.

    If the map_free side completes first then we want to call the
    original lowest level ops. However, if the tls path runs first
    we want to call the sockmap ops. Additionally there was no locking
    around prot updates in TLS code paths so the prot ops could
    be changed multiple times once from TLS path and again from sockmap
    side potentially leaving ops pointed at either TLS or sockmap
    when psock and/or tls context have already been destroyed.

    To fix this race first only update ops inside callback lock
    so that TLS, sockmap and lowest level all agree on prot state.
    Second and a ULP callback update() so that lower layers can
    inform the upper layer when they are being removed allowing the
    upper layer to reset prot ops.

    This gets us close to allowing sockmap and tls to be stacked
    in arbitrary order but will save that patch for *next trees.

    v4:
    - make sure we don't free things for device;
    - remove the checks which swap the callbacks back
    only if TLS is at the top.

    Reported-by: syzbot+06537213db7ba2745c4a@syzkaller.appspotmail.com
    Fixes: 02c558b2d5d6 ("bpf: sockmap, support for msg_peek in sk_msg with redirect ingress")
    Signed-off-by: John Fastabend
    Signed-off-by: Jakub Kicinski
    Reviewed-by: Dirk van der Merwe
    Signed-off-by: Daniel Borkmann

    John Fastabend
     
  • It is possible (via shutdown()) for TCP socks to go through TCP_CLOSE
    state via tcp_disconnect() without actually calling tcp_close which
    would then call the tls close callback. Because of this a user could
    disconnect a socket then put it in a LISTEN state which would break
    our assumptions about sockets always being ESTABLISHED state.

    More directly because close() can call unhash() and unhash is
    implemented by sockmap if a sockmap socket has TLS enabled we can
    incorrectly destroy the psock from unhash() and then call its close
    handler again. But because the psock (sockmap socket representation)
    is already destroyed we call close handler in sk->prot. However,
    in some cases (TLS BASE/BASE case) this will still point at the
    sockmap close handler resulting in a circular call and crash reported
    by syzbot.

    To fix both above issues implement the unhash() routine for TLS.

    v4:
    - add note about tls offload still needing the fix;
    - move sk_proto to the cold cache line;
    - split TX context free into "release" and "free",
    otherwise the GC work itself is in already freed
    memory;
    - more TX before RX for consistency;
    - reuse tls_ctx_free();
    - schedule the GC work after we're done with context
    to avoid UAF;
    - don't set the unhash in all modes, all modes "inherit"
    TLS_BASE's callbacks anyway;
    - disable the unhash hook for TLS_HW.

    Fixes: 3c4d7559159bf ("tls: kernel TLS support")
    Reported-by: Eric Dumazet
    Signed-off-by: John Fastabend
    Signed-off-by: Jakub Kicinski
    Signed-off-by: Daniel Borkmann

    John Fastabend
     
  • The tls close() callback currently drops the sock lock to call
    strp_done(). Split up the RX cleanup into stopping the strparser
    and releasing most resources, syncing strparser and finally
    freeing the context.

    To avoid the need for a strp_done() call on the cleanup path
    of device offload make sure we don't arm the strparser until
    we are sure init will be successful.

    Signed-off-by: John Fastabend
    Signed-off-by: Jakub Kicinski
    Reviewed-by: Dirk van der Merwe
    Signed-off-by: Daniel Borkmann

    John Fastabend
     
  • The tls close() callback currently drops the sock lock, makes a
    cancel_delayed_work_sync() call, and then relocks the sock.

    By restructuring the code we can avoid droping lock and then
    reclaiming it. To simplify this we do the following,

    tls_sk_proto_close
    set_bit(CLOSING)
    set_bit(SCHEDULE)
    cancel_delay_work_sync()
    Signed-off-by: Jakub Kicinski
    Reviewed-by: Dirk van der Merwe
    Signed-off-by: Daniel Borkmann

    John Fastabend
     
  • The deprecated TOE offload doesn't actually do anything in
    tls_sk_proto_close() - all TLS code is skipped and context
    not freed. Remove the callback to make it easier to refactor
    tls_sk_proto_close().

    Signed-off-by: Jakub Kicinski
    Reviewed-by: Dirk van der Merwe
    Signed-off-by: Daniel Borkmann

    Jakub Kicinski
     
  • In tls_set_device_offload_rx() we prepare the software context
    for RX fallback and proceed to add the connection to the device.
    Unfortunately, software context prep includes arming strparser
    so in case of a later error we have to release the socket lock
    to call strp_done().

    In preparation for not releasing the socket lock half way through
    callbacks move arming strparser into a separate function.
    Following patches will make use of that.

    Signed-off-by: Jakub Kicinski
    Reviewed-by: Dirk van der Merwe
    Signed-off-by: Daniel Borkmann

    Jakub Kicinski
     

02 Jul, 2019

1 commit

  • Commit 86029d10af18 ("tls: zero the crypto information from tls_context
    before freeing") added memzero_explicit() calls to clear the key material
    before freeing struct tls_context, but it missed tls_device.c has its
    own way of freeing this structure. Replace the missing free.

    Fixes: 86029d10af18 ("tls: zero the crypto information from tls_context before freeing")
    Signed-off-by: Jakub Kicinski
    Reviewed-by: Dirk van der Merwe
    Signed-off-by: David S. Miller

    Jakub Kicinski
     

24 Jun, 2019

1 commit

  • With commit 94850257cf0f ("tls: Fix tls_device handling of partial records")
    a new path was introduced to cleanup partial records during sk_proto_close.
    This path does not handle the SW KTLS tx_list cleanup.

    This is unnecessary though since the free_resources calls for both
    SW and offload paths will cleanup a partial record.

    The visible effect is the following warning, but this bug also causes
    a page double free.

    WARNING: CPU: 7 PID: 4000 at net/core/stream.c:206 sk_stream_kill_queues+0x103/0x110
    RIP: 0010:sk_stream_kill_queues+0x103/0x110
    RSP: 0018:ffffb6df87e07bd0 EFLAGS: 00010206
    RAX: 0000000000000000 RBX: ffff8c21db4971c0 RCX: 0000000000000007
    RDX: ffffffffffffffa0 RSI: 000000000000001d RDI: ffff8c21db497270
    RBP: ffff8c21db497270 R08: ffff8c29f4748600 R09: 000000010020001a
    R10: ffffb6df87e07aa0 R11: ffffffff9a445600 R12: 0000000000000007
    R13: 0000000000000000 R14: ffff8c21f03f2900 R15: ffff8c21f03b8df0
    Call Trace:
    inet_csk_destroy_sock+0x55/0x100
    tcp_close+0x25d/0x400
    ? tcp_check_oom+0x120/0x120
    tls_sk_proto_close+0x127/0x1c0
    inet_release+0x3c/0x60
    __sock_release+0x3d/0xb0
    sock_close+0x11/0x20
    __fput+0xd8/0x210
    task_work_run+0x84/0xa0
    do_exit+0x2dc/0xb90
    ? release_sock+0x43/0x90
    do_group_exit+0x3a/0xa0
    get_signal+0x295/0x720
    do_signal+0x36/0x610
    ? SYSC_recvfrom+0x11d/0x130
    exit_to_usermode_loop+0x69/0xb0
    do_syscall_64+0x173/0x180
    entry_SYSCALL_64_after_hwframe+0x3d/0xa2
    RIP: 0033:0x7fe9b9abc10d
    RSP: 002b:00007fe9b19a1d48 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
    RAX: fffffffffffffe00 RBX: 0000000000000006 RCX: 00007fe9b9abc10d
    RDX: 0000000000000002 RSI: 0000000000000080 RDI: 00007fe948003430
    RBP: 00007fe948003410 R08: 00007fe948003430 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000246 R12: 00005603739d9080
    R13: 00007fe9b9ab9f90 R14: 00007fe948003430 R15: 0000000000000000

    Fixes: 94850257cf0f ("tls: Fix tls_device handling of partial records")
    Signed-off-by: Dirk van der Merwe
    Signed-off-by: Jakub Kicinski
    Signed-off-by: David S. Miller

    Dirk van der Merwe
     

26 Apr, 2019

1 commit


21 Apr, 2019

1 commit

  • When device refuses the offload in tls_set_device_offload_rx()
    it calls tls_sw_free_resources_rx() to clean up software context
    state.

    Unfortunately, tls_sw_free_resources_rx() does not free all
    the state tls_set_sw_offload() allocated - it leaks IV and
    sequence number buffers. All other code paths which lead to
    tls_sw_release_resources_rx() (which tls_sw_free_resources_rx()
    calls) free those right before the call.

    Avoid the leak by moving freeing of iv and rec_seq into
    tls_sw_release_resources_rx().

    Fixes: 4799ac81e52a ("tls: Add rx inline crypto offload")
    Signed-off-by: Jakub Kicinski
    Reviewed-by: Dirk van der Merwe
    Signed-off-by: David S. Miller

    Jakub Kicinski
     

18 Apr, 2019

1 commit


11 Apr, 2019

2 commits

  • buildbot noticed that TLS_HW is not defined if CONFIG_TLS_DEVICE=n.
    Wrap the cleanup branch into an ifdef, tls_device_free_resources_tx()
    wouldn't be compiled either in this case.

    Fixes: 35b71a34ada6 ("net/tls: don't leak partially sent record in device mode")
    Signed-off-by: Jakub Kicinski
    Signed-off-by: David S. Miller

    Jakub Kicinski
     
  • David reports that tls triggers warnings related to
    sk->sk_forward_alloc not being zero at destruction time:

    WARNING: CPU: 5 PID: 6831 at net/core/stream.c:206 sk_stream_kill_queues+0x103/0x110
    WARNING: CPU: 5 PID: 6831 at net/ipv4/af_inet.c:160 inet_sock_destruct+0x15b/0x170

    When sender fills up the write buffer and dies from
    SIGPIPE. This is due to the device implementation
    not cleaning up the partially_sent_record.

    This is because commit a42055e8d2c3 ("net/tls: Add support for async encryption of records for performance")
    moved the partial record cleanup to the SW-only path.

    Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption of records for performance")
    Reported-by: David Beckett
    Signed-off-by: Jakub Kicinski
    Reviewed-by: Dirk van der Merwe
    Reviewed-by: Simon Horman
    Signed-off-by: David S. Miller

    Jakub Kicinski
     

21 Mar, 2019

1 commit

  • Added support for AES128-CCM based record encryption. AES128-CCM is
    similar to AES128-GCM. Both of them have same salt/iv/mac size. The
    notable difference between the two is that while invoking AES128-CCM
    operation, the salt||nonce (which is passed as IV) has to be prefixed
    with a hardcoded value '2'. Further, CCM implementation in kernel
    requires IV passed in crypto_aead_request() to be full '16' bytes.
    Therefore, the record structure 'struct tls_rec' has been modified to
    reserve '16' bytes for IV. This works for both GCM and CCM based cipher.

    Signed-off-by: Vakul Garg
    Signed-off-by: David S. Miller

    Vakul Garg
     

14 Mar, 2019

1 commit

  • A previous fix ("tls: Fix write space handling") assumed that
    user space application gets informed about the socket send buffer
    availability when tls_push_sg() gets called. Inside tls_push_sg(), in
    case do_tcp_sendpages() returns 0, the function returns without calling
    ctx->sk_write_space. Further, the new function tls_sw_write_space()
    did not invoke ctx->sk_write_space. This leads to situation that user
    space application encounters a lockup always waiting for socket send
    buffer to become available.

    Rather than call ctx->sk_write_space from tls_push_sg(), it should be
    called from tls_write_space. So whenever tcp stack invokes
    sk->sk_write_space after freeing socket send buffer, we always declare
    the same to user space by the way of invoking ctx->sk_write_space.

    Fixes: 7463d3a2db0ef ("tls: Fix write space handling")
    Signed-off-by: Vakul Garg
    Reviewed-by: Boris Pismenny
    Signed-off-by: David S. Miller

    Vakul Garg
     

04 Mar, 2019

2 commits

  • TLS device cannot use the sw context. This patch returns the original
    tls device write space handler and moves the sw/device specific portions
    to the relevant files.

    Also, we remove the write_space call for the tls_sw flow, because it
    handles partial records in its delayed tx work handler.

    Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption of records for performance")
    Signed-off-by: Boris Pismenny
    Reviewed-by: Eran Ben Elisha
    Signed-off-by: David S. Miller

    Boris Pismenny
     
  • Cleanup the handling of partial records while fixing a bug where the
    tls_push_pending_closed_record function is using the software tls
    context instead of the hardware context.

    The bug resulted in the following crash:
    [ 88.791229] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
    [ 88.793271] #PF error: [normal kernel read fault]
    [ 88.794449] PGD 800000022a426067 P4D 800000022a426067 PUD 22a156067 PMD 0
    [ 88.795958] Oops: 0000 [#1] SMP PTI
    [ 88.796884] CPU: 2 PID: 4973 Comm: openssl Not tainted 5.0.0-rc4+ #3
    [ 88.798314] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
    [ 88.800067] RIP: 0010:tls_tx_records+0xef/0x1d0 [tls]
    [ 88.801256] Code: 00 02 48 89 43 08 e8 a0 0b 96 d9 48 89 df e8 48 dd
    4d d9 4c 89 f8 4d 8b bf 98 00 00 00 48 05 98 00 00 00 48 89 04 24 49 39
    c7 8b 1f 4d 89 fd 0f 84 af 00 00 00 41 8b 47 10 85 c0 0f 85 8d 00
    [ 88.805179] RSP: 0018:ffffbd888186fca8 EFLAGS: 00010213
    [ 88.806458] RAX: ffff9af1ed657c98 RBX: ffff9af1e88a1980 RCX: 0000000000000000
    [ 88.808050] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9af1e88a1980
    [ 88.809724] RBP: ffff9af1e88a1980 R08: 0000000000000017 R09: ffff9af1ebeeb700
    [ 88.811294] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
    [ 88.812917] R13: ffff9af1e88a1980 R14: ffff9af1ec13f800 R15: 0000000000000000
    [ 88.814506] FS: 00007fcad2240740(0000) GS:ffff9af1f7880000(0000) knlGS:0000000000000000
    [ 88.816337] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 88.817717] CR2: 0000000000000000 CR3: 0000000228b3e000 CR4: 00000000001406e0
    [ 88.819328] Call Trace:
    [ 88.820123] tls_push_data+0x628/0x6a0 [tls]
    [ 88.821283] ? remove_wait_queue+0x20/0x60
    [ 88.822383] ? n_tty_read+0x683/0x910
    [ 88.823363] tls_device_sendmsg+0x53/0xa0 [tls]
    [ 88.824505] sock_sendmsg+0x36/0x50
    [ 88.825492] sock_write_iter+0x87/0x100
    [ 88.826521] __vfs_write+0x127/0x1b0
    [ 88.827499] vfs_write+0xad/0x1b0
    [ 88.828454] ksys_write+0x52/0xc0
    [ 88.829378] do_syscall_64+0x5b/0x180
    [ 88.830369] entry_SYSCALL_64_after_hwframe+0x44/0xa9
    [ 88.831603] RIP: 0033:0x7fcad1451680

    [ 1248.470626] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
    [ 1248.472564] #PF error: [normal kernel read fault]
    [ 1248.473790] PGD 0 P4D 0
    [ 1248.474642] Oops: 0000 [#1] SMP PTI
    [ 1248.475651] CPU: 3 PID: 7197 Comm: openssl Tainted: G OE 5.0.0-rc4+ #3
    [ 1248.477426] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
    [ 1248.479310] RIP: 0010:tls_tx_records+0x110/0x1f0 [tls]
    [ 1248.480644] Code: 00 02 48 89 43 08 e8 4f cb 63 d7 48 89 df e8 f7 9c
    1b d7 4c 89 f8 4d 8b bf 98 00 00 00 48 05 98 00 00 00 48 89 04 24 49 39
    c7 8b 1f 4d 89 fd 0f 84 af 00 00 00 41 8b 47 10 85 c0 0f 85 8d 00
    [ 1248.484825] RSP: 0018:ffffaa0a41543c08 EFLAGS: 00010213
    [ 1248.486154] RAX: ffff955a2755dc98 RBX: ffff955a36031980 RCX: 0000000000000006
    [ 1248.487855] RDX: 0000000000000000 RSI: 000000000000002b RDI: 0000000000000286
    [ 1248.489524] RBP: ffff955a36031980 R08: 0000000000000000 R09: 00000000000002b1
    [ 1248.491394] R10: 0000000000000003 R11: 00000000ad55ad55 R12: 0000000000000000
    [ 1248.493162] R13: 0000000000000000 R14: ffff955a2abe6c00 R15: 0000000000000000
    [ 1248.494923] FS: 0000000000000000(0000) GS:ffff955a378c0000(0000) knlGS:0000000000000000
    [ 1248.496847] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 1248.498357] CR2: 0000000000000000 CR3: 000000020c40e000 CR4: 00000000001406e0
    [ 1248.500136] Call Trace:
    [ 1248.500998] ? tcp_check_oom+0xd0/0xd0
    [ 1248.502106] tls_sk_proto_close+0x127/0x1e0 [tls]
    [ 1248.503411] inet_release+0x3c/0x60
    [ 1248.504530] __sock_release+0x3d/0xb0
    [ 1248.505611] sock_close+0x11/0x20
    [ 1248.506612] __fput+0xb4/0x220
    [ 1248.507559] task_work_run+0x88/0xa0
    [ 1248.508617] do_exit+0x2cb/0xbc0
    [ 1248.509597] ? core_sys_select+0x17a/0x280
    [ 1248.510740] do_group_exit+0x39/0xb0
    [ 1248.511789] get_signal+0x1d0/0x630
    [ 1248.512823] do_signal+0x36/0x620
    [ 1248.513822] exit_to_usermode_loop+0x5c/0xc6
    [ 1248.515003] do_syscall_64+0x157/0x180
    [ 1248.516094] entry_SYSCALL_64_after_hwframe+0x44/0xa9
    [ 1248.517456] RIP: 0033:0x7fb398bd3f53
    [ 1248.518537] Code: Bad RIP value.

    Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption of records for performance")
    Signed-off-by: Boris Pismenny
    Signed-off-by: Eran Ben Elisha
    Signed-off-by: David S. Miller

    Boris Pismenny
     

20 Feb, 2019

1 commit

  • Each tls context maintains two cipher contexts (one each for tx and rx
    directions). For each tls session, the constants such as protocol
    version, ciphersuite, iv size, associated data size etc are same for
    both the directions and need to be stored only once per tls context.
    Hence these are moved from 'struct cipher_context' to 'struct
    tls_prot_info' and stored only once in 'struct tls_context'.

    Signed-off-by: Vakul Garg
    Signed-off-by: David S. Miller

    Vakul Garg
     

02 Feb, 2019

2 commits

  • TLS 1.3 has minor changes from TLS 1.2 at the record layer.

    * Header now hardcodes the same version and application content type in
    the header.
    * The real content type is appended after the data, before encryption (or
    after decryption).
    * The IV is xored with the sequence number, instead of concatinating four
    bytes of IV with the explicit IV.
    * Zero-padding: No exlicit length is given, we search backwards from the
    end of the decrypted data for the first non-zero byte, which is the
    content type. Currently recv supports reading zero-padding, but there
    is no way for send to add zero padding.

    Signed-off-by: Dave Watson
    Signed-off-by: David S. Miller

    Dave Watson
     
  • Wire up support for 256 bit keys from the setsockopt to the crypto
    framework

    Signed-off-by: Dave Watson
    Signed-off-by: David S. Miller

    Dave Watson
     

23 Jan, 2019

2 commits

  • free tls context in sock destruct. close may not be the last
    call to free sock but force releasing the ctx in close
    will result in GPF when ctx referred again in tcp_done

    [ 515.330477] general protection fault: 0000 [#1] SMP PTI
    [ 515.330539] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.20.0-rc7+ #10
    [ 515.330657] Hardware name: Supermicro X8ST3/X8ST3, BIOS 2.0b
    11/07/2013
    [ 515.330844] RIP: 0010:tls_hw_unhash+0xbf/0xd0
    [
    [ 515.332220] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 515.332340] CR2: 00007fab32c55000 CR3: 000000009261e000 CR4:
    00000000000006e0
    [ 515.332519] Call Trace:
    [ 515.332632]
    [ 515.332793] tcp_set_state+0x5a/0x190
    [ 515.332907] ? tcp_update_metrics+0xe3/0x350
    [ 515.333023] tcp_done+0x31/0xd0
    [ 515.333130] tcp_rcv_state_process+0xc27/0x111a
    [ 515.333242] ? __lock_is_held+0x4f/0x90
    [ 515.333350] ? tcp_v4_do_rcv+0xaf/0x1e0
    [ 515.333456] tcp_v4_do_rcv+0xaf/0x1e0

    Signed-off-by: Atul Gupta
    Signed-off-by: David S. Miller

    Atul Gupta
     
  • build protos is required for tls_hw_prot also hence moved to
    'tls_build_proto' and called as required from tls_init
    and tls_hw_proto. This is required since build_protos
    for v4 is moved from tls_register to tls_init in
    commit

    Signed-off-by: Atul Gupta
    Signed-off-by: David S. Miller

    Atul Gupta
     

21 Dec, 2018

2 commits

  • Daniel Borkmann says:

    ====================
    pull-request: bpf-next 2018-12-21

    The following pull-request contains BPF updates for your *net-next* tree.

    There is a merge conflict in test_verifier.c. Result looks as follows:

    [...]
    },
    {
    "calls: cross frame pruning",
    .insns = {
    [...]
    .prog_type = BPF_PROG_TYPE_SOCKET_FILTER,
    .errstr_unpriv = "function calls to other bpf functions are allowed for root only",
    .result_unpriv = REJECT,
    .errstr = "!read_ok",
    .result = REJECT,
    },
    {
    "jset: functional",
    .insns = {
    [...]
    {
    "jset: unknown const compare not taken",
    .insns = {
    BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
    BPF_FUNC_get_prandom_u32),
    BPF_JMP_IMM(BPF_JSET, BPF_REG_0, 1, 1),
    BPF_LDX_MEM(BPF_B, BPF_REG_8, BPF_REG_9, 0),
    BPF_EXIT_INSN(),
    },
    .prog_type = BPF_PROG_TYPE_SOCKET_FILTER,
    .errstr_unpriv = "!read_ok",
    .result_unpriv = REJECT,
    .errstr = "!read_ok",
    .result = REJECT,
    },
    [...]
    {
    "jset: range",
    .insns = {
    [...]
    },
    .prog_type = BPF_PROG_TYPE_SOCKET_FILTER,
    .result_unpriv = ACCEPT,
    .result = ACCEPT,
    },

    The main changes are:

    1) Various BTF related improvements in order to get line info
    working. Meaning, verifier will now annotate the corresponding
    BPF C code to the error log, from Martin and Yonghong.

    2) Implement support for raw BPF tracepoints in modules, from Matt.

    3) Add several improvements to verifier state logic, namely speeding
    up stacksafe check, optimizations for stack state equivalence
    test and safety checks for liveness analysis, from Alexei.

    4) Teach verifier to make use of BPF_JSET instruction, add several
    test cases to kselftests and remove nfp specific JSET optimization
    now that verifier has awareness, from Jakub.

    5) Improve BPF verifier's slot_type marking logic in order to
    allow more stack slot sharing, from Jiong.

    6) Add sk_msg->size member for context access and add set of fixes
    and improvements to make sock_map with kTLS usable with openssl
    based applications, from John.

    7) Several cleanups and documentation updates in bpftool as well as
    auto-mount of tracefs for "bpftool prog tracelog" command,
    from Quentin.

    8) Include sub-program tags from now on in bpf_prog_info in order to
    have a reliable way for user space to get all tags of the program
    e.g. needed for kallsyms correlation, from Song.

    9) Add BTF annotations for cgroup_local_storage BPF maps and
    implement bpf fs pretty print support, from Roman.

    10) Fix bpftool in order to allow for cross-compilation, from Ivan.

    11) Update of bpftool license to GPLv2-only + BSD-2-Clause in order
    to be compatible with libbfd and allow for Debian packaging,
    from Jakub.

    12) Remove an obsolete prog->aux sanitation in dump and get rid of
    version check for prog load, from Daniel.

    13) Fix a memory leak in libbpf's line info handling, from Prashant.

    14) Fix cpumap's frame alignment for build_skb() so that skb_shared_info
    does not get unaligned, from Jesper.

    15) Fix test_progs kselftest to work with older compilers which are less
    smart in optimizing (and thus throwing build error), from Stanislav.

    16) Cleanup and simplify AF_XDP socket teardown, from Björn.

    17) Fix sk lookup in BPF kselftest's test_sock_addr with regards
    to netns_id argument, from Andrey.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • The existing code did not expect users would initialize the TLS ULP
    without subsequently calling the TLS TX enabling socket option.
    If the application tries to send data after the TLS ULP enable op
    but before the TLS TX enable op the BPF sk_msg verdict program is
    skipped. This patch resolves this by converting the ipv4 sock ops
    to be calculated at init time the same way ipv6 ops are done. This
    pulls in any changes to the sock ops structure that have been made
    after the socket was created including the changes from adding the
    socket to a sock{map|hash}.

    This was discovered by running OpenSSL master branch which calls
    the TLS ULP setsockopt early in TLS handshake but only enables
    the TLS TX path once the handshake has completed. As a result the
    datapath missed the initial handshake messages.

    Fixes: 02c558b2d5d6 ("bpf: sockmap, support for msg_peek in sk_msg with redirect ingress")
    Signed-off-by: John Fastabend
    Signed-off-by: Daniel Borkmann

    John Fastabend
     

20 Dec, 2018

1 commit

  • create_ctx can be called from atomic context, hence use
    GFP_ATOMIC instead of GFP_KERNEL.

    [ 395.962599] BUG: sleeping function called from invalid context at mm/slab.h:421
    [ 395.979896] in_atomic(): 1, irqs_disabled(): 0, pid: 16254, name: openssl
    [ 395.996564] 2 locks held by openssl/16254:
    [ 396.010492] #0: 00000000347acb52 (sk_lock-AF_INET){+.+.}, at: do_tcp_setsockopt.isra.44+0x13b/0x9a0
    [ 396.029838] #1: 000000006c9552b5 (device_spinlock){+...}, at: tls_init+0x1d/0x280
    [ 396.047675] CPU: 5 PID: 16254 Comm: openssl Tainted: G O 4.20.0-rc6+ #25
    [ 396.066019] Hardware name: Supermicro X10SRA-F/X10SRA-F, BIOS 2.0c 09/25/2017
    [ 396.083537] Call Trace:
    [ 396.096265] dump_stack+0x5e/0x8b
    [ 396.109876] ___might_sleep+0x216/0x250
    [ 396.123940] kmem_cache_alloc_trace+0x1b0/0x240
    [ 396.138800] create_ctx+0x1f/0x60
    [ 396.152504] tls_init+0xbd/0x280
    [ 396.166135] tcp_set_ulp+0x191/0x2d0
    [ 396.180035] ? tcp_set_ulp+0x2c/0x2d0
    [ 396.193960] do_tcp_setsockopt.isra.44+0x148/0x9a0
    [ 396.209013] __sys_setsockopt+0x7c/0xe0
    [ 396.223054] __x64_sys_setsockopt+0x20/0x30
    [ 396.237378] do_syscall_64+0x4a/0x180
    [ 396.251200] entry_SYSCALL_64_after_hwframe+0x49/0xbe

    Fixes: df9d4a178022 ("net/tls: sleeping function from invalid context")
    Signed-off-by: Ganesh Goudar
    Signed-off-by: David S. Miller

    Ganesh Goudar
     

15 Dec, 2018

2 commits

  • HW unhash within mutex for registered tls devices cause sleep
    when called from tcp_set_state for TCP_CLOSE. Release lock and
    re-acquire after function call with ref count incr/dec.
    defined kref and fp release for tls_device to ensure device
    is not released outside lock.

    BUG: sleeping function called from invalid context at
    kernel/locking/mutex.c:748
    in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/7
    INFO: lockdep is turned off.
    CPU: 7 PID: 0 Comm: swapper/7 Tainted: G W O
    Call Trace:

    dump_stack+0x5e/0x8b
    ___might_sleep+0x222/0x260
    __mutex_lock+0x5c/0xa50
    ? vprintk_emit+0x1f3/0x440
    ? kmem_cache_free+0x22d/0x2a0
    ? tls_hw_unhash+0x2f/0x80
    ? printk+0x52/0x6e
    ? tls_hw_unhash+0x2f/0x80
    tls_hw_unhash+0x2f/0x80
    tcp_set_state+0x5f/0x180
    tcp_done+0x2e/0xe0
    tcp_rcv_state_process+0x92c/0xdd3
    ? lock_acquire+0xf5/0x1f0
    ? tcp_v4_rcv+0xa7c/0xbe0
    ? tcp_v4_do_rcv+0x70/0x1e0

    Signed-off-by: Atul Gupta
    Signed-off-by: David S. Miller

    Atul Gupta
     
  • create_ctx is called from tls_init and tls_hw_prot
    hence initialize function pointers in common routine.

    Signed-off-by: Atul Gupta
    Signed-off-by: David S. Miller

    Atul Gupta
     

21 Oct, 2018

1 commit