Blame view

Documentation/bpf/prog_sk_lookup.rst 3.77 KB
07ff4f012   Jakub Sitnicki   bpf: sk_lookup: A...
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
  .. SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
  
  =====================
  BPF sk_lookup program
  =====================
  
  BPF sk_lookup program type (``BPF_PROG_TYPE_SK_LOOKUP``) introduces programmability
  into the socket lookup performed by the transport layer when a packet is to be
  delivered locally.
  
  When invoked BPF sk_lookup program can select a socket that will receive the
  incoming packet by calling the ``bpf_sk_assign()`` BPF helper function.
  
  Hooks for a common attach point (``BPF_SK_LOOKUP``) exist for both TCP and UDP.
  
  Motivation
  ==========
  
  BPF sk_lookup program type was introduced to address setup scenarios where
  binding sockets to an address with ``bind()`` socket call is impractical, such
  as:
  
  1. receiving connections on a range of IP addresses, e.g. 192.0.2.0/24, when
     binding to a wildcard address ``INADRR_ANY`` is not possible due to a port
     conflict,
  2. receiving connections on all or a wide range of ports, i.e. an L7 proxy use
     case.
  
  Such setups would require creating and ``bind()``'ing one socket to each of the
  IP address/port in the range, leading to resource consumption and potential
  latency spikes during socket lookup.
  
  Attachment
  ==========
  
  BPF sk_lookup program can be attached to a network namespace with
  ``bpf(BPF_LINK_CREATE, ...)`` syscall using the ``BPF_SK_LOOKUP`` attach type and a
  netns FD as attachment ``target_fd``.
  
  Multiple programs can be attached to one network namespace. Programs will be
  invoked in the same order as they were attached.
  
  Hooks
  =====
  
  The attached BPF sk_lookup programs run whenever the transport layer needs to
  find a listening (TCP) or an unconnected (UDP) socket for an incoming packet.
  
  Incoming traffic to established (TCP) and connected (UDP) sockets is delivered
  as usual without triggering the BPF sk_lookup hook.
  
  The attached BPF programs must return with either ``SK_PASS`` or ``SK_DROP``
  verdict code. As for other BPF program types that are network filters,
  ``SK_PASS`` signifies that the socket lookup should continue on to regular
  hashtable-based lookup, while ``SK_DROP`` causes the transport layer to drop the
  packet.
  
  A BPF sk_lookup program can also select a socket to receive the packet by
  calling ``bpf_sk_assign()`` BPF helper. Typically, the program looks up a socket
  in a map holding sockets, such as ``SOCKMAP`` or ``SOCKHASH``, and passes a
  ``struct bpf_sock *`` to ``bpf_sk_assign()`` helper to record the
  selection. Selecting a socket only takes effect if the program has terminated
  with ``SK_PASS`` code.
  
  When multiple programs are attached, the end result is determined from return
  codes of all the programs according to the following rules:
  
  1. If any program returned ``SK_PASS`` and selected a valid socket, the socket
     is used as the result of the socket lookup.
  2. If more than one program returned ``SK_PASS`` and selected a socket, the last
     selection takes effect.
  3. If any program returned ``SK_DROP``, and no program returned ``SK_PASS`` and
     selected a socket, socket lookup fails.
  4. If all programs returned ``SK_PASS`` and none of them selected a socket,
     socket lookup continues on.
  
  API
  ===
  
  In its context, an instance of ``struct bpf_sk_lookup``, BPF sk_lookup program
  receives information about the packet that triggered the socket lookup. Namely:
  
  * IP version (``AF_INET`` or ``AF_INET6``),
  * L4 protocol identifier (``IPPROTO_TCP`` or ``IPPROTO_UDP``),
  * source and destination IP address,
  * source and destination L4 port,
  * the socket that has been selected with ``bpf_sk_assign()``.
  
  Refer to ``struct bpf_sk_lookup`` declaration in ``linux/bpf.h`` user API
  header, and `bpf-helpers(7)
  <https://man7.org/linux/man-pages/man7/bpf-helpers.7.html>`_ man-page section
  for ``bpf_sk_assign()`` for details.
  
  Example
  =======
  
  See ``tools/testing/selftests/bpf/prog_tests/sk_lookup.c`` for the reference
  implementation.