Blame view

Documentation/bpf/prog_cgroup_sockopt.rst 3.39 KB
0c51b3697   Stanislav Fomichev   bpf: add sockopt ...
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
  .. SPDX-License-Identifier: GPL-2.0
  
  ============================
  BPF_PROG_TYPE_CGROUP_SOCKOPT
  ============================
  
  ``BPF_PROG_TYPE_CGROUP_SOCKOPT`` program type can be attached to two
  cgroup hooks:
  
  * ``BPF_CGROUP_GETSOCKOPT`` - called every time process executes ``getsockopt``
    system call.
  * ``BPF_CGROUP_SETSOCKOPT`` - called every time process executes ``setsockopt``
    system call.
  
  The context (``struct bpf_sockopt``) has associated socket (``sk``) and
  all input arguments: ``level``, ``optname``, ``optval`` and ``optlen``.
  
  BPF_CGROUP_SETSOCKOPT
  =====================
  
  ``BPF_CGROUP_SETSOCKOPT`` is triggered *before* the kernel handling of
  sockopt and it has writable context: it can modify the supplied arguments
  before passing them down to the kernel. This hook has access to the cgroup
  and socket local storage.
  
  If BPF program sets ``optlen`` to -1, the control will be returned
  back to the userspace after all other BPF programs in the cgroup
  chain finish (i.e. kernel ``setsockopt`` handling will *not* be executed).
  
  Note, that ``optlen`` can not be increased beyond the user-supplied
  value. It can only be decreased or set to -1. Any other value will
  trigger ``EFAULT``.
  
  Return Type
  -----------
  
  * ``0`` - reject the syscall, ``EPERM`` will be returned to the userspace.
  * ``1`` - success, continue with next BPF program in the cgroup chain.
  
  BPF_CGROUP_GETSOCKOPT
  =====================
  
  ``BPF_CGROUP_GETSOCKOPT`` is triggered *after* the kernel handing of
  sockopt. The BPF hook can observe ``optval``, ``optlen`` and ``retval``
  if it's interested in whatever kernel has returned. BPF hook can override
  the values above, adjust ``optlen`` and reset ``retval`` to 0. If ``optlen``
  has been increased above initial ``getsockopt`` value (i.e. userspace
  buffer is too small), ``EFAULT`` is returned.
  
  This hook has access to the cgroup and socket local storage.
  
  Note, that the only acceptable value to set to ``retval`` is 0 and the
  original value that the kernel returned. Any other value will trigger
  ``EFAULT``.
  
  Return Type
  -----------
  
  * ``0`` - reject the syscall, ``EPERM`` will be returned to the userspace.
  * ``1`` - success: copy ``optval`` and ``optlen`` to userspace, return
    ``retval`` from the syscall (note that this can be overwritten by
    the BPF program from the parent cgroup).
  
  Cgroup Inheritance
  ==================
  
  Suppose, there is the following cgroup hierarchy where each cgroup
  has ``BPF_CGROUP_GETSOCKOPT`` attached at each level with
  ``BPF_F_ALLOW_MULTI`` flag::
  
    A (root, parent)
     \
      B (child)
  
  When the application calls ``getsockopt`` syscall from the cgroup B,
  the programs are executed from the bottom up: B, A. First program
  (B) sees the result of kernel's ``getsockopt``. It can optionally
  adjust ``optval``, ``optlen`` and reset ``retval`` to 0. After that
  control will be passed to the second (A) program which will see the
  same context as B including any potential modifications.
  
  Same for ``BPF_CGROUP_SETSOCKOPT``: if the program is attached to
  A and B, the trigger order is B, then A. If B does any changes
  to the input arguments (``level``, ``optname``, ``optval``, ``optlen``),
  then the next program in the chain (A) will see those changes,
  *not* the original input ``setsockopt`` arguments. The potentially
  modified values will be then passed down to the kernel.
  
  Example
  =======
  
  See ``tools/testing/selftests/bpf/progs/sockopt_sk.c`` for an example
  of BPF program that handles socket options.