06 Dec, 2014

1 commit

  • introduce program type BPF_PROG_TYPE_SOCKET_FILTER that is used
    for attaching programs to sockets where ctx == skb.

    add verifier checks for ABS/IND instructions which can only be seen
    in socket filters, therefore the check:
    if (env->prog->aux->prog_type != BPF_PROG_TYPE_SOCKET_FILTER)
    verbose("BPF_LD_ABS|IND instructions are only allowed in socket filters\n");

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     

20 Nov, 2014

1 commit

  • - fix NULL pointer dereference:
    kernel/bpf/arraymap.c:41 array_map_alloc() error: potential null dereference 'array'. (kzalloc returns null)
    kernel/bpf/arraymap.c:41 array_map_alloc() error: we previously assumed 'array' could be null (see line 40)

    - integer overflow check was missing in arraymap
    (hashmap checks for overflow via kmalloc_array())

    - arraymap can round_up(value_size, 8) to zero. check was missing.

    - hashmap was missing zero size check as well, since roundup_pow_of_two() can
    truncate into zero

    - found a typo in the arraymap comment and unnecessary empty line

    Fix all of these issues and make both overflow checks explicit U32 in size.

    Reported-by: kbuild test robot
    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     

19 Nov, 2014

6 commits

  • proper types and function helpers are ready. Use them in verifier testsuite.
    Remove temporary stubs

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     
  • expose bpf_map_lookup_elem(), bpf_map_update_elem(), bpf_map_delete_elem()
    map accessors to eBPF programs

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     
  • fix errno of BPF_MAP_LOOKUP_ELEM command as bpf manpage
    described it in commit b4fc1a460f30("Merge branch 'bpf-next'"):
    -----
    BPF_MAP_LOOKUP_ELEM
    int bpf_lookup_elem(int fd, void *key, void *value)
    {
    union bpf_attr attr = {
    .map_fd = fd,
    .key = ptr_to_u64(key),
    .value = ptr_to_u64(value),
    };

    return bpf(BPF_MAP_LOOKUP_ELEM, &attr, sizeof(attr));
    }
    bpf() syscall looks up an element with given key in a map fd.
    If element is found it returns zero and stores element's value
    into value. If element is not found it returns -1 and sets
    errno to ENOENT.

    and further down in manpage:

    ENOENT For BPF_MAP_LOOKUP_ELEM or BPF_MAP_DELETE_ELEM, indicates that
    element with given key was not found.
    -----

    In general all BPF commands return ENOENT when map element is not found
    (including BPF_MAP_GET_NEXT_KEY and BPF_MAP_UPDATE_ELEM with
    flags == BPF_MAP_UPDATE_ONLY)

    Subsequent patch adds a testsuite to check return values for all of
    these combinations.

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     
  • add new map type BPF_MAP_TYPE_ARRAY and its implementation

    - optimized for fastest possible lookup()
    . in the future verifier/JIT may recognize lookup() with constant key
    and optimize it into constant pointer. Can optimize non-constant
    key into direct pointer arithmetic as well, since pointers and
    value_size are constant for the life of the eBPF program.
    In other words array_map_lookup_elem() may be 'inlined' by verifier/JIT
    while preserving concurrent access to this map from user space

    - two main use cases for array type:
    . 'global' eBPF variables: array of 1 element with key=0 and value is a
    collection of 'global' variables which programs can use to keep the state
    between events
    . aggregation of tracing events into fixed set of buckets

    - all array elements pre-allocated and zero initialized at init time

    - key as an index in array and can only be 4 byte

    - map_delete_elem() returns EINVAL, since elements cannot be deleted

    - map_update_elem() replaces elements in an non-atomic way
    (for atomic updates hashtable type should be used instead)

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     
  • add new map type BPF_MAP_TYPE_HASH and its implementation

    - maps are created/destroyed by userspace. Both userspace and eBPF programs
    can lookup/update/delete elements from the map

    - eBPF programs can be called in_irq(), so use spin_lock_irqsave() mechanism
    for concurrent updates

    - key/value are opaque range of bytes (aligned to 8 bytes)

    - user space provides 3 configuration attributes via BPF syscall:
    key_size, value_size, max_entries

    - map takes care of allocating/freeing key/value pairs

    - map_update_elem() must fail to insert new element when max_entries
    limit is reached to make sure that eBPF programs cannot exhaust memory

    - map_update_elem() replaces elements in an atomic way

    - optimized for speed of lookup() which can be called multiple times from
    eBPF program which itself is triggered by high volume of events
    . in the future JIT compiler may recognize lookup() call and optimize it
    further, since key_size is constant for life of eBPF program

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     
  • the current meaning of BPF_MAP_UPDATE_ELEM syscall command is:
    either update existing map element or create a new one.
    Initially the plan was to add a new command to handle the case of
    'create new element if it didn't exist', but 'flags' style looks
    cleaner and overall diff is much smaller (more code reused), so add 'flags'
    attribute to BPF_MAP_UPDATE_ELEM command with the following meaning:
    #define BPF_ANY 0 /* create new element or update existing */
    #define BPF_NOEXIST 1 /* create new element if it didn't exist */
    #define BPF_EXIST 2 /* update existing element */

    bpf_update_elem(fd, key, value, BPF_NOEXIST) call can fail with EEXIST
    if element already exists.

    bpf_update_elem(fd, key, value, BPF_EXIST) can fail with ENOENT
    if element doesn't exist.

    Userspace will call it as:
    int bpf_update_elem(int fd, void *key, void *value, __u64 flags)
    {
    union bpf_attr attr = {
    .map_fd = fd,
    .key = ptr_to_u64(key),
    .value = ptr_to_u64(value),
    .flags = flags;
    };

    return bpf(BPF_MAP_UPDATE_ELEM, &attr, sizeof(attr));
    }

    First two bits of 'flags' are used to encode style of bpf_update_elem() command.
    Bits 2-63 are reserved for future use.

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     

02 Nov, 2014

1 commit


31 Oct, 2014

1 commit

  • verifier keeps track of register state spilled to stack.
    registers are 8-byte wide and always aligned, so instead of tracking them
    in every byte-sized stack slot, use MAX_BPF_STACK / 8 array to track
    spilled register state.
    Though verifier runs in user context and its state freed immediately
    after verification, it makes sense to reduce its memory usage.
    This optimization reduces sizeof(struct verifier_state)
    from 12464 to 1712 on 64-bit and from 6232 to 1112 on 32-bit.

    Note, this patch doesn't change existing limits, which are there to bound
    time and memory during verification: 4k total number of insns in a program,
    1k number of jumps (states to visit) and 32k number of processed insn
    (since an insn may be visited multiple times). Theoretical worst case memory
    during verification is 1712 * 1k = 17Mbyte. Out-of-memory situation triggers
    cleanup and rejects the program.

    Suggested-by: Andy Lutomirski
    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     

28 Oct, 2014

1 commit

  • introduce two configs:
    - hidden CONFIG_BPF to select eBPF interpreter that classic socket filters
    depend on
    - visible CONFIG_BPF_SYSCALL (default off) that tracing and sockets can use

    that solves several problems:
    - tracing and others that wish to use eBPF don't need to depend on NET.
    They can use BPF_SYSCALL to allow loading from userspace or select BPF
    to use it directly from kernel in NET-less configs.
    - in 3.18 programs cannot be attached to events yet, so don't force it on
    - when the rest of eBPF infra is there in 3.19+, it's still useful to
    switch it off to minimize kernel size

    bloat-o-meter on x64 shows:
    add/remove: 0/60 grow/shrink: 0/2 up/down: 0/-15601 (-15601)

    tested with many different config combinations. Hopefully didn't miss anything.

    Signed-off-by: Alexei Starovoitov
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     

22 Oct, 2014

1 commit

  • while comparing for verifier state equivalency the comparison
    was missing a check for uninitialized register.
    Make sure it does so and add a testcase.

    Fixes: f1bca824dabb ("bpf: add search pruning optimization to verifier")
    Cc: Hannes Frederic Sowa
    Signed-off-by: Alexei Starovoitov
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     

02 Oct, 2014

1 commit

  • consider C program represented in eBPF:
    int filter(int arg)
    {
    int a, b, c, *ptr;

    if (arg == 1)
    ptr = &a;
    else if (arg == 2)
    ptr = &b;
    else
    ptr = &c;

    *ptr = 0;
    return 0;
    }
    eBPF verifier has to follow all possible paths through the program
    to recognize that '*ptr = 0' instruction would be safe to execute
    in all situations.
    It's doing it by picking a path towards the end and observes changes
    to registers and stack at every insn until it reaches bpf_exit.
    Then it comes back to one of the previous branches and goes towards
    the end again with potentially different values in registers.
    When program has a lot of branches, the number of possible combinations
    of branches is huge, so verifer has a hard limit of walking no more
    than 32k instructions. This limit can be reached and complex (but valid)
    programs could be rejected. Therefore it's important to recognize equivalent
    verifier states to prune this depth first search.

    Basic idea can be illustrated by the program (where .. are some eBPF insns):
    1: ..
    2: if (rX == rY) goto 4
    3: ..
    4: ..
    5: ..
    6: bpf_exit
    In the first pass towards bpf_exit the verifier will walk insns: 1, 2, 3, 4, 5, 6
    Since insn#2 is a branch the verifier will remember its state in verifier stack
    to come back to it later.
    Since insn#4 is marked as 'branch target', the verifier will remember its state
    in explored_states[4] linked list.
    Once it reaches insn#6 successfully it will pop the state recorded at insn#2 and
    will continue.
    Without search pruning optimization verifier would have to walk 4, 5, 6 again,
    effectively simulating execution of insns 1, 2, 4, 5, 6
    With search pruning it will check whether state at #4 after jumping from #2
    is equivalent to one recorded in explored_states[4] during first pass.
    If there is an equivalent state, verifier can prune the search at #4 and declare
    this path to be safe as well.
    In other words two states at #4 are equivalent if execution of 1, 2, 3, 4 insns
    and 1, 2, 4 insns produces equivalent registers and stack.

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     

27 Sep, 2014

10 commits

  • 1.
    the library includes a trivial set of BPF syscall wrappers:
    int bpf_create_map(int key_size, int value_size, int max_entries);
    int bpf_update_elem(int fd, void *key, void *value);
    int bpf_lookup_elem(int fd, void *key, void *value);
    int bpf_delete_elem(int fd, void *key);
    int bpf_get_next_key(int fd, void *key, void *next_key);
    int bpf_prog_load(enum bpf_prog_type prog_type,
    const struct sock_filter_int *insns, int insn_len,
    const char *license);
    bpf_prog_load() stores verifier log into global bpf_log_buf[] array

    and BPF_*() macros to build instructions

    2.
    test stubs configure eBPF infra with 'unspec' map and program types.
    These are fake types used by user space testsuite only.

    3.
    verifier tests valid and invalid programs and expects predefined
    error log messages from kernel.
    40 tests so far.

    $ sudo ./test_verifier
    #0 add+sub+mul OK
    #1 unreachable OK
    #2 unreachable2 OK
    #3 out of range jump OK
    #4 out of range jump2 OK
    #5 test1 ld_imm64 OK
    ...

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     
  • This patch adds verifier core which simulates execution of every insn and
    records the state of registers and program stack. Every branch instruction seen
    during simulation is pushed into state stack. When verifier reaches BPF_EXIT,
    it pops the state from the stack and continues until it reaches BPF_EXIT again.
    For program:
    1: bpf_mov r1, xxx
    2: if (r1 == 0) goto 5
    3: bpf_mov r0, 1
    4: goto 6
    5: bpf_mov r0, 2
    6: bpf_exit
    The verifier will walk insns: 1, 2, 3, 4, 6
    then it will pop the state recorded at insn#2 and will continue: 5, 6

    This way it walks all possible paths through the program and checks all
    possible values of registers. While doing so, it checks for:
    - invalid instructions
    - uninitialized register access
    - uninitialized stack access
    - misaligned stack access
    - out of range stack access
    - invalid calling convention
    - instruction encoding is not using reserved fields

    Kernel subsystem configures the verifier with two callbacks:

    - bool (*is_valid_access)(int off, int size, enum bpf_access_type type);
    that provides information to the verifer which fields of 'ctx'
    are accessible (remember 'ctx' is the first argument to eBPF program)

    - const struct bpf_func_proto *(*get_func_proto)(enum bpf_func_id func_id);
    returns argument constraints of kernel helper functions that eBPF program
    may call, so that verifier can checks that R1-R5 types match the prototype

    More details in Documentation/networking/filter.txt and in kernel/bpf/verifier.c

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     
  • check that control flow graph of eBPF program is a directed acyclic graph

    check_cfg() does:
    - detect loops
    - detect unreachable instructions
    - check that program terminates with BPF_EXIT insn
    - check that all branches are within program boundary

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     
  • eBPF programs passed from userspace are using pseudo BPF_LD_IMM64 instructions
    to refer to process-local map_fd. Scan the program for such instructions and
    if FDs are valid, convert them to 'struct bpf_map' pointers which will be used
    by verifier to check access to maps in bpf_map_lookup/update() calls.
    If program passes verifier, convert pseudo BPF_LD_IMM64 into generic by dropping
    BPF_PSEUDO_MAP_FD flag.

    Note that eBPF interpreter is generic and knows nothing about pseudo insns.

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     
  • add optional attributes for BPF_PROG_LOAD syscall:
    union bpf_attr {
    struct {
    ...
    __u32 log_level; /* verbosity level of eBPF verifier */
    __u32 log_size; /* size of user buffer */
    __aligned_u64 log_buf; /* user supplied 'char *buffer' */
    };
    };

    when log_level > 0 the verifier will return its verification log in the user
    supplied buffer 'log_buf' which can be used by program author to analyze why
    verifier rejected given program.

    'Understanding eBPF verifier messages' section of Documentation/networking/filter.txt
    provides several examples of these messages, like the program:

    BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
    BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
    BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
    BPF_LD_MAP_FD(BPF_REG_1, 0),
    BPF_CALL_FUNC(BPF_FUNC_map_lookup_elem),
    BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1),
    BPF_ST_MEM(BPF_DW, BPF_REG_0, 4, 0),
    BPF_EXIT_INSN(),

    will be rejected with the following multi-line message in log_buf:

    0: (7a) *(u64 *)(r10 -8) = 0
    1: (bf) r2 = r10
    2: (07) r2 += -8
    3: (b7) r1 = 0
    4: (85) call 1
    5: (15) if r0 == 0x0 goto pc+1
    R0=map_ptr R10=fp
    6: (7a) *(u64 *)(r0 +4) = 0
    misaligned access off 4 size 8

    The format of the output can change at any time as verifier evolves.

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     
  • this patch adds all of eBPF verfier documentation and empty bpf_check()

    The end goal for the verifier is to statically check safety of the program.

    Verifier will catch:
    - loops
    - out of range jumps
    - unreachable instructions
    - invalid instructions
    - uninitialized register access
    - uninitialized stack access
    - misaligned stack access
    - out of range stack access
    - invalid calling convention

    More details in Documentation/networking/filter.txt

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     
  • in native eBPF programs userspace is using pseudo BPF_CALL instructions
    which encode one of 'enum bpf_func_id' inside insn->imm field.
    Verifier checks that program using correct function arguments to given func_id.
    If all checks passed, kernel needs to fixup BPF_CALL->imm fields by
    replacing func_id with in-kernel function pointer.
    eBPF interpreter just calls the function.

    In-kernel eBPF users continue to use generic BPF_CALL.

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     
  • eBPF programs are similar to kernel modules. They are loaded by the user
    process and automatically unloaded when process exits. Each eBPF program is
    a safe run-to-completion set of instructions. eBPF verifier statically
    determines that the program terminates and is safe to execute.

    The following syscall wrapper can be used to load the program:
    int bpf_prog_load(enum bpf_prog_type prog_type,
    const struct bpf_insn *insns, int insn_cnt,
    const char *license)
    {
    union bpf_attr attr = {
    .prog_type = prog_type,
    .insns = ptr_to_u64(insns),
    .insn_cnt = insn_cnt,
    .license = ptr_to_u64(license),
    };

    return bpf(BPF_PROG_LOAD, &attr, sizeof(attr));
    }
    where 'insns' is an array of eBPF instructions and 'license' is a string
    that must be GPL compatible to call helper functions marked gpl_only

    Upon succesful load the syscall returns prog_fd.
    Use close(prog_fd) to unload the program.

    User space tests and examples follow in the later patches

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     
  • 'maps' is a generic storage of different types for sharing data between kernel
    and userspace.

    The maps are accessed from user space via BPF syscall, which has commands:

    - create a map with given type and attributes
    fd = bpf(BPF_MAP_CREATE, union bpf_attr *attr, u32 size)
    returns fd or negative error

    - lookup key in a given map referenced by fd
    err = bpf(BPF_MAP_LOOKUP_ELEM, union bpf_attr *attr, u32 size)
    using attr->map_fd, attr->key, attr->value
    returns zero and stores found elem into value or negative error

    - create or update key/value pair in a given map
    err = bpf(BPF_MAP_UPDATE_ELEM, union bpf_attr *attr, u32 size)
    using attr->map_fd, attr->key, attr->value
    returns zero or negative error

    - find and delete element by key in a given map
    err = bpf(BPF_MAP_DELETE_ELEM, union bpf_attr *attr, u32 size)
    using attr->map_fd, attr->key

    - iterate map elements (based on input key return next_key)
    err = bpf(BPF_MAP_GET_NEXT_KEY, union bpf_attr *attr, u32 size)
    using attr->map_fd, attr->key, attr->next_key

    - close(fd) deletes the map

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     
  • BPF syscall is a multiplexor for a range of different operations on eBPF.
    This patch introduces syscall with single command to create a map.
    Next patch adds commands to access maps.

    'maps' is a generic storage of different types for sharing data between kernel
    and userspace.

    Userspace example:
    /* this syscall wrapper creates a map with given type and attributes
    * and returns map_fd on success.
    * use close(map_fd) to delete the map
    */
    int bpf_create_map(enum bpf_map_type map_type, int key_size,
    int value_size, int max_entries)
    {
    union bpf_attr attr = {
    .map_type = map_type,
    .key_size = key_size,
    .value_size = value_size,
    .max_entries = max_entries
    };

    return bpf(BPF_MAP_CREATE, &attr, sizeof(attr));
    }

    'union bpf_attr' is backwards compatible with future extensions.

    More details in Documentation/networking/filter.txt and in manpage

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     

11 Sep, 2014

1 commit

  • Since BPF JIT depends on the availability of module_alloc() and
    module_free() helpers (HAVE_BPF_JIT and MODULES), we better build
    that code only in case we have BPF_JIT in our config enabled, just
    like with other JIT code. Fixes builds for arm/marzen_defconfig
    and sh/rsk7269_defconfig.

    ====================
    kernel/built-in.o: In function `bpf_jit_binary_alloc':
    /home/cwang/linux/kernel/bpf/core.c:144: undefined reference to `module_alloc'
    kernel/built-in.o: In function `bpf_jit_binary_free':
    /home/cwang/linux/kernel/bpf/core.c:164: undefined reference to `module_free'
    make: *** [vmlinux] Error 1
    ====================

    Reported-by: Fengguang Wu
    Fixes: 738cbe72adc5 ("net: bpf: consolidate JIT binary allocator")
    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

10 Sep, 2014

2 commits

  • Introduced in commit 314beb9bcabf ("x86: bpf_jit_comp: secure bpf jit
    against spraying attacks") and later on replicated in aa2d2c73c21f
    ("s390/bpf,jit: address randomize and write protect jit code") for
    s390 architecture, write protection for BPF JIT images got added and
    a random start address of the JIT code, so that it's not on a page
    boundary anymore.

    Since both use a very similar allocator for the BPF binary header,
    we can consolidate this code into the BPF core as it's mostly JIT
    independant anyway.

    This will also allow for future archs that support DEBUG_SET_MODULE_RONX
    to just reuse instead of reimplementing it.

    JIT tested on x86_64 and s390x with BPF test suite.

    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Cc: Eric Dumazet
    Cc: Heiko Carstens
    Cc: Martin Schwidefsky
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • add BPF_LD_IMM64 instruction to load 64-bit immediate value into a register.
    All previous instructions were 8-byte. This is first 16-byte instruction.
    Two consecutive 'struct bpf_insn' blocks are interpreted as single instruction:
    insn[0].code = BPF_LD | BPF_DW | BPF_IMM
    insn[0].dst_reg = destination register
    insn[0].imm = lower 32-bit
    insn[1].code = 0
    insn[1].imm = upper 32-bit
    All unused fields must be zero.

    Classic BPF has similar instruction: BPF_LD | BPF_W | BPF_IMM
    which loads 32-bit immediate value into a register.

    x64 JITs it as single 'movabsq %rax, imm64'
    arm64 may JIT as sequence of four 'movk x0, #imm16, lsl #shift' insn

    Note that old eBPF programs are binary compatible with new interpreter.

    It helps eBPF programs load 64-bit constant into a register with one
    instruction instead of using two registers and 4 instructions:
    BPF_MOV32_IMM(R1, imm32)
    BPF_ALU64_IMM(BPF_LSH, R1, 32)
    BPF_MOV32_IMM(R2, imm32)
    BPF_ALU64_REG(BPF_OR, R1, R2)

    User space generated programs will use this instruction to load constants only.

    To tell kernel that user space needs a pointer the _pseudo_ variant of
    this instruction may be added later, which will use extra bits of encoding
    to indicate what type of pointer user space is asking kernel to provide.
    For example 'off' or 'src_reg' fields can be used for such purpose.
    src_reg = 1 could mean that user space is asking kernel to validate and
    load in-kernel map pointer.
    src_reg = 2 could mean that user space needs readonly data section pointer
    src_reg = 3 could mean that user space needs a pointer to per-cpu local data
    All such future pseudo instructions will not be carrying the actual pointer
    as part of the instruction, but rather will be treated as a request to kernel
    to provide one. The kernel will verify the request_for_a_pointer, then
    will drop _pseudo_ marking and will store actual internal pointer inside
    the instruction, so the end result is the interpreter and JITs never
    see pseudo BPF_LD_IMM64 insns and only operate on generic BPF_LD_IMM64 that
    loads 64-bit immediate into a register. User space never operates on direct
    pointers and verifier can easily recognize request_for_pointer vs other
    instructions.

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     

06 Sep, 2014

1 commit

  • With eBPF getting more extended and exposure to user space is on it's way,
    hardening the memory range the interpreter uses to steer its command flow
    seems appropriate. This patch moves the to be interpreted bytecode to
    read-only pages.

    In case we execute a corrupted BPF interpreter image for some reason e.g.
    caused by an attacker which got past a verifier stage, it would not only
    provide arbitrary read/write memory access but arbitrary function calls
    as well. After setting up the BPF interpreter image, its contents do not
    change until destruction time, thus we can setup the image on immutable
    made pages in order to mitigate modifications to that code. The idea
    is derived from commit 314beb9bcabf ("x86: bpf_jit_comp: secure bpf jit
    against spraying attacks").

    This is possible because bpf_prog is not part of sk_filter anymore.
    After setup bpf_prog cannot be altered during its life-time. This prevents
    any modifications to the entire bpf_prog structure (incl. function/JIT
    image pointer).

    Every eBPF program (including classic BPF that are migrated) have to call
    bpf_prog_select_runtime() to select either interpreter or a JIT image
    as a last setup step, and they all are being freed via bpf_prog_free(),
    including non-JIT. Therefore, we can easily integrate this into the
    eBPF life-time, plus since we directly allocate a bpf_prog, we have no
    performance penalty.

    Tested with seccomp and test_bpf testsuite in JIT/non-JIT mode and manual
    inspection of kernel_page_tables. Brad Spengler proposed the same idea
    via Twitter during development of this patch.

    Joint work with Hannes Frederic Sowa.

    Suggested-by: Brad Spengler
    Signed-off-by: Daniel Borkmann
    Signed-off-by: Hannes Frederic Sowa
    Cc: Alexei Starovoitov
    Cc: Kees Cook
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

03 Aug, 2014

3 commits

  • clean up names related to socket filtering and bpf in the following way:
    - everything that deals with sockets keeps 'sk_*' prefix
    - everything that is pure BPF is changed to 'bpf_*' prefix

    split 'struct sk_filter' into
    struct sk_filter {
    atomic_t refcnt;
    struct rcu_head rcu;
    struct bpf_prog *prog;
    };
    and
    struct bpf_prog {
    u32 jited:1,
    len:31;
    struct sock_fprog_kern *orig_prog;
    unsigned int (*bpf_func)(const struct sk_buff *skb,
    const struct bpf_insn *filter);
    union {
    struct sock_filter insns[0];
    struct bpf_insn insnsi[0];
    struct work_struct work;
    };
    };
    so that 'struct bpf_prog' can be used independent of sockets and cleans up
    'unattached' bpf use cases

    split SK_RUN_FILTER macro into:
    SK_RUN_FILTER to be used with 'struct sk_filter *' and
    BPF_PROG_RUN to be used with 'struct bpf_prog *'

    __sk_filter_release(struct sk_filter *) gains
    __bpf_prog_release(struct bpf_prog *) helper function

    also perform related renames for the functions that work
    with 'struct bpf_prog *', since they're on the same lines:

    sk_filter_size -> bpf_prog_size
    sk_filter_select_runtime -> bpf_prog_select_runtime
    sk_filter_free -> bpf_prog_free
    sk_unattached_filter_create -> bpf_prog_create
    sk_unattached_filter_destroy -> bpf_prog_destroy
    sk_store_orig_filter -> bpf_prog_store_orig_filter
    sk_release_orig_filter -> bpf_release_orig_filter
    __sk_migrate_filter -> bpf_migrate_filter
    __sk_prepare_filter -> bpf_prepare_filter

    API for attaching classic BPF to a socket stays the same:
    sk_attach_filter(prog, struct sock *)/sk_detach_filter(struct sock *)
    and SK_RUN_FILTER(struct sk_filter *, ctx) to execute a program
    which is used by sockets, tun, af_packet

    API for 'unattached' BPF programs becomes:
    bpf_prog_create(struct bpf_prog **)/bpf_prog_destroy(struct bpf_prog *)
    and BPF_PROG_RUN(struct bpf_prog *, ctx) to execute a program
    which is used by isdn, ppp, team, seccomp, ptp, xt_bpf, cls_bpf, test_bpf

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     
  • to indicate that this function is converting classic BPF into eBPF
    and not related to sockets

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     
  • trivial rename to indicate that this functions performs classic BPF checking

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     

25 Jul, 2014

1 commit


24 Jul, 2014

1 commit

  • BPF is used in several kernel components. This split creates logical boundary
    between generic eBPF core and the rest

    kernel/bpf/core.c: eBPF interpreter

    net/core/filter.c: classic->eBPF converter, classic verifiers, socket filters

    This patch only moves functions.

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Alexei Starovoitov