22 Sep, 2017

1 commit

  • When cross-compiling the bpf sample map_perf_test for aarch64, I find that
    __NR_getpgrp is undefined. This causes build errors. This syscall is deprecated
    and requires defining __ARCH_WANT_SYSCALL_DEPRECATED. To avoid having to define
    that, just use a different syscall (getppid) for the array map stress test.

    Acked-by: Alexei Starovoitov
    Signed-off-by: Joel Fernandes
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Joel Fernandes
     

02 Sep, 2017

1 commit

  • Create a new case to test the LRU lookup performance.

    At the beginning, the LRU map is fully loaded (i.e. the number of keys
    is equal to map->max_entries). The lookup is done through key 0
    to num_map_entries and then repeats from 0 again.

    This patch also creates an anonymous struct to properly
    name the test params in stress_lru_hmap_alloc() in map_perf_test_kern.c.

    Signed-off-by: Martin KaFai Lau
    Acked-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Martin KaFai Lau
     

20 Aug, 2017

1 commit


18 Apr, 2017

2 commits

  • This patch adds a map-in-map LRU example.
    If we know only a subset of cores will use the
    LRU, we can allocate a common LRU list per targeting core
    and store it into an array-of-hashs.

    It allows using the common LRU map with map-update performance
    comparable to the BPF_F_NO_COMMON_LRU map but without wasting memory
    on the unused cores that we know they will never access the LRU map.

    BPF_F_NO_COMMON_LRU:
    > map_perf_test 32 8 10000000 10000000 | awk '{sum += $3}END{print sum}'
    9234314 (9.23M/s)

    map-in-map LRU:
    > map_perf_test 512 8 1260000 80000000 | awk '{sum += $3}END{print sum}'
    9962743 (9.96M/s)

    Notes that the max_entries for the map-in-map LRU test is 1260000 which
    is the max_entries for each inner LRU map. 8 processes have been
    started, so 8 * 1260000 = 10080000 (~10M) which is close to what is
    used in the BPF_F_NO_COMMON_LRU test.

    Signed-off-by: Martin KaFai Lau
    Signed-off-by: David S. Miller

    Martin KaFai Lau
     
  • One more LRU test will be added later in this patch series.
    In this patch, we first move all existing LRU map tests into
    a single syscall (connect) first so that the future new
    LRU test can be added without hunting another syscall.

    One of the map name is also changed from percpu_lru_hash_map
    to nocommon_lru_hash_map to avoid the confusion with percpu_hash_map.

    Signed-off-by: Martin KaFai Lau
    Acked-by: Alexei Starovoitov
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Martin KaFai Lau
     

17 Mar, 2017

1 commit

  • $ map_perf_test 128
    speed of HASH bpf_map_lookup_elem() in lookups per second
    w/o JIT w/JIT
    before 46M 58M
    after 42M 74M

    perf report
    before:
    54.23% map_perf_test [kernel.kallsyms] [k] __htab_map_lookup_elem
    14.24% map_perf_test [kernel.kallsyms] [k] lookup_elem_raw
    8.84% map_perf_test [kernel.kallsyms] [k] htab_map_lookup_elem
    5.93% map_perf_test [kernel.kallsyms] [k] bpf_map_lookup_elem
    2.30% map_perf_test [kernel.kallsyms] [k] bpf_prog_da4fc6a3f41761a2
    1.49% map_perf_test [kernel.kallsyms] [k] kprobe_ftrace_handler

    after:
    60.03% map_perf_test [kernel.kallsyms] [k] __htab_map_lookup_elem
    18.07% map_perf_test [kernel.kallsyms] [k] lookup_elem_raw
    2.91% map_perf_test [kernel.kallsyms] [k] bpf_prog_da4fc6a3f41761a2
    1.94% map_perf_test [kernel.kallsyms] [k] _einittext
    1.90% map_perf_test [kernel.kallsyms] [k] __audit_syscall_exit
    1.72% map_perf_test [kernel.kallsyms] [k] kprobe_ftrace_handler

    Notice that bpf_map_lookup_elem() and htab_map_lookup_elem() are trivial
    functions, yet they take sizeable amount of cpu time.
    htab_map_gen_lookup() removes bpf_map_lookup_elem() and converts
    htab_map_lookup_elem() into three BPF insns which causing cpu time
    for bpf_prog_da4fc6a3f41761a2() slightly increase.

    $ map_perf_test 256
    speed of ARRAY bpf_map_lookup_elem() in lookups per second
    w/o JIT w/JIT
    before 97M 174M
    after 64M 280M

    before:
    37.33% map_perf_test [kernel.kallsyms] [k] array_map_lookup_elem
    13.95% map_perf_test [kernel.kallsyms] [k] bpf_map_lookup_elem
    6.54% map_perf_test [kernel.kallsyms] [k] bpf_prog_da4fc6a3f41761a2
    4.57% map_perf_test [kernel.kallsyms] [k] kprobe_ftrace_handler

    after:
    32.86% map_perf_test [kernel.kallsyms] [k] bpf_prog_da4fc6a3f41761a2
    6.54% map_perf_test [kernel.kallsyms] [k] kprobe_ftrace_handler

    array_map_gen_lookup() removes calls to array_map_lookup_elem()
    and bpf_map_lookup_elem() and replaces them with 7 bpf insns.

    The performance without JIT is slower, since executing extra insns
    in the interpreter is slower than running native C code,
    but with JIT the performance gains are obvious,
    since native C->x86 code is replaced with fewer bpf->x86 instructions.

    Signed-off-by: Alexei Starovoitov
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     

24 Jan, 2017

1 commit

  • Extend the map_perf_test_{user,kern}.c infrastructure to stress test
    lpm-trie lookups. We hook into the kprobe on sys_gettid() and measure
    the latency depending on trie size and lookup count.

    On my Intel Haswell i7-6400U, a single gettid() syscall with an empty
    bpf program takes roughly 6.5us on my system. Lookups in empty tries
    take ~1.8us on first try, ~0.9us on retries. Lookups in tries with 8192
    entries take ~7.1us (on the first _and_ any subsequent try).

    Signed-off-by: David Herrmann
    Reviewed-by: Daniel Mack
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    David Herrmann
     

16 Nov, 2016

1 commit

  • This patch has some unit tests and a test_lru_dist.

    The test_lru_dist reads in the numeric keys from a file.
    The files used here are generated by a modified fio-genzipf tool
    originated from the fio test suit. The sample data file can be
    found here: https://github.com/iamkafai/bpf-lru

    The zipf.* data files have 100k numeric keys and the key is also
    ranged from 1 to 100k.

    The test_lru_dist outputs the number of unique keys (nr_unique).
    F.e. The following means, 61239 of them is unique out of 100k keys.
    nr_misses means it cannot be found in the LRU map, so nr_misses
    must be >= nr_unique. test_lru_dist also simulates a perfect LRU
    map as a comparison:

    [root@arch-fb-vm1 ~]# ~/devshare/fb-kernel/linux/samples/bpf/test_lru_dist \
    /root/zipf.100k.a1_01.out 4000 1
    ...
    test_parallel_lru_dist (map_type:9 map_flags:0x0):
    task:0 BPF LRU: nr_unique:23093(/100000) nr_misses:31603(/100000)
    task:0 Perfect LRU: nr_unique:23093(/100000 nr_misses:34328(/100000)
    ....
    test_parallel_lru_dist (map_type:9 map_flags:0x2):
    task:0 BPF LRU: nr_unique:23093(/100000) nr_misses:31710(/100000)
    task:0 Perfect LRU: nr_unique:23093(/100000 nr_misses:34328(/100000)

    [root@arch-fb-vm1 ~]# ~/devshare/fb-kernel/linux/samples/bpf/test_lru_dist \
    /root/zipf.100k.a0_01.out 40000 1
    ...
    test_parallel_lru_dist (map_type:9 map_flags:0x0):
    task:0 BPF LRU: nr_unique:61239(/100000) nr_misses:67054(/100000)
    task:0 Perfect LRU: nr_unique:61239(/100000 nr_misses:66993(/100000)
    ...
    test_parallel_lru_dist (map_type:9 map_flags:0x2):
    task:0 BPF LRU: nr_unique:61239(/100000) nr_misses:67068(/100000)
    task:0 Perfect LRU: nr_unique:61239(/100000 nr_misses:66993(/100000)

    LRU map has also been added to map_perf_test:
    /* Global LRU */
    [root@kerneltest003.31.prn1 ~]# for i in 1 4 8; do echo -n "$i cpus: "; \
    ./map_perf_test 16 $i | awk '{r += $3}END{print r " updates"}'; done
    1 cpus: 2934082 updates
    4 cpus: 7391434 updates
    8 cpus: 6500576 updates

    /* Percpu LRU */
    [root@kerneltest003.31.prn1 ~]# for i in 1 4 8; do echo -n "$i cpus: "; \
    ./map_perf_test 32 $i | awk '{r += $3}END{print r " updates"}'; done
    1 cpus: 2896553 updates
    4 cpus: 9766395 updates
    8 cpus: 17460553 updates

    Signed-off-by: Martin KaFai Lau
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Martin KaFai Lau
     

09 Mar, 2016

1 commit