14 Jul, 2017

1 commit

  • Merge yet more updates from Andrew Morton:

    - various misc things

    - kexec updates

    - sysctl core updates

    - scripts/gdb udpates

    - checkpoint-restart updates

    - ipc updates

    - kernel/watchdog updates

    - Kees's "rough equivalent to the glibc _FORTIFY_SOURCE=1 feature"

    - "stackprotector: ascii armor the stack canary"

    - more MM bits

    - checkpatch updates

    * emailed patches from Andrew Morton : (96 commits)
    writeback: rework wb_[dec|inc]_stat family of functions
    ARM: samsung: usb-ohci: move inline before return type
    video: fbdev: omap: move inline before return type
    video: fbdev: intelfb: move inline before return type
    USB: serial: safe_serial: move __inline__ before return type
    drivers: tty: serial: move inline before return type
    drivers: s390: move static and inline before return type
    x86/efi: move asmlinkage before return type
    sh: move inline before return type
    MIPS: SMP: move asmlinkage before return type
    m68k: coldfire: move inline before return type
    ia64: sn: pci: move inline before type
    ia64: move inline before return type
    FRV: tlbflush: move asmlinkage before return type
    CRIS: gpio: move inline before return type
    ARM: HP Jornada 7XX: move inline before return type
    ARM: KVM: move asmlinkage before type
    checkpatch: improve the STORAGE_CLASS test
    mm, migration: do not trigger OOM killer when migrating memory
    drm/i915: use __GFP_RETRY_MAYFAIL
    ...

    Linus Torvalds
     

13 Jul, 2017

1 commit

  • This is a layering violation so we replace the uses with calls to
    sg_page(). This is a prep patch for replacing page_link and this is one
    of the very few uses outside of scatterlist.h.

    Link: http://lkml.kernel.org/r/1495663199-22234-1-git-send-email-logang@deltatee.com
    Signed-off-by: Logan Gunthorpe
    Signed-off-by: Stephen Bates
    Acked-by: Stefani Seibold
    Cc: Stefani Seibold
    Cc: Greg KH
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Logan Gunthorpe
     

12 Jul, 2017

1 commit

  • With latest net-next:

    ====
    clang -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/6.3.1/include -I./arch/x86/include -I./arch/x86/include/generated/uapi -I./arch/x86/include/generated -I./include -I./arch/x86/include/uapi -I./include/uapi -I./include/generated/uapi -include ./include/linux/kconfig.h -Isamples/bpf \
    -D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value -Wno-pointer-sign \
    -Wno-compare-distinct-pointer-types \
    -Wno-gnu-variable-sized-type-not-at-end \
    -Wno-address-of-packed-member -Wno-tautological-compare \
    -Wno-unknown-warning-option \
    -O2 -emit-llvm -c samples/bpf/tcp_synrto_kern.c -o -| llc -march=bpf -filetype=obj -o samples/bpf/tcp_synrto_kern.o
    samples/bpf/tcp_synrto_kern.c:20:10: fatal error: 'bpf_endian.h' file not found
    ^~~~~~~~~~~~~~
    1 error generated.
    ====

    net has the same issue.

    Add support for ntohl and htonl in tools/testing/selftests/bpf/bpf_endian.h.
    Also move bpf_helpers.h from samples/bpf to selftests/bpf and change
    compiler include logic so that programs in samples/bpf can access the headers
    in selftests/bpf, but not the other way around.

    Signed-off-by: Yonghong Song
    Acked-by: Daniel Borkmann
    Acked-by: Lawrence Brakmo
    Signed-off-by: David S. Miller

    Yonghong Song
     

05 Jul, 2017

1 commit

  • The function load_bpf_file ignores the return value of
    load_and_attach(), so even if load_and_attach() returns an error,
    load_bpf_file() will return 0.

    Now, load_bpf_file() can call load_and_attach() multiple times and some
    can succeed and some could fail. I think the correct behavor is to
    return error on the first failed load_and_attach().

    v2: Added missing SOB

    Signed-off-by: Lawrence Brakmo
    Signed-off-by: David S. Miller

    Lawrence Brakmo
     

02 Jul, 2017

9 commits

  • Sample BPF program, tcp_clamp_kern.c, to demostrate the use
    of setting the sndcwnd clamp. This program assumes that if the
    first 5.5 bytes of the host's IPv6 addresses are the same, then
    the hosts are in the same datacenter and sets sndcwnd clamp to
    100 packets, SYN and SYN-ACK RTOs to 10ms and send/receive buffer
    sizes to 150KB.

    Signed-off-by: Lawrence Brakmo
    Signed-off-by: David S. Miller

    Lawrence Brakmo
     
  • Sample BPF program that assumes hosts are far away (i.e. large RTTs)
    and sets initial cwnd and initial receive window to 40 packets,
    send and receive buffers to 1.5MB.

    In practice there would be a test to insure the hosts are actually
    far enough away.

    Signed-off-by: Lawrence Brakmo
    Signed-off-by: David S. Miller

    Lawrence Brakmo
     
  • Sample BPF program that sets congestion control to dctcp when both hosts
    are within the same datacenter. In this example that is assumed to be
    when they have the first 5.5 bytes of their IPv6 address are the same.

    Signed-off-by: Lawrence Brakmo
    Signed-off-by: David S. Miller

    Lawrence Brakmo
     
  • This patch contains a BPF program to set initial receive window to
    40 packets and send and receive buffers to 1.5MB. This would usually
    be done after doing appropriate checks that indicate the hosts are
    far enough away (i.e. large RTT).

    Signed-off-by: Lawrence Brakmo
    Signed-off-by: David S. Miller

    Lawrence Brakmo
     
  • Added support for calling a subset of socket setsockopts from
    BPF_PROG_TYPE_SOCK_OPS programs. The code was duplicated rather
    than making the changes to call the socket setsockopt function because
    the changes required would have been larger.

    The ops supported are:
    SO_RCVBUF
    SO_SNDBUF
    SO_MAX_PACING_RATE
    SO_PRIORITY
    SO_RCVLOWAT
    SO_MARK

    Signed-off-by: Lawrence Brakmo
    Signed-off-by: David S. Miller

    Lawrence Brakmo
     
  • The sample bpf program, tcp_rwnd_kern.c, sets the initial
    advertized window to 40 packets in an environment where
    distinct IPv6 prefixes indicate that both hosts are not
    in the same data center.

    Signed-off-by: Lawrence Brakmo
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Lawrence Brakmo
     
  • The sample BPF program, tcp_synrto_kern.c, sets the SYN and SYN-ACK
    RTOs to 10ms when both hosts are within the same datacenter (i.e.
    small RTTs) in an environment where common IPv6 prefixes indicate
    both hosts are in the same data center.

    Signed-off-by: Lawrence Brakmo
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Lawrence Brakmo
     
  • The program load_sock_ops can be used to load sock_ops bpf programs and
    to attach it to an existing (v2) cgroup. It can also be used to detach
    sock_ops programs.

    Examples:
    load_sock_ops [-l]
    Load and attaches a sock_ops program at the specified cgroup.
    If "-l" is used, the program will continue to run to output the
    BPF log buffer.
    If the specified filename does not end in ".o", it appends
    "_kern.o" to the name.

    load_sock_ops -r
    Detaches the currently attached sock_ops program from the
    specified cgroup.

    Signed-off-by: Lawrence Brakmo
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Lawrence Brakmo
     
  • Created a new BPF program type, BPF_PROG_TYPE_SOCK_OPS, and a corresponding
    struct that allows BPF programs of this type to access some of the
    socket's fields (such as IP addresses, ports, etc.). It uses the
    existing bpf cgroups infrastructure so the programs can be attached per
    cgroup with full inheritance support. The program will be called at
    appropriate times to set relevant connections parameters such as buffer
    sizes, SYN and SYN-ACK RTOs, etc., based on connection information such
    as IP addresses, port numbers, etc.

    Alghough there are already 3 mechanisms to set parameters (sysctls,
    route metrics and setsockopts), this new mechanism provides some
    distinct advantages. Unlike sysctls, it can set parameters per
    connection. In contrast to route metrics, it can also use port numbers
    and information provided by a user level program. In addition, it could
    set parameters probabilistically for evaluation purposes (i.e. do
    something different on 10% of the flows and compare results with the
    other 90% of the flows). Also, in cases where IPv6 addresses contain
    geographic information, the rules to make changes based on the distance
    (or RTT) between the hosts are much easier than route metric rules and
    can be global. Finally, unlike setsockopt, it oes not require
    application changes and it can be updated easily at any time.

    Although the bpf cgroup framework already contains a sock related
    program type (BPF_PROG_TYPE_CGROUP_SOCK), I created the new type
    (BPF_PROG_TYPE_SOCK_OPS) beccause the existing type expects to be called
    only once during the connections's lifetime. In contrast, the new
    program type will be called multiple times from different places in the
    network stack code. For example, before sending SYN and SYN-ACKs to set
    an appropriate timeout, when the connection is established to set
    congestion control, etc. As a result it has "op" field to specify the
    type of operation requested.

    The purpose of this new program type is to simplify setting connection
    parameters, such as buffer sizes, TCP's SYN RTO, etc. For example, it is
    easy to use facebook's internal IPv6 addresses to determine if both hosts
    of a connection are in the same datacenter. Therefore, it is easy to
    write a BPF program to choose a small SYN RTO value when both hosts are
    in the same datacenter.

    This patch only contains the framework to support the new BPF program
    type, following patches add the functionality to set various connection
    parameters.

    This patch defines a new BPF program type: BPF_PROG_TYPE_SOCKET_OPS
    and a new bpf syscall command to load a new program of this type:
    BPF_PROG_LOAD_SOCKET_OPS.

    Two new corresponding structs (one for the kernel one for the user/BPF
    program):

    /* kernel version */
    struct bpf_sock_ops_kern {
    struct sock *sk;
    __u32 op;
    union {
    __u32 reply;
    __u32 replylong[4];
    };
    };

    /* user version
    * Some fields are in network byte order reflecting the sock struct
    * Use the bpf_ntohl helper macro in samples/bpf/bpf_endian.h to
    * convert them to host byte order.
    */
    struct bpf_sock_ops {
    __u32 op;
    union {
    __u32 reply;
    __u32 replylong[4];
    };
    __u32 family;
    __u32 remote_ip4; /* In network byte order */
    __u32 local_ip4; /* In network byte order */
    __u32 remote_ip6[4]; /* In network byte order */
    __u32 local_ip6[4]; /* In network byte order */
    __u32 remote_port; /* In network byte order */
    __u32 local_port; /* In host byte horder */
    };

    Currently there are two types of ops. The first type expects the BPF
    program to return a value which is then used by the caller (or a
    negative value to indicate the operation is not supported). The second
    type expects state changes to be done by the BPF program, for example
    through a setsockopt BPF helper function, and they ignore the return
    value.

    The reply fields of the bpf_sockt_ops struct are there in case a bpf
    program needs to return a value larger than an integer.

    Signed-off-by: Lawrence Brakmo
    Acked-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Lawrence Brakmo
     

30 Jun, 2017

1 commit


22 Jun, 2017

1 commit

  • tracex5_kern.c build failed with the following error message:
    ../samples/bpf/tracex5_kern.c:12:10: fatal error: 'syscall_nrs.h' file not found
    #include "syscall_nrs.h"
    The generated file syscall_nrs.h is put in build/samples/bpf directory,
    but this directory is not in include path, hence build failed.

    The fix is to add $(obj) into the clang compilation path.

    Signed-off-by: Yonghong Song
    Signed-off-by: David S. Miller

    Yonghong Song
     

17 Jun, 2017

2 commits


15 Jun, 2017

2 commits

  • There are two problems:

    1) In MIPS the __NR_* macros expand to an expression, this causes the
    sections of the object file to be named like:

    .
    .
    .
    [ 5] kprobe/(5000 + 1) PROGBITS 0000000000000000 000160 ...
    [ 6] kprobe/(5000 + 0) PROGBITS 0000000000000000 000258 ...
    [ 7] kprobe/(5000 + 9) PROGBITS 0000000000000000 000348 ...
    .
    .
    .

    The fix here is to use the "asm_offsets" trick to evaluate the macros
    in the C compiler and generate a header file with a usable form of the
    macros.

    2) MIPS syscall numbers start at 5000, so we need a bigger map to hold
    the sub-programs.

    Signed-off-by: David Daney
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    David Daney
     
  • Signed-off-by: David Daney
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    David Daney
     

05 Jun, 2017

1 commit

  • $ trace_event

    tests attaching BPF program to HW_CPU_CYCLES, SW_CPU_CLOCK, HW_CACHE_L1D and other events.
    It runs 'dd' in the background while bpf program collects user and kernel
    stack trace on counter overflow.
    User space expects to see sys_read and sys_write in the kernel stack.

    $ tracex6

    tests reading of various perf counters from BPF program.

    Both tests were refactored to increase coverage and be more accurate.

    Signed-off-by: Teng Qin
    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Teng Qin
     

01 Jun, 2017

1 commit

  • An eBPF ELF file generated with LLVM can contain several program
    section, which can be used for bpf tail calls. The bpf prog file
    descriptors are accessible via array prog_fd[].

    At-least XDP samples assume ordering, and uses prog_fd[0] is the main
    XDP program to attach. The actual order of array prog_fd[] depend on
    whether or not a bpf program section is referencing any maps or not.
    Not using a map result in being loaded/processed after all other
    prog section. Thus, this can lead to some very strange and hard to
    debug situation, as the user can only see a FD and cannot correlated
    that with the ELF section name.

    The fix is rather simple, and even removes duplicate memcmp code.
    Simply load program sections as the last step, instead of
    load_and_attach while processing the relocation section.

    When working with tail calls, it become even more essential that the
    order of prog_fd[] is consistant, like the current dependency of the
    map_fd[] order.

    Signed-off-by: Jesper Dangaard Brouer
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Jesper Dangaard Brouer
     

12 May, 2017

2 commits

  • Shahid Habib noticed that when xdp1 was killed from a different console the xdp
    program was not cleaned-up properly in the kernel and it continued to forward
    traffic.

    Most of the applications in samples/bpf cleanup properly, but only when getting
    SIGINT. Since kill defaults to using SIGTERM, add support to cleanup when the
    application receives either SIGINT or SIGTERM.

    Signed-off-by: Andy Gospodarek
    Reported-by: Shahid Habib
    Acked-by: Alexei Starovoitov
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Andy Gospodarek
     
  • After commit b5cdae3291f7 ("net: Generic XDP") we automatically fall
    back to a generic XDP variant if the driver does not support native
    XDP. Allow for an option where the user can specify that always the
    native XDP variant should be selected and in case it's not supported
    by a driver, just bail out.

    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

05 May, 2017

1 commit

  • Pull char/misc driver updates from Greg KH:
    "Here is the big set of new char/misc driver drivers and features for
    4.12-rc1.

    There's lots of new drivers added this time around, new firmware
    drivers from Google, more auxdisplay drivers, extcon drivers, fpga
    drivers, and a bunch of other driver updates. Nothing major, except if
    you happen to have the hardware for these drivers, and then you will
    be happy :)

    All of these have been in linux-next for a while with no reported
    issues"

    * tag 'char-misc-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (136 commits)
    firmware: google memconsole: Fix return value check in platform_memconsole_init()
    firmware: Google VPD: Fix return value check in vpd_platform_init()
    goldfish_pipe: fix build warning about using too much stack.
    goldfish_pipe: An implementation of more parallel pipe
    fpga fr br: update supported version numbers
    fpga: region: release FPGA region reference in error path
    fpga altera-hps2fpga: disable/unprepare clock on error in alt_fpga_bridge_probe()
    mei: drop the TODO from samples
    firmware: Google VPD sysfs driver
    firmware: Google VPD: import lib_vpd source files
    misc: lkdtm: Add volatile to intentional NULL pointer reference
    eeprom: idt_89hpesx: Add OF device ID table
    misc: ds1682: Add OF device ID table
    misc: tsl2550: Add OF device ID table
    w1: Remove unneeded use of assert() and remove w1_log.h
    w1: Use kernel common min() implementation
    uio_mf624: Align memory regions to page size and set correct offsets
    uio_mf624: Refactor memory info initialization
    uio: Allow handling of non page-aligned memory regions
    hangcheck-timer: Fix typo in comment
    ...

    Linus Torvalds
     

03 May, 2017

5 commits

  • Giving *_user.c side tools access to map_data[] provides easier
    access to information on the maps being loaded. Still provide
    the guarantee that the order maps are being defined in inside the
    _kern.c file corresponds with the order in the array. Now user
    tools are not blind, but can inspect and verify the maps that got
    loaded from the ELF binary.

    Signed-off-by: Jesper Dangaard Brouer
    Signed-off-by: David S. Miller

    Jesper Dangaard Brouer
     
  • Do this change before others start to use this callback.
    Change map_perf_test_user.c which seems to be the only user.

    This patch extends capabilities of commit 9fd63d05f3e8 ("bpf:
    Allow bpf sample programs (*_user.c) to change bpf_map_def").

    Give fixup callback access to struct bpf_map_data, instead of
    only stuct bpf_map_def. This add flexibility to allow userspace
    to reassign the map file descriptor. This is very useful when
    wanting to share maps between several bpf programs.

    Signed-off-by: Jesper Dangaard Brouer
    Signed-off-by: David S. Miller

    Jesper Dangaard Brouer
     
  • This patch does proper parsing of the ELF "maps" section, in-order to
    be both backwards and forwards compatible with changes to the map
    definition struct bpf_map_def, which gets compiled into the ELF file.

    The assumption is that new features with value zero, means that they
    are not in-use. For backward compatibility where loading an ELF file
    with a smaller struct bpf_map_def, only copy objects ELF size, leaving
    rest of loaders struct zero. For forward compatibility where ELF file
    have a larger struct bpf_map_def, only copy loaders own struct size
    and verify that rest of the larger struct is zero, assuming this means
    the newer feature was not activated, thus it should be safe for this
    older loader to load this newer ELF file.

    Fixes: fb30d4b71214 ("bpf: Add tests for map-in-map")
    Fixes: 409526bea3c3 ("samples/bpf: bpf_load.c detect and abort if ELF maps section size is wrong")
    Signed-off-by: Jesper Dangaard Brouer
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Jesper Dangaard Brouer
     
  • Needed to adjust max locked memory RLIMIT_MEMLOCK for testing these bpf samples
    as these are using more and larger maps than can fit in distro default 64Kbytes limit.

    Signed-off-by: Jesper Dangaard Brouer
    Signed-off-by: David S. Miller

    Jesper Dangaard Brouer
     
  • Pull livepatch updates from Jiri Kosina:

    - a per-task consistency model is being added for architectures that
    support reliable stack dumping (extending this, currently rather
    trivial set, is currently in the works).

    This extends the nature of the types of patches that can be applied
    by live patching infrastructure. The code stems from the design
    proposal made [1] back in November 2014. It's a hybrid of SUSE's
    kGraft and RH's kpatch, combining advantages of both: it uses
    kGraft's per-task consistency and syscall barrier switching combined
    with kpatch's stack trace switching. There are also a number of
    fallback options which make it quite flexible.

    Most of the heavy lifting done by Josh Poimboeuf with help from
    Miroslav Benes and Petr Mladek

    [1] https://lkml.kernel.org/r/20141107140458.GA21774@suse.cz

    - module load time patch optimization from Zhou Chengming

    - a few assorted small fixes

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
    livepatch: add missing printk newlines
    livepatch: Cancel transition a safe way for immediate patches
    livepatch: Reduce the time of finding module symbols
    livepatch: make klp_mutex proper part of API
    livepatch: allow removal of a disabled patch
    livepatch: add /proc//patch_state
    livepatch: change to a per-task consistency model
    livepatch: store function sizes
    livepatch: use kstrtobool() in enabled_store()
    livepatch: move patching functions into patch.c
    livepatch: remove unnecessary object loaded check
    livepatch: separate enabled and patched states
    livepatch/s390: add TIF_PATCH_PENDING thread flag
    livepatch/s390: reorganize TIF thread flag bits
    livepatch/powerpc: add TIF_PATCH_PENDING thread flag
    livepatch/x86: add TIF_PATCH_PENDING thread flag
    livepatch: create temporary klp_update_patch_state() stub
    x86/entry: define _TIF_ALLWORK_MASK flags explicitly
    stacktrace/x86: add function for detecting reliable stack traces

    Linus Torvalds
     

02 May, 2017

1 commit

  • Fix the following warnings triggered by 51570a5ab2b7 ("A Sample of
    using socket cookie and uid for traffic monitoring"):

    In file included from /home/foo/net-next/samples/bpf/cookie_uid_helper_example.c:54:0:
    /home/foo/net-next/samples/bpf/cookie_uid_helper_example.c: In function 'prog_load':
    /home/foo/net-next/samples/bpf/cookie_uid_helper_example.c:119:27: warning: overflow in implicit constant conversion [-Woverflow]
    -32 + offsetof(struct stats, uid)),
    ^
    /home/foo/net-next/samples/bpf/libbpf.h:135:12: note: in definition of macro 'BPF_STX_MEM'
    .off = OFF, \
    ^
    /home/foo/net-next/samples/bpf/cookie_uid_helper_example.c:121:27: warning: overflow in implicit constant conversion [-Woverflow]
    -32 + offsetof(struct stats, packets), 1),
    ^
    /home/foo/net-next/samples/bpf/libbpf.h:155:12: note: in definition of macro 'BPF_ST_MEM'
    .off = OFF, \
    ^
    /home/foo/net-next/samples/bpf/cookie_uid_helper_example.c:129:27: warning: overflow in implicit constant conversion [-Woverflow]
    -32 + offsetof(struct stats, bytes)),
    ^
    /home/foo/net-next/samples/bpf/libbpf.h:135:12: note: in definition of macro 'BPF_STX_MEM'
    .off = OFF, \
    ^
    HOSTLD /home/foo/net-next/samples/bpf/per_socket_stats_example

    Fixes: 51570a5ab2b7 ("A Sample of using socket cookie and uid for traffic monitoring")
    Signed-off-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

01 May, 2017

3 commits

  • The xdp_tx_iptunnel program can be terminated in two ways, after
    N-seconds or via Ctrl-C SIGINT. The SIGINT code path does not
    handle detatching the correct XDP program, in-case the program
    was attached with XDP_FLAGS_SKB_MODE.

    Fix this by storing the XDP flags as a global variable, which is
    available for the SIGINT handler function.

    Fixes: 3993f2cb983b ("samples/bpf: Add support for SKB_MODE to xdp1 and xdp_tx_iptunnel")
    Signed-off-by: Jesper Dangaard Brouer
    Acked-by: Daniel Borkmann
    Reviewed-by: Andy Gospodarek
    Signed-off-by: David S. Miller

    Jesper Dangaard Brouer
     
  • The kernel side of XDP_FLAGS_SKB_MODE is unsigned, and the rtnetlink
    IFLA_XDP_FLAGS is defined as NLA_U32. Thus, userspace programs under
    samples/bpf/ should use the correct type.

    Fixes: 3993f2cb983b ("samples/bpf: Add support for SKB_MODE to xdp1 and xdp_tx_iptunnel")
    Signed-off-by: Jesper Dangaard Brouer
    Acked-by: Daniel Borkmann
    Reviewed-by: Andy Gospodarek
    Signed-off-by: David S. Miller

    Jesper Dangaard Brouer
     
  • The struct bpf_map_def was extended in commit fb30d4b71214 ("bpf: Add tests
    for map-in-map") with member unsigned int inner_map_idx. This changed the size
    of the maps section in the generated ELF _kern.o files.

    Unfortunately the loader in bpf_load.c does not detect or handle this. Thus,
    older _kern.o files became incompatible, and caused hard-to-debug errors
    where the syscall validation rejected BPF_MAP_CREATE request.

    This patch only detect the situation and aborts load_bpf_file(). It also
    add code comments warning people that read this loader for inspiration
    for these pitfalls.

    Fixes: fb30d4b71214 ("bpf: Add tests for map-in-map")
    Signed-off-by: Jesper Dangaard Brouer
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Jesper Dangaard Brouer
     

28 Apr, 2017

1 commit

  • Add option to xdp1 and xdp_tx_iptunnel to insert xdp program in
    SKB_MODE:
    - update set_link_xdp_fd to take a flags argument that is added to the
    RTM_SETLINK message

    - Add -S option to xdp1 and xdp_tx_iptunnel user code. When passed in
    XDP_FLAGS_SKB_MODE is set in the flags arg passed to set_link_xdp_fd

    Signed-off-by: David Ahern
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    David Ahern
     

26 Apr, 2017

1 commit


25 Apr, 2017

3 commits

  • Fixes the following warning

    samples/bpf/test_lru_dist.c:28:0: warning: "offsetof" redefined
    #define offsetof(TYPE, MEMBER) ((size_t)&((TYPE *)0)->MEMBER)

    In file included from ./tools/lib/bpf/bpf.h:25:0,
    from samples/bpf/libbpf.h:5,
    from samples/bpf/test_lru_dist.c:24:
    /usr/lib/gcc/x86_64-redhat-linux/6.3.1/include/stddef.h:417:0: note: this is the location of the previous definition
    #define offsetof(TYPE, MEMBER) __builtin_offsetof (TYPE, MEMBER)

    Signed-off-by: Alexander Alemayhu
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Alexander Alemayhu
     
  • Fixes the following warning

    samples/bpf/cookie_uid_helper_example.c: At top level:
    samples/bpf/cookie_uid_helper_example.c:276:6: warning: no previous prototype for ‘finish’ [-Wmissing-prototypes]
    void finish(int ret)
    ^~~~~~
    HOSTLD samples/bpf/per_socket_stats_example

    Signed-off-by: Alexander Alemayhu
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Alexander Alemayhu
     
  • I was initially going to remove '-Wno-address-of-packed-member' because I
    thought it was not supposed to be there but Daniel suggested using
    '-Wno-unknown-warning-option'.

    This silences several warnings similiar to the one below

    warning: unknown warning option '-Wno-address-of-packed-member' [-Wunknown-warning-option]
    1 warning generated.
    clang -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/6.3.1/include -I./arch/x86/include -I./arch/x86/include/generated/uapi -I./arch/x86/include/generated -I./include
    -I./arch/x86/include/uapi -I./include/uapi -I./include/generated/uapi -include ./include/linux/kconfig.h \
    -D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value -Wno-pointer-sign \
    -Wno-compare-distinct-pointer-types \
    -Wno-gnu-variable-sized-type-not-at-end \
    -Wno-address-of-packed-member -Wno-tautological-compare \
    -O2 -emit-llvm -c samples/bpf/xdp_tx_iptunnel_kern.c -o -| llc -march=bpf -filetype=obj -o samples/bpf/xdp_tx_iptunnel_kern.o

    $ clang --version

    clang version 3.9.1 (tags/RELEASE_391/final)
    Target: x86_64-unknown-linux-gnu
    Thread model: posix
    InstalledDir: /usr/bin

    Signed-off-by: Alexander Alemayhu
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Alexander Alemayhu
     

23 Apr, 2017

1 commit


18 Apr, 2017

1 commit

  • This patch adds a map-in-map LRU example.
    If we know only a subset of cores will use the
    LRU, we can allocate a common LRU list per targeting core
    and store it into an array-of-hashs.

    It allows using the common LRU map with map-update performance
    comparable to the BPF_F_NO_COMMON_LRU map but without wasting memory
    on the unused cores that we know they will never access the LRU map.

    BPF_F_NO_COMMON_LRU:
    > map_perf_test 32 8 10000000 10000000 | awk '{sum += $3}END{print sum}'
    9234314 (9.23M/s)

    map-in-map LRU:
    > map_perf_test 512 8 1260000 80000000 | awk '{sum += $3}END{print sum}'
    9962743 (9.96M/s)

    Notes that the max_entries for the map-in-map LRU test is 1260000 which
    is the max_entries for each inner LRU map. 8 processes have been
    started, so 8 * 1260000 = 10080000 (~10M) which is close to what is
    used in the BPF_F_NO_COMMON_LRU test.

    Signed-off-by: Martin KaFai Lau
    Signed-off-by: David S. Miller

    Martin KaFai Lau