18 Oct, 2017

2 commits

  • Pull tracing fix from Steven Rostedt:
    "Testing a new trace event format, I triggered a bug by doing:

    # modprobe trace-events-sample
    # echo 1 > /sys/kernel/debug/tracing/events/sample-trace/enable
    # rmmod trace-events-sample

    This would cause an oops. The issue is that I added another trace
    event sample that reused a reg function of another trace event to
    create a thread to call the tracepoints. The problem was that the reg
    function couldn't handle nested calls (reg; reg; unreg; unreg;) and
    created two threads (instead of one) and only removed one on exit.

    This isn't a critical bug as the bug is only in sample code. But
    sample code should be free of known bugs to prevent others from
    copying it. This is why this is also marked for stable"

    * tag 'trace-v4.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    tracing/samples: Fix creation and deletion of simple_thread_fn creation

    Linus Torvalds
     
  • Commit 7496946a8 ("tracing: Add samples of DECLARE_EVENT_CLASS() and
    DEFINE_EVENT()") added template examples for all the events. It created a
    DEFINE_EVENT_FN() example which reused the foo_bar_reg and foo_bar_unreg
    functions.

    Enabling both the TRACE_EVENT_FN() and DEFINE_EVENT_FN() example trace
    events caused the foo_bar_reg to be called twice, creating the test thread
    twice. The foo_bar_unreg would remove it only once, even if it was called
    multiple times, leaving a thread existing when the module is unloaded,
    causing an oops.

    Add a ref count and allow foo_bar_reg() and foo_bar_unreg() be called by
    multiple trace events.

    Cc: stable@vger.kernel.org
    Fixes: 7496946a8 ("tracing: Add samples of DECLARE_EVENT_CLASS() and DEFINE_EVENT()")
    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     

08 Sep, 2017

1 commit

  • Pull media updates from Mauro Carvalho Chehab:
    "Brazil's Independence Day pull request :-)

    This is one of the biggest media pull requests, with 625 patches
    affecting almost all parts of media (RC, DVB, V4L2, CEC, docs).

    This contains:

    - A lot of new drivers:
    * DVB frontends: mxl5xx, stv0910, stv6111;
    * camera flash: as3645a led driver;
    * HDMI receiver: adv748X;
    * camera sensor: Omnivision 6650 5M driver (ov6650);
    * HDMI CEC: ao-cec meson driver;
    * V4L2: Qualcom camss driver;
    * Remote controller: gpio-ir-tx, pwm-ir-tx and zx-irdec drivers.

    - The DDbridge DVB driver got a massive update, with makes it in sync
    with modern hardware from that vendor;

    - There's an important milestone on this series: the DVB
    documentation was written in 2003, but only started to be updated
    in 2007. It also used to contain several gaps from the time it was
    kept out of tree, mentioning error codes and device nodes that
    never existed upstream. On this series, it received a massive
    update: all non-deprecated digital TV APIs are now in sync with the
    current implementation;

    - Some DVB APIs that aren't used by any upstream driver got removed;

    - Other parts of the media documentation algo got updated, fixing
    some bugs on its PDF output and making it compatible with Sphinx
    version 1.6.

    As the number of hacks required to build PDF output reduced, I hope
    we'll have less troubles as newer versions of our documentation
    toolchain are released (famous last words);

    - As usual, lots of driver cleanups and improvements"

    * tag 'media/v4.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (624 commits)
    media: leds: as3645a: add V4L2_FLASH_LED_CLASS dependency
    media: get rid of removed DMX_GET_CAPS and DMX_SET_SOURCE leftovers
    media: Revert "[media] v4l: async: make v4l2 coexist with devicetree nodes in a dt overlay"
    media: staging: atomisp: sh_css_calloc shall return a pointer to the allocated space
    media: Revert "[media] lirc_dev: remove superfluous get/put_device() calls"
    media: add qcom_camss.rst to v4l-drivers rst file
    media: dvb headers: make checkpatch happier
    media: dvb uapi: move frontend legacy API to another part of the book
    media: pixfmt-srggb12p.rst: better format the table for PDF output
    media: docs-rst: media: Don't use \small for V4L2_PIX_FMT_SRGGB10 documentation
    media: index.rst: don't write "Contents:" on PDF output
    media: pixfmt*.rst: replace a two dots by a comma
    media: vidioc-g-fmt.rst: adjust table format
    media: vivid.rst: add a blank line to correct ReST format
    media: v4l2 uapi book: get rid of driver programming's chapter
    media: format.rst: use the right markup for important notes
    media: docs-rst: cardlists: change their format to flat-tables
    media: em28xx-cardlist.rst: update to reflect last changes
    media: v4l2-event.rst: adjust table to fit on PDF output
    media: docs: don't show ToC for each part on PDF output
    ...

    Linus Torvalds
     

02 Sep, 2017

1 commit

  • Create a new case to test the LRU lookup performance.

    At the beginning, the LRU map is fully loaded (i.e. the number of keys
    is equal to map->max_entries). The lookup is done through key 0
    to num_map_entries and then repeats from 0 again.

    This patch also creates an anonymous struct to properly
    name the test params in stress_lru_hmap_alloc() in map_perf_test_kern.c.

    Signed-off-by: Martin KaFai Lau
    Acked-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Martin KaFai Lau
     

01 Sep, 2017

6 commits


30 Aug, 2017

2 commits

  • This tool xdp_monitor demonstrate how to use the different xdp_redirect
    tracepoints xdp_redirect{,_map}{,_err} from a BPF program.

    The default mode is to only monitor the error counters, to avoid
    affecting the per packet performance. Tracepoints comes with a base
    overhead of 25 nanosec for an attached bpf_prog, and 48 nanosec for
    using a full perf record (with non-matching filter). Thus, default
    loading the --stats mode could affect the maximum performance.

    This version of the tool is very simple and count all types of errors
    as one. It will be natural to extend this later with the different
    types of errors that can occur, which should help users quickly
    identify common mistakes.

    Because the TP_STRUCT was kept in sync all the tracepoints loads the
    same BPF code. It would also be natural to extend the map version to
    demonstrate how the map information could be used.

    Signed-off-by: Jesper Dangaard Brouer
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Jesper Dangaard Brouer
     
  • For supporting XDP_REDIRECT, a device driver must (obviously)
    implement the "TX" function ndo_xdp_xmit(). An additional requirement
    is you cannot TX out a device, unless it also have a xdp bpf program
    attached. This dependency is caused by the driver code need to setup
    XDP resources before it can ndo_xdp_xmit.

    Update bpf samples xdp_redirect and xdp_redirect_map to automatically
    attach a dummy XDP program to the configured ifindex_out device. Use
    the XDP flag XDP_FLAGS_UPDATE_IF_NOEXIST on the dummy load, to avoid
    overriding an existing XDP prog on the device.

    Signed-off-by: Jesper Dangaard Brouer
    Signed-off-by: David S. Miller

    Jesper Dangaard Brouer
     

29 Aug, 2017

2 commits

  • Extend existing tests for vxlan, gre, geneve, ipip to
    include ERSPAN tunnel.

    Signed-off-by: William Tu
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    William Tu
     
  • In the initial sockmap API we provided strparser and verdict programs
    using a single attach command by extending the attach API with a the
    attach_bpf_fd2 field.

    However, if we add other programs in the future we will be adding a
    field for every new possible type, attach_bpf_fd(3,4,..). This
    seems a bit clumsy for an API. So lets push the programs using two
    new type fields.

    BPF_SK_SKB_STREAM_PARSER
    BPF_SK_SKB_STREAM_VERDICT

    This has the advantage of having a readable name and can easily be
    extended in the future.

    Updates to samples and sockmap included here also generalize tests
    slightly to support upcoming patch for multiple map support.

    Signed-off-by: John Fastabend
    Fixes: 174a79ff9515 ("bpf: sockmap with sk redirect support")
    Suggested-by: Alexei Starovoitov
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    John Fastabend
     

20 Aug, 2017

2 commits

  • These vb2_ops structures are only stored in the ops field of a
    vb2_queue structure, which is declared as const. Thus the vb2_ops
    structures themselves can be const.

    Done with the help of Coccinelle.

    //
    @r disable optional_qualifier@
    identifier i;
    position p;
    @@
    static struct vb2_ops i@p = { ... };

    @ok@
    identifier r.i;
    struct vb2_queue e;
    position p;
    @@
    e.ops = &i@p;

    @bad@
    position p != {r.p,ok.p};
    identifier r.i;
    struct vb2_ops e;
    @@
    e@i@p

    @depends on !bad disable optional_qualifier@
    identifier r.i;
    @@
    static
    +const
    struct vb2_ops i = { ... };
    //

    Signed-off-by: Julia Lawall
    Signed-off-by: Hans Verkuil
    Signed-off-by: Mauro Carvalho Chehab

    Julia Lawall
     
  • This patch makes the needed changes to allow each process of
    the INNER_LRU_HASH_PREALLOC test to provide its numa node id
    when creating the lru map.

    Signed-off-by: Martin KaFai Lau
    Acked-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Martin KaFai Lau
     

17 Aug, 2017

1 commit

  • This program binds a program to a cgroup and then matches hard
    coded IP addresses and adds these to a sockmap.

    This will receive messages from the backend and send them to
    the client.

    client:X frontend:10000 client:X backend:10001

    To keep things simple this is only designed for 1:1 connections
    using hard coded values. A more complete example would allow many
    backends and clients.

    To run,

    # sockmap

    Signed-off-by: John Fastabend
    Signed-off-by: David S. Miller

    John Fastabend
     

08 Aug, 2017

1 commit


02 Aug, 2017

1 commit


01 Aug, 2017

1 commit

  • test_tunnel_bpf.sh fails to remove the vxlan11 tunnel device, causing the
    next geneve tunnelling test case fails. In addition, the geneve reserved bit
    in tcbpf2_kern.c should be zero, according to the RFC.

    Signed-off-by: William Tu
    Signed-off-by: David S. Miller

    William Tu
     

21 Jul, 2017

1 commit


18 Jul, 2017

3 commits

  • When testing with a driver that has both native and generic redirect support:

    $ sudo ./samples/bpf/xdp_redirect -N 5 6
    input: 5 output: 6
    ifindex 6: 4961879 pkt/s
    ifindex 6: 6391319 pkt/s
    ifindex 6: 6419468 pkt/s

    $ sudo ./samples/bpf/xdp_redirect -S 5 6
    input: 5 output: 6
    ifindex 6: 1845435 pkt/s
    ifindex 6: 3882850 pkt/s
    ifindex 6: 3893974 pkt/s

    $ sudo ./samples/bpf/xdp_redirect_map -N 5 6
    input: 5 output: 6
    map[0] (vports) = 4, map[1] (map) = 5, map[2] (count) = 0
    ifindex 6: 2207374 pkt/s
    ifindex 6: 6212869 pkt/s
    ifindex 6: 6286515 pkt/s

    $ sudo ./samples/bpf/xdp_redirect_map -S 5 6
    input: 5 output: 6
    map[0] (vports) = 4, map[1] (map) = 5, map[2] (count) = 0
    ifindex 6: 5052528 pkt/s
    ifindex 6: 5736631 pkt/s
    ifindex 6: 5739962 pkt/s

    Signed-off-by: Andy Gospodarek
    Acked-by: John Fastabend
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Andy Gospodarek
     
  • Signed-off-by: John Fastabend
    Tested-by: Andy Gospodarek
    Acked-by: Daniel Borkmann
    Acked-by: Jesper Dangaard Brouer
    Signed-off-by: David S. Miller

    John Fastabend
     
  • This implements a sample program for testing bpf_redirect. It reports
    the number of packets redirected per second and as input takes the
    ifindex of the device to run the xdp program on and the ifindex of the
    interface to redirect packets to.

    Signed-off-by: John Fastabend
    Tested-by: Andy Gospodarek
    Acked-by: Daniel Borkmann
    Acked-by: Jesper Dangaard Brouer
    Signed-off-by: David S. Miller

    John Fastabend
     

14 Jul, 2017

1 commit

  • Merge yet more updates from Andrew Morton:

    - various misc things

    - kexec updates

    - sysctl core updates

    - scripts/gdb udpates

    - checkpoint-restart updates

    - ipc updates

    - kernel/watchdog updates

    - Kees's "rough equivalent to the glibc _FORTIFY_SOURCE=1 feature"

    - "stackprotector: ascii armor the stack canary"

    - more MM bits

    - checkpatch updates

    * emailed patches from Andrew Morton : (96 commits)
    writeback: rework wb_[dec|inc]_stat family of functions
    ARM: samsung: usb-ohci: move inline before return type
    video: fbdev: omap: move inline before return type
    video: fbdev: intelfb: move inline before return type
    USB: serial: safe_serial: move __inline__ before return type
    drivers: tty: serial: move inline before return type
    drivers: s390: move static and inline before return type
    x86/efi: move asmlinkage before return type
    sh: move inline before return type
    MIPS: SMP: move asmlinkage before return type
    m68k: coldfire: move inline before return type
    ia64: sn: pci: move inline before type
    ia64: move inline before return type
    FRV: tlbflush: move asmlinkage before return type
    CRIS: gpio: move inline before return type
    ARM: HP Jornada 7XX: move inline before return type
    ARM: KVM: move asmlinkage before type
    checkpatch: improve the STORAGE_CLASS test
    mm, migration: do not trigger OOM killer when migrating memory
    drm/i915: use __GFP_RETRY_MAYFAIL
    ...

    Linus Torvalds
     

13 Jul, 2017

1 commit

  • This is a layering violation so we replace the uses with calls to
    sg_page(). This is a prep patch for replacing page_link and this is one
    of the very few uses outside of scatterlist.h.

    Link: http://lkml.kernel.org/r/1495663199-22234-1-git-send-email-logang@deltatee.com
    Signed-off-by: Logan Gunthorpe
    Signed-off-by: Stephen Bates
    Acked-by: Stefani Seibold
    Cc: Stefani Seibold
    Cc: Greg KH
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Logan Gunthorpe
     

12 Jul, 2017

1 commit

  • With latest net-next:

    ====
    clang -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/6.3.1/include -I./arch/x86/include -I./arch/x86/include/generated/uapi -I./arch/x86/include/generated -I./include -I./arch/x86/include/uapi -I./include/uapi -I./include/generated/uapi -include ./include/linux/kconfig.h -Isamples/bpf \
    -D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value -Wno-pointer-sign \
    -Wno-compare-distinct-pointer-types \
    -Wno-gnu-variable-sized-type-not-at-end \
    -Wno-address-of-packed-member -Wno-tautological-compare \
    -Wno-unknown-warning-option \
    -O2 -emit-llvm -c samples/bpf/tcp_synrto_kern.c -o -| llc -march=bpf -filetype=obj -o samples/bpf/tcp_synrto_kern.o
    samples/bpf/tcp_synrto_kern.c:20:10: fatal error: 'bpf_endian.h' file not found
    ^~~~~~~~~~~~~~
    1 error generated.
    ====

    net has the same issue.

    Add support for ntohl and htonl in tools/testing/selftests/bpf/bpf_endian.h.
    Also move bpf_helpers.h from samples/bpf to selftests/bpf and change
    compiler include logic so that programs in samples/bpf can access the headers
    in selftests/bpf, but not the other way around.

    Signed-off-by: Yonghong Song
    Acked-by: Daniel Borkmann
    Acked-by: Lawrence Brakmo
    Signed-off-by: David S. Miller

    Yonghong Song
     

05 Jul, 2017

1 commit

  • The function load_bpf_file ignores the return value of
    load_and_attach(), so even if load_and_attach() returns an error,
    load_bpf_file() will return 0.

    Now, load_bpf_file() can call load_and_attach() multiple times and some
    can succeed and some could fail. I think the correct behavor is to
    return error on the first failed load_and_attach().

    v2: Added missing SOB

    Signed-off-by: Lawrence Brakmo
    Signed-off-by: David S. Miller

    Lawrence Brakmo
     

02 Jul, 2017

9 commits

  • Sample BPF program, tcp_clamp_kern.c, to demostrate the use
    of setting the sndcwnd clamp. This program assumes that if the
    first 5.5 bytes of the host's IPv6 addresses are the same, then
    the hosts are in the same datacenter and sets sndcwnd clamp to
    100 packets, SYN and SYN-ACK RTOs to 10ms and send/receive buffer
    sizes to 150KB.

    Signed-off-by: Lawrence Brakmo
    Signed-off-by: David S. Miller

    Lawrence Brakmo
     
  • Sample BPF program that assumes hosts are far away (i.e. large RTTs)
    and sets initial cwnd and initial receive window to 40 packets,
    send and receive buffers to 1.5MB.

    In practice there would be a test to insure the hosts are actually
    far enough away.

    Signed-off-by: Lawrence Brakmo
    Signed-off-by: David S. Miller

    Lawrence Brakmo
     
  • Sample BPF program that sets congestion control to dctcp when both hosts
    are within the same datacenter. In this example that is assumed to be
    when they have the first 5.5 bytes of their IPv6 address are the same.

    Signed-off-by: Lawrence Brakmo
    Signed-off-by: David S. Miller

    Lawrence Brakmo
     
  • This patch contains a BPF program to set initial receive window to
    40 packets and send and receive buffers to 1.5MB. This would usually
    be done after doing appropriate checks that indicate the hosts are
    far enough away (i.e. large RTT).

    Signed-off-by: Lawrence Brakmo
    Signed-off-by: David S. Miller

    Lawrence Brakmo
     
  • Added support for calling a subset of socket setsockopts from
    BPF_PROG_TYPE_SOCK_OPS programs. The code was duplicated rather
    than making the changes to call the socket setsockopt function because
    the changes required would have been larger.

    The ops supported are:
    SO_RCVBUF
    SO_SNDBUF
    SO_MAX_PACING_RATE
    SO_PRIORITY
    SO_RCVLOWAT
    SO_MARK

    Signed-off-by: Lawrence Brakmo
    Signed-off-by: David S. Miller

    Lawrence Brakmo
     
  • The sample bpf program, tcp_rwnd_kern.c, sets the initial
    advertized window to 40 packets in an environment where
    distinct IPv6 prefixes indicate that both hosts are not
    in the same data center.

    Signed-off-by: Lawrence Brakmo
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Lawrence Brakmo
     
  • The sample BPF program, tcp_synrto_kern.c, sets the SYN and SYN-ACK
    RTOs to 10ms when both hosts are within the same datacenter (i.e.
    small RTTs) in an environment where common IPv6 prefixes indicate
    both hosts are in the same data center.

    Signed-off-by: Lawrence Brakmo
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Lawrence Brakmo
     
  • The program load_sock_ops can be used to load sock_ops bpf programs and
    to attach it to an existing (v2) cgroup. It can also be used to detach
    sock_ops programs.

    Examples:
    load_sock_ops [-l]
    Load and attaches a sock_ops program at the specified cgroup.
    If "-l" is used, the program will continue to run to output the
    BPF log buffer.
    If the specified filename does not end in ".o", it appends
    "_kern.o" to the name.

    load_sock_ops -r
    Detaches the currently attached sock_ops program from the
    specified cgroup.

    Signed-off-by: Lawrence Brakmo
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Lawrence Brakmo
     
  • Created a new BPF program type, BPF_PROG_TYPE_SOCK_OPS, and a corresponding
    struct that allows BPF programs of this type to access some of the
    socket's fields (such as IP addresses, ports, etc.). It uses the
    existing bpf cgroups infrastructure so the programs can be attached per
    cgroup with full inheritance support. The program will be called at
    appropriate times to set relevant connections parameters such as buffer
    sizes, SYN and SYN-ACK RTOs, etc., based on connection information such
    as IP addresses, port numbers, etc.

    Alghough there are already 3 mechanisms to set parameters (sysctls,
    route metrics and setsockopts), this new mechanism provides some
    distinct advantages. Unlike sysctls, it can set parameters per
    connection. In contrast to route metrics, it can also use port numbers
    and information provided by a user level program. In addition, it could
    set parameters probabilistically for evaluation purposes (i.e. do
    something different on 10% of the flows and compare results with the
    other 90% of the flows). Also, in cases where IPv6 addresses contain
    geographic information, the rules to make changes based on the distance
    (or RTT) between the hosts are much easier than route metric rules and
    can be global. Finally, unlike setsockopt, it oes not require
    application changes and it can be updated easily at any time.

    Although the bpf cgroup framework already contains a sock related
    program type (BPF_PROG_TYPE_CGROUP_SOCK), I created the new type
    (BPF_PROG_TYPE_SOCK_OPS) beccause the existing type expects to be called
    only once during the connections's lifetime. In contrast, the new
    program type will be called multiple times from different places in the
    network stack code. For example, before sending SYN and SYN-ACKs to set
    an appropriate timeout, when the connection is established to set
    congestion control, etc. As a result it has "op" field to specify the
    type of operation requested.

    The purpose of this new program type is to simplify setting connection
    parameters, such as buffer sizes, TCP's SYN RTO, etc. For example, it is
    easy to use facebook's internal IPv6 addresses to determine if both hosts
    of a connection are in the same datacenter. Therefore, it is easy to
    write a BPF program to choose a small SYN RTO value when both hosts are
    in the same datacenter.

    This patch only contains the framework to support the new BPF program
    type, following patches add the functionality to set various connection
    parameters.

    This patch defines a new BPF program type: BPF_PROG_TYPE_SOCKET_OPS
    and a new bpf syscall command to load a new program of this type:
    BPF_PROG_LOAD_SOCKET_OPS.

    Two new corresponding structs (one for the kernel one for the user/BPF
    program):

    /* kernel version */
    struct bpf_sock_ops_kern {
    struct sock *sk;
    __u32 op;
    union {
    __u32 reply;
    __u32 replylong[4];
    };
    };

    /* user version
    * Some fields are in network byte order reflecting the sock struct
    * Use the bpf_ntohl helper macro in samples/bpf/bpf_endian.h to
    * convert them to host byte order.
    */
    struct bpf_sock_ops {
    __u32 op;
    union {
    __u32 reply;
    __u32 replylong[4];
    };
    __u32 family;
    __u32 remote_ip4; /* In network byte order */
    __u32 local_ip4; /* In network byte order */
    __u32 remote_ip6[4]; /* In network byte order */
    __u32 local_ip6[4]; /* In network byte order */
    __u32 remote_port; /* In network byte order */
    __u32 local_port; /* In host byte horder */
    };

    Currently there are two types of ops. The first type expects the BPF
    program to return a value which is then used by the caller (or a
    negative value to indicate the operation is not supported). The second
    type expects state changes to be done by the BPF program, for example
    through a setsockopt BPF helper function, and they ignore the return
    value.

    The reply fields of the bpf_sockt_ops struct are there in case a bpf
    program needs to return a value larger than an integer.

    Signed-off-by: Lawrence Brakmo
    Acked-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Lawrence Brakmo
     

30 Jun, 2017

1 commit


22 Jun, 2017

1 commit

  • tracex5_kern.c build failed with the following error message:
    ../samples/bpf/tracex5_kern.c:12:10: fatal error: 'syscall_nrs.h' file not found
    #include "syscall_nrs.h"
    The generated file syscall_nrs.h is put in build/samples/bpf directory,
    but this directory is not in include path, hence build failed.

    The fix is to add $(obj) into the clang compilation path.

    Signed-off-by: Yonghong Song
    Signed-off-by: David S. Miller

    Yonghong Song
     

17 Jun, 2017

1 commit