29 Sep, 2018

1 commit


26 Sep, 2018

2 commits

  • Daniel Borkmann says:

    ====================
    pull-request: bpf-next 2018-09-25

    The following pull-request contains BPF updates for your *net-next* tree.

    The main changes are:

    1) Allow for RX stack hardening by implementing the kernel's flow
    dissector in BPF. Idea was originally presented at netconf 2017 [0].
    Quote from merge commit:

    [...] Because of the rigorous checks of the BPF verifier, this
    provides significant security guarantees. In particular, the BPF
    flow dissector cannot get inside of an infinite loop, as with
    CVE-2013-4348, because BPF programs are guaranteed to terminate.
    It cannot read outside of packet bounds, because all memory accesses
    are checked. Also, with BPF the administrator can decide which
    protocols to support, reducing potential attack surface. Rarely
    encountered protocols can be excluded from dissection and the
    program can be updated without kernel recompile or reboot if a
    bug is discovered. [...]

    Also, a sample flow dissector has been implemented in BPF as part
    of this work, from Petar and Willem.

    [0] http://vger.kernel.org/netconf2017_files/rx_hardening_and_udp_gso.pdf

    2) Add support for bpftool to list currently active attachment
    points of BPF networking programs providing a quick overview
    similar to bpftool's perf subcommand, from Yonghong.

    3) Fix a verifier pruning instability bug where a union member
    from the register state was not cleared properly leading to
    branches not being pruned despite them being valid candidates,
    from Alexei.

    4) Various smaller fast-path optimizations in XDP's map redirect
    code, from Jesper.

    5) Enable to recognize BPF_MAP_TYPE_REUSEPORT_SOCKARRAY maps
    in bpftool, from Roman.

    6) Remove a duplicate check in libbpf that probes for function
    storage, from Taeung.

    7) Fix an issue in test_progs by avoid checking for errno since
    on success its value should not be checked, from Mauricio.

    8) Fix unused variable warning in bpf_getsockopt() helper when
    CONFIG_INET is not configured, from Anders.

    9) Fix a compilation failure in the BPF sample code's use of
    bpf_flow_keys, from Prashant.

    10) Minor cleanups in BPF code, from Yue and Zhong.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Version bump conflict in batman-adv, take what's in net-next.

    iavf conflict, adjustment of netdev_ops in net-next conflicting
    with poll controller method removal in net.

    Signed-off-by: David S. Miller

    David S. Miller
     

25 Sep, 2018

2 commits

  • Add BPF_MAP_TYPE_REUSEPORT_SOCKARRAY map type to the list
    of maps types which bpftool recognizes.

    Signed-off-by: Roman Gushchin
    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Cc: Jakub Kicinski
    Cc: Yonghong Song
    Acked-by: Jakub Kicinski
    Acked-by: Song Liu
    Signed-off-by: Daniel Borkmann

    Roman Gushchin
     
  • Dave writes:
    "Networking fixes:

    1) Fix multiqueue handling of coalesce timer in stmmac, from Jose
    Abreu.

    2) Fix memory corruption in NFC, from Suren Baghdasaryan.

    3) Don't write reserved bits in ravb driver, from Kazuya Mizuguchi.

    4) SMC bug fixes from Karsten Graul, YueHaibing, and Ursula Braun.

    5) Fix TX done race in mvpp2, from Antoine Tenart.

    6) ipv6 metrics leak, from Wei Wang.

    7) Adjust firmware version requirements in mlxsw, from Petr Machata.

    8) Fix autonegotiation on resume in r8169, from Heiner Kallweit.

    9) Fixed missing entries when dumping /proc/net/if_inet6, from Jeff
    Barnhill.

    10) Fix double free in devlink, from Dan Carpenter.

    11) Fix ethtool regression from UFO feature removal, from Maciej
    Żenczykowski.

    12) Fix drivers that have a ndo_poll_controller() that captures the
    cpu entirely on loaded hosts by trying to drain all rx and tx
    queues, from Eric Dumazet.

    13) Fix memory corruption with jumbo frames in aquantia driver, from
    Friedemann Gerold."

    * gitolite.kernel.org:/pub/scm/linux/kernel/git/davem/net: (79 commits)
    net: mvneta: fix the remaining Rx descriptor unmapping issues
    ip_tunnel: be careful when accessing the inner header
    mpls: allow routes on ip6gre devices
    net: aquantia: memory corruption on jumbo frames
    tun: remove ndo_poll_controller
    nfp: remove ndo_poll_controller
    bnxt: remove ndo_poll_controller
    bnx2x: remove ndo_poll_controller
    mlx5: remove ndo_poll_controller
    mlx4: remove ndo_poll_controller
    i40evf: remove ndo_poll_controller
    ice: remove ndo_poll_controller
    igb: remove ndo_poll_controller
    ixgb: remove ndo_poll_controller
    fm10k: remove ndo_poll_controller
    ixgbevf: remove ndo_poll_controller
    ixgbe: remove ndo_poll_controller
    bonding: use netpoll_poll_dev() helper
    netpoll: make ndo_poll_controller() optional
    rds: Fix build regression.
    ...

    Greg Kroah-Hartman
     

23 Sep, 2018

1 commit


22 Sep, 2018

2 commits


21 Sep, 2018

1 commit

  • Paolo writes:
    "It's mostly small bugfixes and cleanups, mostly around x86 nested
    virtualization. One important change, not related to nested
    virtualization, is that the ability for the guest kernel to trap
    CPUID instructions (in Linux that's the ARCH_SET_CPUID arch_prctl) is
    now masked by default. This is because the feature is detected
    through an MSR; a very bad idea that Intel seems to like more and
    more. Some applications choke if the other fields of that MSR are
    not initialized as on real hardware, hence we have to disable the
    whole MSR by default, as was the case before Linux 4.12."

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (23 commits)
    KVM: nVMX: Fix bad cleanup on error of get/set nested state IOCTLs
    kvm: selftests: Add platform_info_test
    KVM: x86: Control guest reads of MSR_PLATFORM_INFO
    KVM: x86: Turbo bits in MSR_PLATFORM_INFO
    nVMX x86: Check VPID value on vmentry of L2 guests
    nVMX x86: check posted-interrupt descriptor addresss on vmentry of L2
    KVM: nVMX: Wake blocked vCPU in guest-mode if pending interrupt in virtual APICv
    KVM: VMX: check nested state and CR4.VMXE against SMM
    kvm: x86: make kvm_{load|put}_guest_fpu() static
    x86/hyper-v: rename ipi_arg_{ex,non_ex} structures
    KVM: VMX: use preemption timer to force immediate VMExit
    KVM: VMX: modify preemption timer bit only when arming timer
    KVM: VMX: immediately mark preemption timer expired only for zero value
    KVM: SVM: Switch to bitmap_zalloc()
    KVM/MMU: Fix comment in walk_shadow_page_lockless_end()
    kvm: selftests: use -pthread instead of -lpthread
    KVM: x86: don't reset root in kvm_mmu_setup()
    kvm: mmu: Don't read PDPTEs when paging is not enabled
    x86/kvm/lapic: always disable MMIO interface in x2APIC mode
    KVM: s390: Make huge pages unavailable in ucontrol VMs
    ...

    Greg Kroah-Hartman
     

20 Sep, 2018

6 commits

  • A so-called "MC-aware" mode has recently been enabled in mlxsw. In
    MC-aware mode, BUM traffic is handled in a special way so that when a
    switch is flooded with BUM, UC performance isn't unduly impacted.
    Without enablement of this mode, a stream of BUM traffic can cause
    sustained UC throughput drop in excess of 99 %.

    Add a test for this behavior. Compare how much UC throughput degrades as
    a stream of broadcast frames floods the switch. A minimal degradation is
    tolerated to cover for glitches in traffic injection performance.

    Signed-off-by: Petr Machata
    Reviewed-by: Jiri Pirko
    Signed-off-by: Ido Schimmel
    Signed-off-by: David S. Miller

    Petr Machata
     
  • Some selftests need to tweak MTU of an interface, and naturally should
    at teardown restore the MTU back to the original value. Add two
    functions to facilitate this MTU handling: mtu_set() to change MTU
    value, and mtu_reset() to change it back to what it was before.

    Signed-off-by: Petr Machata
    Reviewed-by: Jiri Pirko
    Signed-off-by: Ido Schimmel
    Signed-off-by: David S. Miller

    Petr Machata
     
  • Add a new service function to obtain ethtool counters.

    Signed-off-by: Petr Machata
    Reviewed-by: Jiri Pirko
    Signed-off-by: Ido Schimmel
    Signed-off-by: David S. Miller

    Petr Machata
     
  • Test guest access to MSR_PLATFORM_INFO when the capability is enabled
    or disabled.

    Signed-off-by: Drew Schmitt
    Signed-off-by: Paolo Bonzini

    Drew Schmitt
     
  • I run into the following error

    testing/selftests/kvm/dirty_log_test.c:285: undefined reference to `pthread_create'
    testing/selftests/kvm/dirty_log_test.c:297: undefined reference to `pthread_join'
    collect2: error: ld returned 1 exit status

    my gcc version is gcc version 4.8.4
    "-pthread" would work everywhere

    Signed-off-by: Lei Yang
    Signed-off-by: Paolo Bonzini

    Lei Yang
     
  • libc_compat.h is used by libbpf so make sure it's licensed under
    LGPL or BSD license. The license change should be OK, I'm the only
    author of the file.

    Signed-off-by: Jakub Kicinski
    Reviewed-by: Quentin Monnet
    Acked-by: Yonghong Song
    Signed-off-by: Daniel Borkmann

    Jakub Kicinski
     

19 Sep, 2018

3 commits


18 Sep, 2018

4 commits

  • This is a followup patch for Commit f6f3bac08ff9
    ("tools/bpf: bpftool: add net support").
    Some improvements are made for the bpftool net output.
    Specially, plain output is more concise such that
    per attachment should nicely fit in one line.
    Compared to previous output, the prog tag is removed
    since it can be easily obtained with program id.
    Similar to xdp attachments, the device name is added
    to tc attachments.

    The bpf program attached through shared block
    mechanism is supported as well.
    $ ip link add dev v1 type veth peer name v2
    $ tc qdisc add dev v1 ingress_block 10 egress_block 20 clsact
    $ tc qdisc add dev v2 ingress_block 10 egress_block 20 clsact
    $ tc filter add block 10 protocol ip prio 25 bpf obj bpf_shared.o sec ingress flowid 1:1
    $ tc filter add block 20 protocol ip prio 30 bpf obj bpf_cyclic.o sec classifier flowid 1:1
    $ bpftool net
    xdp:

    tc:
    v2(7) clsact/ingress bpf_shared.o:[ingress] id 23
    v2(7) clsact/egress bpf_cyclic.o:[classifier] id 24
    v1(8) clsact/ingress bpf_shared.o:[ingress] id 23
    v1(8) clsact/egress bpf_cyclic.o:[classifier] id 24

    The documentation and "bpftool net help" are updated
    to make it clear that current implementation only
    supports xdp and tc attachments. For programs
    attached to cgroups, "bpftool cgroup" can be used
    to dump attachments. For other programs e.g.
    sk_{filter,skb,msg,reuseport} and lwt/seg6,
    iproute2 tools should be used.

    The new output:
    $ bpftool net
    xdp:
    eth0(2) driver id 198

    tc:
    eth0(2) clsact/ingress fbflow_icmp id 335 act [{icmp_action id 336}]
    eth0(2) clsact/egress fbflow_egress id 334
    $ bpftool -jp net
    [{
    "xdp": [{
    "devname": "eth0",
    "ifindex": 2,
    "mode": "driver",
    "id": 198
    }
    ],
    "tc": [{
    "devname": "eth0",
    "ifindex": 2,
    "kind": "clsact/ingress",
    "name": "fbflow_icmp",
    "id": 335,
    "act": [{
    "name": "icmp_action",
    "id": 336
    }
    ]
    },{
    "devname": "eth0",
    "ifindex": 2,
    "kind": "clsact/egress",
    "name": "fbflow_egress",
    "id": 334
    }
    ]
    }
    ]

    Signed-off-by: Yonghong Song
    Signed-off-by: Daniel Borkmann

    Yonghong Song
     
  • The dependency for the man page rule using asciidoctor incorrectly
    specifies a source file in $(OUTPUT). When building out-of-tree, the
    source file is not found, resulting in a fall-back to the following rule
    which uses xmlto.

    Signed-off-by: Ben Hutchings
    Cc: Alexander Shishkin
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20180916151704.GF4765@decadent.org.uk
    Fixes: ffef80ecf89f ("perf Documentation: Support for asciidoctor")
    Signed-off-by: Arnaldo Carvalho de Melo

    Ben Hutchings
     
  • Same problem that got fixed in a similar fashion in tools/perf/ in
    c8b5f2c96d1b ("tools: Introduce str_error_r()"), fix it in the same
    way, licensing needs to be sorted out to libbpf to use libapi, so,
    for this simple case, just get the same wrapper in tools/lib/bpf.

    This makes libbpf and its users (bpftool, selftests, perf) to build
    again in Alpine Linux 3.[45678] and edge.

    Acked-by: Alexei Starovoitov
    Cc: Adrian Hunter
    Cc: Daniel Borkmann
    Cc: David Ahern
    Cc: Hendrik Brueckner
    Cc: Jakub Kicinski
    Cc: Jiri Olsa
    Cc: Martin KaFai Lau
    Cc: Namhyung Kim
    Cc: Quentin Monnet
    Cc: Thomas Richter
    Cc: Wang Nan
    Cc: Yonghong Song
    Fixes: 1ce6a9fc1549 ("bpf: fix build error in libbpf with EXTRA_CFLAGS="-Wp, -D_FORTIFY_SOURCE=2 -O2"")
    Link: https://lkml.kernel.org/r/20180917151636.GA21790@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • Dave writes:
    "Various fixes, all over the place:

    1) OOB data generation fix in bluetooth, from Matias Karhumaa.

    2) BPF BTF boundary calculation fix, from Martin KaFai Lau.

    3) Don't bug on excessive frags, to be compatible in situations mixing
    older and newer kernels on each end. From Juergen Gross.

    4) Scheduling in RCU fix in hv_netvsc, from Stephen Hemminger.

    5) Zero keying information in TLS layer before freeing copies
    of them, from Sabrina Dubroca.

    6) Fix NULL deref in act_sample, from Davide Caratti.

    7) Orphan SKB before GRO in veth to prevent crashes with XDP,
    from Toshiaki Makita.

    8) Fix use after free in ip6_xmit, from Eric Dumazet.

    9) Fix VF mac address regression in bnxt_en, from Micahel Chan.

    10) Fix MSG_PEEK behavior in TLS layer, from Daniel Borkmann.

    11) Programming adjustments to r8169 which fix not being to enter deep
    sleep states on some machines, from Kai-Heng Feng and Hans de
    Goede.

    12) Fix DST_NOCOUNT flag handling for ipv6 routes, from Peter
    Oskolkov."

    * gitolite.kernel.org:/pub/scm/linux/kernel/git/davem/net: (45 commits)
    net/ipv6: do not copy dst flags on rt init
    qmi_wwan: set DTR for modems in forced USB2 mode
    clk: x86: Stop marking clocks as CLK_IS_CRITICAL
    r8169: Get and enable optional ether_clk clock
    clk: x86: add "ether_clk" alias for Bay Trail / Cherry Trail
    r8169: enable ASPM on RTL8106E
    r8169: Align ASPM/CLKREQ setting function with vendor driver
    Revert "kcm: remove any offset before parsing messages"
    kcm: remove any offset before parsing messages
    net: ethernet: Fix a unused function warning.
    net: dsa: mv88e6xxx: Fix ATU Miss Violation
    tls: fix currently broken MSG_PEEK behavior
    hv_netvsc: pair VF based on serial number
    PCI: hv: support reporting serial number as slot information
    bnxt_en: Fix VF mac address regression.
    ipv6: fix possible use-after-free in ip6_xmit()
    net: hp100: fix always-true check for link up state
    ARM: dts: at91: add new compatibility string for macb on sama5d3
    net: macb: disable scatter-gather for macb on sama5d3
    net: mvpp2: let phylink manage the carrier state
    ...

    Greg Kroah-Hartman
     

17 Sep, 2018

3 commits

  • A number of tls selftests rely upon recv() to return an exact number of
    data bytes. When tls record crypto is done using an async accelerator,
    it is possible that recv() returns lesser than expected number bytes.
    This leads to failure of many test cases. To fix it, MSG_WAITALL has
    been used in flags passed to recv() syscall.

    Signed-off-by: Vakul Garg
    Signed-off-by: David S. Miller

    Vakul Garg
     
  • In kTLS MSG_PEEK behavior is currently failing, strace example:

    [pid 2430] socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 3
    [pid 2430] socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 4
    [pid 2430] bind(4, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
    [pid 2430] listen(4, 10) = 0
    [pid 2430] getsockname(4, {sa_family=AF_INET, sin_port=htons(38855), sin_addr=inet_addr("0.0.0.0")}, [16]) = 0
    [pid 2430] connect(3, {sa_family=AF_INET, sin_port=htons(38855), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
    [pid 2430] setsockopt(3, SOL_TCP, 0x1f /* TCP_??? */, [7564404], 4) = 0
    [pid 2430] setsockopt(3, 0x11a /* SOL_?? */, 1, "\3\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0
    [pid 2430] accept(4, {sa_family=AF_INET, sin_port=htons(49636), sin_addr=inet_addr("127.0.0.1")}, [16]) = 5
    [pid 2430] setsockopt(5, SOL_TCP, 0x1f /* TCP_??? */, [7564404], 4) = 0
    [pid 2430] setsockopt(5, 0x11a /* SOL_?? */, 2, "\3\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0
    [pid 2430] close(4) = 0
    [pid 2430] sendto(3, "test_read_peek", 14, 0, NULL, 0) = 14
    [pid 2430] sendto(3, "_mult_recs\0", 11, 0, NULL, 0) = 11
    [pid 2430] recvfrom(5, "test_read_peektest_read_peektest"..., 64, MSG_PEEK, NULL, NULL) = 64

    As can be seen from strace, there are two TLS records sent,
    i) 'test_read_peek' and ii) '_mult_recs\0' where we end up
    peeking 'test_read_peektest_read_peektest'. This is clearly
    wrong, and what happens is that given peek cannot call into
    tls_sw_advance_skb() to unpause strparser and proceed with
    the next skb, we end up looping over the current one, copying
    the 'test_read_peek' over and over into the user provided
    buffer.

    Here, we can only peek into the currently held skb (current,
    full TLS record) as otherwise we would end up having to hold
    all the original skb(s) (depending on the peek depth) in a
    separate queue when unpausing strparser to process next
    records, minimally intrusive is to return only up to the
    current record's size (which likely was what c46234ebb4d1
    ("tls: RX path for ktls") originally intended as well). Thus,
    after patch we properly peek the first record:

    [pid 2046] wait4(2075,
    [pid 2075] socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 3
    [pid 2075] socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 4
    [pid 2075] bind(4, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
    [pid 2075] listen(4, 10) = 0
    [pid 2075] getsockname(4, {sa_family=AF_INET, sin_port=htons(55115), sin_addr=inet_addr("0.0.0.0")}, [16]) = 0
    [pid 2075] connect(3, {sa_family=AF_INET, sin_port=htons(55115), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
    [pid 2075] setsockopt(3, SOL_TCP, 0x1f /* TCP_??? */, [7564404], 4) = 0
    [pid 2075] setsockopt(3, 0x11a /* SOL_?? */, 1, "\3\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0
    [pid 2075] accept(4, {sa_family=AF_INET, sin_port=htons(45732), sin_addr=inet_addr("127.0.0.1")}, [16]) = 5
    [pid 2075] setsockopt(5, SOL_TCP, 0x1f /* TCP_??? */, [7564404], 4) = 0
    [pid 2075] setsockopt(5, 0x11a /* SOL_?? */, 2, "\3\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0
    [pid 2075] close(4) = 0
    [pid 2075] sendto(3, "test_read_peek", 14, 0, NULL, 0) = 14
    [pid 2075] sendto(3, "_mult_recs\0", 11, 0, NULL, 0) = 11
    [pid 2075] recvfrom(5, "test_read_peek", 64, MSG_PEEK, NULL, NULL) = 14

    Fixes: c46234ebb4d1 ("tls: RX path for ktls")
    Signed-off-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • …l/git/shuah/linux-kselftest

    Pulled kselftest fixes from Shuah:
    "This Kselftest fixes update for 4.9-rc5 consists of:

    -- fixes to build failures
    -- fixes to add missing config files to increase test coverage
    -- fixes to cgroup test and a new cgroup test for memory.oom.group"

    Greg Kroah-Hartman
     

16 Sep, 2018

2 commits

  • Pull perf fixes from Ingo Molnar:
    "Mostly tooling fixes, but also breakpoint and x86 PMU driver fixes"

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
    perf tools: Fix maps__find_symbol_by_name()
    tools headers uapi: Update tools's copy of linux/if_link.h
    tools headers uapi: Update tools's copy of linux/vhost.h
    tools headers uapi: Update tools's copies of kvm headers
    tools headers uapi: Update tools's copy of drm/drm.h
    tools headers uapi: Update tools's copy of asm-generic/unistd.h
    tools headers uapi: Update tools's copy of linux/perf_event.h
    perf/core: Force USER_DS when recording user stack data
    perf/UAPI: Clearly mark __PERF_SAMPLE_CALLCHAIN_EARLY as internal use
    perf/x86/intel: Add support/quirk for the MISPREDICT bit on Knights Landing CPUs
    perf annotate: Fix parsing aarch64 branch instructions after objdump update
    perf probe powerpc: Ignore SyS symbols irrespective of endianness
    perf event-parse: Use fixed size string for comms
    perf util: Fix bad memory access in trace info.
    perf tools: Streamline bpf examples and headers installation
    perf evsel: Fix potential null pointer dereference in perf_evsel__new_idx()
    perf arm64: Fix include path for asm-generic/unistd.h
    perf/hw_breakpoint: Simplify breakpoint enable in perf_event_modify_breakpoint
    perf/hw_breakpoint: Enable breakpoint in modify_user_hw_breakpoint
    perf/hw_breakpoint: Remove superfluous bp->attr.disabled = 0
    ...

    Linus Torvalds
     
  • Pull locking fixes from Ingo Molnar:
    "Misc fixes: liblockdep fixes and ww_mutex fixes"

    * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    locking/ww_mutex: Fix spelling mistake "cylic" -> "cyclic"
    locking/lockdep: Delete unnecessary #include
    tools/lib/lockdep: Add dummy task_struct state member
    tools/lib/lockdep: Add empty nmi.h
    tools/lib/lockdep: Update Sasha Levin email to MSFT
    jump_label: Fix typo in warning message
    locking/mutex: Fix mutex debug call and ww_mutex documentation

    Linus Torvalds
     

15 Sep, 2018

5 commits

  • fix the following build error:
    clang -I. -I./include/uapi -I../../../include/uapi -idirafter /usr/local/include -idirafter /data/users/ast/llvm/bld/lib/clang/7.0.0/include -idirafter /usr/include -Wno-compare-distinct-pointer-types \
    -O2 -target bpf -emit-llvm -c bpf_flow.c -o - | \
    llc -march=bpf -mcpu=generic -filetype=obj -o /data/users/ast/bpf-next/tools/testing/selftests/bpf/bpf_flow.o
    LLVM ERROR: 'dissect' label emitted multiple times to assembly file
    make: *** [/data/users/ast/bpf-next/tools/testing/selftests/bpf/bpf_flow.o] Error 1

    Fixes: 9c98b13cc3bb ("flow_dissector: implements eBPF parser")
    Signed-off-by: Alexei Starovoitov

    Alexei Starovoitov
     
  • Adds a test that sends different types of packets over multiple
    tunnels and verifies that valid packets are dissected correctly. To do
    so, a tc-flower rule is added to drop packets on UDP src port 9, and
    packets are sent from ports 8, 9, and 10. Only the packets on port 9
    should be dropped. Because tc-flower relies on the flow dissector to
    match flows, correct classification demonstrates correct dissection.

    Also add support logic to load the BPF program and to inject the test
    packets.

    Signed-off-by: Petar Penkov
    Signed-off-by: Willem de Bruijn
    Signed-off-by: Alexei Starovoitov

    Petar Penkov
     
  • This eBPF program extracts basic/control/ip address/ports keys from
    incoming packets. It supports recursive parsing for IP encapsulation,
    and VLAN, along with IPv4/IPv6 and extension headers. This program is
    meant to show how flow dissection and key extraction can be done in
    eBPF.

    Link: http://vger.kernel.org/netconf2017_files/rx_hardening_and_udp_gso.pdf
    Signed-off-by: Petar Penkov
    Signed-off-by: Willem de Bruijn
    Signed-off-by: Alexei Starovoitov

    Petar Penkov
     
  • This patch extends libbpf and bpftool to work with programs of type
    BPF_PROG_TYPE_FLOW_DISSECTOR.

    Signed-off-by: Petar Penkov
    Signed-off-by: Willem de Bruijn
    Signed-off-by: Alexei Starovoitov

    Petar Penkov
     
  • This patch syncs tools/include/uapi/linux/bpf.h with the flow dissector
    definitions from include/uapi/linux/bpf.h

    Signed-off-by: Petar Penkov
    Signed-off-by: Willem de Bruijn
    Signed-off-by: Alexei Starovoitov

    Petar Penkov
     

13 Sep, 2018

2 commits


12 Sep, 2018

6 commits

  • Fix a bug in the key delete code - the num_records range
    from 0 to num_records-1.

    Signed-off-by: K. Y. Srinivasan
    Reported-by: David Binderman
    Cc:
    Reviewed-by: Michael Kelley
    Signed-off-by: Greg Kroah-Hartman

    K. Y. Srinivasan
     
  • Commit f7010770fbac ("tools/bpf: move bpf/lib netlink related
    functions into a new file") introduced a while loop for the
    netlink recv path. This while loop is needed since the
    buffer in recv syscall may not be enough to hold all the
    information and in such cases multiple recv calls are needed.

    There is a bug introduced by the above commit as
    the while loop may block on recv syscall if there is no
    more messages are expected. The netlink message header
    flag NLM_F_MULTI is used to indicate that more messages
    are expected and this patch fixed the bug by doing
    further recv syscall only if multipart message is expected.

    The patch added another fix regarding to message length of 0.
    When netlink recv returns message length of 0, there will be
    no more messages for returning data so the while loop
    can end.

    Fixes: f7010770fbac ("tools/bpf: move bpf/lib netlink related functions into a new file")
    Reported-by: Björn Töpel
    Tested-by: Björn Töpel
    Signed-off-by: Yonghong Song
    Signed-off-by: Alexei Starovoitov

    Yonghong Song
     
  • Currently, prog array map and map of maps are not supported
    in bpftool. This patch added the support.
    Different from other map types, for prog array map and
    map of maps, the key returned bpf_get_next_key() may not
    point to a valid value. So for these two map types,
    no error will be printed out when such a scenario happens.

    The following is the plain and json dump if btf is not available:
    $ ./bpftool map dump id 10
    key: 08 00 00 00 value: 5c 01 00 00
    Found 1 element
    $ ./bpftool -jp map dump id 10
    [{
    "key": ["0x08","0x00","0x00","0x00"
    ],
    "value": ["0x5c","0x01","0x00","0x00"
    ]
    }]

    If the BTF is available, the dump looks below:
    $ ./bpftool map dump id 2
    [{
    "key": 0,
    "value": 7
    }
    ]
    $ ./bpftool -jp map dump id 2
    [{
    "key": ["0x00","0x00","0x00","0x00"
    ],
    "value": ["0x07","0x00","0x00","0x00"
    ],
    "formatted": {
    "key": 0,
    "value": 7
    }
    }]

    Signed-off-by: Yonghong Song
    Signed-off-by: Alexei Starovoitov

    Yonghong Song
     
  • Commit 1c5aae7710bb ("perf machine: Create maps for x86 PTI entry
    trampolines") revealed a problem with maps__find_symbol_by_name() that
    resulted in probes not being found e.g.

    $ sudo perf probe xsk_mmap
    xsk_mmap is out of .text, skip it.
    Probe point 'xsk_mmap' not found.
    Error: Failed to add events.

    maps__find_symbol_by_name() can optionally return the map of the found
    symbol. It can get the map wrong because, in fact, the symbol is found
    on the map's dso, not allowing for the possibility that the dso has more
    than one map. Fix by always checking the map contains the symbol.

    Reported-by: Björn Töpel
    Signed-off-by: Adrian Hunter
    Tested-by: Björn Töpel
    Cc: Jiri Olsa
    Cc: stable@vger.kernel.org
    Fixes: 1c5aae7710bb ("perf machine: Create maps for x86 PTI entry trampolines")
    Link: http://lkml.kernel.org/r/20180907085116.25782-1-adrian.hunter@intel.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Adrian Hunter
     
  • To get the changes in:

    3e7a50ceb11e ("net: report min and max mtu network device settings")
    2756f68c3149 ("net: bridge: add support for backup port")
    a25717d2b604 ("xdp: support simultaneous driver and hw XDP attachment")
    4f91da26c811 ("xdp: add per mode attributes for attached programs")
    f203b76d7809 ("xfrm: Add virtual xfrm interfaces")

    Silencing this libbpf build warning:

    Warning: Kernel ABI header at 'tools/include/uapi/linux/if_link.h' differs from latest version at 'include/uapi/linux/if_link.h'

    Cc: Adrian Hunter
    Cc: Daniel Borkmann
    Cc: David Ahern
    Cc: David S. Miller
    Cc: Jakub Kicinski
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Nikolay Aleksandrov
    Cc: Steffen Klassert
    Cc: Stephen Hemminger
    Cc: Wang Nan
    Link: https://lkml.kernel.org/n/tip-xd9ztioa894zemv8ag8kg64u@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • To get the changes in:

    c48300c92ad9 ("vhost: fix VHOST_GET_BACKEND_FEATURES ioctl request definition")

    This makes 'perf trace' and other tools in the future using its
    beautifiers in a libbeauty.so library be able to translate these new
    ioctl to strings:

    $ tools/perf/trace/beauty/vhost_virtio_ioctl.sh > /tmp/after
    $ diff -u /tmp/before /tmp/after
    --- /tmp/before 2018-09-11 13:10:57.923038244 -0300
    +++ /tmp/after 2018-09-11 13:11:20.329012685 -0300
    @@ -15,6 +15,7 @@
    [0x22] = "SET_VRING_ERR",
    [0x23] = "SET_VRING_BUSYLOOP_TIMEOUT",
    [0x24] = "GET_VRING_BUSYLOOP_TIMEOUT",
    + [0x25] = "SET_BACKEND_FEATURES",
    [0x30] = "NET_SET_BACKEND",
    [0x40] = "SCSI_SET_ENDPOINT",
    [0x41] = "SCSI_CLEAR_ENDPOINT",
    @@ -27,4 +28,5 @@
    static const char *vhost_virtio_ioctl_read_cmds[] = {
    [0x00] = "GET_FEATURES",
    [0x12] = "GET_VRING_BASE",
    + [0x26] = "GET_BACKEND_FEATURES",
    };
    $

    We'll also use this to be able to express syscall filters using symbolic
    these symbolic names, something like:

    # perf trace --all-cpus -e ioctl(cmd=*GET_FEATURES)

    This silences the following warning during perf's build:

    Warning: Kernel ABI header at 'tools/include/uapi/linux/vhost.h' differs from latest version at 'include/uapi/linux/vhost.h'
    diff -u tools/include/uapi/linux/vhost.h include/uapi/linux/vhost.h

    Cc: Adrian Hunter
    Cc: David Ahern
    Cc: David S. Miller
    Cc: Gleb Fotengauer-Malinovskiy
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Wang Nan
    Link: https://lkml.kernel.org/n/tip-35x71oei2hdui9u0tarpimbq@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo