29 Sep, 2016

1 commit

  • Suppose you have a map array value that is something like this

    struct foo {
    unsigned iter;
    int array[SOME_CONSTANT];
    };

    You can easily insert this into an array, but you cannot modify the contents of
    foo->array[] after the fact. This is because we have no way to verify we won't
    go off the end of the array at verification time. This patch provides a start
    for this work. We accomplish this by keeping track of a minimum and maximum
    value a register could be while we're checking the code. Then at the time we
    try to do an access into a MAP_VALUE we verify that the maximum offset into that
    region is a valid access into that memory region. So in practice, code such as
    this

    unsigned index = 0;

    if (foo->iter >= SOME_CONSTANT)
    foo->iter = index;
    else
    index = foo->iter++;
    foo->array[index] = bar;

    would be allowed, as we can verify that index will always be between 0 and
    SOME_CONSTANT-1. If you wish to use signed values you'll have to have an extra
    check to make sure the index isn't less than 0, or do something like index %=
    SOME_CONSTANT.

    Signed-off-by: Josef Bacik
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Josef Bacik
     

27 Sep, 2016

2 commits


21 Sep, 2016

1 commit

  • Add couple of test cases for direct write and the negative size issue, and
    also adjust the direct packet access test4 since it asserts that writes are
    not possible, but since we've just added support for writes, we need to
    invert the verdict to ACCEPT, of course. Summary: 133 PASSED, 0 FAILED.

    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

17 Sep, 2016

2 commits

  • the test creates 3 namespaces with veth connected via bridge.
    First two namespaces simulate two different hosts with the same
    IPv4 and IPv6 addresses configured on the tunnel interface and they
    communicate with outside world via standard tunnels.
    Third namespace creates collect_md tunnel that is driven by BPF
    program which selects different remote host (either first or
    second namespace) based on tcp dest port number while tcp dst
    ip is the same.
    This scenario is rough approximation of load balancer use case.
    The tests check both traditional tunnel configuration and collect_md mode.

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     
  • extend existing tests for vxlan, geneve, gre to include IPIP tunnel.
    It tests both traditional tunnel configuration and
    dynamic via bpf helpers.

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     

09 Sep, 2016

1 commit

  • LLVM can generate code that tests for direct packet access via
    skb->data/data_end in a way that currently gets rejected by the
    verifier, example:

    [...]
    7: (61) r3 = *(u32 *)(r6 +80)
    8: (61) r9 = *(u32 *)(r6 +76)
    9: (bf) r2 = r9
    10: (07) r2 += 54
    11: (3d) if r3 >= r2 goto pc+12
    R1=inv R2=pkt(id=0,off=54,r=0) R3=pkt_end R4=inv R6=ctx
    R9=pkt(id=0,off=0,r=0) R10=fp
    12: (18) r4 = 0xffffff7a
    14: (05) goto pc+430
    [...]

    from 11 to 24: R1=inv R2=pkt(id=0,off=54,r=0) R3=pkt_end R4=inv
    R6=ctx R9=pkt(id=0,off=0,r=0) R10=fp
    24: (7b) *(u64 *)(r10 -40) = r1
    25: (b7) r1 = 0
    26: (63) *(u32 *)(r6 +56) = r1
    27: (b7) r2 = 40
    28: (71) r8 = *(u8 *)(r9 +20)
    invalid access to packet, off=20 size=1, R9(id=0,off=0,r=0)

    The reason why this gets rejected despite a proper test is that we
    currently call find_good_pkt_pointers() only in case where we detect
    tests like rX > pkt_end, where rX is of type pkt(id=Y,off=Z,r=0) and
    derived, for example, from a register of type pkt(id=Y,off=0,r=0)
    pointing to skb->data. find_good_pkt_pointers() then fills the range
    in the current branch to pkt(id=Y,off=0,r=Z) on success.

    For above case, we need to extend that to recognize pkt_end >= rX
    pattern and mark the other branch that is taken on success with the
    appropriate pkt(id=Y,off=0,r=Z) type via find_good_pkt_pointers().
    Since eBPF operates on BPF_JGT (>) and BPF_JGE (>=), these are the
    only two practical options to test for from what LLVM could have
    generated, since there's no such thing as BPF_JLT (= r2 goto pc+12
    R1=inv R2=pkt(id=0,off=54,r=0) R3=pkt_end R4=inv R6=ctx
    R9=pkt(id=0,off=0,r=0) R10=fp
    12: (18) r4 = 0xffffff7a
    14: (05) goto pc+430
    [...]

    from 11 to 24: R1=inv R2=pkt(id=0,off=54,r=54) R3=pkt_end R4=inv
    R6=ctx R9=pkt(id=0,off=0,r=54) R10=fp
    24: (7b) *(u64 *)(r10 -40) = r1
    25: (b7) r1 = 0
    26: (63) *(u32 *)(r6 +56) = r1
    27: (b7) r2 = 40
    28: (71) r8 = *(u8 *)(r9 +20)
    29: (bf) r1 = r8
    30: (25) if r8 > 0x3c goto pc+47
    R1=inv56 R2=imm40 R3=pkt_end R4=inv R6=ctx R8=inv56
    R9=pkt(id=0,off=0,r=54) R10=fp
    31: (b7) r1 = 1
    [...]

    Verifier test cases are also added in this work, one that demonstrates
    the mentioned example here and one that tries a bad packet access for
    the current/fall-through branch (the one with types pkt(id=X,off=Y,r=0),
    pkt(id=X,off=0,r=0)), then a case with good and bad accesses, and two
    with both test variants (>, >=).

    Fixes: 969bf05eb3ce ("bpf: direct packet access")
    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

03 Sep, 2016

2 commits

  • sample instruction pointer and frequency count in a BPF map

    Signed-off-by: Brendan Gregg
    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Brendan Gregg
     
  • The bpf program is called 50 times a second and does hashmap[kern&user_stackid]++
    It's primary purpose to check that key bpf helpers like map lookup, update,
    get_stackid, trace_printk and ctx access are all working.
    It checks:
    - PERF_COUNT_HW_CPU_CYCLES on all cpus
    - PERF_COUNT_HW_CPU_CYCLES for current process and inherited perf_events to children
    - PERF_COUNT_SW_CPU_CLOCK on all cpus
    - PERF_COUNT_SW_CPU_CLOCK for current process

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     

20 Aug, 2016

1 commit

  • The patch creates sample code exercising bpf_skb_{set,get}_tunnel_key,
    and bpf_skb_{set,get}_tunnel_opt for GRE, VXLAN, and GENEVE. A native
    tunnel device is created in a namespace to interact with a lwtunnel
    device out of the namespace, with metadata enabled. The bpf_skb_set_*
    program is attached to tc egress and bpf_skb_get_* is attached to egress
    qdisc. A ping between two tunnels is used to verify correctness and
    the result of bpf_skb_get_* printed by bpf_trace_printk.

    Signed-off-by: William Tu
    Signed-off-by: David S. Miller

    William Tu
     

18 Aug, 2016

2 commits

  • Minor overlapping changes for both merge conflicts.

    Resolution work done by Stephen Rothwell was used
    as a reference.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Pull networking fixes from David Miller:

    1) Buffers powersave frame test is reversed in cfg80211, fix from Felix
    Fietkau.

    2) Remove bogus WARN_ON in openvswitch, from Jarno Rajahalme.

    3) Fix some tg3 ethtool logic bugs, and one that would cause no
    interrupts to be generated when rx-coalescing is set to 0. From
    Satish Baddipadige and Siva Reddy Kallam.

    4) QLCNIC mailbox corruption and napi budget handling fix from Manish
    Chopra.

    5) Fix fib_trie logic when walking the trie during /proc/net/route
    output than can access a stale node pointer. From David Forster.

    6) Several sctp_diag fixes from Phil Sutter.

    7) PAUSE frame handling fixes in mlxsw driver from Ido Schimmel.

    8) Checksum fixup fixes in bpf from Daniel Borkmann.

    9) Memork leaks in nfnetlink, from Liping Zhang.

    10) Use after free in rxrpc, from David Howells.

    11) Use after free in new skb_array code of macvtap driver, from Jason
    Wang.

    12) Calipso resource leak, from Colin Ian King.

    13) mediatek bug fixes (missing stats sync init, etc.) from Sean Wang.

    14) Fix bpf non-linear packet write helpers, from Daniel Borkmann.

    15) Fix lockdep splats in macsec, from Sabrina Dubroca.

    16) hv_netvsc bug fixes from Vitaly Kuznetsov, mostly to do with VF
    handling.

    17) Various tc-action bug fixes, from CONG Wang.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (116 commits)
    net_sched: allow flushing tc police actions
    net_sched: unify the init logic for act_police
    net_sched: convert tcf_exts from list to pointer array
    net_sched: move tc offload macros to pkt_cls.h
    net_sched: fix a typo in tc_for_each_action()
    net_sched: remove an unnecessary list_del()
    net_sched: remove the leftover cleanup_a()
    mlxsw: spectrum: Allow packets to be trapped from any PG
    mlxsw: spectrum: Unmap 802.1Q FID before destroying it
    mlxsw: spectrum: Add missing rollbacks in error path
    mlxsw: reg: Fix missing op field fill-up
    mlxsw: spectrum: Trap loop-backed packets
    mlxsw: spectrum: Add missing packet traps
    mlxsw: spectrum: Mark port as active before registering it
    mlxsw: spectrum: Create PVID vPort before registering netdevice
    mlxsw: spectrum: Remove redundant errors from the code
    mlxsw: spectrum: Don't return upon error in removal path
    i40e: check for and deal with non-contiguous TCs
    ixgbe: Re-enable ability to toggle VLAN filtering
    ixgbe: Force VLNCTRL.VFE to be set in all VMDq paths
    ...

    Linus Torvalds
     

13 Aug, 2016

3 commits

  • test various corner cases of the helper function access to the packet
    via crafted XDP programs.

    Signed-off-by: Aaron Yue
    Signed-off-by: Alexei Starovoitov
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Aaron Yue
     
  • While hashing out BPF's current_task_under_cgroup helper bits, it came
    to discussion that the skb_in_cgroup helper name was suboptimally chosen.

    Tejun says:

    So, I think in_cgroup should mean that the object is in that
    particular cgroup while under_cgroup in the subhierarchy of that
    cgroup. Let's rename the other subhierarchy test to under too. I
    think that'd be a lot less confusing going forward.

    [...]

    It's more intuitive and gives us the room to implement the real
    "in" test if ever necessary in the future.

    Since this touches uapi bits, we need to change this as long as v4.8
    is not yet officially released. Thus, change the helper enum and rename
    related bits.

    Fixes: 4a482f34afcc ("cgroup: bpf: Add bpf_skb_in_cgroup_proto")
    Reference: http://patchwork.ozlabs.org/patch/658500/
    Suggested-by: Sargun Dhillon
    Suggested-by: Tejun Heo
    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov

    Daniel Borkmann
     
  • This test has a BPF program which writes the last known pid to call the
    sync syscall within a given cgroup to a map.

    The user mode program creates its own mount namespace, and mounts the
    cgroupsv2 hierarchy in there, as on all current test systems
    (Ubuntu 16.04, Debian), the cgroupsv2 vfs is unmounted by default.
    Once it does this, it proceeds to test.

    The test checks for positive and negative condition. It ensures that
    when it's part of a given cgroup, its pid is captured in the map,
    and that when it leaves the cgroup, this doesn't happen.

    It populate a cgroups arraymap prior to execution in userspace. This means
    that the program must be run in the same cgroups namespace as the programs
    that are being traced.

    Signed-off-by: Sargun Dhillon
    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Cc: Tejun Heo
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Sargun Dhillon
     

11 Aug, 2016

1 commit

  • The commit 555c8a8623a3 ("bpf: avoid stack copy and use skb ctx for event output")
    started using 20 of initially reserved upper 32-bits of 'flags' argument
    in bpf_perf_event_output(). Adjust corresponding prototype in samples/bpf/bpf_helpers.h

    Signed-off-by: Adam Barth
    Signed-off-by: Alexei Starovoitov
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Adam Barth
     

07 Aug, 2016

1 commit


04 Aug, 2016

4 commits

  • The regs_return_value() returns "unsigned long" or "long" value. But the
    retval is int type now, it may cause overflow, the log may becomes:

    [ 2911.078869] do_brk returned -2003877888 and took 4620 ns to execute

    This patch converts the retval to "unsigned long" type, and fixes the
    overflow issue.

    Link: http://lkml.kernel.org/r/1464143083-3877-4-git-send-email-shijie.huang@arm.com
    Signed-off-by: Huang Shijie
    Cc: Petr Mladek
    Cc: Steve Capper
    Cc: Ananth N Mavinakayanahalli
    Cc: Anil S Keshavamurthy
    Cc: Masami Hiramatsu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Huang Shijie
     
  • We prefer to use the pr_* to print out the log now, this patch converts
    the printk to pr_info. In the error path, use the pr_err to replace the
    printk.

    Link: http://lkml.kernel.org/r/1464143083-3877-3-git-send-email-shijie.huang@arm.com
    Signed-off-by: Huang Shijie
    Cc: Petr Mladek
    Cc: Steve Capper
    Cc: Ananth N Mavinakayanahalli
    Cc: Anil S Keshavamurthy
    Cc: Masami Hiramatsu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Huang Shijie
     
  • We prefer to use the pr_* to print out the log now, this patch converts
    the printk to pr_info. In the error path, use the pr_err to replace the
    printk.

    Link: http://lkml.kernel.org/r/1464143083-3877-2-git-send-email-shijie.huang@arm.com
    Signed-off-by: Huang Shijie
    Cc: Petr Mladek
    Cc: Steve Capper
    Cc: Ananth N Mavinakayanahalli
    Cc: Anil S Keshavamurthy
    Cc: Masami Hiramatsu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Huang Shijie
     
  • We prefer to use the pr_* to print out the log now, this patch converts
    the printk to pr_info. In the error path, use the pr_err to replace the
    printk.

    Link: http://lkml.kernel.org/r/1464143083-3877-1-git-send-email-shijie.huang@arm.com
    Signed-off-by: Huang Shijie
    Cc: Petr Mladek
    Cc: Steve Capper
    Cc: Ananth N Mavinakayanahalli
    Cc: Anil S Keshavamurthy
    Cc: Masami Hiramatsu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Huang Shijie
     

30 Jul, 2016

1 commit

  • Pull security subsystem updates from James Morris:
    "Highlights:

    - TPM core and driver updates/fixes
    - IPv6 security labeling (CALIPSO)
    - Lots of Apparmor fixes
    - Seccomp: remove 2-phase API, close hole where ptrace can change
    syscall #"

    * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (156 commits)
    apparmor: fix SECURITY_APPARMOR_HASH_DEFAULT parameter handling
    tpm: Add TPM 2.0 support to the Nuvoton i2c driver (NPCT6xx family)
    tpm: Factor out common startup code
    tpm: use devm_add_action_or_reset
    tpm2_i2c_nuvoton: add irq validity check
    tpm: read burstcount from TPM_STS in one 32-bit transaction
    tpm: fix byte-order for the value read by tpm2_get_tpm_pt
    tpm_tis_core: convert max timeouts from msec to jiffies
    apparmor: fix arg_size computation for when setprocattr is null terminated
    apparmor: fix oops, validate buffer size in apparmor_setprocattr()
    apparmor: do not expose kernel stack
    apparmor: fix module parameters can be changed after policy is locked
    apparmor: fix oops in profile_unpack() when policy_db is not present
    apparmor: don't check for vmalloc_addr if kvzalloc() failed
    apparmor: add missing id bounds check on dfa verification
    apparmor: allow SYS_CAP_RESOURCE to be sufficient to prlimit another task
    apparmor: use list_next_entry instead of list_entry_next
    apparmor: fix refcount race when finding a child profile
    apparmor: fix ref count leak when profile sha1 hash is read
    apparmor: check that xindex is in trans_table bounds
    ...

    Linus Torvalds
     

29 Jul, 2016

1 commit

  • Pull tracing updates from Steven Rostedt:
    "This is mostly clean ups and small fixes. Some of the more visible
    changes are:

    - The function pid code uses the event pid filtering logic
    - [ku]probe events have access to current->comm
    - trace_printk now has sample code
    - PCI devices now trace physical addresses
    - stack tracing has less unnessary functions traced"

    * tag 'trace-v4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    printk, tracing: Avoiding unneeded blank lines
    tracing: Use __get_str() when manipulating strings
    tracing, RAS: Cleanup on __get_str() usage
    tracing: Use outer () on __get_str() definition
    ftrace: Reduce size of function graph entries
    tracing: Have HIST_TRIGGERS select TRACING
    tracing: Using for_each_set_bit() to simplify trace_pid_write()
    ftrace: Move toplevel init out of ftrace_init_tracefs()
    tracing/function_graph: Fix filters for function_graph threshold
    tracing: Skip more functions when doing stack tracing of events
    tracing: Expose CPU physical addresses (resource values) for PCI devices
    tracing: Show the preempt count of when the event was called
    tracing: Add trace_printk sample code
    tracing: Choose static tp_printk buffer by explicit nesting count
    tracing: expose current->comm to [ku]probe events
    ftrace: Have set_ftrace_pid use the bitmap like events do
    tracing: Move pid_list write processing into its own function
    tracing: Move the pid_list seq_file functions to be global
    tracing: Move filtered_pid helper functions into trace.c
    tracing: Make the pid filtering helper functions global

    Linus Torvalds
     

28 Jul, 2016

2 commits

  • Pull networking updates from David Miller:

    1) Unified UDP encapsulation offload methods for drivers, from
    Alexander Duyck.

    2) Make DSA binding more sane, from Andrew Lunn.

    3) Support QCA9888 chips in ath10k, from Anilkumar Kolli.

    4) Several workqueue usage cleanups, from Bhaktipriya Shridhar.

    5) Add XDP (eXpress Data Path), essentially running BPF programs on RX
    packets as soon as the device sees them, with the option to mirror
    the packet on TX via the same interface. From Brenden Blanco and
    others.

    6) Allow qdisc/class stats dumps to run lockless, from Eric Dumazet.

    7) Add VLAN support to b53 and bcm_sf2, from Florian Fainelli.

    8) Simplify netlink conntrack entry layout, from Florian Westphal.

    9) Add ipv4 forwarding support to mlxsw spectrum driver, from Ido
    Schimmel, Yotam Gigi, and Jiri Pirko.

    10) Add SKB array infrastructure and convert tun and macvtap over to it.
    From Michael S Tsirkin and Jason Wang.

    11) Support qdisc packet injection in pktgen, from John Fastabend.

    12) Add neighbour monitoring framework to TIPC, from Jon Paul Maloy.

    13) Add NV congestion control support to TCP, from Lawrence Brakmo.

    14) Add GSO support to SCTP, from Marcelo Ricardo Leitner.

    15) Allow GRO and RPS to function on macsec devices, from Paolo Abeni.

    16) Support MPLS over IPV4, from Simon Horman.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1622 commits)
    xgene: Fix build warning with ACPI disabled.
    be2net: perform temperature query in adapter regardless of its interface state
    l2tp: Correctly return -EBADF from pppol2tp_getname.
    net/mlx5_core/health: Remove deprecated create_singlethread_workqueue
    net: ipmr/ip6mr: update lastuse on entry change
    macsec: ensure rx_sa is set when validation is disabled
    tipc: dump monitor attributes
    tipc: add a function to get the bearer name
    tipc: get monitor threshold for the cluster
    tipc: make cluster size threshold for monitoring configurable
    tipc: introduce constants for tipc address validation
    net: neigh: disallow transition to NUD_STALE if lladdr is unchanged in neigh_update()
    MAINTAINERS: xgene: Add driver and documentation path
    Documentation: dtb: xgene: Add MDIO node
    dtb: xgene: Add MDIO node
    drivers: net: xgene: ethtool: Use phy_ethtool_gset and sset
    drivers: net: xgene: Use exported functions
    drivers: net: xgene: Enable MDIO driver
    drivers: net: xgene: Add backward compatibility
    drivers: net: phy: xgene: Add MDIO driver
    ...

    Linus Torvalds
     
  • Pull arm64 updates from Catalin Marinas:

    - Kexec support for arm64

    - Kprobes support

    - Expose MIDR_EL1 and REVIDR_EL1 CPU identification registers to sysfs

    - Trapping of user space cache maintenance operations and emulation in
    the kernel (CPU errata workaround)

    - Clean-up of the early page tables creation (kernel linear mapping,
    EFI run-time maps) to avoid splitting larger blocks (e.g. pmds) into
    smaller ones (e.g. ptes)

    - VDSO support for CLOCK_MONOTONIC_RAW in clock_gettime()

    - ARCH_HAS_KCOV enabled for arm64

    - Optimise IP checksum helpers

    - SWIOTLB optimisation to only allocate/initialise the buffer if the
    available RAM is beyond the 32-bit mask

    - Properly handle the "nosmp" command line argument

    - Fix for the initialisation of the CPU debug state during early boot

    - vdso-offsets.h build dependency workaround

    - Build fix when RANDOMIZE_BASE is enabled with MODULES off

    * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (64 commits)
    arm64: arm: Fix-up the removal of the arm64 regs_query_register_name() prototype
    arm64: Only select ARM64_MODULE_PLTS if MODULES=y
    arm64: mm: run pgtable_page_ctor() on non-swapper translation table pages
    arm64: mm: make create_mapping_late() non-allocating
    arm64: Honor nosmp kernel command line option
    arm64: Fix incorrect per-cpu usage for boot CPU
    arm64: kprobes: Add KASAN instrumentation around stack accesses
    arm64: kprobes: Cleanup jprobe_return
    arm64: kprobes: Fix overflow when saving stack
    arm64: kprobes: WARN if attempting to step with PSTATE.D=1
    arm64: debug: remove unused local_dbg_{enable, disable} macros
    arm64: debug: remove redundant spsr manipulation
    arm64: debug: unmask PSTATE.D earlier
    arm64: localise Image objcopy flags
    arm64: ptrace: remove extra define for CPSR's E bit
    kprobes: Add arm64 case in kprobe example module
    arm64: Add kernel return probes support (kretprobes)
    arm64: Add trampoline code for kretprobes
    arm64: kprobes instruction simulation support
    arm64: Treat all entry code as non-kprobe-able
    ...

    Linus Torvalds
     

26 Jul, 2016

2 commits

  • This example shows using a kprobe to act as a dnat mechanism to divert
    traffic for arbitrary endpoints. It rewrite the arguments to a syscall
    while they're still in userspace, and before the syscall has a chance
    to copy the argument into kernel space.

    Although this is an example, it also acts as a test because the mapped
    address is 255.255.255.255:555 -> real address, and that's not a legal
    address to connect to. If the helper is broken, the example will fail
    on the intermediate steps, as well as the final step to verify the
    rewrite of userspace memory succeeded.

    Signed-off-by: Sargun Dhillon
    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Sargun Dhillon
     
  • This allows user memory to be written to during the course of a kprobe.
    It shouldn't be used to implement any kind of security mechanism
    because of TOC-TOU attacks, but rather to debug, divert, and
    manipulate execution of semi-cooperative processes.

    Although it uses probe_kernel_write, we limit the address space
    the probe can write into by checking the space with access_ok.
    We do this as opposed to calling copy_to_user directly, in order
    to avoid sleeping. In addition we ensure the threads's current fs
    / segment is USER_DS and the thread isn't exiting nor a kernel thread.

    Given this feature is meant for experiments, and it has a risk of
    crashing the system, and running programs, we print a warning on
    when a proglet that attempts to use this helper is installed,
    along with the pid and process name.

    Signed-off-by: Sargun Dhillon
    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Sargun Dhillon
     

21 Jul, 2016

2 commits

  • Add a '-6' option to the sample pktgen scripts for sending out
    IPv6 packets.

    [root@kerneldev010.prn1 ~/pktgen]# ./pktgen_sample03_burst_single_flow.sh -i eth0 -s 64 -d fe80::f652:14ff:fec2:a14c -m f4:52:14:c2:a1:4c -b 32 -6

    [root@kerneldev011.prn1 ~]# tcpdump -i eth0 -nn -c3 port 9
    tcpdump: WARNING: eth0: no IPv4 address assigned
    tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
    listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
    14:38:51.815297 IP6 fe80::f652:14ff:fec2:2ad2.9 > fe80::f652:14ff:fec2:a14c.9: UDP, length 16
    14:38:51.815311 IP6 fe80::f652:14ff:fec2:2ad2.9 > fe80::f652:14ff:fec2:a14c.9: UDP, length 16
    14:38:51.815313 IP6 fe80::f652:14ff:fec2:2ad2.9 > fe80::f652:14ff:fec2:a14c.9: UDP, length 16

    Signed-off-by: Martin KaFai Lau
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Martin KaFai Lau
     
  • The naming choice of index is not terribly descriptive, and dropcnt is
    in fact incorrect for xdp2. Pick better names for these: ipproto and
    rxcnt.

    Signed-off-by: Brenden Blanco
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Brenden Blanco
     

20 Jul, 2016

2 commits

  • Add a sample that rewrites and forwards packets out on the same
    interface. Observed single core forwarding performance of ~10Mpps.

    Since the mlx4 driver under test recycles every single packet page, the
    perf output shows almost exclusively just the ring management and bpf
    program work. Slowdowns are likely occurring due to cache misses.

    Signed-off-by: Brenden Blanco
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Brenden Blanco
     
  • Add a sample program that only drops packets at the BPF_PROG_TYPE_XDP_RX
    hook of a link. With the drop-only program, observed single core rate is
    ~20Mpps.

    Other tests were run, for instance without the dropcnt increment or
    without reading from the packet header, the packet rate was mostly
    unchanged.

    $ perf record -a samples/bpf/xdp1 $(
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Brenden Blanco
     

19 Jul, 2016

1 commit


15 Jul, 2016

3 commits

  • Removing the pktgen sample script pktgen.conf-1-1-rdos, because
    it does not contain anything that is not covered by the other and
    newer style sample scripts.

    Signed-off-by: Jesper Dangaard Brouer
    Signed-off-by: David S. Miller

    Jesper Dangaard Brouer
     
  • This pktgen sample script is useful for scalability testing a
    receiver. The script will simply generate one flow per
    thread (option -t N) using the thread number as part of the
    source IP-address.

    The single flow sample (pktgen_sample03_burst_single_flow.sh)
    have become quite popular, but it is important that developers
    also make sure to benchmark scalability of multiple receive
    queues.

    Signed-off-by: Jesper Dangaard Brouer
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Jesper Dangaard Brouer
     
  • Adding a pktgen sample script that demonstrates how to use pktgen
    for simulating flows. Script will generate a certain number of
    concurrent flows ($FLOWS) and each flow will contain $FLOWLEN
    packets, which will be send back-to-back, before switching to a
    new flow, due to flag FLOW_SEQ.

    This script obsoletes the old sample script 'pktgen.conf-1-1-flows',
    which is removed.

    Signed-off-by: Jesper Dangaard Brouer
    Signed-off-by: David S. Miller

    Jesper Dangaard Brouer
     

09 Jul, 2016

2 commits


08 Jul, 2016

1 commit

  • Add a separate Kconfig option for SAMPLES_SECCOMP.

    Main reason for this is that, just like other samples, it's forced to
    be a module.

    Without this, since the sample is a target only controlled by
    CONFIG_SECCOMP_FILTER, the samples will be built before include files are
    put in place properly. For example, from an arm64 allmodconfig built with
    "make -sk -j 32" (without specific target), the following happens:

    samples/seccomp/bpf-fancy.c:13:27: fatal error: linux/seccomp.h: No such file or directory
    samples/seccomp/bpf-helper.h:20:50: fatal error: linux/seccomp.h: No such file or directory
    samples/seccomp/dropper.c:20:27: fatal error: linux/seccomp.h: No such file or directory
    samples/seccomp/bpf-direct.c:21:27: fatal error: linux/seccomp.h: No such file or directory

    So, just stick to the same format as other samples.

    Signed-off-by: Olof Johansson
    Signed-off-by: Kees Cook

    Olof Johansson
     

05 Jul, 2016

1 commit


02 Jul, 2016

1 commit

  • test_cgrp2_array_pin.c:
    A userland program that creates a bpf_map (BPF_MAP_TYPE_GROUP_ARRAY),
    pouplates/updates it with a cgroup2's backed fd and pins it to a
    bpf-fs's file. The pinned file can be loaded by tc and then used
    by the bpf prog later. This program can also update an existing pinned
    array and it could be useful for debugging/testing purpose.

    test_cgrp2_tc_kern.c:
    A bpf prog which should be loaded by tc. It is to demonstrate
    the usage of bpf_skb_in_cgroup.

    test_cgrp2_tc.sh:
    A script that glues the test_cgrp2_array_pin.c and
    test_cgrp2_tc_kern.c together. The idea is like:
    1. Load the test_cgrp2_tc_kern.o by tc
    2. Use test_cgrp2_array_pin.c to populate a BPF_MAP_TYPE_CGROUP_ARRAY
    with a cgroup fd
    3. Do a 'ping -6 ff02::1%ve' to ensure the packet has been
    dropped because of a match on the cgroup

    Most of the lines in test_cgrp2_tc.sh is the boilerplate
    to setup the cgroup/bpf-fs/net-devices/netns...etc. It is
    not bulletproof on errors but should work well enough and
    give enough debug info if things did not go well.

    Signed-off-by: Martin KaFai Lau
    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Cc: Tejun Heo
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Martin KaFai Lau