02 May, 2017

3 commits


01 May, 2017

1 commit

  • llvm 4.0 and above generates the code like below:
    ....
    440: (b7) r1 = 15
    441: (05) goto pc+73
    515: (79) r6 = *(u64 *)(r10 -152)
    516: (bf) r7 = r10
    517: (07) r7 += -112
    518: (bf) r2 = r7
    519: (0f) r2 += r1
    520: (71) r1 = *(u8 *)(r8 +0)
    521: (73) *(u8 *)(r2 +45) = r1
    ....
    and the verifier complains "R2 invalid mem access 'inv'" for insn #521.
    This is because verifier marks register r2 as unknown value after #519
    where r2 is a stack pointer and r1 holds a constant value.

    Teach verifier to recognize "stack_ptr + imm" and
    "stack_ptr + reg with const val" as valid stack_ptr with new offset.

    Signed-off-by: Yonghong Song
    Acked-by: Martin KaFai Lau
    Acked-by: Daniel Borkmann
    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Yonghong Song
     

29 Apr, 2017

3 commits

  • To overcome bugs as described and fixed in 89087c456fb5 ("bpf: Fix
    values type used in test_maps"), provide a generic BPF_DECLARE_PERCPU()
    and bpf_percpu() accessor macro for all percpu map values used in
    tests.

    Declaring variables works as follows (also works for structs):

    BPF_DECLARE_PERCPU(uint32_t, my_value);

    They can then be accessed normally as uint32_t type through:

    bpf_percpu(my_value, )

    For example:

    bpf_percpu(my_value, 0)++;

    Implicitly, we make sure that the passed type is allocated and aligned
    by gcc at least on a 8-byte boundary, so that it works together with
    the map lookup/update syscall for percpu maps. We use it as a usage
    example in test_maps, so that others are free to adapt this into their
    code when necessary.

    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • Dave reported that on sparc test_progs generates buggy swapped
    eth->h_proto protocol comparisons:

    10: (15) if r3 == 0xdd86 goto pc+9
    R0=imm2,min_value=2,max_value=2 R1=pkt(id=0,off=0,r=14) R2=pkt_end R3=inv
    R4=pkt(id=0,off=14,r=14) R5=inv56 R10=fp

    This is due to the unconditional ...

    #define htons __builtin_bswap16
    #define ntohs __builtin_bswap16

    ... in test_progs that causes this. Make use of asm/byteorder.h
    and use __constant_htons() where possible and only perform the
    bswap16 when on little endian in non-constant case.

    Fixes: 6882804c916b ("selftests/bpf: add a test for overlapping packet range checks")
    Fixes: 37821613626e ("selftests/bpf: add l4 load balancer test based on sched_cls")
    Reported-by: David S. Miller
    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • Add several test cases around ldimm64, fp arithmetic and direct
    packet access.

    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

25 Apr, 2017

4 commits


23 Apr, 2017

1 commit


22 Apr, 2017

3 commits

  • Both conflict were simple overlapping changes.

    In the kaweth case, Eric Dumazet's skb_cow() bug fix overlapped the
    conversion of the driver in net-next to use in-netdev stats.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Maps of per-cpu type have their value element size adjusted to 8 if it
    is specified smaller during various map operations.

    This makes test_maps as a 32-bit binary fail, in fact the kernel
    writes past the end of the value's array on the user's stack.

    To be quite honest, I think the kernel should reject creation of a
    per-cpu map that doesn't have a value size of at least 8 if that's
    what the kernel is going to silently adjust to later.

    If the user passed something smaller, it is a sizeof() calcualtion
    based upon the type they will actually use (just like in this testcase
    code) in later calls to the map operations.

    Fixes: df570f577231 ("samples/bpf: unit test for BPF_MAP_TYPE_PERCPU_ARRAY")
    Signed-off-by: David S. Miller
    Acked-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov

    David Miller
     
  • Add napi_id access to __sk_buff for socket filter program types, tc
    program types and other bpf_convert_ctx_access() users. Having access
    to skb->napi_id is useful for per RX queue listener siloing, f.e.
    in combination with SO_ATTACH_REUSEPORT_EBPF and when busy polling is
    used, meaning SO_REUSEPORT enabled listeners can then select the
    corresponding socket at SYN time already [1]. The skb is marked via
    skb_mark_napi_id() early in the receive path (e.g., napi_gro_receive()).

    Currently, sockets can only use SO_INCOMING_NAPI_ID from 6d4339028b35
    ("net: Introduce SO_INCOMING_NAPI_ID") as a socket option to look up
    the NAPI ID associated with the queue for steering, which requires a
    prior sk_mark_napi_id() after the socket was looked up.

    Semantics for the __sk_buff napi_id access are similar, meaning if
    skb->napi_id is < MIN_NAPI_ID (e.g. outgoing packets using sender_cpu),
    then an invalid napi_id of 0 is returned to the program, otherwise a
    valid non-zero napi_id.

    [1] http://netdevconf.org/2.1/slides/apr6/dumazet-BUSY-POLLING-Netdev-2.1.pdf

    Suggested-by: Eric Dumazet
    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

21 Apr, 2017

1 commit

  • 'psock_fanout' has been failing since commit 4d7b9dc1f36a9 ("tools:
    psock_lib: harden socket filter used by psock tests"). That commit
    changed the CBPF filter to examine the full ethernet frame, and was
    tested on 'psock_tpacket' which uses SOCK_RAW. But 'psock_fanout' was
    also using this same CBPF in two places, for filtering and fanout, on a
    SOCK_DGRAM socket.

    Change 'psock_fanout' to use SOCK_RAW so that the CBPF program used with
    SO_ATTACH_FILTER can examine the entire frame. Create a new CBPF
    program for use with PACKET_FANOUT_DATA which ignores the header, as it
    cannot see the ethernet header.

    Tested: Ran tools/testing/selftests/net/psock_{fanout,tpacket} 10 times,
    and they all passed.

    Fixes: 4d7b9dc1f36a9 ("tools: psock_lib: harden socket filter used by psock tests")
    Signed-off-by: 'Mike Maloney '
    Signed-off-by: David S. Miller

    Mike Maloney
     

20 Apr, 2017

1 commit


19 Apr, 2017

3 commits

  • Pull ftrace testcase update from Steven Rostedt:
    "While testing my development branch, without the fix for the pid use
    after free bug, the selftest that Namhyung added triggers it. I
    figured it would be good to add the test for the bug after the fix,
    such that it does not exist without the fix.

    I added another patch that lets the test only test part of the pid
    filtering, and ignores the function-fork (filtering on children as
    well) if the function-fork feature does not exist. This feature is
    added by Namhyung just before he added this test. But since the test
    tests both with and without the feature, it would be good to let it
    not fail if the feature does not exist"

    * tag 'trace-v4.11-rc5-4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    selftests: ftrace: Add check for function-fork before running pid filter test
    selftests: ftrace: Add a testcase for function PID filter

    Linus Torvalds
     
  • Have the func-filter-pid test check for the function-fork option before
    testing it. It can still test the pid filtering, but will stop before
    testing the function-fork option for children inheriting the pids.
    This allows the test to be added before the function-fork feature, but after
    a bug fix that triggers one of the bugs the test can cause.

    Cc: Namhyung Kim
    Cc: Shuah Khan
    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     
  • Like event pid filtering test, add function pid filtering test with the
    new "function-fork" option. It also tests it on an instance directory
    so that it can verify the bug related pid filtering on instances.

    Link: http://lkml.kernel.org/r/20170417024430.21194-5-namhyung@kernel.org

    Cc: Ingo Molnar
    Cc: Masami Hiramatsu
    Cc: Shuah Khan
    Signed-off-by: Namhyung Kim
    Signed-off-by: Steven Rostedt (VMware)

    Namhyung Kim
     

18 Apr, 2017

3 commits

  • After doing map_perf_test with a much bigger
    BPF_F_NO_COMMON_LRU map, the perf report shows a
    lot of time spent in rotating the inactive list (i.e.
    __bpf_lru_list_rotate_inactive):
    > map_perf_test 32 8 10000 1000000 | awk '{sum += $3}END{print sum}'
    19644783 (19M/s)
    > map_perf_test 32 8 10000000 10000000 | awk '{sum += $3}END{print sum}'
    6283930 (6.28M/s)

    By inactive, it usually means the element is not in cache. Hence,
    there is a need to tune the PERCPU_NR_SCANS value.

    This patch finds a better number of elements to
    scan during each list rotation. The PERCPU_NR_SCANS (which
    is defined the same as PERCPU_FREE_TARGET) decreases
    from 16 elements to 4 elements. This change only
    affects the BPF_F_NO_COMMON_LRU map.

    The test_lru_dist does not show meaningful difference
    between 16 and 4. Our production L4 load balancer which uses
    the LRU map for conntrack-ing also shows little change in cache
    hit rate. Since both benchmark and production data show no
    cache-hit difference, PERCPU_NR_SCANS is lowered from 16 to 4.
    We can consider making it configurable if we find a usecase
    later that shows another value works better and/or use
    a different rotation strategy.

    After this change:
    > map_perf_test 32 8 10000000 10000000 | awk '{sum += $3}END{print sum}'
    9240324 (9.2M/s)

    i.e. 6.28M/s -> 9.2M/s

    The test_lru_dist has not shown meaningful difference:
    > test_lru_dist zipf.100k.a1_01.out 4000 1:
    nr_misses: 31575 (Before) vs 31566 (After)

    > test_lru_dist zipf.100k.a0_01.out 40000 1
    nr_misses: 67036 (Before) vs 67031 (After)

    Signed-off-by: Martin KaFai Lau
    Acked-by: Alexei Starovoitov
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Martin KaFai Lau
     
  • This patch does the following cleanup on test_lru_map.c
    1) Fix indentation (Replace spaces by tabs)
    2) Remove redundant BPF_F_NO_COMMON_LRU test
    3) Simplify some comments

    Signed-off-by: Martin KaFai Lau
    Acked-by: Alexei Starovoitov
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Martin KaFai Lau
     
  • test_lru_sanity3 is not applicable to BPF_F_NO_COMMON_LRU.
    It just happens to work when PERCPU_FREE_TARGET == 16.

    This patch:
    1) Disable test_lru_sanity3 for BPF_F_NO_COMMON_LRU
    2) Add test_lru_sanity6 to test list rotation for
    the BPF_F_NO_COMMON_LRU map.

    Signed-off-by: Martin KaFai Lau
    Acked-by: Alexei Starovoitov
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Martin KaFai Lau
     

16 Apr, 2017

1 commit


15 Apr, 2017

1 commit

  • Pull perf fixes from Thomas Gleixner:
    "Two small fixes for perf:

    - the move to support cross arch annotation introduced per arch
    initialization requirements, fullfill them for s/390 (Christian
    Borntraeger)

    - add the missing initialization to the LBR entries to avoid exposing
    random or stale data"

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    perf/x86: Avoid exposing wrong/stale data in intel_pmu_lbr_read_32()
    perf annotate s390: Fix perf annotate error -95 (4.10 regression)

    Linus Torvalds
     

14 Apr, 2017

1 commit

  • When debugging the JIT on an embedded platform or cross build
    environment, libbfd may not be available, making it impossible to run
    bpf_jit_disasm natively.

    Add an option to emit a binary image of the JIT code to a file. This
    file can then be disassembled off line. Typical usage in this case
    might be (pasting mips64 dmesg output to cat command):

    $ cat > jit.raw
    $ bpf_jit_disasm -f jit.raw -O jit.bin
    $ mips64-linux-gnu-objdump -D -b binary -m mips:isa64r2 -EB jit.bin

    Signed-off-by: David Daney
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    David Daney
     

13 Apr, 2017

8 commits

  • The switch that conditionally sets CPUPOWER_CAP_HAS_TURBO_RATIO and
    CPUPOWER_CAP_IS_SNB flags is missing a break, so all cores get both
    flags set and an assumed base clock of 100 MHz for turbo values.

    Reported-by: GSR
    Tested-by: GSR
    References: https://bugs.debian.org/859978
    Fixes: 8fb2e440b223 (cpupower: Show Intel turbo ratio support via ...)
    Signed-off-by: Ben Hutchings
    Signed-off-by: Rafael J. Wysocki

    Ben Hutchings
     
  • Pull turbostat utility fixes for v4.11 from Len Brown.

    * 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
    tools/power turbostat: update version number
    tools/power turbostat: fix impossibly large CPU%c1 value
    tools/power turbostat: turbostat.8 add missing column definitions
    tools/power turbostat: update HWP dump to decimal from hex
    tools/power turbostat: enable package THERM_INTERRUPT dump
    tools/power turbostat: show missing Core and GFX power on SKL and KBL
    tools/power turbostat: bugfix: GFXMHz column not changing

    Rafael J. Wysocki
     
  • Signed-off-by: Len Brown

    Len Brown
     
  • Most CPUs do not have a hardware c1 counter,
    and so turbostat derives c1 residency:

    c1 = TSC - MPERF - other_core_cstate_counters

    As it is not possible to atomically read these coutners,
    measurement jitter can case this calcuation to "go negative"
    when very close to 0. Turbostat detect that case and
    simply prints c1 = 0.00%

    But that check neglected to account for systems where the TSC
    crystal clock domain and the MPERF BCLK domain are differ by
    a small amount. That allowed very small negative c1 numbers
    to escape this check and be printed as huge positve numbers.

    This code begs for a bit of cleanup, but this patch
    is the minimal change to fix the issue.

    Signed-off-by: Len Brown

    Len Brown
     
  • Add GFX%rc6 and GFXMHz to the column descriptions section
    of the turbostat man page.

    Signed-off-by: Doug Smythies
    Signed-off-by: Len Brown

    Doug Smythies
     
  • Syntax only.

    The HWP CAPABILTIES and REQUEST ratios are more easily
    viewed in decimal -- just multiply by 100 and you get MHz...

    new:
    cpu0: MSR_HWP_CAPABILITIES: 0x010c1b23 (high 35 guar 27 eff 12 low 1)
    cpu0: MSR_HWP_REQUEST: 0x80002301 (min 1 max 35 des 0 epp 0x80 window 0x0 pkg 0x0)

    old:
    cpu0: MSR_HWP_CAPABILITIES: 0x010c1b23 (high 0x23 guar 0x1b eff 0xc low 0x1)
    cpu0: MSR_HWP_REQUEST: 0x80002301 (min 0x1 max 0x23 des 0x0 epp 0x80 window 0x0 pkg 0x0)

    Signed-off-by: Len Brown

    Len Brown
     
  • cpu0: MSR_IA32_TEMPERATURE_TARGET: 0x00641400 (100 C)
    cpu0: MSR_IA32_PACKAGE_THERM_STATUS: 0x884b0800 (25 C)
    cpu0: MSR_IA32_PACKAGE_THERM_INTERRUPT: 0x00000003 (100 C, 100 C)

    Enable the same per-core output, but hide it behind --debug
    because it is too verbose on big systems.

    Signed-off-by: Len Brown

    Len Brown
     
  • While the current SDM is silent on the matter, the Core and GFX
    RAPL power meters on SKL and KBL appear to work -- so show them.

    Reported-by: Yaroslav Isakov
    Signed-off-by: Len Brown

    Len Brown
     

12 Apr, 2017

1 commit


10 Apr, 2017

1 commit


09 Apr, 2017

1 commit

  • Pull powerpc fixes from Michael Ellerman:
    "Some more powerpc fixes for 4.11:

    Headed to stable:

    - disable HFSCR[TM] if TM is not supported, fixes a potential host
    kernel crash triggered by a hostile guest, but only in
    configurations that no one uses

    - don't try to fix up misaligned load-with-reservation instructions

    - fix flush_(d|i)cache_range() called from modules on little endian
    kernels

    - add missing global TLB invalidate if cxl is active

    - fix missing preempt_disable() in crc32c-vpmsum

    And a fix for selftests build changes that went in this release:

    - selftests/powerpc: Fix standalone powerpc build

    Thanks to: Benjamin Herrenschmidt, Frederic Barrat, Oliver O'Halloran,
    Paul Mackerras"

    * tag 'powerpc-4.11-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
    powerpc/crypto/crc32c-vpmsum: Fix missing preempt_disable()
    powerpc/mm: Add missing global TLB invalidate if cxl is active
    powerpc/64: Fix flush_(d|i)cache_range() called from modules
    powerpc: Don't try to fix up misaligned load-with-reservation instructions
    powerpc: Disable HFSCR[TM] if TM is not supported
    selftests/powerpc: Fix standalone powerpc build

    Linus Torvalds
     

07 Apr, 2017

2 commits

  • since 4.10 perf annotate exits on s390 with an "unknown error -95".
    Turns out that commit 786c1b51844d ("perf annotate: Start supporting
    cross arch annotation") added a hard requirement for architecture
    support when objdump is used but only provided x86 and arm support.
    Meanwhile power was added so lets add s390 as well.

    While at it make sure to implement the branch and jump types.

    Signed-off-by: Christian Borntraeger
    Cc: Andreas Krebbel
    Cc: Hendrik Brueckner
    Cc: Martin Schwidefsky
    Cc: Peter Zijlstra
    Cc: linux-s390
    Cc: stable@kernel.org # v4.10+
    Fixes: 786c1b51844 "perf annotate: Start supporting cross arch annotation"
    Link: http://lkml.kernel.org/r/1491465112-45819-2-git-send-email-borntraeger@de.ibm.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Christian Borntraeger
     
  • fix artifact of merge resolution

    Signed-off-by: Alexei Starovoitov
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     

06 Apr, 2017

1 commit