31 Mar, 2020

1 commit

  • Pull locking updates from Ingo Molnar:
    "The main changes in this cycle were:

    - Continued user-access cleanups in the futex code.

    - percpu-rwsem rewrite that uses its own waitqueue and atomic_t
    instead of an embedded rwsem. This addresses a couple of
    weaknesses, but the primary motivation was complications on the -rt
    kernel.

    - Introduce raw lock nesting detection on lockdep
    (CONFIG_PROVE_RAW_LOCK_NESTING=y), document the raw_lock vs. normal
    lock differences. This too originates from -rt.

    - Reuse lockdep zapped chain_hlocks entries, to conserve RAM
    footprint on distro-ish kernels running into the "BUG:
    MAX_LOCKDEP_CHAIN_HLOCKS too low!" depletion of the lockdep
    chain-entries pool.

    - Misc cleanups, smaller fixes and enhancements - see the changelog
    for details"

    * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (55 commits)
    fs/buffer: Make BH_Uptodate_Lock bit_spin_lock a regular spinlock_t
    thermal/x86_pkg_temp: Make pkg_temp_lock a raw_spinlock_t
    Documentation/locking/locktypes: Minor copy editor fixes
    Documentation/locking/locktypes: Further clarifications and wordsmithing
    m68knommu: Remove mm.h include from uaccess_no.h
    x86: get rid of user_atomic_cmpxchg_inatomic()
    generic arch_futex_atomic_op_inuser() doesn't need access_ok()
    x86: don't reload after cmpxchg in unsafe_atomic_op2() loop
    x86: convert arch_futex_atomic_op_inuser() to user_access_begin/user_access_end()
    objtool: whitelist __sanitizer_cov_trace_switch()
    [parisc, s390, sparc64] no need for access_ok() in futex handling
    sh: no need of access_ok() in arch_futex_atomic_op_inuser()
    futex: arch_futex_atomic_op_inuser() calling conventions change
    completion: Use lockdep_assert_RT_in_threaded_ctx() in complete_all()
    lockdep: Add posixtimer context tracing bits
    lockdep: Annotate irq_work
    lockdep: Add hrtimer context tracing bits
    lockdep: Introduce wait-type checks
    completion: Use simple wait queues
    sched/swait: Prepare usage in completions
    ...

    Linus Torvalds
     

28 Mar, 2020

1 commit

  • it's not really different from e.g. __sanitizer_cov_trace_cmp4();
    as it is, the switches that generate an array of labels get
    rejected by objtool, while slightly different set of cases
    that gets compiled into a series of comparisons is accepted.

    Signed-off-by: Al Viro

    Al Viro
     

26 Mar, 2020

15 commits

  • In preparation to adding a vmlinux.o specific pass, rearrange some
    code. No functional changes intended.

    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Miroslav Benes
    Acked-by: Josh Poimboeuf
    Link: https://lkml.kernel.org/r/20200324160924.924304616@infradead.org

    Peter Zijlstra
     
  • Perf shows there is significant time in find_rela_by_dest(); this is
    because we have to iterate the address space per byte, looking for
    relocation entries.

    Optimize this by reducing the address space granularity.

    This reduces objtool on vmlinux.o runtime from 4.8 to 4.4 seconds.

    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Miroslav Benes
    Acked-by: Josh Poimboeuf
    Link: https://lkml.kernel.org/r/20200324160924.861321325@infradead.org

    Peter Zijlstra
     
  • Perf shows we spend a measurable amount of time spend cleaning up
    right before we exit anyway. Avoid the needsless work and just
    terminate.

    This reduces objtool on vmlinux.o runtime from 5.4s to 4.8s

    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Miroslav Benes
    Acked-by: Josh Poimboeuf
    Link: https://lkml.kernel.org/r/20200324160924.800720170@infradead.org

    Peter Zijlstra
     
  • Perf showed that __hash_init() is a significant portion of
    read_sections(), so instead of doing a per section rela_hash, use an
    elf-wide rela_hash.

    Statistics show us there are about 1.1 million relas, so size it
    accordingly.

    This reduces the objtool on vmlinux.o runtime to a third, from 15 to 5
    seconds.

    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Miroslav Benes
    Acked-by: Josh Poimboeuf
    Link: https://lkml.kernel.org/r/20200324160924.739153726@infradead.org

    Peter Zijlstra
     
  • Perf showed that find_symbol_by_name() takes time; add a symbol name
    hash.

    This shaves another second off of objtool on vmlinux.o runtime, down
    to 15 seconds.

    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Miroslav Benes
    Acked-by: Josh Poimboeuf
    Link: https://lkml.kernel.org/r/20200324160924.676865656@infradead.org

    Peter Zijlstra
     
  • Perf shows we're spending a lot of time in find_insn() and the
    statistics show we have around 3.2 million instruction. Increase the
    hash table size to reduce the bucket load from around 50 to 3.

    This shaves about 2s off of objtool on vmlinux.o runtime, down to 16s.

    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Miroslav Benes
    Acked-by: Josh Poimboeuf
    Link: https://lkml.kernel.org/r/20200324160924.617882545@infradead.org

    Peter Zijlstra
     
  • For consistency; we have:

    find_symbol_by_offset() / find_symbol_containing()
    find_func_by_offset() / find_containing_func()

    fix that.

    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Miroslav Benes
    Acked-by: Josh Poimboeuf
    Link: https://lkml.kernel.org/r/20200324160924.558470724@infradead.org

    Peter Zijlstra
     
  • All of:

    read_symbols(), find_symbol_by_offset(), find_symbol_containing(),
    find_containing_func()

    do a linear search of the symbols. Add an RB tree to make it go
    faster.

    This about halves objtool runtime on vmlinux.o, from 34s to 18s.

    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Miroslav Benes
    Acked-by: Josh Poimboeuf
    Link: https://lkml.kernel.org/r/20200324160924.499016559@infradead.org

    Peter Zijlstra
     
  • In order to avoid yet another linear search of (20k) sections, add a
    name based hash.

    This reduces objtool runtime on vmlinux.o by some 10s to around 35s.

    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Miroslav Benes
    Acked-by: Josh Poimboeuf
    Link: https://lkml.kernel.org/r/20200324160924.440174280@infradead.org

    Peter Zijlstra
     
  • In order to avoid a linear search (over 20k entries), add an
    section_hash to the elf object.

    This reduces objtool on vmlinux.o from a few minutes to around 45
    seconds.

    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Miroslav Benes
    Acked-by: Josh Poimboeuf
    Link: https://lkml.kernel.org/r/20200324160924.381249993@infradead.org

    Peter Zijlstra
     
  • Have it print a few numbers which can be used to size the hashtables.

    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Miroslav Benes
    Acked-by: Josh Poimboeuf
    Link: https://lkml.kernel.org/r/20200324160924.321381240@infradead.org

    Peter Zijlstra
     
  • The symbol index is object wide, not per section, so it makes no sense
    to have the symbol_hash be part of the section object. By moving it to
    the elf object we avoid the linear sections iteration.

    This reduces the runtime of objtool on vmlinux.o from over 3 hours (I
    gave up) to a few minutes. The defconfig vmlinux.o has around 20k
    sections.

    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Miroslav Benes
    Acked-by: Josh Poimboeuf
    Link: https://lkml.kernel.org/r/20200324160924.261852348@infradead.org

    Peter Zijlstra
     
  • Now that func_for_each_insn() is available, rename
    func_for_each_insn_all(). This gets us:

    sym_for_each_insn() - iterate on symbol offset/len
    func_for_each_insn() - iterate on insn->func

    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Miroslav Benes
    Acked-by: Josh Poimboeuf
    Link: https://lkml.kernel.org/r/20200324160924.083720147@infradead.org

    Peter Zijlstra
     
  • There is func_for_each_insn() and func_for_each_insn_all(), the both
    iterate the instructions, but the first uses symbol offset/length
    while the second uses insn->func.

    Rename func_for_each_insn() to sym_for_eac_insn() because it iterates
    on symbol information.

    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Miroslav Benes
    Acked-by: Josh Poimboeuf
    Link: https://lkml.kernel.org/r/20200324160924.024341229@infradead.org

    Peter Zijlstra
     
  • Trivial 'cleanup' to save one indentation level and match
    validate_call().

    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Miroslav Benes
    Acked-by: Josh Poimboeuf
    Link: https://lkml.kernel.org/r/20200324160923.963996225@infradead.org

    Peter Zijlstra
     

21 Feb, 2020

2 commits

  • A recent clang change, combined with a binutils bug, can trigger a
    situation where a ".Lprintk$local" STT_NOTYPE symbol gets created at the
    same offset as the "printk" STT_FUNC symbol. This confuses objtool:

    kernel/printk/printk.o: warning: objtool: ignore_loglevel_setup()+0x10: can't find call dest symbol at .text+0xc67

    Improve the call destination detection by looking specifically for an
    STT_FUNC symbol.

    Reported-by: Nick Desaulniers
    Signed-off-by: Josh Poimboeuf
    Signed-off-by: Borislav Petkov
    Tested-by: Nick Desaulniers
    Tested-by: Nathan Chancellor
    Link: https://github.com/ClangBuiltLinux/linux/issues/872
    Link: https://sourceware.org/bugzilla/show_bug.cgi?id=25551
    Link: https://lkml.kernel.org/r/0a7ee320bc0ea4469bd3dc450a7b4725669e0ea9.1581997059.git.jpoimboe@redhat.com

    Josh Poimboeuf
     
  • Clang has the ability to create a switch table which is not a jump
    table, but is rather a table of string pointers. This confuses objtool,
    because it sees the relocations for the string pointers and assumes
    they're part of a jump table:

    drivers/ata/sata_dwc_460ex.o: warning: objtool: sata_dwc_bmdma_start_by_tag()+0x3a2: can't find switch jump table
    net/ceph/messenger.o: warning: objtool: ceph_con_workfn()+0x47c: can't find switch jump table

    Make objtool's find_jump_table() smart enough to distinguish between a
    switch jump table (which has relocations to text addresses in the same
    function as the original instruction) and other anonymous rodata (which
    may have relocations to elsewhere).

    Reported-by: Nick Desaulniers
    Signed-off-by: Josh Poimboeuf
    Signed-off-by: Borislav Petkov
    Tested-by: Nick Desaulniers
    Tested-by: Nathan Chancellor
    Link: https://github.com/ClangBuiltLinux/linux/issues/485
    Link: https://lkml.kernel.org/r/263f6aae46d33da0b86d7030ced878cb5cab1788.1581997059.git.jpoimboe@redhat.com

    Josh Poimboeuf
     

11 Feb, 2020

3 commits

  • Relocations in alternative code can be dangerous, because the code is
    copy/pasted to the text section after relocations have been resolved,
    which can corrupt PC-relative addresses.

    However, relocations might be acceptable in some cases, depending on the
    architecture. For example, the x86 alternatives code manually fixes up
    the target addresses for PC-relative jumps and calls.

    So disallow relocations in alternative code, except where the x86 arch
    code allows it.

    This code may need to be tweaked for other arches when objtool gets
    support for them.

    Suggested-by: Linus Torvalds
    Signed-off-by: Josh Poimboeuf
    Signed-off-by: Borislav Petkov
    Reviewed-by: Julien Thierry
    Link: https://lkml.kernel.org/r/7b90b68d093311e4e8f6b504a9e1c758fd7e0002.1581359535.git.jpoimboe@redhat.com

    Josh Poimboeuf
     
  • There are several places where objtool tests for a non-dynamic (aka
    direct) jump. Move the check to a helper function.

    Signed-off-by: Josh Poimboeuf
    Signed-off-by: Borislav Petkov
    Reviewed-by: Julien Thierry
    Link: https://lkml.kernel.org/r/9b8b438df918276315e4765c60d2587f3c7ad698.1581359535.git.jpoimboe@redhat.com

    Josh Poimboeuf
     
  • When objtool encounters a fatal error, it usually means the binary is
    corrupt or otherwise broken in some way. Up until now, such errors were
    just treated as warnings which didn't fail the kernel build.

    However, objtool is now stable enough that if a fatal error is
    discovered, it most likely means something is seriously wrong and it
    should fail the kernel build.

    Note that this doesn't apply to "normal" objtool warnings; only fatal
    ones.

    Suggested-by: Borislav Petkov
    Signed-off-by: Josh Poimboeuf
    Signed-off-by: Borislav Petkov
    Reviewed-by: Julien Thierry
    Link: https://lkml.kernel.org/r/f18c3743de0fef673d49dd35760f26bdef7f6fc3.1581359535.git.jpoimboe@redhat.com

    Josh Poimboeuf
     

22 Jan, 2020

2 commits

  • Building objtool with ARCH=x86_64 fails with:

    $make ARCH=x86_64 -C tools/objtool
    ...
    CC arch/x86/decode.o
    arch/x86/decode.c:10:22: fatal error: asm/insn.h: No such file or directory
    #include
    ^
    compilation terminated.
    mv: cannot stat ‘arch/x86/.decode.o.tmp’: No such file or directory
    make[2]: *** [arch/x86/decode.o] Error 1
    ...

    The root cause is that the command-line variable 'ARCH' cannot be
    overridden. It can be replaced by 'SRCARCH', which is defined in
    'tools/scripts/Makefile.arch'.

    Signed-off-by: Shile Zhang
    Signed-off-by: Josh Poimboeuf
    Signed-off-by: Ingo Molnar
    Reviewed-by: Kamalesh Babulal
    Link: https://lore.kernel.org/r/d5d11370ae116df6c653493acd300ec3d7f5e925.1579543924.git.jpoimboe@redhat.com

    Shile Zhang
     
  • The sync-check.sh script prints out the path due to a "cd -" at the end
    of the script, even on silent builds. This isn't even needed, since the
    script is executed in our build instead of sourced (so it won't change
    the working directory of the surrounding build anyway).

    Just remove the cd to make the build silent.

    Fixes: 2ffd84ae973b ("objtool: Update sync-check.sh from perf's check-headers.sh")
    Signed-off-by: Olof Johansson
    Signed-off-by: Josh Poimboeuf
    Signed-off-by: Ingo Molnar
    Link: https://lore.kernel.org/r/cb002857fafa8186cfb9c3e43fb62e4108a1bab9.1579543924.git.jpoimboe@redhat.com

    Olof Johansson
     

27 Nov, 2019

2 commits

  • Pull perf updates from Ingo Molnar:
    "The main kernel side changes in this cycle were:

    - Various Intel-PT updates and optimizations (Alexander Shishkin)

    - Prohibit kprobes on Xen/KVM emulate prefixes (Masami Hiramatsu)

    - Add support for LSM and SELinux checks to control access to the
    perf syscall (Joel Fernandes)

    - Misc other changes, optimizations, fixes and cleanups - see the
    shortlog for details.

    There were numerous tooling changes as well - 254 non-merge commits.
    Here are the main changes - too many to list in detail:

    - Enhancements to core tooling infrastructure, perf.data, libperf,
    libtraceevent, event parsing, vendor events, Intel PT, callchains,
    BPF support and instruction decoding.

    - There were updates to the following tools:

    perf annotate
    perf diff
    perf inject
    perf kvm
    perf list
    perf maps
    perf parse
    perf probe
    perf record
    perf report
    perf script
    perf stat
    perf test
    perf trace

    - And a lot of other changes: please see the shortlog and Git log for
    more details"

    * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (279 commits)
    perf parse: Fix potential memory leak when handling tracepoint errors
    perf probe: Fix spelling mistake "addrees" -> "address"
    libtraceevent: Fix memory leakage in copy_filter_type
    libtraceevent: Fix header installation
    perf intel-bts: Does not support AUX area sampling
    perf intel-pt: Add support for decoding AUX area samples
    perf intel-pt: Add support for recording AUX area samples
    perf pmu: When using default config, record which bits of config were changed by the user
    perf auxtrace: Add support for queuing AUX area samples
    perf session: Add facility to peek at all events
    perf auxtrace: Add support for dumping AUX area samples
    perf inject: Cut AUX area samples
    perf record: Add aux-sample-size config term
    perf record: Add support for AUX area sampling
    perf auxtrace: Add support for AUX area sample recording
    perf auxtrace: Move perf_evsel__find_pmu()
    perf record: Add a function to test for kernel support for AUX area sampling
    perf tools: Add kernel AUX area sampling definitions
    perf/core: Make the mlock accounting simple again
    perf report: Jump to symbol source view from total cycles view
    ...

    Linus Torvalds
     
  • Pull x86 asm updates from Ingo Molnar:
    "The main changes in this cycle were:

    - Cross-arch changes to move the linker sections for NOTES and
    EXCEPTION_TABLE into the RO_DATA area, where they belong on most
    architectures. (Kees Cook)

    - Switch the x86 linker fill byte from x90 (NOP) to 0xcc (INT3), to
    trap jumps into the middle of those padding areas instead of
    sliding execution. (Kees Cook)

    - A thorough cleanup of symbol definitions within x86 assembler code.
    The rather randomly named macros got streamlined around a
    (hopefully) straightforward naming scheme:

    SYM_START(name, linkage, align...)
    SYM_END(name, sym_type)

    SYM_FUNC_START(name)
    SYM_FUNC_END(name)

    SYM_CODE_START(name)
    SYM_CODE_END(name)

    SYM_DATA_START(name)
    SYM_DATA_END(name)

    etc - with about three times of these basic primitives with some
    label, local symbol or attribute variant, expressed via postfixes.

    No change in functionality intended. (Jiri Slaby)

    - Misc other changes, cleanups and smaller fixes"

    * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (67 commits)
    x86/entry/64: Remove pointless jump in paranoid_exit
    x86/entry/32: Remove unused resume_userspace label
    x86/build/vdso: Remove meaningless CFLAGS_REMOVE_*.o
    m68k: Convert missed RODATA to RO_DATA
    x86/vmlinux: Use INT3 instead of NOP for linker fill bytes
    x86/mm: Report actual image regions in /proc/iomem
    x86/mm: Report which part of kernel image is freed
    x86/mm: Remove redundant address-of operators on addresses
    xtensa: Move EXCEPTION_TABLE to RO_DATA segment
    powerpc: Move EXCEPTION_TABLE to RO_DATA segment
    parisc: Move EXCEPTION_TABLE to RO_DATA segment
    microblaze: Move EXCEPTION_TABLE to RO_DATA segment
    ia64: Move EXCEPTION_TABLE to RO_DATA segment
    h8300: Move EXCEPTION_TABLE to RO_DATA segment
    c6x: Move EXCEPTION_TABLE to RO_DATA segment
    arm64: Move EXCEPTION_TABLE to RO_DATA segment
    alpha: Move EXCEPTION_TABLE to RO_DATA segment
    x86/vmlinux: Move EXCEPTION_TABLE to RO_DATA segment
    x86/vmlinux: Actually use _etext for the end of the text segment
    vmlinux.lds.h: Allow EXCEPTION_TABLE to live in RO_DATA
    ...

    Linus Torvalds
     

28 Oct, 2019

1 commit

  • The new check_zeroed_user() function uses variable shifts inside of a
    user_access_begin()/user_access_end() section and that results in GCC
    emitting __ubsan_handle_shift_out_of_bounds() calls, even though
    through value range analysis it would be able to see that the UB in
    question is impossible.

    Annotate and whitelist this UBSAN function; continued use of
    user_access_begin()/user_access_end() will undoubtedly result in
    further uses of function.

    Reported-by: Randy Dunlap
    Tested-by: Randy Dunlap
    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Randy Dunlap
    Acked-by: Christian Brauner
    Cc: Josh Poimboeuf
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Stephen Rothwell
    Cc: Thomas Gleixner
    Cc: cyphar@cyphar.com
    Cc: keescook@chromium.org
    Cc: linux@rasmusvillemoes.dk
    Fixes: f5a1a536fa14 ("lib: introduce copy_struct_from_user() helper")
    Link: https://lkml.kernel.org/r/20191021131149.GA19358@hirez.programming.kicks-ass.net
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

18 Oct, 2019

1 commit

  • Decode Xen and KVM's emulate-prefix signature by x86 insn decoder.
    It is called "prefix" but actually not x86 instruction prefix, so
    this adds insn.emulate_prefix_size field instead of reusing
    insn.prefixes.

    If x86 decoder finds a special sequence of instructions of
    XEN_EMULATE_PREFIX and 'ud2a; .ascii "kvm"', it just counts the
    length, set insn.emulate_prefix_size and fold it with the next
    instruction. In other words, the signature and the next instruction
    is treated as a single instruction.

    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Josh Poimboeuf
    Cc: Juergen Gross
    Cc: x86@kernel.org
    Cc: Boris Ostrovsky
    Cc: Ingo Molnar
    Cc: Stefano Stabellini
    Cc: Andrew Cooper
    Cc: Borislav Petkov
    Cc: xen-devel@lists.xenproject.org
    Cc: Randy Dunlap
    Link: https://lkml.kernel.org/r/156777564986.25081.4964537658500952557.stgit@devnote2

    Masami Hiramatsu
     

01 Oct, 2019

1 commit

  • Fix the following warning seen on GCC 7.3:
    kunit/test-test.o: warning: objtool: kunit_test_unsuccessful_try() falls through to next function kunit_test_catch()

    kunit_try_catch_throw is a function added in the following patch in this
    series; it allows KUnit, a unit testing framework for the kernel, to
    bail out of a broken test. As a consequence, it is a new __noreturn
    function that objtool thinks is broken (as seen above). So fix this
    warning by adding kunit_try_catch_throw to objtool's noreturn list.

    Reported-by: kbuild test robot
    Signed-off-by: Brendan Higgins
    Acked-by: Josh Poimboeuf
    Link: https://www.spinics.net/lists/linux-kbuild/msg21708.html
    Cc: Peter Zijlstra
    Signed-off-by: Shuah Khan

    Brendan Higgins
     

25 Sep, 2019

1 commit

  • Explicitly check kvm_rebooting in kvm_spurious_fault() prior to invoking
    BUG(), as opposed to assuming the caller has already done so. Letting
    kvm_spurious_fault() be called "directly" will allow VMX to better
    optimize its low level assembly flows.

    As a happy side effect, kvm_spurious_fault() no longer needs to be
    marked as a dead end since it doesn't unconditionally BUG().

    Acked-by: Paolo Bonzini
    Cc: Josh Poimboeuf
    Signed-off-by: Sean Christopherson
    Signed-off-by: Paolo Bonzini

    Sean Christopherson
     

17 Sep, 2019

1 commit

  • Pull perf updates from Ingo Molnar:
    "Kernel side changes:

    - Improved kbprobes robustness

    - Intel PEBS support for PT hardware tracing

    - Other Intel PT improvements: high order pages memory footprint
    reduction and various related cleanups

    - Misc cleanups

    The perf tooling side has been very busy in this cycle, with over 300
    commits. This is an incomplete high-level summary of the many
    improvements done by over 30 developers:

    - Lots of updates to the following tools:

    'perf c2c'
    'perf config'
    'perf record'
    'perf report'
    'perf script'
    'perf test'
    'perf top'
    'perf trace'

    - Updates to libperf and libtraceevent, and a consolidation of the
    proliferation of x86 instruction decoder libraries.

    - Vendor event updates for Intel and PowerPC CPUs,

    - Updates to hardware tracing tooling for ARM and Intel CPUs,

    - ... and lots of other changes and cleanups - see the shortlog and
    Git log for details"

    * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (322 commits)
    kprobes: Prohibit probing on BUG() and WARN() address
    perf/x86: Make more stuff static
    x86, perf: Fix the dependency of the x86 insn decoder selftest
    objtool: Ignore intentional differences for the x86 insn decoder
    objtool: Update sync-check.sh from perf's check-headers.sh
    perf build: Ignore intentional differences for the x86 insn decoder
    perf intel-pt: Use shared x86 insn decoder
    perf intel-pt: Remove inat.c from build dependency list
    perf: Update .gitignore file
    objtool: Move x86 insn decoder to a common location
    perf metricgroup: Support multiple events for metricgroup
    perf metricgroup: Scale the metric result
    perf pmu: Change convert_scale from static to global
    perf symbols: Move mem_info and branch_info out of symbol.h
    perf auxtrace: Uninline functions that touch perf_session
    perf tools: Remove needless evlist.h include directives
    perf tools: Remove needless evlist.h include directives
    perf tools: Remove needless thread_map.h include directives
    perf tools: Remove needless thread.h include directives
    perf tools: Remove needless map.h include directives
    ...

    Linus Torvalds
     

10 Sep, 2019

1 commit

  • If the build user has the CFLAGS variable set in their environment,
    objtool blindly appends to it, which can cause unexpected behavior.

    Clobber CFLAGS to ensure consistent objtool compilation behavior.

    Reported-by: Valdis Kletnieks
    Tested-by: Valdis Kletnieks
    Signed-off-by: Josh Poimboeuf
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: https://lkml.kernel.org/r/83a276df209962e6058fcb6c615eef9d401c21bc.1567121311.git.jpoimboe@redhat.com
    Signed-off-by: Ingo Molnar

    Josh Poimboeuf
     

01 Sep, 2019

3 commits

  • Since we need to build this in !x86, we need to explicitely use the x86
    files, not things like asm/insn.h, so we intentionally differ from the
    master copy in the kernel sources, add -I diff directives to ignore just
    these differences when checking for drift.

    Acked-by: Josh Poimboeuf
    Link: http://lore.kernel.org/lkml/20190830193109.p7jagidsrahoa4pn@treble
    Acked-by: Masami Hiramatsu
    Cc: Adrian Hunter
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Peter Zijlstra (Intel)
    Link: https://lkml.kernel.org/n/tip-j965m9b7xtdc83em3twfkh9o@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • To allow using the -I trick that will be needed for checking the x86
    insn decoder files.

    Without the specific -I lines we still get the same warnings as before:

    $ make -C tools/objtool/ clean ; make -C tools/objtool/
    make: Entering directory '/home/acme/git/perf/tools/objtool'
    CLEAN objtool
    find -name '*.o' -delete -o -name '\.*.cmd' -delete -o -name '\.*.d' -delete
    rm -f arch/x86/inat-tables.c fixdep

    LD objtool-in.o
    make[1]: Leaving directory '/home/acme/git/perf/tools/objtool'
    Warning: Kernel ABI header at 'tools/arch/x86/include/asm/inat.h' differs from latest version at 'arch/x86/include/asm/inat.h'
    diff -u tools/arch/x86/include/asm/inat.h arch/x86/include/asm/inat.h
    Warning: Kernel ABI header at 'tools/arch/x86/include/asm/insn.h' differs from latest version at 'arch/x86/include/asm/insn.h'
    diff -u tools/arch/x86/include/asm/insn.h arch/x86/include/asm/insn.h
    Warning: Kernel ABI header at 'tools/arch/x86/lib/inat.c' differs from latest version at 'arch/x86/lib/inat.c'
    diff -u tools/arch/x86/lib/inat.c arch/x86/lib/inat.c
    Warning: Kernel ABI header at 'tools/arch/x86/lib/insn.c' differs from latest version at 'arch/x86/lib/insn.c'
    diff -u tools/arch/x86/lib/insn.c arch/x86/lib/insn.c
    /home/acme/git/perf/tools/objtool
    LINK objtool
    make: Leaving directory '/home/acme/git/perf/tools/objtool'
    $

    The next patch will add the -I lines for those files.

    Acked-by: Josh Poimboeuf
    Link: http://lore.kernel.org/lkml/20190830193109.p7jagidsrahoa4pn@treble
    Acked-by: Masami Hiramatsu
    Cc: Adrian Hunter
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Peter Zijlstra (Intel)
    Link: https://lkml.kernel.org/n/tip-vu3p38mnxlwd80rlsnjkqcf2@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • The kernel tree has three identical copies of the x86 instruction
    decoder. Two of them are in the tools subdir.

    The tools subdir is supposed to be completely standalone and separate
    from the kernel. So having at least one copy of the kernel decoder in
    the tools subdir is unavoidable. However, we don't need *two* of them.

    Move objtool's copy of the decoder to a shared location, so that perf
    will also be able to use it.

    Signed-off-by: Josh Poimboeuf
    Reviewed-by: Masami Hiramatsu
    Acked-by: Peter Zijlstra (Intel)
    Cc: Adrian Hunter
    Cc: Jiri Olsa
    Cc: x86@kernel.org
    Link: http://lore.kernel.org/lkml/55b486b88f6bcd0c9a2a04b34f964860c8390ca8.1567118001.git.jpoimboe@redhat.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Josh Poimboeuf
     

25 Jul, 2019

1 commit

  • A clang build reported an (obvious) double CLAC while a GCC build did not;
    it turns out that objtool only re-visits instructions if the first visit
    was with AC=0. If OTOH the first visit was with AC=1, it completely ignores
    any subsequent visit, even when it has AC=0.

    Fix this by using a visited mask instead of a boolean, and (explicitly)
    mark the AC state.

    $ ./objtool check -b --no-fp --retpoline --uaccess drivers/gpu/drm/i915/gem/i915_gem_execbuffer.o
    drivers/gpu/drm/i915/gem/i915_gem_execbuffer.o: warning: objtool: .altinstr_replacement+0x22: redundant UACCESS disable
    drivers/gpu/drm/i915/gem/i915_gem_execbuffer.o: warning: objtool: eb_copy_relocations.isra.34()+0xea: (alt)
    drivers/gpu/drm/i915/gem/i915_gem_execbuffer.o: warning: objtool: .altinstr_replacement+0xffffffffffffffff: (branch)
    drivers/gpu/drm/i915/gem/i915_gem_execbuffer.o: warning: objtool: eb_copy_relocations.isra.34()+0xd9: (alt)
    drivers/gpu/drm/i915/gem/i915_gem_execbuffer.o: warning: objtool: eb_copy_relocations.isra.34()+0xb2: (branch)
    drivers/gpu/drm/i915/gem/i915_gem_execbuffer.o: warning: objtool: eb_copy_relocations.isra.34()+0x39: (branch)
    drivers/gpu/drm/i915/gem/i915_gem_execbuffer.o: warning: objtool: eb_copy_relocations.isra.34()+0x0:
    Reported-by: Thomas Gleixner
    Reported-by: Sedat Dilek
    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Josh Poimboeuf
    Signed-off-by: Thomas Gleixner
    Tested-by: Nathan Chancellor
    Tested-by: Nick Desaulniers
    Tested-by: Sedat Dilek
    Link: https://github.com/ClangBuiltLinux/linux/issues/617
    Link: https://lkml.kernel.org/r/5359166aad2d53f3145cd442d83d0e5115e0cd17.1564007838.git.jpoimboe@redhat.com

    Peter Zijlstra
     

19 Jul, 2019

4 commits

  • A Clang-built kernel is showing the following warning:

    arch/x86/kernel/platform-quirks.o: warning: objtool: x86_early_init_platform_quirks()+0x84: unreachable instruction

    That corresponds to this code:

    7e: 0f 85 00 00 00 00 jne 84
    80: R_X86_64_PC32 __x86_indirect_thunk_r11-0x4
    84: c3 retq

    This is a conditional retpoline sibling call, which is now possible
    thanks to retpolines. Objtool hasn't seen that before. It's
    incorrectly interpreting the conditional jump as an unconditional
    dynamic jump.

    Reported-by: Nick Desaulniers
    Signed-off-by: Josh Poimboeuf
    Signed-off-by: Thomas Gleixner
    Tested-by: Nick Desaulniers
    Acked-by: Peter Zijlstra (Intel)
    Link: https://lkml.kernel.org/r/30d4c758b267ef487fb97e6ecb2f148ad007b554.1563413318.git.jpoimboe@redhat.com

    Josh Poimboeuf
     
  • This makes it easier to add new instruction types. Also it's hopefully
    more robust since the compiler should warn about out-of-range enums.

    Signed-off-by: Josh Poimboeuf
    Signed-off-by: Thomas Gleixner
    Tested-by: Nick Desaulniers
    Acked-by: Peter Zijlstra (Intel)
    Link: https://lkml.kernel.org/r/0740e96af0d40e54cfd6a07bf09db0fbd10793cd.1563413318.git.jpoimboe@redhat.com

    Josh Poimboeuf
     
  • In one rare case, Clang generated the following code:

    5ca: 83 e0 21 and $0x21,%eax
    5cd: b9 04 00 00 00 mov $0x4,%ecx
    5d2: ff 24 c5 00 00 00 00 jmpq *0x0(,%rax,8)
    5d5: R_X86_64_32S .rodata+0x38

    which uses the corresponding jump table relocations:

    000000000038 000200000001 R_X86_64_64 0000000000000000 .text + 834
    000000000040 000200000001 R_X86_64_64 0000000000000000 .text + 5d9
    000000000048 000200000001 R_X86_64_64 0000000000000000 .text + b96
    000000000050 000200000001 R_X86_64_64 0000000000000000 .text + b96
    000000000058 000200000001 R_X86_64_64 0000000000000000 .text + b96
    000000000060 000200000001 R_X86_64_64 0000000000000000 .text + b96
    000000000068 000200000001 R_X86_64_64 0000000000000000 .text + b96
    000000000070 000200000001 R_X86_64_64 0000000000000000 .text + b96
    000000000078 000200000001 R_X86_64_64 0000000000000000 .text + b96
    000000000080 000200000001 R_X86_64_64 0000000000000000 .text + b96
    000000000088 000200000001 R_X86_64_64 0000000000000000 .text + b96
    000000000090 000200000001 R_X86_64_64 0000000000000000 .text + b96
    000000000098 000200000001 R_X86_64_64 0000000000000000 .text + b96
    0000000000a0 000200000001 R_X86_64_64 0000000000000000 .text + b96
    0000000000a8 000200000001 R_X86_64_64 0000000000000000 .text + b96
    0000000000b0 000200000001 R_X86_64_64 0000000000000000 .text + b96
    0000000000b8 000200000001 R_X86_64_64 0000000000000000 .text + b96
    0000000000c0 000200000001 R_X86_64_64 0000000000000000 .text + b96
    0000000000c8 000200000001 R_X86_64_64 0000000000000000 .text + b96
    0000000000d0 000200000001 R_X86_64_64 0000000000000000 .text + b96
    0000000000d8 000200000001 R_X86_64_64 0000000000000000 .text + b96
    0000000000e0 000200000001 R_X86_64_64 0000000000000000 .text + b96
    0000000000e8 000200000001 R_X86_64_64 0000000000000000 .text + b96
    0000000000f0 000200000001 R_X86_64_64 0000000000000000 .text + b96
    0000000000f8 000200000001 R_X86_64_64 0000000000000000 .text + b96
    000000000100 000200000001 R_X86_64_64 0000000000000000 .text + b96
    000000000108 000200000001 R_X86_64_64 0000000000000000 .text + b96
    000000000110 000200000001 R_X86_64_64 0000000000000000 .text + b96
    000000000118 000200000001 R_X86_64_64 0000000000000000 .text + b96
    000000000120 000200000001 R_X86_64_64 0000000000000000 .text + b96
    000000000128 000200000001 R_X86_64_64 0000000000000000 .text + b96
    000000000130 000200000001 R_X86_64_64 0000000000000000 .text + b96
    000000000138 000200000001 R_X86_64_64 0000000000000000 .text + 82f
    000000000140 000200000001 R_X86_64_64 0000000000000000 .text + 828

    Since %eax was masked with 0x21, only the first two and the last two
    entries are possible.

    Objtool doesn't actually emulate all the code, so it isn't smart enough
    to know that all the middle entries aren't reachable. They point to the
    NOP padding area after the end of the function, so objtool seg faulted
    when it tried to dereference a NULL insn->func.

    After this fix, objtool still gives an "unreachable" error because it
    stops reading the jump table when it encounters the bad addresses:

    /home/jpoimboe/objtool-tests/adm1275.o: warning: objtool: adm1275_probe()+0x828: unreachable instruction

    While the above code is technically correct, it's very wasteful of
    memory -- it uses 34 jump table entries when only 4 are needed. It's
    also not possible for objtool to validate this type of switch table
    because the unused entries point outside the function and objtool has no
    way of determining if that's intentional. Hopefully the Clang folks can
    fix it.

    Reported-by: Arnd Bergmann
    Signed-off-by: Josh Poimboeuf
    Signed-off-by: Thomas Gleixner
    Tested-by: Nick Desaulniers
    Acked-by: Peter Zijlstra (Intel)
    Link: https://lkml.kernel.org/r/a9db88eec4f1ca089e040989846961748238b6d8.1563413318.git.jpoimboe@redhat.com

    Josh Poimboeuf
     
  • This fixes objtool for both a GCC issue and a Clang issue:

    1) GCC issue:

    kernel/bpf/core.o: warning: objtool: ___bpf_prog_run()+0x8d5: sibling call from callable instruction with modified stack frame

    With CONFIG_RETPOLINE=n, GCC is doing the following optimization in
    ___bpf_prog_run().

    Before:

    select_insn:
    jmp *jumptable(,%rax,8)
    ...
    ALU64_ADD_X:
    ...
    jmp select_insn
    ALU_ADD_X:
    ...
    jmp select_insn

    After:

    select_insn:
    jmp *jumptable(, %rax, 8)
    ...
    ALU64_ADD_X:
    ...
    jmp *jumptable(, %rax, 8)
    ALU_ADD_X:
    ...
    jmp *jumptable(, %rax, 8)

    This confuses objtool. It has never seen multiple indirect jump
    sites which use the same jump table.

    For GCC switch tables, the only way of detecting the size of a table
    is by continuing to scan for more tables. The size of the previous
    table can only be determined after another switch table is found, or
    when the scan reaches the end of the function.

    That logic was reused for C jump tables, and was based on the
    assumption that each jump table only has a single jump site. The
    above optimization breaks that assumption.

    2) Clang issue:

    drivers/usb/misc/sisusbvga/sisusb.o: warning: objtool: sisusb_write_mem_bulk()+0x588: can't find switch jump table

    With clang 9, code can be generated where a function contains two
    indirect jump instructions which use the same switch table.

    The fix is the same for both issues: split the jump table parsing into
    two passes.

    In the first pass, locate the heads of all switch tables for the
    function and mark their locations.

    In the second pass, parse the switch tables and add them.

    Fixes: e55a73251da3 ("bpf: Fix ORC unwinding in non-JIT BPF code")
    Reported-by: Randy Dunlap
    Reported-by: Arnd Bergmann
    Signed-off-by: Jann Horn
    Signed-off-by: Josh Poimboeuf
    Signed-off-by: Thomas Gleixner
    Tested-by: Nick Desaulniers
    Acked-by: Peter Zijlstra (Intel)
    Link: https://lkml.kernel.org/r/e995befaada9d4d8b2cf788ff3f566ba900d2b4d.1563413318.git.jpoimboe@redhat.com

    Co-developed-by: Josh Poimboeuf

    Jann Horn