19 Sep, 2020

1 commit


10 Sep, 2020

10 commits


02 Sep, 2020

4 commits

  • Implementation of ORC requires some definitions that are currently
    provided by the target architecture headers. Do not depend on these
    definitions when the orc subcommand is not implemented.

    This avoid requiring arches with no orc implementation to provide dummy
    orc definitions.

    Signed-off-by: Julien Thierry
    Reviewed-by: Miroslav Benes
    Signed-off-by: Josh Poimboeuf

    Julien Thierry
     
  • Orc generation is only done for text sections, but some instructions
    can be found in non-text sections (e.g. .discard.text sections).

    Skip setting their orc sections since their whole sections will be
    skipped for orc generation.

    Reviewed-by: Miroslav Benes
    Signed-off-by: Julien Thierry
    Signed-off-by: Josh Poimboeuf

    Julien Thierry
     
  • Now that the objtool_file can be obtained outside of the check function,
    orc generation builtin no longer requires check to explicitly call its
    orc related functions.

    Signed-off-by: Julien Thierry
    Reviewed-by: Miroslav Benes
    Signed-off-by: Josh Poimboeuf

    Julien Thierry
     
  • Structure objtool_file can be used by different subcommands. In fact
    it already is, by check and orc.

    Provide a function that allows to initialize objtool_file, that builtin
    can call, without relying on check to do the correct setup for them and
    explicitly hand the objtool_file to them.

    Reviewed-by: Miroslav Benes
    Signed-off-by: Julien Thierry
    Signed-off-by: Josh Poimboeuf

    Julien Thierry
     

01 Sep, 2020

18 commits

  • Replace many of the indirect calls with static_call().

    The average PMI time, as measured by perf_sample_event_took()*:

    PRE: 3283.03 [ns]
    POST: 3145.12 [ns]

    Which is a ~138 [ns] win per PMI, or a ~4.2% decrease.

    [*] on an IVB-EP, using: 'perf record -a -e cycles -- make O=defconfig-build/ -j80'

    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Ingo Molnar
    Cc: Linus Torvalds
    Link: https://lore.kernel.org/r/20200818135805.338001015@infradead.org

    Peter Zijlstra
     
  • Currently the tracepoint site will iterate a vector and issue indirect
    calls to however many handlers are registered (ie. the vector is
    long).

    Using static_call() it is possible to optimize this for the common
    case of only having a single handler registered. In this case the
    static_call() can directly call this handler. Otherwise, if the vector
    is longer than 1, call a function that iterates the whole vector like
    the current code.

    [peterz: updated to new interface]

    Signed-off-by: Steven Rostedt (VMware)
    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Ingo Molnar
    Cc: Linus Torvalds
    Link: https://lore.kernel.org/r/20200818135805.279421092@infradead.org

    Steven Rostedt (VMware)
     
  • In order to use static_call() to wire up x86_pmu, we need to
    initialize earlier, specifically before memory allocation works; copy
    some of the tricks from jump_label to enable this.

    Primarily we overload key->next to store a sites pointer when there
    are no modules, this avoids having to use kmalloc() to initialize the
    sites and allows us to run much earlier.

    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Ingo Molnar
    Reviewed-by: Steven Rostedt (VMware)
    Link: https://lore.kernel.org/r/20200818135805.220737930@infradead.org

    Peter Zijlstra
     
  • Verify the text we're about to change is as we expect it to be.

    Requested-by: Steven Rostedt

    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Ingo Molnar
    Link: https://lore.kernel.org/r/20200818135805.161974981@infradead.org

    Peter Zijlstra
     
  • GCC can turn our static_call(name)(args...) into a tail call, in which
    case we get a JMP.d32 into the trampoline (which then does a further
    tail-call).

    Teach objtool to recognise and mark these in .static_call_sites and
    adjust the code patching to deal with this.

    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Ingo Molnar
    Cc: Linus Torvalds
    Link: https://lore.kernel.org/r/20200818135805.101186767@infradead.org

    Peter Zijlstra
     
  • Extend the static_call infrastructure to optimize the following common
    pattern:

    if (func_ptr)
    func_ptr(args...)

    For the trampoline (which is in effect a tail-call), we patch the
    JMP.d32 into a RET, which then directly consumes the trampoline call.

    For the in-line sites we replace the CALL with a NOP5.

    NOTE: this is 'obviously' limited to functions with a 'void' return type.

    NOTE: DEFINE_STATIC_COND_CALL() only requires a typename, as opposed
    to a full function.

    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Ingo Molnar
    Cc: Linus Torvalds
    Link: https://lore.kernel.org/r/20200818135805.042977182@infradead.org

    Peter Zijlstra
     
  • Future patches will need to poke a RET instruction, provide the
    infrastructure required for this.

    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Ingo Molnar
    Reviewed-by: Steven Rostedt (VMware)
    Cc: Masami Hiramatsu
    Link: https://lore.kernel.org/r/20200818135804.982214828@infradead.org

    Peter Zijlstra
     
  • Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Ingo Molnar
    Link: https://lore.kernel.org/r/20200818135804.922581202@infradead.org

    Peter Zijlstra
     
  • Add the inline static call implementation for x86-64. The generated code
    is identical to the out-of-line case, except we move the trampoline into
    it's own section.

    Objtool uses the trampoline naming convention to detect all the call
    sites. It then annotates those call sites in the .static_call_sites
    section.

    During boot (and module init), the call sites are patched to call
    directly into the destination function. The temporary trampoline is
    then no longer used.

    [peterz: merged trampolines, put trampoline in section]

    Signed-off-by: Josh Poimboeuf
    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Ingo Molnar
    Cc: Linus Torvalds
    Link: https://lore.kernel.org/r/20200818135804.864271425@infradead.org

    Josh Poimboeuf
     
  • Add the x86 out-of-line static call implementation. For each key, a
    permanent trampoline is created which is the destination for all static
    calls for the given key. The trampoline has a direct jump which gets
    patched by static_call_update() when the destination function changes.

    [peterz: fixed trampoline, rewrote patching code]

    Signed-off-by: Josh Poimboeuf
    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Ingo Molnar
    Cc: Linus Torvalds
    Link: https://lore.kernel.org/r/20200818135804.804315175@infradead.org

    Josh Poimboeuf
     
  • Similar to how we disallow kprobes on any other dynamic text
    (ftrace/jump_label) also disallow kprobes on inline static_call()s.

    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Ingo Molnar
    Link: https://lore.kernel.org/r/20200818135804.744920586@infradead.org

    Peter Zijlstra
     
  • Add infrastructure for an arch-specific CONFIG_HAVE_STATIC_CALL_INLINE
    option, which is a faster version of CONFIG_HAVE_STATIC_CALL. At
    runtime, the static call sites are patched directly, rather than using
    the out-of-line trampolines.

    Compared to out-of-line static calls, the performance benefits are more
    modest, but still measurable. Steven Rostedt did some tracepoint
    measurements:

    https://lkml.kernel.org/r/20181126155405.72b4f718@gandalf.local.home

    This code is heavily inspired by the jump label code (aka "static
    jumps"), as some of the concepts are very similar.

    For more details, see the comments in include/linux/static_call.h.

    [peterz: simplified interface; merged trampolines]

    Signed-off-by: Josh Poimboeuf
    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Ingo Molnar
    Reviewed-by: Steven Rostedt (VMware)
    Cc: Linus Torvalds
    Link: https://lore.kernel.org/r/20200818135804.684334440@infradead.org

    Josh Poimboeuf
     
  • Static calls are a replacement for global function pointers. They use
    code patching to allow direct calls to be used instead of indirect
    calls. They give the flexibility of function pointers, but with
    improved performance. This is especially important for cases where
    retpolines would otherwise be used, as retpolines can significantly
    impact performance.

    The concept and code are an extension of previous work done by Ard
    Biesheuvel and Steven Rostedt:

    https://lkml.kernel.org/r/20181005081333.15018-1-ard.biesheuvel@linaro.org
    https://lkml.kernel.org/r/20181006015110.653946300@goodmis.org

    There are two implementations, depending on arch support:

    1) out-of-line: patched trampolines (CONFIG_HAVE_STATIC_CALL)
    2) basic function pointers

    For more details, see the comments in include/linux/static_call.h.

    [peterz: simplified interface]

    Signed-off-by: Josh Poimboeuf
    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Ingo Molnar
    Reviewed-by: Steven Rostedt (VMware)
    Cc: Linus Torvalds
    Link: https://lore.kernel.org/r/20200818135804.623259796@infradead.org

    Josh Poimboeuf
     
  • The __ADDRESSABLE() macro uses the __LINE__ macro to create a temporary
    symbol which has a unique name. However, if the macro is used multiple
    times from within another macro, the line number will always be the
    same, resulting in duplicate symbols.

    Make the temporary symbols truly unique by using __UNIQUE_ID instead of
    __LINE__.

    Signed-off-by: Josh Poimboeuf
    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Ingo Molnar
    Acked-by: Ard Biesheuvel
    Link: https://lore.kernel.org/r/20200818135804.564436253@infradead.org

    Josh Poimboeuf
     
  • Nothing ensures the module exists while we're iterating
    mod->jump_entries in __jump_label_mod_text_reserved(), take a module
    reference to ensure the module sticks around.

    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Ingo Molnar
    Reviewed-by: Steven Rostedt (VMware)
    Link: https://lore.kernel.org/r/20200818135804.504501338@infradead.org

    Peter Zijlstra
     
  • Now that notifiers got unbroken; use the proper interface to handle
    notifier errors and propagate them.

    There were already MODULE_STATE_COMING notifiers that failed; notably:

    - jump_label_module_notifier()
    - tracepoint_module_notify()
    - bpf_event_notify()

    By propagating this error, we fix those users.

    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Ingo Molnar
    Reviewed-by: Miroslav Benes
    Acked-by: Jessica Yu
    Acked-by: Josh Poimboeuf
    Link: https://lore.kernel.org/r/20200818135804.444372853@infradead.org

    Peter Zijlstra
     
  • While auditing all module notifiers I noticed a whole bunch of fail
    wrt the return value. Notifiers have a 'special' return semantics.

    As is; NOTIFY_DONE vs NOTIFY_OK is a bit vague; but
    notifier_from_errno(0) results in NOTIFY_OK and NOTIFY_DONE has a
    comment that says "Don't care".

    From this I've used NOTIFY_DONE when the function completely ignores
    the callback and notifier_to_error() isn't used.

    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Ingo Molnar
    Reviewed-by: Mathieu Desnoyers
    Reviewed-by: Joel Fernandes (Google)
    Reviewed-by: Robert Richter
    Acked-by: Steven Rostedt (VMware)
    Link: https://lore.kernel.org/r/20200818135804.385360407@infradead.org

    Peter Zijlstra
     
  • The current notifiers have the following error handling pattern all
    over the place:

    int err, nr;

    err = __foo_notifier_call_chain(&chain, val_up, v, -1, &nr);
    if (err & NOTIFIER_STOP_MASK)
    __foo_notifier_call_chain(&chain, val_down, v, nr-1, NULL)

    And aside from the endless repetition thereof, it is broken. Consider
    blocking notifiers; both calls take and drop the rwsem, this means
    that the notifier list can change in between the two calls, making @nr
    meaningless.

    Fix this by replacing all the __foo_notifier_call_chain() functions
    with foo_notifier_call_chain_robust() that embeds the above pattern,
    but ensures it is inside a single lock region.

    Note: I switched atomic_notifier_call_chain_robust() to use
    the spinlock, since RCU cannot provide the guarantee
    required for the recovery.

    Note: software_resume() error handling was broken afaict.

    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Ingo Molnar
    Acked-by: Rafael J. Wysocki
    Link: https://lore.kernel.org/r/20200818135804.325626653@infradead.org

    Peter Zijlstra
     

31 Aug, 2020

7 commits

  • Linus Torvalds
     
  • Pull crypto fixes from Herbert Xu:

    - fix regression in af_alg that affects iwd

    - restore polling delay in qat

    - fix double free in ingenic on error path

    - fix potential build failure in sa2ul due to missing Kconfig dependency

    * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
    crypto: af_alg - Work around empty control messages without MSG_MORE
    crypto: sa2ul - add Kconfig selects to fix build error
    crypto: ingenic - Drop kfree for memory allocated with devm_kzalloc
    crypto: qat - add delay before polling mailbox

    Linus Torvalds
     
  • Pull x86 fixes from Thomas Gleixner:
    "Three interrupt related fixes for X86:

    - Move disabling of the local APIC after invoking fixup_irqs() to
    ensure that interrupts which are incoming are noted in the IRR and
    not ignored.

    - Unbreak affinity setting.

    The rework of the entry code reused the regular exception entry
    code for device interrupts. The vector number is pushed into the
    errorcode slot on the stack which is then lifted into an argument
    and set to -1 because that's regs->orig_ax which is used in quite
    some places to check whether the entry came from a syscall.

    But it was overlooked that orig_ax is used in the affinity cleanup
    code to validate whether the interrupt has arrived on the new
    target. It turned out that this vector check is pointless because
    interrupts are never moved from one vector to another on the same
    CPU. That check is a historical leftover from the time where x86
    supported multi-CPU affinities, but not longer needed with the now
    strict single CPU affinity. Famous last words ...

    - Add a missing check for an empty cpumask into the matrix allocator.

    The affinity change added a warning to catch the case where an
    interrupt is moved on the same CPU to a different vector. This
    triggers because a condition with an empty cpumask returns an
    assignment from the allocator as the allocator uses for_each_cpu()
    without checking the cpumask for being empty. The historical
    inconsistent for_each_cpu() behaviour of ignoring the cpumask and
    unconditionally claiming that CPU0 is in the mask struck again.
    Sigh.

    plus a new entry into the MAINTAINER file for the HPE/UV platform"

    * tag 'x86-urgent-2020-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    genirq/matrix: Deal with the sillyness of for_each_cpu() on UP
    x86/irq: Unbreak interrupt affinity setting
    x86/hotplug: Silence APIC only after all interrupts are migrated
    MAINTAINERS: Add entry for HPE Superdome Flex (UV) maintainers

    Linus Torvalds
     
  • Pull irq fixes from Thomas Gleixner:
    "A set of fixes for interrupt chip drivers:

    - Revert the platform driver conversion of interrupt chip drivers as
    it turned out to create more problems than it solves.

    - Fix a trivial typo in the new module helpers which made probing
    reliably fail.

    - Small fixes in the STM32 and MIPS Ingenic drivers

    - The TI firmware rework which had badly managed dependencies and had
    to wait post rc1"

    * tag 'irq-urgent-2020-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    irqchip/ingenic: Leave parent IRQ unmasked on suspend
    irqchip/stm32-exti: Avoid losing interrupts due to clearing pending bits by mistake
    irqchip: Revert modular support for drivers using IRQCHIP_PLATFORM_DRIVER helperse
    irqchip: Fix probing deferal when using IRQCHIP_PLATFORM_DRIVER helpers
    arm64: dts: k3-am65: Update the RM resource types
    arm64: dts: k3-am65: ti-sci-inta/intr: Update to latest bindings
    arm64: dts: k3-j721e: ti-sci-inta/intr: Update to latest bindings
    irqchip/ti-sci-inta: Add support for INTA directly connecting to GIC
    irqchip/ti-sci-inta: Do not store TISCI device id in platform device id field
    dt-bindings: irqchip: Convert ti, sci-inta bindings to yaml
    dt-bindings: irqchip: ti, sci-inta: Update docs to support different parent.
    irqchip/ti-sci-intr: Add support for INTR being a parent to INTR
    dt-bindings: irqchip: Convert ti, sci-intr bindings to yaml
    dt-bindings: irqchip: ti, sci-intr: Update bindings to drop the usage of gic as parent
    firmware: ti_sci: Add support for getting resource with subtype
    firmware: ti_sci: Drop unused structure ti_sci_rm_type_map
    firmware: ti_sci: Drop the device id to resource type translation

    Linus Torvalds
     
  • Pull scheduler fix from Thomas Gleixner:
    "A single fix for the scheduler:

    - Make is_idle_task() __always_inline to prevent the compiler from
    putting it out of line into the wrong section because it's used
    inside noinstr sections"

    * tag 'sched-urgent-2020-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    sched: Use __always_inline on is_idle_task()

    Linus Torvalds
     
  • Pull locking fixes from Thomas Gleixner:
    "A set of fixes for lockdep, tracing and RCU:

    - Prevent recursion by using raw_cpu_* operations

    - Fixup the interrupt state in the cpu idle code to be consistent

    - Push rcu_idle_enter/exit() invocations deeper into the idle path so
    that the lock operations are inside the RCU watching sections

    - Move trace_cpu_idle() into generic code so it's called before RCU
    goes idle.

    - Handle raw_local_irq* vs. local_irq* operations correctly

    - Move the tracepoints out from under the lockdep recursion handling
    which turned out to be fragile and inconsistent"

    * tag 'locking-urgent-2020-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    lockdep,trace: Expose tracepoints
    lockdep: Only trace IRQ edges
    mips: Implement arch_irqs_disabled()
    arm64: Implement arch_irqs_disabled()
    nds32: Implement arch_irqs_disabled()
    locking/lockdep: Cleanup
    x86/entry: Remove unused THUNKs
    cpuidle: Move trace_cpu_idle() into generic code
    cpuidle: Make CPUIDLE_FLAG_TLB_FLUSHED generic
    sched,idle,rcu: Push rcu_idle deeper into the idle path
    cpuidle: Fixup IRQ state
    lockdep: Use raw_cpu_*() for per-cpu variables

    Linus Torvalds
     
  • Pull cfis fix from Steve French:
    "DFS fix for referral problem when using SMB1"

    * tag '5.9-rc2-smb-fix' of git://git.samba.org/sfrench/cifs-2.6:
    cifs: fix check of tcon dfs in smb1

    Linus Torvalds