23 Jul, 2019

7 commits

  • After previous changes the suspend-to-idle code flow can be
    integrated more tightly with the generic system suspend code flow
    by making suspend_enter() call s2idle_loop() later and removing
    the direct invocations of dpm_noirq_begin(),
    dpm_noirq_suspend_devices(), dpm_noirq_end(), and
    dpm_noirq_resume_devices() from the latter, so do that.

    This change is not expected to alter functionality.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Thomas Gleixner

    Rafael J. Wysocki
     
  • After commit 33e4f80ee69b ("ACPI / PM: Ignore spurious SCI wakeups
    from suspend-to-idle") the "noirq" phases of device suspend and
    resume may run for multiple times during suspend-to-idle, if there
    are spurious system wakeup events while suspended. However, this
    is complicated and fragile and actually unnecessary.

    The main reason for doing this is that on some systems the EC may
    signal system wakeup events (power button events, for example) as
    well as events that should not cause the system to resume (spurious
    system wakeup events). Thus, in order to determine whether or not
    a given event signaled by the EC while suspended is a proper system
    wakeup one, the EC GPE needs to be dispatched and to start with that
    was achieved by allowing the ACPI SCI action handler to run, which
    was only possible after calling resume_device_irqs().

    However, dispatching the EC GPE this way turned out to take too much
    time in some cases and some EC events might be missed due to that, so
    commit 68e22011856f ("ACPI: EC: Dispatch the EC GPE directly on
    s2idle wake") started to dispatch the EC GPE right after a wakeup
    event has been detected, so in fact the full ACPI SCI action handler
    doesn't need to run any more to deal with the wakeups coming from the
    EC.

    Use this observation to simplify the suspend-to-idle control flow
    so that the "noirq" phases of device suspend and resume are each
    run only once in every suspend-to-idle cycle, which is reported to
    significantly reduce power drawn by some systems when suspended to
    idle (by allowing them to reach a deep platform-wide low-power state
    through the suspend-to-idle flow). [What appears to happen is that
    the "noirq" resume of devices after a spurious EC wakeup brings some
    devices into a state in which they prevent the platform from reaching
    the deep low-power state going forward, even after a subsequent
    "noirq" suspend phase, and on some systems the EC triggers such
    wakeups already when the "noirq" suspend of devices is running for
    the first time in the given suspend/resume cycle, so the platform
    cannot reach the deep low-power state at all.]

    First, make acpi_s2idle_wake() use the acpi_ec_dispatch_gpe() return
    value to determine whether or not the wakeup may have been triggered
    by the EC (in which case the system wakeup is canceled and ACPI
    events are processed in order to determine whether or not the event
    is a proper system wakeup one) and use rearm_wake_irq() (introduced
    by a previous change) in it to rearm the ACPI SCI for system wakeup
    detection in case the system will remain suspended.

    Second, drop acpi_s2idle_sync(), which is not needed any more, and
    the corresponding global platform suspend-to-idle callback.

    Next, drop the pm_wakeup_pending() check (which is an optimization
    only) from __device_suspend_noirq() to prevent it from returning
    errors on system wakeups occurring before the "noirq" phase of
    device suspend is complete (as in the case of suspend-to-idle it is
    not known whether or not these wakeups are suprious at that point),
    in order to avoid having to carry out a "noirq" resume of devices
    on a spurious system wakeup.

    Finally, change the code flow in s2idle_loop() to (1) run the
    "noirq" suspend of devices once before starting the loop, (2) check
    for spurious EC wakeups (via the platform ->wake callback) for the
    first time before calling s2idle_enter(), and (3) run the "noirq"
    resume of devices once after leaving the loop.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Thomas Gleixner

    Rafael J. Wysocki
     
  • The role of the s2idle_wakeup variable is to cause
    acpi_pm_wakeup_event() and acpi_pm_notify_handler() to
    increment pm_abort_suspend and trigger a wakeup from
    suspend-to-idle in case the ACPI SCI wakeup was canceled
    by acpi_s2idle_wake().

    However, for this purpose it need not be set in acpi_s2idle_wake()
    and cleared in acpi_s2idle_sync(), respectively. In fact, it
    may be set as early as in acpi_s2idle_prepare() and cleared as
    late as in acpi_s2idle_restore(), so do that to allow subsequent
    changes to be simpler.

    This change is not expected to alter functionality.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Thomas Gleixner

    Rafael J. Wysocki
     
  • It is not actually guaranteed that pm_abort_suspend will be
    nonzero when pm_system_cancel_wakeup() is called which may lead to
    subtle issues, so make it use atomic_dec_if_positive() instead of
    atomic_dec() for the safety sake.

    Fixes: 33e4f80ee69b ("ACPI / PM: Ignore spurious SCI wakeups from suspend-to-idle")
    Signed-off-by: Rafael J. Wysocki
    Acked-by: Thomas Gleixner

    Rafael J. Wysocki
     
  • On some systems, if suspend-to-idle is used, the EC may signal system
    wakeup events (power button events, for example) as well as events
    that should not cause the system to resume and acpi_ec_dispatch_gpe()
    needs to be called to determine whether or not the system should
    resume then. In particular, if acpi_ec_dispatch_gpe() doesn't detect
    any EC events at all, the system should remain suspended, so it is
    useful to know when that is the case.

    For this reason, make acpi_ec_dispatch_gpe() return a bool value
    indicating whether or not any EC events have been detected by it.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Thomas Gleixner

    Rafael J. Wysocki
     
  • In some cases it is useful to know whether or not the
    acpi_ev_detect_gpe() called by acpi_dispatch_gpe() has found
    the GPE to be active, so return the return value of it (whose
    data type is u32) from latter.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Thomas Gleixner

    Rafael J. Wysocki
     
  • Introduce a new function, rearm_wake_irq(), allowing a wakeup IRQ
    to be armed for systen wakeup detection again without running any
    action handlers associated with it after it has been armed for
    wakeup detection and triggered.

    That is useful for IRQs, like ACPI SCI, that may deliver wakeup
    as well as non-wakeup interrupts when armed for systen wakeup
    detection. In those cases, it may be possible to determine whether
    or not the delivered interrupt is a systen wakeup one without
    running the entire action handler (or handlers, if the IRQ is
    shared) for the IRQ, and if the interrupt turns out to be a
    non-wakeup one, the IRQ can be rearmed with the help of the
    new function.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Thomas Gleixner

    Rafael J. Wysocki
     

22 Jul, 2019

7 commits

  • Linus Torvalds
     
  • Pull Devicetree fixes from Rob Herring:
    "Fix several warnings/errors in validation of binding schemas"

    * tag 'devicetree-fixes-for-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
    dt-bindings: pinctrl: stm32: Fix missing 'clocks' property in examples
    dt-bindings: iio: ad7124: Fix dtc warnings in example
    dt-bindings: iio: avia-hx711: Fix avdd-supply typo in example
    dt-bindings: pinctrl: aspeed: Fix AST2500 example errors
    dt-bindings: pinctrl: aspeed: Fix 'compatible' schema errors
    dt-bindings: riscv: Limit cpus schema to only check RiscV 'cpu' nodes
    dt-bindings: Ensure child nodes are of type 'object'

    Linus Torvalds
     
  • Pull vfs documentation typo fix from Al Viro.

    * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    typo fix: it's d_make_root, not d_make_inode...

    Linus Torvalds
     
  • Pull cifs fixes from Steve French:
    "Two fixes for stable, one that had dependency on earlier patch in this
    merge window and can now go in, and a perf improvement in SMB3 open"

    * tag '5.3-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
    cifs: update internal module number
    cifs: flush before set-info if we have writeable handles
    smb3: optimize open to not send query file internal info
    cifs: copy_file_range needs to strip setuid bits and update timestamps
    CIFS: fix deadlock in cached root handling

    Linus Torvalds
     
  • The commit b3aa14f02254 ("iommu: remove the mapping_error dma_map_ops
    method") incorrectly changed the checking from dma_ops_alloc_iova() in
    map_sg() causes a crash under memory pressure as dma_ops_alloc_iova()
    never return DMA_MAPPING_ERROR on failure but 0, so the error handling
    is all wrong.

    kernel BUG at drivers/iommu/iova.c:801!
    Workqueue: kblockd blk_mq_run_work_fn
    RIP: 0010:iova_magazine_free_pfns+0x7d/0xc0
    Call Trace:
    free_cpu_cached_iovas+0xbd/0x150
    alloc_iova_fast+0x8c/0xba
    dma_ops_alloc_iova.isra.6+0x65/0xa0
    map_sg+0x8c/0x2a0
    scsi_dma_map+0xc6/0x160
    pqi_aio_submit_io+0x1f6/0x440 [smartpqi]
    pqi_scsi_queue_command+0x90c/0xdd0 [smartpqi]
    scsi_queue_rq+0x79c/0x1200
    blk_mq_dispatch_rq_list+0x4dc/0xb70
    blk_mq_sched_dispatch_requests+0x249/0x310
    __blk_mq_run_hw_queue+0x128/0x200
    blk_mq_run_work_fn+0x27/0x30
    process_one_work+0x522/0xa10
    worker_thread+0x63/0x5b0
    kthread+0x1d2/0x1f0
    ret_from_fork+0x22/0x40

    Fixes: b3aa14f02254 ("iommu: remove the mapping_error dma_map_ops method")
    Signed-off-by: Qian Cai
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Linus Torvalds

    Qian Cai
     
  • The hexagon implementation pte_alloc_one(), pte_alloc_one_kernel(),
    pte_free_kernel() and pte_free() is identical to the generic except of
    lack of __GFP_ACCOUNT for the user PTEs allocation.

    Switch hexagon to use generic version of these functions.

    Signed-off-by: Mike Rapoport
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • Pull NTB updates from Jon Mason:
    "New feature to add support for NTB virtual MSI interrupts, the ability
    to test and use this feature in the NTB transport layer.

    Also, bug fixes for the AMD and Switchtec drivers, as well as some
    general patches"

    * tag 'ntb-5.3' of git://github.com/jonmason/ntb: (22 commits)
    NTB: Describe the ntb_msi_test client in the documentation.
    NTB: Add MSI interrupt support to ntb_transport
    NTB: Add ntb_msi_test support to ntb_test
    NTB: Introduce NTB MSI Test Client
    NTB: Introduce MSI library
    NTB: Rename ntb.c to support multiple source files in the module
    NTB: Introduce functions to calculate multi-port resource index
    NTB: Introduce helper functions to calculate logical port number
    PCI/switchtec: Add module parameter to request more interrupts
    PCI/MSI: Support allocating virtual MSI interrupts
    ntb_hw_switchtec: Fix setup MW with failure bug
    ntb_hw_switchtec: Skip unnecessary re-setup of shared memory window for crosslink case
    ntb_hw_switchtec: Remove redundant steps of switchtec_ntb_reinit_peer() function
    NTB: correct ntb_dev_ops and ntb_dev comment typos
    NTB: amd: Silence shift wrapping warning in amd_ntb_db_vector_mask()
    ntb_hw_switchtec: potential shift wrapping bug in switchtec_ntb_init_sndev()
    NTB: ntb_transport: Ensure qp->tx_mw_dma_addr is initaliazed
    NTB: ntb_hw_amd: set peer limit register
    NTB: ntb_perf: Clear stale values in doorbell and command SPAD register
    NTB: ntb_perf: Disable NTB link after clearing peer XLAT registers
    ...

    Linus Torvalds
     

21 Jul, 2019

19 commits

  • Signed-off-by: Al Viro

    Al Viro
     
  • Now that examples are validated against the DT schema, an error with
    required 'clocks' property missing is exposed:

    Documentation/devicetree/bindings/pinctrl/st,stm32-pinctrl.example.dt.yaml: \
    pinctrl@40020000: gpio@0: 'clocks' is a required property
    Documentation/devicetree/bindings/pinctrl/st,stm32-pinctrl.example.dt.yaml: \
    pinctrl@50020000: gpio@1000: 'clocks' is a required property
    Documentation/devicetree/bindings/pinctrl/st,stm32-pinctrl.example.dt.yaml: \
    pinctrl@50020000: gpio@2000: 'clocks' is a required property

    Add the missing 'clocks' properties to the examples to fix the errors.

    Fixes: 2c9239c125f0 ("dt-bindings: pinctrl: Convert stm32 pinctrl bindings to json-schema")
    Cc: Linus Walleij
    Cc: Maxime Coquelin
    Cc: linux-gpio@vger.kernel.org
    Cc: linux-stm32@st-md-mailman.stormreply.com
    Acked-by: Alexandre TORGUE
    Signed-off-by: Rob Herring

    Rob Herring
     
  • With the conversion to DT schema, the examples are now compiled with
    dtc. The ad7124 binding example has the following warning:

    Documentation/devicetree/bindings/iio/adc/adi,ad7124.example.dts:19.11-21: \
    Warning (reg_format): /example-0/adc@0:reg: property has invalid length (4 bytes) (#address-cells == 1, #size-cells == 1)

    There's a default #size-cells and #address-cells values of 1 for
    examples. For examples needing different values such as this one on a
    SPI bus, they need to provide a SPI bus parent node.

    Fixes: 26ae15e62d3c ("Convert AD7124 bindings documentation to YAML format.")

    Cc: Jonathan Cameron
    Cc: linux-iio@vger.kernel.org
    Signed-off-by: Rob Herring

    Rob Herring
     
  • Now that examples are validated against the DT schema, a typo in
    avia-hx711 example generates a warning:

    Documentation/devicetree/bindings/iio/adc/avia-hx711.example.dt.yaml: weight: 'avdd-supply' is a required property

    Fix the typo.

    Fixes: 5150ec3fe125 ("avia-hx711.yaml: transform DT binding to YAML")
    Cc: Andreas Klinger
    Cc: Jonathan Cameron
    Cc: linux-iio@vger.kernel.org
    Signed-off-by: Rob Herring

    Rob Herring
     
  • The schema examples are now validated against the schema itself. The
    AST2500 pinctrl schema has a couple of errors:

    Documentation/devicetree/bindings/pinctrl/aspeed,ast2500-pinctrl.example.dt.yaml: \
    example-0: $nodename:0: 'example-0' does not match '^(bus|soc|axi|ahb|apb)(@[0-9a-f]+)?$'
    Documentation/devicetree/bindings/pinctrl/aspeed,ast2500-pinctrl.example.dt.yaml: \
    pinctrl: aspeed,external-nodes: [[1, 2]] is too short

    Fixes: 0a617de16730 ("dt-bindings: pinctrl: aspeed: Convert AST2500 bindings to json-schema")
    Cc: Andrew Jeffery
    Cc: Linus Walleij
    Cc: Joel Stanley
    Cc: linux-aspeed@lists.ozlabs.org
    Cc: linux-gpio@vger.kernel.org
    Cc: linux-arm-kernel@lists.infradead.org
    Acked-by: Andrew Jeffery
    Signed-off-by: Rob Herring

    Rob Herring
     
  • The Aspeed pinctl schema have errors in the 'compatible' schema:

    Documentation/devicetree/bindings/pinctrl/aspeed,ast2400-pinctrl.yaml: \
    properties:compatible:enum: ['aspeed', 'ast2400-pinctrl', 'aspeed', 'g4-pinctrl'] has non-unique elements
    Documentation/devicetree/bindings/pinctrl/aspeed,ast2500-pinctrl.yaml: \
    properties:compatible:enum: ['aspeed', 'ast2500-pinctrl', 'aspeed', 'g5-pinctrl'] has non-unique elements

    Flow style sequences have to be quoted if the vales contain ','. Fix
    this by using the more common one line per entry formatting.

    Fixes: 0a617de16730 ("dt-bindings: pinctrl: aspeed: Convert AST2500 bindings to json-schema")
    Fixes: 07457937bb5c ("dt-bindings: pinctrl: aspeed: Convert AST2400 bindings to json-schema")
    Cc: Andrew Jeffery
    Cc: Linus Walleij
    Cc: Joel Stanley
    Cc: linux-aspeed@lists.ozlabs.org
    Cc: linux-gpio@vger.kernel.org
    Cc: linux-arm-kernel@lists.infradead.org
    Acked-by: Andrew Jeffery
    Signed-off-by: Rob Herring

    Rob Herring
     
  • Matching on the 'cpus' node was a bad choice because the schema is
    incorrectly applied to non-RiscV cpus nodes. As we now have a common cpus
    schema which checks the general structure, it is also redundant to do so
    in the Risc-V CPU schema.

    The downside is one could conceivably mix different architecture's cpu
    nodes or have typos in the compatible string. The latter problem pretty
    much exists for every schema.

    Acked-by: Paul Walmsley
    Signed-off-by: Rob Herring

    Rob Herring
     
  • Properties which are child node definitions need to have an explict
    type. Otherwise, a matching (DT) property can silently match when an
    error is desired. Fix this up tree-wide. Once this is fixed, the
    meta-schema will enforce this on any child node definitions.

    Cc: Chen-Yu Tsai
    Cc: David Woodhouse
    Cc: Brian Norris
    Cc: Marek Vasut
    Cc: Richard Weinberger
    Cc: Vignesh Raghavendra
    Cc: Linus Walleij
    Cc: Maxime Coquelin
    Cc: linux-mtd@lists.infradead.org
    Cc: linux-gpio@vger.kernel.org
    Cc: linux-stm32@st-md-mailman.stormreply.com
    Cc: linux-spi@vger.kernel.org
    Acked-by: Miquel Raynal
    Acked-by: Maxime Ripard
    Acked-by: Mark Brown
    Acked-by: Alexandre TORGUE
    Signed-off-by: Rob Herring

    Rob Herring
     
  • Pull more input updates from Dmitry Torokhov:

    - Apple SPI keyboard and trackpad driver for newer Macs

    - ALPS driver will ignore trackpoint-only devices to give the
    trackpoint driver a chance to handle them properly

    - another Lenovo is switched over to SMbus from PS/2

    - assorted driver fixups.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
    Input: alps - fix a mismatch between a condition check and its comment
    Input: psmouse - fix build error of multiple definition
    Input: applespi - remove set but not used variables 'sts'
    Input: add Apple SPI keyboard and trackpad driver
    Input: alps - don't handle ALPS cs19 trackpoint-only device
    Input: hyperv-keyboard - remove dependencies on PAGE_SIZE for ring buffer
    Input: adp5589 - initialize GPIO controller parent device
    Input: iforce - remove empty multiline comments
    Input: synaptics - fix misuse of strlcpy
    Input: auo-pixcir-ts - switch to using devm_add_action_or_reset()
    Input: gtco - bounds check collection indent level
    Input: mtk-pmic-keys - add of_node_put() before return
    Input: sun4i-lradc-keys - add of_node_put() before return
    Input: synaptics - whitelist Lenovo T580 SMBus intertouch

    Linus Torvalds
     
  • Pull dma-mapping fixes from Christoph Hellwig:
    "Fix various regressions:

    - force unencrypted dma-coherent buffers if encryption bit can't fit
    into the dma coherent mask (Tom Lendacky)

    - avoid limiting request size if swiotlb is not used (me)

    - fix swiotlb handling in dma_direct_sync_sg_for_cpu/device (Fugang
    Duan)"

    * tag 'dma-mapping-5.3-1' of git://git.infradead.org/users/hch/dma-mapping:
    dma-direct: correct the physical addr in dma_direct_sync_sg_for_cpu/device
    dma-direct: only limit the mapping size if swiotlb could be used
    dma-mapping: add a dma_addressing_limited helper
    dma-direct: Force unencrypted DMA under SME for certain DMA masks

    Linus Torvalds
     
  • Pull x86 fixes from Thomas Gleixner:
    "A set of x86 specific fixes and updates:

    - The CR2 corruption fixes which store CR2 early in the entry code
    and hand the stored address to the fault handlers.

    - Revert a forgotten leftover of the dropped FSGSBASE series.

    - Plug a memory leak in the boot code.

    - Make the Hyper-V assist functionality robust by zeroing the shadow
    page.

    - Remove a useless check for dead processes with LDT

    - Update paravirt and VMware maintainers entries.

    - A few cleanup patches addressing various compiler warnings"

    * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/entry/64: Prevent clobbering of saved CR2 value
    x86/hyper-v: Zero out the VP ASSIST PAGE on allocation
    x86, boot: Remove multiple copy of static function sanitize_boot_params()
    x86/boot/compressed/64: Remove unused variable
    x86/boot/efi: Remove unused variables
    x86/mm, tracing: Fix CR2 corruption
    x86/entry/64: Update comments and sanity tests for create_gap
    x86/entry/64: Simplify idtentry a little
    x86/entry/32: Simplify common_exception
    x86/paravirt: Make read_cr2() CALLEE_SAVE
    MAINTAINERS: Update PARAVIRT_OPS_INTERFACE and VMWARE_HYPERVISOR_INTERFACE
    x86/process: Delete useless check for dead process with LDT
    x86: math-emu: Hide clang warnings for 16-bit overflow
    x86/e820: Use proper booleans instead of 0/1
    x86/apic: Silence -Wtype-limits compiler warnings
    x86/mm: Free sme_early_buffer after init
    x86/boot: Fix memory leak in default_get_smp_config()
    Revert "x86/ptrace: Prevent ptrace from clearing the FS/GS selector" and fix the test

    Linus Torvalds
     
  • Pull perf tooling updates from Thomas Gleixner:
    "A set of perf improvements and fixes:

    perf db-export:
    - Improvements in how COMM details are exported to databases for post
    processing and use in the sql-viewer.py UI.

    - Export switch events to the database.

    BPF:
    - Bump rlimit(MEMLOCK) for 'perf test bpf' and 'perf trace', just
    like selftests/bpf/bpf_rlimit.h do, which makes errors due to
    exhaustion of this limit, which are kinda cryptic (EPERM sometimes)
    less frequent.

    perf version:
    - Fix segfault due to missing OPT_END(), noticed on PowerPC.

    perf vendor events:
    - Add JSON files for IBM s/390 machine type 8561.

    perf cs-etm (ARM):
    - Fix two cases of error returns not bing done properly: Invalid
    ERR_PTR() use and loss of propagation error codes"

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (28 commits)
    perf version: Fix segfault due to missing OPT_END()
    perf vendor events s390: Add JSON files for machine type 8561
    perf cs-etm: Return errcode in cs_etm__process_auxtrace_info()
    perf cs-etm: Remove errnoeous ERR_PTR() usage in cs_etm__process_auxtrace_info
    perf scripts python: export-to-postgresql.py: Export switch events
    perf scripts python: export-to-sqlite.py: Export switch events
    perf db-export: Export switch events
    perf db-export: Factor out db_export__threads()
    perf script: Add scripting operation process_switch()
    perf scripts python: exported-sql-viewer.py: Use new 'has_calls' column
    perf scripts python: exported-sql-viewer.py: Remove redundant semi-colons
    perf scripts python: export-to-postgresql.py: Add has_calls column to comms table
    perf scripts python: export-to-sqlite.py: Add has_calls column to comms table
    perf db-export: Also export thread's current comm
    perf db-export: Factor out db_export__comm()
    perf scripts python: export-to-postgresql.py: Export comm details
    perf scripts python: export-to-sqlite.py: Export comm details
    perf db-export: Export comm details
    perf db-export: Fix a white space issue in db_export__sample()
    perf db-export: Move export__comm_thread into db_export__sample()
    ...

    Linus Torvalds
     
  • Pull core fixes from Thomas Gleixner:

    - A collection of objtool fixes which address recent fallout partially
    exposed by newer toolchains, clang, BPF and general code changes.

    - Force USER_DS for user stack traces

    [ Note: the "objtool fixes" are not all to objtool itself, but for
    kernel code that triggers objtool warnings.

    Things like missing function size annotations, or code that confuses
    the unwinder etc. - Linus]

    * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
    objtool: Support conditional retpolines
    objtool: Convert insn type to enum
    objtool: Fix seg fault on bad switch table entry
    objtool: Support repeated uses of the same C jump table
    objtool: Refactor jump table code
    objtool: Refactor sibling call detection logic
    objtool: Do frame pointer check before dead end check
    objtool: Change dead_end_function() to return boolean
    objtool: Warn on zero-length functions
    objtool: Refactor function alias logic
    objtool: Track original function across branches
    objtool: Add mcsafe_handle_tail() to the uaccess safe list
    bpf: Disable GCC -fgcse optimization for ___bpf_prog_run()
    x86/uaccess: Remove redundant CLACs in getuser/putuser error paths
    x86/uaccess: Don't leak AC flag into fentry from mcsafe_handle_tail()
    x86/uaccess: Remove ELF function annotation from copy_user_handle_tail()
    x86/head/64: Annotate start_cpu0() as non-callable
    x86/entry: Fix thunk function ELF sizes
    x86/kvm: Don't call kvm_spurious_fault() from .fixup
    x86/kvm: Replace vmx_vmenter()'s call to kvm_spurious_fault() with UD2
    ...

    Linus Torvalds
     
  • Pull smp fix from Thomas Gleixner:
    "Add warnings to the smp function calls so callers from wrong contexts
    get detected"

    * 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    smp: Warn on function calls from softirq context

    Linus Torvalds
     
  • Pull CONFIG_PREEMPT_RT stub config from Thomas Gleixner:
    "The real-time preemption patch set exists for almost 15 years now and
    while the vast majority of infrastructure and enhancements have found
    their way into the mainline kernel, the final integration of RT is
    still missing.

    Over the course of the last few years, we have worked on reducing the
    intrusivenness of the RT patches by refactoring kernel infrastructure
    to be more real-time friendly. Almost all of these changes were
    benefitial to the mainline kernel on their own, so there was no
    objection to integrate them.

    Though except for the still ongoing printk refactoring, the remaining
    changes which are required to make RT a first class mainline citizen
    are not longer arguable as immediately beneficial for the mainline
    kernel. Most of them are either reordering code flows or adding RT
    specific functionality.

    But this now has hit a wall and turned into a classic hen and egg
    problem:

    Maintainers are rightfully wary vs. these changes as they make only
    sense if the final integration of RT into the mainline kernel takes
    place.

    Adding CONFIG_PREEMPT_RT aims to solve this as a clear sign that RT
    will be fully integrated into the mainline kernel. The final
    integration of the missing bits and pieces will be of course done with
    the same careful approach as we have used in the past.

    While I'm aware that you are not entirely enthusiastic about that, I
    think that RT should receive the same treatment as any other widely
    used out of tree functionality, which we have accepted into mainline
    over the years.

    RT has become the de-facto standard real-time enhancement and is
    shipped by enterprise, embedded and community distros. It's in use
    throughout a wide range of industries: telecommunications, industrial
    automation, professional audio, medical devices, data acquisition,
    automotive - just to name a few major use cases.

    RT development is backed by a Linuxfoundation project which is
    supported by major stakeholders of this technology. The funding will
    continue over the actual inclusion into mainline to make sure that the
    functionality is neither introducing regressions, regressing itself,
    nor becomes subject to bitrot. There is also a lifely user community
    around RT as well, so contrary to the grim situation 5 years ago, it's
    a healthy project.

    As RT is still a good vehicle to exercise rarely used code paths and
    to detect hard to trigger issues, you could at least view it as a QA
    tool if nothing else"

    * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    sched/rt, Kconfig: Introduce CONFIG_PREEMPT_RT

    Linus Torvalds
     
  • Pull more KVM updates from Paolo Bonzini:
    "Mostly bugfixes, but also:

    - s390 support for KVM selftests

    - LAPIC timer offloading to housekeeping CPUs

    - Extend an s390 optimization for overcommitted hosts to all
    architectures

    - Debugging cleanups and improvements"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (25 commits)
    KVM: x86: Add fixed counters to PMU filter
    KVM: nVMX: do not use dangling shadow VMCS after guest reset
    KVM: VMX: dump VMCS on failed entry
    KVM: x86/vPMU: refine kvm_pmu err msg when event creation failed
    KVM: s390: Use kvm_vcpu_wake_up in kvm_s390_vcpu_wakeup
    KVM: Boost vCPUs that are delivering interrupts
    KVM: selftests: Remove superfluous define from vmx.c
    KVM: SVM: Fix detection of AMD Errata 1096
    KVM: LAPIC: Inject timer interrupt via posted interrupt
    KVM: LAPIC: Make lapic timer unpinned
    KVM: x86/vPMU: reset pmc->counter to 0 for pmu fixed_counters
    KVM: nVMX: Ignore segment base for VMX memory operand when segment not FS or GS
    kvm: x86: ioapic and apic debug macros cleanup
    kvm: x86: some tsc debug cleanup
    kvm: vmx: fix coccinelle warnings
    x86: kvm: avoid constant-conversion warning
    x86: kvm: avoid -Wsometimes-uninitized warning
    KVM: x86: expose AVX512_BF16 feature to guest
    KVM: selftests: enable pgste option for the linker on s390
    KVM: selftests: Move kvm_create_max_vcpus test to generic code
    ...

    Linus Torvalds
     
  • Pull SCSI fixes from James Bottomley:
    "This is the final round of mostly small fixes in our initial submit.

    It's mostly minor fixes and driver updates. The only change of note is
    adding a virt_boundary_mask to the SCSI host and host template to
    parametrise this for NVMe devices instead of having them do a call in
    slave_alloc. It's a fairly straightforward conversion except in the
    two NVMe handling drivers that didn't set it who now have a virtual
    infinity parameter added"

    * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (24 commits)
    scsi: megaraid_sas: set an unlimited max_segment_size
    scsi: mpt3sas: set an unlimited max_segment_size for SAS 3.0 HBAs
    scsi: IB/srp: set virt_boundary_mask in the scsi host
    scsi: IB/iser: set virt_boundary_mask in the scsi host
    scsi: storvsc: set virt_boundary_mask in the scsi host template
    scsi: ufshcd: set max_segment_size in the scsi host template
    scsi: core: take the DMA max mapping size into account
    scsi: core: add a host / host template field for the virt boundary
    scsi: core: Fix race on creating sense cache
    scsi: sd_zbc: Fix compilation warning
    scsi: libfc: fix null pointer dereference on a null lport
    scsi: zfcp: fix GCC compiler warning emitted with -Wmaybe-uninitialized
    scsi: zfcp: fix request object use-after-free in send path causing wrong traces
    scsi: zfcp: fix request object use-after-free in send path causing seqno errors
    scsi: megaraid_sas: Update driver version to 07.710.50.00
    scsi: megaraid_sas: Add module parameter for FW Async event logging
    scsi: megaraid_sas: Enable msix_load_balance for Invader and later controllers
    scsi: megaraid_sas: Fix calculation of target ID
    scsi: lpfc: reduce stack size with CONFIG_GCC_PLUGIN_STRUCTLEAK_VERBOSE
    scsi: devinfo: BLIST_TRY_VPD_PAGES for SanDisk Cruzer Blade
    ...

    Linus Torvalds
     
  • Pull more Kbuild updates from Masahiro Yamada:

    - match the directory structure of the linux-libc-dev package to that
    of Debian-based distributions

    - fix incorrect include/config/auto.conf generation when Kconfig
    creates it along with the .config file

    - remove misleading $(AS) from documents

    - clean up precious tag files by distclean instead of mrproper

    - add a new coccinelle patch for devm_platform_ioremap_resource
    migration

    - refactor module-related scripts to read modules.order instead of
    $(MODVERDIR)/*.mod files to get the list of created modules

    - remove MODVERDIR

    - update list of header compile-test

    - add -fcf-protection=none flag to avoid conflict with the retpoline
    flags when CONFIG_RETPOLINE=y

    - misc cleanups

    * tag 'kbuild-v5.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (25 commits)
    kbuild: add -fcf-protection=none when using retpoline flags
    kbuild: update compile-test header list for v5.3-rc1
    kbuild: split out *.mod out of {single,multi}-used-m rules
    kbuild: remove 'prepare1' target
    kbuild: remove the first line of *.mod files
    kbuild: create *.mod with full directory path and remove MODVERDIR
    kbuild: export_report: read modules.order instead of .tmp_versions/*.mod
    kbuild: modpost: read modules.order instead of $(MODVERDIR)/*.mod
    kbuild: modsign: read modules.order instead of $(MODVERDIR)/*.mod
    kbuild: modinst: read modules.order instead of $(MODVERDIR)/*.mod
    scsi: remove pointless $(MODVERDIR)/$(obj)/53c700.ver
    kbuild: remove duplication from modules.order in sub-directories
    kbuild: get rid of kernel/ prefix from in-tree modules.{order,builtin}
    kbuild: do not create empty modules.order in the prepare stage
    coccinelle: api: add devm_platform_ioremap_resource script
    kbuild: compile-test headers listed in header-test-m as well
    kbuild: remove unused hostcc-option
    kbuild: remove tag files by distclean instead of mrproper
    kbuild: add --hash-style= and --build-id unconditionally
    kbuild: get rid of misleading $(AS) from documents
    ...

    Linus Torvalds
     
  • Pull dcache and mountpoint updates from Al Viro:
    "Saner handling of refcounts to mountpoints.

    Transfer the counting reference from struct mount ->mnt_mountpoint
    over to struct mountpoint ->m_dentry. That allows us to get rid of the
    convoluted games with ordering of mount shutdowns.

    The cost is in teaching shrink_dcache_{parent,for_umount} to cope with
    mixed-filesystem shrink lists, which we'll also need for the Slab
    Movable Objects patchset"

    * 'work.dcache2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    switch the remnants of releasing the mountpoint away from fs_pin
    get rid of detach_mnt()
    make struct mountpoint bear the dentry reference to mountpoint, not struct mount
    Teach shrink_dcache_parent() to cope with mixed-filesystem shrink lists
    fs/namespace.c: shift put_mountpoint() to callers of unhash_mnt()
    __detach_mounts(): lookup_mountpoint() can't return ERR_PTR() anymore
    nfs: dget_parent() never returns NULL
    ceph: don't open-code the check for dead lockref

    Linus Torvalds
     

20 Jul, 2019

7 commits

  • The recent fix for CR2 corruption introduced a new way to reliably corrupt
    the saved CR2 value.

    CR2 is saved early in the entry code in RDX, which is the third argument to
    the fault handling functions. But it missed that between saving and
    invoking the fault handler enter_from_user_mode() can be called. RDX is a
    caller saved register so the invoked function can freely clobber it with
    the obvious consequences.

    The TRACE_IRQS_OFF call is safe as it calls through the thunk which
    preserves RDX, but TRACE_IRQS_OFF_DEBUG is not because it also calls into
    C-code outside of the thunk.

    Store CR2 in R12 instead which is a callee saved register and move R12 to
    RDX just before calling the fault handler.

    Fixes: a0d14b8909de ("x86/mm, tracing: Fix CR2 corruption")
    Reported-by: Sean Christopherson
    Signed-off-by: Thomas Gleixner
    Acked-by: Peter Zijlstra (Intel)
    Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1907201020540.1782@nanos.tec.linutronix.de

    Thomas Gleixner
     
  • It's clearly documented that smp function calls cannot be invoked from
    softirq handling context. Unfortunately nothing enforces that or emits a
    warning.

    A single function call can be invoked from softirq context only via
    smp_call_function_single_async().

    The only legit context is task context, so add a warning to that effect.

    Reported-by: luferry
    Signed-off-by: Peter Zijlstra
    Signed-off-by: Thomas Gleixner
    Link: https://lkml.kernel.org/r/20190718160601.GP3402@hirez.programming.kicks-ass.net

    Peter Zijlstra
     
  • Updates KVM_CAP_PMU_EVENT_FILTER so it can also whitelist or blacklist
    fixed counters.

    Signed-off-by: Eric Hankland
    [No need to check padding fields for zero. - Paolo]
    Signed-off-by: Paolo Bonzini

    Eric Hankland
     
  • If a KVM guest is reset while running a nested guest, free_nested will
    disable the shadow VMCS execution control in the vmcs01. However,
    on the next KVM_RUN vmx_vcpu_run would nevertheless try to sync
    the VMCS12 to the shadow VMCS which has since been freed.

    This causes a vmptrld of a NULL pointer on my machime, but Jan reports
    the host to hang altogether. Let's see how much this trivial patch fixes.

    Reported-by: Jan Kiszka
    Cc: Liran Alon
    Cc: stable@vger.kernel.org
    Signed-off-by: Paolo Bonzini

    Paolo Bonzini
     
  • This is useful for debugging, and is ratelimited nowadays.

    Signed-off-by: Paolo Bonzini

    Paolo Bonzini
     
  • If a perf_event creation fails due to any reason of the host perf
    subsystem, it has no chance to log the corresponding event for guest
    which may cause abnormal sampling data in guest result. In debug mode,
    this message helps to understand the state of vPMC and we may not
    limit the number of occurrences but not in a spamming style.

    Suggested-by: Joe Perches
    Signed-off-by: Like Xu
    Cc: stable@vger.kernel.org
    Signed-off-by: Paolo Bonzini

    Like Xu
     
  • Use kvm_vcpu_wake_up() in kvm_s390_vcpu_wakeup().

    Suggested-by: Paolo Bonzini
    Cc: Paolo Bonzini
    Cc: Radim Krčmář
    Cc: Christian Borntraeger
    Signed-off-by: Wanpeng Li
    Signed-off-by: Paolo Bonzini

    Wanpeng Li