04 Apr, 2020

4 commits

  • Pull pci updates from Bjorn Helgaas:
    "Enumeration:

    - Revert sysfs "rescan" renames that broke apps (Kelsey Skunberg)

    - Add more 32 GT/s link speed decoding and improve the implementation
    (Yicong Yang)

    Resource management:

    - Add support for sizing programmable host bridge apertures and fix a
    related alpha Nautilus regression (Ivan Kokshaysky)

    Interrupts:

    - Add boot interrupt quirk mechanism for Xeon chipsets and document
    boot interrupts (Sean V Kelley)

    PCIe native device hotplug:

    - When possible, disable in-band presence detect and use PDS
    (Alexandru Gagniuc)

    - Add DMI table for devices that don't use in-band presence detection
    but don't advertise that correctly (Stuart Hayes)

    - Fix hang when powering slots up/down via sysfs (Lukas Wunner)

    - Fix an MSI interrupt race (Stuart Hayes)

    Virtualization:

    - Add ACS quirks for Zhaoxin devices (Raymond Pang)

    Error handling:

    - Add Error Disconnect Recover (EDR) support so firmware can report
    devices disconnected via DPC and we can try to recover (Kuppuswamy
    Sathyanarayanan)

    Peer-to-peer DMA:

    - Add Intel Sky Lake-E Root Ports B, C, D to the whitelist (Andrew
    Maier)

    ASPM:

    - Reduce severity of common clock config message (Chris Packham)

    - Clear the correct bits when enabling L1 substates, so we don't go
    to the wrong state (Yicong Yang)

    Endpoint framework:

    - Replace EPF linkup ops with notifier call chain and improve locking
    (Kishon Vijay Abraham I)

    - Fix concurrent memory allocation in OB address region (Kishon Vijay
    Abraham I)

    - Move PF function number assignment to EPC core to support multiple
    function creation methods (Kishon Vijay Abraham I)

    - Fix issue with clearing configfs "start" entry (Kunihiko Hayashi)

    - Fix issue with endpoint MSI-X ignoring BAR Indicator and Table
    Offset (Kishon Vijay Abraham I)

    - Add support for testing DMA transfers (Kishon Vijay Abraham I)

    - Add support for testing > 10 endpoint devices (Kishon Vijay Abraham I)

    - Add support for tests to clear IRQ (Kishon Vijay Abraham I)

    - Add common DT schema for endpoint controllers (Kishon Vijay Abraham I)

    Amlogic Meson PCIe controller driver:

    - Add DT bindings for AXG PCIe PHY, shared MIPI/PCIe analog PHY (Remi
    Pommarel)

    - Add Amlogic AXG PCIe PHY, AXG MIPI/PCIe analog PHY drivers (Remi
    Pommarel)

    Cadence PCIe controller driver:

    - Add Root Complex/Endpoint DT schema for Cadence PCIe (Kishon Vijay
    Abraham I)

    Intel VMD host bridge driver:

    - Add two VMD Device IDs that require bus restriction mode (Sushma
    Kalakota)

    Mobiveil PCIe controller driver:

    - Refactor and modularize mobiveil driver (Hou Zhiqiang)

    - Add support for Mobiveil GPEX Gen4 host (Hou Zhiqiang)

    Microsoft Hyper-V host bridge driver:

    - Add support for Hyper-V PCI protocol version 1.3 and
    PCI_BUS_RELATIONS2 (Long Li)

    - Refactor to prepare for virtual PCI on non-x86 architectures (Boqun
    Feng)

    - Fix memory leak in hv_pci_probe()'s error path (Dexuan Cui)

    NVIDIA Tegra PCIe controller driver:

    - Use pci_parse_request_of_pci_ranges() (Rob Herring)

    - Add support for endpoint mode and related DT updates (Vidya Sagar)

    - Reduce -EPROBE_DEFER error message log level (Thierry Reding)

    Qualcomm PCIe controller driver:

    - Restrict class fixup to specific Qualcomm devices (Bjorn Andersson)

    Synopsys DesignWare PCIe controller driver:

    - Refactor core initialization code for endpoint mode (Vidya Sagar)

    - Fix endpoint MSI-X to use correct table address (Kishon Vijay
    Abraham I)

    TI DRA7xx PCIe controller driver:

    - Fix MSI IRQ handling (Vignesh Raghavendra)

    TI Keystone PCIe controller driver:

    - Allow AM654 endpoint to raise MSI-X interrupt (Kishon Vijay Abraham I)

    Miscellaneous:

    - Quirk ASMedia XHCI USB to avoid "PME# from D0" defect (Kai-Heng
    Feng)

    - Use ioremap(), not phys_to_virt(), for platform ROM to fix video
    ROM mapping with CONFIG_HIGHMEM (Mikel Rychliski)"

    * tag 'pci-v5.7-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (96 commits)
    misc: pci_endpoint_test: remove duplicate macro PCI_ENDPOINT_TEST_STATUS
    PCI: tegra: Print -EPROBE_DEFER error message at debug level
    misc: pci_endpoint_test: Use full pci-endpoint-test name in request_irq()
    misc: pci_endpoint_test: Fix to support > 10 pci-endpoint-test devices
    tools: PCI: Add 'e' to clear IRQ
    misc: pci_endpoint_test: Add ioctl to clear IRQ
    misc: pci_endpoint_test: Avoid using module parameter to determine irqtype
    PCI: keystone: Allow AM654 PCIe Endpoint to raise MSI-X interrupt
    PCI: dwc: Fix dw_pcie_ep_raise_msix_irq() to get correct MSI-X table address
    PCI: endpoint: Fix ->set_msix() to take BIR and offset as arguments
    misc: pci_endpoint_test: Add support to get DMA option from userspace
    tools: PCI: Add 'd' command line option to support DMA
    misc: pci_endpoint_test: Use streaming DMA APIs for buffer allocation
    PCI: endpoint: functions/pci-epf-test: Print throughput information
    PCI: endpoint: functions/pci-epf-test: Add DMA support to transfer data
    PCI: pciehp: Fix MSI interrupt race
    PCI: pciehp: Fix indefinite wait on sysfs requests
    PCI: endpoint: Fix clearing start entry in configfs
    PCI: tegra: Add support for PCIe endpoint mode in Tegra194
    PCI: sysfs: Revert "rescan" file renames
    ...

    Linus Torvalds
     
  • Pull char/misc driver updates from Greg KH:
    "Here is the big set of char/misc/other driver patches for 5.7-rc1.

    Lots of things in here, and it's later than expected due to some
    reverts to resolve some reported issues. All is now clean with no
    reported problems in linux-next.

    Included in here is:
    - interconnect updates
    - mei driver updates
    - uio updates
    - nvmem driver updates
    - soundwire updates
    - binderfs updates
    - coresight updates
    - habanalabs updates
    - mhi new bus type and core
    - extcon driver updates
    - some Kconfig cleanups
    - other small misc driver cleanups and updates

    As mentioned, all have been in linux-next for a while, and with the
    last two reverts, all is calm and good"

    * tag 'char-misc-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (174 commits)
    Revert "driver core: platform: Initialize dma_parms for platform devices"
    Revert "amba: Initialize dma_parms for amba devices"
    amba: Initialize dma_parms for amba devices
    driver core: platform: Initialize dma_parms for platform devices
    bus: mhi: core: Drop the references to mhi_dev in mhi_destroy_device()
    bus: mhi: core: Initialize bhie field in mhi_cntrl for RDDM capture
    bus: mhi: core: Add support for reading MHI info from device
    misc: rtsx: set correct pcr_ops for rts522A
    speakup: misc: Use dynamic minor numbers for speakup devices
    mei: me: add cedar fork device ids
    coresight: do not use the BIT() macro in the UAPI header
    Documentation: provide IBM contacts for embargoed hardware
    nvmem: core: remove nvmem_sysfs_get_groups()
    nvmem: core: use is_bin_visible for permissions
    nvmem: core: use device_register and device_unregister
    nvmem: core: add root_only member to nvmem device struct
    extcon: axp288: Add wakeup support
    extcon: Mark extcon_get_edev_name() function as exported symbol
    extcon: palmas: Hide error messages if gpio returns -EPROBE_DEFER
    dt-bindings: extcon: usbc-cros-ec: convert extcon-usbc-cros-ec.txt to yaml format
    ...

    Linus Torvalds
     
  • Pull SPDX updates from Greg KH:
    "Here are three SPDX patches for 5.7-rc1.

    One fixes up the SPDX tag for a single driver, while the other two go
    through the tree and add SPDX tags for all of the .gitignore files as
    needed.

    Nothing too complex, but you will get a merge conflict with your
    current tree, that should be trivial to handle (one file modified by
    two things, one file deleted.)

    All three of these have been in linux-next for a while, with no
    reported issues other than the merge conflict"

    * tag 'spdx-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx:
    ASoC: MT6660: make spdxcheck.py happy
    .gitignore: add SPDX License Identifier
    .gitignore: remove too obvious comments

    Linus Torvalds
     
  • Pull cgroup updates from Tejun Heo:

    - Christian extended clone3 so that processes can be spawned into
    cgroups directly.

    This is not only neat in terms of semantics but also avoids grabbing
    the global cgroup_threadgroup_rwsem for migration.

    - Daniel added !root xattr support to cgroupfs.

    Userland already uses xattrs on cgroupfs for bookkeeping. This will
    allow delegated cgroups to support such usages.

    - Prateek tried to make cpuset hotplug handling synchronous but that
    led to possible deadlock scenarios. Reverted.

    - Other minor changes including release_agent_path handling cleanup.

    * 'for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
    docs: cgroup-v1: Document the cpuset_v2_mode mount option
    Revert "cpuset: Make cpuset hotplug synchronous"
    cgroupfs: Support user xattrs
    kernfs: Add option to enable user xattrs
    kernfs: Add removed_size out param for simple_xattr_set
    kernfs: kvmalloc xattr value instead of kmalloc
    cgroup: Restructure release_agent_path handling
    selftests/cgroup: add tests for cloning into cgroups
    clone3: allow spawning processes into cgroups
    cgroup: add cgroup_may_write() helper
    cgroup: refactor fork helpers
    cgroup: add cgroup_get_from_file() helper
    cgroup: unify attach permission checking
    cpuset: Make cpuset hotplug synchronous
    cgroup.c: Use built-in RCU list checking
    kselftest/cgroup: add cgroup destruction test
    cgroup: Clean up css_set task traversal

    Linus Torvalds
     

03 Apr, 2020

12 commits

  • Pull kvm updates from Paolo Bonzini:
    "ARM:
    - GICv4.1 support

    - 32bit host removal

    PPC:
    - secure (encrypted) using under the Protected Execution Framework
    ultravisor

    s390:
    - allow disabling GISA (hardware interrupt injection) and protected
    VMs/ultravisor support.

    x86:
    - New dirty bitmap flag that sets all bits in the bitmap when dirty
    page logging is enabled; this is faster because it doesn't require
    bulk modification of the page tables.

    - Initial work on making nested SVM event injection more similar to
    VMX, and less buggy.

    - Various cleanups to MMU code (though the big ones and related
    optimizations were delayed to 5.8). Instead of using cr3 in
    function names which occasionally means eptp, KVM too has
    standardized on "pgd".

    - A large refactoring of CPUID features, which now use an array that
    parallels the core x86_features.

    - Some removal of pointer chasing from kvm_x86_ops, which will also
    be switched to static calls as soon as they are available.

    - New Tigerlake CPUID features.

    - More bugfixes, optimizations and cleanups.

    Generic:
    - selftests: cleanups, new MMU notifier stress test, steal-time test

    - CSV output for kvm_stat"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (277 commits)
    x86/kvm: fix a missing-prototypes "vmread_error"
    KVM: x86: Fix BUILD_BUG() in __cpuid_entry_get_reg() w/ CONFIG_UBSAN=y
    KVM: VMX: Add a trampoline to fix VMREAD error handling
    KVM: SVM: Annotate svm_x86_ops as __initdata
    KVM: VMX: Annotate vmx_x86_ops as __initdata
    KVM: x86: Drop __exit from kvm_x86_ops' hardware_unsetup()
    KVM: x86: Copy kvm_x86_ops by value to eliminate layer of indirection
    KVM: x86: Set kvm_x86_ops only after ->hardware_setup() completes
    KVM: VMX: Configure runtime hooks using vmx_x86_ops
    KVM: VMX: Move hardware_setup() definition below vmx_x86_ops
    KVM: x86: Move init-only kvm_x86_ops to separate struct
    KVM: Pass kvm_init()'s opaque param to additional arch funcs
    s390/gmap: return proper error code on ksm unsharing
    KVM: selftests: Fix cosmetic copy-paste error in vm_mem_region_move()
    KVM: Fix out of range accesses to memslots
    KVM: X86: Micro-optimize IPI fastpath delay
    KVM: X86: Delay read msr data iff writes ICR MSR
    KVM: PPC: Book3S HV: Add a capability for enabling secure guests
    KVM: arm64: GICv4.1: Expose HW-based SGIs in debugfs
    KVM: arm64: GICv4.1: Allow non-trapping WFI when using HW SGIs
    ...

    Linus Torvalds
     
  • Merge updates from Andrew Morton:
    "A large amount of MM, plenty more to come.

    Subsystems affected by this patch series:
    - tools
    - kthread
    - kbuild
    - scripts
    - ocfs2
    - vfs
    - mm: slub, kmemleak, pagecache, gup, swap, memcg, pagemap, mremap,
    sparsemem, kasan, pagealloc, vmscan, compaction, mempolicy,
    hugetlbfs, hugetlb"

    * emailed patches from Andrew Morton : (155 commits)
    include/linux/huge_mm.h: check PageTail in hpage_nr_pages even when !THP
    mm/hugetlb: fix build failure with HUGETLB_PAGE but not HUGEBTLBFS
    selftests/vm: fix map_hugetlb length used for testing read and write
    mm/hugetlb: remove unnecessary memory fetch in PageHeadHuge()
    mm/hugetlb.c: clean code by removing unnecessary initialization
    hugetlb_cgroup: add hugetlb_cgroup reservation docs
    hugetlb_cgroup: add hugetlb_cgroup reservation tests
    hugetlb: support file_region coalescing again
    hugetlb_cgroup: support noreserve mappings
    hugetlb_cgroup: add accounting for shared mappings
    hugetlb: disable region_add file_region coalescing
    hugetlb_cgroup: add reservation accounting for private mappings
    mm/hugetlb_cgroup: fix hugetlb_cgroup migration
    hugetlb_cgroup: add interface for charge/uncharge hugetlb reservations
    hugetlb_cgroup: add hugetlb_cgroup reservation counter
    hugetlbfs: Use i_mmap_rwsem to address page fault/truncate race
    hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization
    mm/memblock.c: remove redundant assignment to variable max_addr
    mm: mempolicy: require at least one nodeid for MPOL_PREFERRED
    mm: mempolicy: use VM_BUG_ON_VMA in queue_pages_test_walk()
    ...

    Linus Torvalds
     
  • Pull exec/proc updates from Eric Biederman:
    "This contains two significant pieces of work: the work to sort out
    proc_flush_task, and the work to solve a deadlock between strace and
    exec.

    Fixing proc_flush_task so that it no longer requires a persistent
    mount makes improvements to proc possible. The removal of the
    persistent mount solves an old regression that that caused the hidepid
    mount option to only work on remount not on mount. The regression was
    found and reported by the Android folks. This further allows Alexey
    Gladkov's work making proc mount options specific to an individual
    mount of proc to move forward.

    The work on exec starts solving a long standing issue with exec that
    it takes mutexes of blocking userspace applications, which makes exec
    extremely deadlock prone. For the moment this adds a second mutex with
    a narrower scope that handles all of the easy cases. Which makes the
    tricky cases easy to spot. With a little luck the code to solve those
    deadlocks will be ready by next merge window"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (25 commits)
    signal: Extend exec_id to 64bits
    pidfd: Use new infrastructure to fix deadlocks in execve
    perf: Use new infrastructure to fix deadlocks in execve
    proc: io_accounting: Use new infrastructure to fix deadlocks in execve
    proc: Use new infrastructure to fix deadlocks in execve
    kernel/kcmp.c: Use new infrastructure to fix deadlocks in execve
    kernel: doc: remove outdated comment cred.c
    mm: docs: Fix a comment in process_vm_rw_core
    selftests/ptrace: add test cases for dead-locks
    exec: Fix a deadlock in strace
    exec: Add exec_update_mutex to replace cred_guard_mutex
    exec: Move exec_mmap right after de_thread in flush_old_exec
    exec: Move cleanup of posix timers on exec out of de_thread
    exec: Factor unshare_sighand out of de_thread and call it separately
    exec: Only compute current once in flush_old_exec
    pid: Improve the comment about waiting in zap_pid_ns_processes
    proc: Remove the now unnecessary internal mount of proc
    uml: Create a private mount of proc for mconsole
    uml: Don't consult current to find the proc_mnt in mconsole_proc
    proc: Use a list of inodes to flush from proc
    ...

    Linus Torvalds
     
  • Add a new command line option 'e' to invoke "PCITEST_CLEAR_IRQ"
    ioctl. This can be used to clear the irqs set using the 'i' option.

    Signed-off-by: Kishon Vijay Abraham I
    Signed-off-by: Lorenzo Pieralisi

    Kishon Vijay Abraham I
     
  • Add a new command line option 'd' to use DMA for data transfers.
    It should be used with read, write or copy commands.

    Signed-off-by: Kishon Vijay Abraham I
    Signed-off-by: Lorenzo Pieralisi
    Tested-by: Alan Mikhak

    Kishon Vijay Abraham I
     
  • Commit fa7b9a805c79 ("tools/selftest/vm: allow choosing mem size and page
    size in map_hugetlb") added the possibility to change the size of memory
    mapped for the test, but left the read and write test using the default
    value. This is unnoticed when mapping a length greater than the default
    one, but segfaults otherwise.

    Fix read_bytes() and write_bytes() by giving them the real length.

    Also fix the call to munmap().

    Fixes: fa7b9a805c79 ("tools/selftest/vm: allow choosing mem size and page size in map_hugetlb")
    Signed-off-by: Christophe Leroy
    Signed-off-by: Andrew Morton
    Reviewed-by: Leonardo Bras
    Cc: Michael Ellerman
    Cc: Shuah Khan
    Cc:
    Link: http://lkml.kernel.org/r/9a404a13c871c4bd0ba9ede68f69a1225180dd7e.1580978385.git.christophe.leroy@c-s.fr
    Signed-off-by: Linus Torvalds

    Christophe Leroy
     
  • The tests use both shared and private mapped hugetlb memory, and monitors
    the hugetlb usage counter as well as the hugetlb reservation counter.
    They test different configurations such as hugetlb memory usage via
    hugetlbfs, or MAP_HUGETLB, or shmget/shmat, and with and without
    MAP_POPULATE.

    Also add test for hugetlb reservation reparenting, since this is a subtle
    issue.

    Signed-off-by: Mina Almasry
    Signed-off-by: Andrew Morton
    Tested-by: Sandipan Das [powerpc64]
    Acked-by: Mike Kravetz
    Cc: Sandipan Das
    Cc: David Rientjes
    Cc: Greg Thelen
    Cc: Shakeel Butt
    Cc: Shuah Khan
    Link: http://lkml.kernel.org/r/20200211213128.73302-8-almasrymina@google.com
    Signed-off-by: Linus Torvalds

    Mina Almasry
     
  • It was noticed that mlock2 tests are failing after 9c4e6b1a7027f ("mm,
    mlock, vmscan: no more skipping pagevecs") because the patch has changed
    the timing on when the page is added to the unevictable LRU list and thus
    gains the unevictable page flag.

    The test was just too dependent on the implementation details which were
    true at the time when it was introduced. Page flags and the timing when
    they are set is something no userspace should ever depend on. The test
    should be testing only for the user observable contract of the tested
    syscalls. Those are defined pretty well for the mlock and there are other
    means for testing them. In fact this is already done and testing for page
    flags can be safely dropped to achieve the aimed purpose. Present bits
    can be checked by /proc//smaps RSS field and the locking state by
    VmFlags although I would argue that Locked: field would be more
    appropriate.

    Drop all the page flag machinery and considerably simplify the test. This
    should be more robust for future kernel changes while checking the
    promised contract is still valid.

    Fixes: 9c4e6b1a7027f ("mm, mlock, vmscan: no more skipping pagevecs")
    Reported-by: Rafael Aquini
    Signed-off-by: Michal Hocko
    Signed-off-by: Andrew Morton
    Acked-by: Rafael Aquini
    Cc: Shakeel Butt
    Cc: Eric B Munson
    Cc: Shuah Khan
    Cc:
    Link: http://lkml.kernel.org/r/20200324154218.GS19542@dhcp22.suse.cz
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • Add a few simple self tests for the new flag MREMAP_DONTUNMAP, they are
    simple smoke tests which also demonstrate the behavior.

    [akpm@linux-foundation.org: convert eight-spaces to hard tabs]
    [bgeffon@google.com: v7]
    Link: http://lkml.kernel.org/r/20200221174248.244748-2-bgeffon@google.com
    [akpm@linux-foundation.org: coding style fixes]
    Signed-off-by: Brian Geffon
    Signed-off-by: Andrew Morton
    Cc: Vlastimil Babka
    Cc: "Michael S . Tsirkin"
    Cc: Brian Geffon
    Cc: Arnd Bergmann
    Cc: Andy Lutomirski
    Cc: Will Deacon
    Cc: Andrea Arcangeli
    Cc: Sonny Rao
    Cc: Minchan Kim
    Cc: Joel Fernandes
    Cc: Yu Zhao
    Cc: Jesse Barnes
    Cc: Nathan Chancellor
    Cc: Florian Weimer
    Cc: "Kirill A . Shutemov"
    Cc: Lokesh Gidra
    Link: http://lkml.kernel.org/r/20200218173221.237674-2-bgeffon@google.com
    Signed-off-by: Linus Torvalds

    Brian Geffon
     
  • It's good to have basic unit test coverage of the new FOLL_PIN behavior.
    Fortunately, the gup_benchmark unit test is extremely fast (a few
    milliseconds), so adding it the the run_vmtests suite is going to cause no
    noticeable change in running time.

    So, add two new invocations to run_vmtests:

    1) Run gup_benchmark with normal get_user_pages().

    2) Run gup_benchmark with pin_user_pages(). This is much like the
    first call, except that it sets FOLL_PIN.

    Running these two in quick succession also provide a visual comparison of
    the running times, which is convenient.

    The new invocations are fairly early in the run_vmtests script, because
    with test suites, it's usually preferable to put the shorter, faster tests
    first, all other things being equal.

    Signed-off-by: John Hubbard
    Signed-off-by: Andrew Morton
    Reviewed-by: Ira Weiny
    Cc: Jan Kara
    Cc: Jérôme Glisse
    Cc: Kirill A. Shutemov
    Cc: "Matthew Wilcox (Oracle)"
    Cc: Al Viro
    Cc: Christoph Hellwig
    Cc: Dan Williams
    Cc: Dave Chinner
    Cc: Jason Gunthorpe
    Cc: Jonathan Corbet
    Cc: Michal Hocko
    Cc: Mike Kravetz
    Cc: Shuah Khan
    Cc: Vlastimil Babka
    Link: http://lkml.kernel.org/r/20200211001536.1027652-11-jhubbard@nvidia.com
    Signed-off-by: Linus Torvalds

    John Hubbard
     
  • Up until now, gup_benchmark supported testing of the following kernel
    functions:

    * get_user_pages(): via the '-U' command line option
    * get_user_pages_longterm(): via the '-L' command line option
    * get_user_pages_fast(): as the default (no options required)

    Add test coverage for the new corresponding pin_*() functions:

    * pin_user_pages_fast(): via the '-a' command line option
    * pin_user_pages(): via the '-b' command line option

    Also, add an option for clarity: '-u' for what is now (still) the default
    choice: get_user_pages_fast().

    Also, for the commands that set FOLL_PIN, verify that the pages really are
    dma-pinned, via the new is_dma_pinned() routine. Those commands are:

    PIN_FAST_BENCHMARK : calls pin_user_pages_fast()
    PIN_BENCHMARK : calls pin_user_pages()

    In between the calls to pin_*() and unpin_user_pages(), check each page:
    if page_maybe_dma_pinned() returns false, then WARN and return.

    Do this outside of the benchmark timestamps, so that it doesn't affect
    reported times.

    Signed-off-by: John Hubbard
    Signed-off-by: Andrew Morton
    Reviewed-by: Ira Weiny
    Acked-by: Kirill A. Shutemov
    Cc: Jan Kara
    Cc: Jérôme Glisse
    Cc: "Matthew Wilcox (Oracle)"
    Cc: Al Viro
    Cc: Christoph Hellwig
    Cc: Dan Williams
    Cc: Dave Chinner
    Cc: Jason Gunthorpe
    Cc: Jonathan Corbet
    Cc: Michal Hocko
    Cc: Mike Kravetz
    Cc: Shuah Khan
    Cc: Vlastimil Babka
    Link: http://lkml.kernel.org/r/20200211001536.1027652-10-jhubbard@nvidia.com
    Signed-off-by: Linus Torvalds

    John Hubbard
     
  • A recent change to the netlink code: 6e237d099fac ("netlink: Relax attr
    validation for fixed length types") logs a warning when programs send
    messages with invalid attributes (e.g., wrong length for a u32). Yafang
    reported this error message for tools/accounting/getdelays.c.

    send_cmd() is wrongly adding 1 to the attribute length. As noted in
    include/uapi/linux/netlink.h nla_len should be NLA_HDRLEN + payload
    length, so drop the +1.

    Fixes: 9e06d3f9f6b1 ("per task delay accounting taskstats interface: documentation fix")
    Reported-by: Yafang Shao
    Signed-off-by: David Ahern
    Signed-off-by: Andrew Morton
    Tested-by: Yafang Shao
    Cc: Johannes Berg
    Cc: Shailabh Nagar
    Cc:
    Link: http://lkml.kernel.org/r/20200327173111.63922-1-dsahern@kernel.org
    Signed-off-by: Linus Torvalds

    David Ahern
     

02 Apr, 2020

3 commits

  • Pull XArray updates from Matthew Wilcox:

    - Fix two bugs which affected multi-index entries larger than 2^26
    indices

    - Fix some documentation

    - Remove unused IDA macros

    - Add a small optimisation for tiny configurations

    - Fix a bug which could cause an RCU walker to terminate a marked walk
    early

    * tag 'xarray-5.7' of git://git.infradead.org/users/willy/linux-dax:
    xarray: Fix early termination of xas_for_each_marked
    radix tree test suite: Support kmem_cache alignment
    XArray: Optimise xas_sibling() if !CONFIG_XARRAY_MULTI
    ida: remove abandoned macros
    XArray: Fix incorrect comment in header file
    XArray: Fix xas_pause for large multi-index entries
    XArray: Fix xa_find_next for large multi-index entries

    Linus Torvalds
     
  • …kernel/git/shuah/linux-kselftest

    Pull kunit updates from Shuah Khan:
    "This kunit update consists of:

    - debugfs support for displaying kunit test suite results.

    This is especially useful for module-loaded tests to allow
    disentangling of test result display from other dmesg events.
    CONFIG_KUNIT_DEBUGFS enables/disables the debugfs support.

    - Several fixes and improvements to kunit framework and tool"

    * tag 'linux-kselftest-kunit-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
    kunit: tool: add missing test data file content
    kunit: update documentation to describe debugfs representation
    kunit: subtests should be indented 4 spaces according to TAP
    kunit: add log test
    kunit: add debugfs /sys/kernel/debug/kunit/<suite>/results display
    Documentation: kunit: Make the KUnit documentation less UML-specific
    Fix linked-list KUnit test when run multiple times
    kunit: kunit_tool: Allow .kunitconfig to disable config items
    kunit: Always print actual pointer values in asserts
    kunit: add --make_options
    kunit: Run all KUnit tests through allyesconfig
    kunit: kunit_parser: make parser more robust

    Linus Torvalds
     
  • …/git/shuah/linux-kselftest

    Pull kselftest update from Shuah Khan:
    "This kselftest update consists of:

    - resctrl_tests for resctrl file system. resctrl isn't included in
    the default TARGETS list in kselftest Makefile. It can be run
    manually.

    - Kselftest harness improvements.

    - Kselftest framework and individual test fixes to support runs on
    Kernel CI rings and other environments that use relocatable build
    and install features.

    - Minor cleanups and typo fixes"

    * tag 'linux-kselftest-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (25 commits)
    selftests: enforce local header dependency in lib.mk
    selftests: Fix memfd to support relocatable build (O=objdir)
    selftests: Fix seccomp to support relocatable build (O=objdir)
    selftests/harness: Handle timeouts cleanly
    selftests/harness: Move test child waiting logic
    selftests: android: Fix custom install from skipping test progs
    selftests: android: ion: Fix ionmap_test compile error
    selftests: Fix kselftest O=objdir build from cluttering top level objdir
    selftests/seccomp: Adjust test fixture counts
    selftests/ftrace: Fix typo in trigger-multihist.tc
    selftests/timens: Remove duplicated include <time.h>
    selftests/resctrl: fix spelling mistake "Errror" -> "Error"
    selftests/resctrl: Add the test in MAINTAINERS
    selftests/resctrl: Disable MBA and MBM tests for AMD
    selftests/resctrl: Use cache index3 id for AMD schemata masks
    selftests/resctrl: Add vendor detection mechanism
    selftests/resctrl: Add Cache Allocation Technology (CAT) selftest
    selftests/resctrl: Add Cache QoS Monitoring (CQM) selftest
    selftests/resctrl: Add MBA test
    selftests/resctrl: Add MBM test
    ...

    Linus Torvalds
     

01 Apr, 2020

3 commits

  • Pull networking updates from David Miller:
    "Highlights:

    1) Fix the iwlwifi regression, from Johannes Berg.

    2) Support BSS coloring and 802.11 encapsulation offloading in
    hardware, from John Crispin.

    3) Fix some potential Spectre issues in qtnfmac, from Sergey
    Matyukevich.

    4) Add TTL decrement action to openvswitch, from Matteo Croce.

    5) Allow paralleization through flow_action setup by not taking the
    RTNL mutex, from Vlad Buslov.

    6) A lot of zero-length array to flexible-array conversions, from
    Gustavo A. R. Silva.

    7) Align XDP statistics names across several drivers for consistency,
    from Lorenzo Bianconi.

    8) Add various pieces of infrastructure for offloading conntrack, and
    make use of it in mlx5 driver, from Paul Blakey.

    9) Allow using listening sockets in BPF sockmap, from Jakub Sitnicki.

    10) Lots of parallelization improvements during configuration changes
    in mlxsw driver, from Ido Schimmel.

    11) Add support to devlink for generic packet traps, which report
    packets dropped during ACL processing. And use them in mlxsw
    driver. From Jiri Pirko.

    12) Support bcmgenet on ACPI, from Jeremy Linton.

    13) Make BPF compatible with RT, from Thomas Gleixnet, Alexei
    Starovoitov, and your's truly.

    14) Support XDP meta-data in virtio_net, from Yuya Kusakabe.

    15) Fix sysfs permissions when network devices change namespaces, from
    Christian Brauner.

    16) Add a flags element to ethtool_ops so that drivers can more simply
    indicate which coalescing parameters they actually support, and
    therefore the generic layer can validate the user's ethtool
    request. Use this in all drivers, from Jakub Kicinski.

    17) Offload FIFO qdisc in mlxsw, from Petr Machata.

    18) Support UDP sockets in sockmap, from Lorenz Bauer.

    19) Fix stretch ACK bugs in several TCP congestion control modules,
    from Pengcheng Yang.

    20) Support virtual functiosn in octeontx2 driver, from Tomasz
    Duszynski.

    21) Add region operations for devlink and use it in ice driver to dump
    NVM contents, from Jacob Keller.

    22) Add support for hw offload of MACSEC, from Antoine Tenart.

    23) Add support for BPF programs that can be attached to LSM hooks,
    from KP Singh.

    24) Support for multiple paths, path managers, and counters in MPTCP.
    From Peter Krystad, Paolo Abeni, Florian Westphal, Davide Caratti,
    and others.

    25) More progress on adding the netlink interface to ethtool, from
    Michal Kubecek"

    * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2121 commits)
    net: ipv6: rpl_iptunnel: Fix potential memory leak in rpl_do_srh_inline
    cxgb4/chcr: nic-tls stats in ethtool
    net: dsa: fix oops while probing Marvell DSA switches
    net/bpfilter: remove superfluous testing message
    net: macb: Fix handling of fixed-link node
    net: dsa: ksz: Select KSZ protocol tag
    netdevsim: dev: Fix memory leak in nsim_dev_take_snapshot_write
    net: stmmac: add EHL 2.5Gbps PCI info and PCI ID
    net: stmmac: add EHL PSE0 & PSE1 1Gbps PCI info and PCI ID
    net: stmmac: create dwmac-intel.c to contain all Intel platform
    net: dsa: bcm_sf2: Support specifying VLAN tag egress rule
    net: dsa: bcm_sf2: Add support for matching VLAN TCI
    net: dsa: bcm_sf2: Move writing of CFP_DATA(5) into slicing functions
    net: dsa: bcm_sf2: Check earlier for FLOW_EXT and FLOW_MAC_EXT
    net: dsa: bcm_sf2: Disable learning for ASP port
    net: dsa: b53: Deny enslaving port 7 for 7278 into a bridge
    net: dsa: b53: Prevent tagged VLAN on port 7 for 7278
    net: dsa: b53: Restore VLAN entries upon (re)configuration
    net: dsa: bcm_sf2: Fix overflow checks
    hv_netvsc: Remove unnecessary round_up for recv_completion_cnt
    ...

    Linus Torvalds
     
  • Pull x86 platform driver updates from Andy Shevchenko:

    - Fix for improper handling of fan_boost_mode in sysfs for ASUS
    laptops.

    - On newer ASUS laptops the 1st battery is named differently, here is a
    fix.

    - Fix Lex 2I385SW to allow both network cards to be used.

    - The power integrated circuit driver for Surface 3 has been added.

    - Refactor and clean up of Intel PMC driver and enable it on Intel
    Jasper Lake.

    - Clean up of Dell RBU driver.

    - Big update for Intel Speed Select technology support tool and driver.

    * tag 'platform-drivers-x86-v5.7-1' of git://git.infradead.org/linux-platform-drivers-x86: (75 commits)
    platform/x86: surface3_power: Fix always true condition in mshw0011_space_handler()
    platform/x86: surface3_power: Fix Kconfig section ordering
    platform/x86: surface3_power: Add missed headers
    platform/x86: surface3_power: Reformat GUID assignment
    platform/x86: surface3_power: Drop useless macro ACPI_PTR()
    platform/x86: surface3_power: Prefix POLL_INTERVAL with SURFACE_3
    platform/x86: surface3_power: Simplify mshw0011_adp_psr() to one liner
    platform/x86: surface3_power: Use dev_err() instead of pr_err()
    platform/x86: surface3_power: Drop unused structure definition
    platform/x86: surface3_power: MSHW0011 rev-eng implementation
    platform/x86: intel_pmc_core: Make pmc_core_substate_res_show() generic
    platform/x86: intel_pmc_core: Make pmc_core_lpm_display() generic for platforms that support sub-states
    tools/power/x86/intel-speed-select: Fix a typo in error message
    tools/power/x86/intel-speed-select: Update version
    tools/power/x86/intel-speed-select: Avoid duplicate Package strings for json
    tools/power/x86/intel-speed-select: Add display for enabled cpus count
    tools/power/x86/intel-speed-select: Print friendly warning for bad command line
    tools/power/x86/intel-speed-select: Fix avx options for turbo-freq feature
    tools/power/x86/intel-speed-select: Improve CLX commands
    tools/power/x86/intel-speed-select: Show error for invalid CPUs in the options
    ...

    Linus Torvalds
     
  • Pull misc x86 updates from Ingo Molnar:

    - extend the decoder maps with CET instructions

    - fix !vDSO corner cases

    * 'x86-misc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    perf/tests: Add CET instructions to the new instructions test
    x86/insn: Add Control-flow Enforcement (CET) instructions to the opcode map
    selftests/x86/ptrace_syscall_32: Fix no-vDSO segfault
    selftests/x86/vdso: Fix no-vDSO segfaults

    Linus Torvalds
     

31 Mar, 2020

18 commits

  • KVM/arm updates for Linux 5.7

    - GICv4.1 support
    - 32bit host removal

    Paolo Bonzini
     
  • Signed-off-by: David S. Miller

    David S. Miller
     
  • Pull x86 entry code updates from Thomas Gleixner:

    - Convert the 32bit syscalls to be pt_regs based which removes the
    requirement to push all 6 potential arguments onto the stack and
    consolidates the interface with the 64bit variant

    - The first small portion of the exception and syscall related entry
    code consolidation which aims to address the recently discovered
    issues vs. RCU, int3, NMI and some other exceptions which can
    interrupt any context. The bulk of the changes is still work in
    progress and aimed for 5.8.

    - A few lockdep namespace cleanups which have been applied into this
    branch to keep the prerequisites for the ongoing work confined.

    * tag 'x86-entry-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (35 commits)
    x86/entry: Fix build error x86 with !CONFIG_POSIX_TIMERS
    lockdep: Rename trace_{hard,soft}{irq_context,irqs_enabled}()
    lockdep: Rename trace_softirqs_{on,off}()
    lockdep: Rename trace_hardirq_{enter,exit}()
    x86/entry: Rename ___preempt_schedule
    x86: Remove unneeded includes
    x86/entry: Drop asmlinkage from syscalls
    x86/entry/32: Enable pt_regs based syscalls
    x86/entry/32: Use IA32-specific wrappers for syscalls taking 64-bit arguments
    x86/entry/32: Rename 32-bit specific syscalls
    x86/entry/32: Clean up syscall_32.tbl
    x86/entry: Remove ABI prefixes from functions in syscall tables
    x86/entry/64: Add __SYSCALL_COMMON()
    x86/entry: Remove syscall qualifier support
    x86/entry/64: Remove ptregs qualifier from syscall table
    x86/entry: Move max syscall number calculation to syscallhdr.sh
    x86/entry/64: Split X32 syscall table into its own file
    x86/entry/64: Move sys_ni_syscall stub to common.c
    x86/entry/64: Use syscall wrappers for x32_rt_sigreturn
    x86/entry: Refactor SYS_NI macros
    ...

    Linus Torvalds
     
  • Add test cases that verify that each registered packet trap policer:

    * Honors that imposed limitations of rate and burst size
    * Able to police trapped packets to the specified rate
    * Able to police trapped packets to the specified burst size
    * Able to be unbound from its trap group

    Signed-off-by: Ido Schimmel
    Signed-off-by: David S. Miller

    Ido Schimmel
     
  • Add test cases for packet trap policer set / show commands as well as
    for the binding of these policers to packet trap groups.

    Both good and bad flows are tested for maximum coverage.

    v2:
    * Add test case with new 'fail_trap_policer_set' knob
    * Add test case for partially modified trap group

    Signed-off-by: Ido Schimmel
    Signed-off-by: David S. Miller

    Ido Schimmel
     
  • Add selftests to exercise FD-based cgroup BPF program attachments and their
    intermixing with legacy cgroup BPF attachments. Auto-detachment and program
    replacement (both unconditional and cmpxchng-like) are tested as well.

    Signed-off-by: Andrii Nakryiko
    Signed-off-by: Alexei Starovoitov
    Link: https://lore.kernel.org/bpf/20200330030001.2312810-5-andriin@fb.com

    Andrii Nakryiko
     
  • Add bpf_program__attach_cgroup(), which uses BPF_LINK_CREATE subcommand to
    create an FD-based kernel bpf_link. Also add low-level bpf_link_create() API.

    If expected_attach_type is not specified explicitly with
    bpf_program__set_expected_attach_type(), libbpf will try to determine proper
    attach type from BPF program's section definition.

    Also add support for bpf_link's underlying BPF program replacement:
    - unconditional through high-level bpf_link__update_program() API;
    - cmpxchg-like with specifying expected current BPF program through
    low-level bpf_link_update() API.

    Signed-off-by: Andrii Nakryiko
    Signed-off-by: Alexei Starovoitov
    Link: https://lore.kernel.org/bpf/20200330030001.2312810-4-andriin@fb.com

    Andrii Nakryiko
     
  • Implement new sub-command to attach cgroup BPF programs and return FD-based
    bpf_link back on success. bpf_link, once attached to cgroup, cannot be
    replaced, except by owner having its FD. Cgroup bpf_link supports only
    BPF_F_ALLOW_MULTI semantics. Both link-based and prog-based BPF_F_ALLOW_MULTI
    attachments can be freely intermixed.

    To prevent bpf_cgroup_link from keeping cgroup alive past the point when no
    BPF program can be executed, implement auto-detachment of link. When
    cgroup_bpf_release() is called, all attached bpf_links are forced to release
    cgroup refcounts, but they leave bpf_link otherwise active and allocated, as
    well as still owning underlying bpf_prog. This is because user-space might
    still have FDs open and active, so bpf_link as a user-referenced object can't
    be freed yet. Once last active FD is closed, bpf_link will be freed and
    underlying bpf_prog refcount will be dropped. But cgroup refcount won't be
    touched, because cgroup is released already.

    The inherent race between bpf_cgroup_link release (from closing last FD) and
    cgroup_bpf_release() is resolved by both operations taking cgroup_mutex. So
    the only additional check required is when bpf_cgroup_link attempts to detach
    itself from cgroup. At that time we need to check whether there is still
    cgroup associated with that link. And if not, exit with success, because
    bpf_cgroup_link was already successfully detached.

    Signed-off-by: Andrii Nakryiko
    Signed-off-by: Alexei Starovoitov
    Acked-by: Roman Gushchin
    Link: https://lore.kernel.org/bpf/20200330030001.2312810-2-andriin@fb.com

    Andrii Nakryiko
     
  • Pull perf updates from Ingo Molnar:
    "The main changes in this cycle were:

    Kernel side changes:

    - A couple of x86/cpu cleanups and changes were grandfathered in due
    to patch dependencies. These clean up the set of CPU model/family
    matching macros with a consistent namespace and C99 initializer
    style.

    - A bunch of updates to various low level PMU drivers:
    * AMD Family 19h L3 uncore PMU
    * Intel Tiger Lake uncore support
    * misc fixes to LBR TOS sampling

    - optprobe fixes

    - perf/cgroup: optimize cgroup event sched-in processing

    - misc cleanups and fixes

    Tooling side changes are to:

    - perf {annotate,expr,record,report,stat,test}

    - perl scripting

    - libapi, libperf and libtraceevent

    - vendor events on Intel and S390, ARM cs-etm

    - Intel PT updates

    - Documentation changes and updates to core facilities

    - misc cleanups, fixes and other enhancements"

    * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (89 commits)
    cpufreq/intel_pstate: Fix wrong macro conversion
    x86/cpu: Cleanup the now unused CPU match macros
    hwrng: via_rng: Convert to new X86 CPU match macros
    crypto: Convert to new CPU match macros
    ASoC: Intel: Convert to new X86 CPU match macros
    powercap/intel_rapl: Convert to new X86 CPU match macros
    PCI: intel-mid: Convert to new X86 CPU match macros
    mmc: sdhci-acpi: Convert to new X86 CPU match macros
    intel_idle: Convert to new X86 CPU match macros
    extcon: axp288: Convert to new X86 CPU match macros
    thermal: Convert to new X86 CPU match macros
    hwmon: Convert to new X86 CPU match macros
    platform/x86: Convert to new CPU match macros
    EDAC: Convert to new X86 CPU match macros
    cpufreq: Convert to new X86 CPU match macros
    ACPI: Convert to new X86 CPU match macros
    x86/platform: Convert to new CPU match macros
    x86/kernel: Convert to new CPU match macros
    x86/kvm: Convert to new CPU match macros
    x86/perf/events: Convert to new CPU match macros
    ...

    Linus Torvalds
     
  • Pull locking updates from Ingo Molnar:
    "The main changes in this cycle were:

    - Continued user-access cleanups in the futex code.

    - percpu-rwsem rewrite that uses its own waitqueue and atomic_t
    instead of an embedded rwsem. This addresses a couple of
    weaknesses, but the primary motivation was complications on the -rt
    kernel.

    - Introduce raw lock nesting detection on lockdep
    (CONFIG_PROVE_RAW_LOCK_NESTING=y), document the raw_lock vs. normal
    lock differences. This too originates from -rt.

    - Reuse lockdep zapped chain_hlocks entries, to conserve RAM
    footprint on distro-ish kernels running into the "BUG:
    MAX_LOCKDEP_CHAIN_HLOCKS too low!" depletion of the lockdep
    chain-entries pool.

    - Misc cleanups, smaller fixes and enhancements - see the changelog
    for details"

    * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (55 commits)
    fs/buffer: Make BH_Uptodate_Lock bit_spin_lock a regular spinlock_t
    thermal/x86_pkg_temp: Make pkg_temp_lock a raw_spinlock_t
    Documentation/locking/locktypes: Minor copy editor fixes
    Documentation/locking/locktypes: Further clarifications and wordsmithing
    m68knommu: Remove mm.h include from uaccess_no.h
    x86: get rid of user_atomic_cmpxchg_inatomic()
    generic arch_futex_atomic_op_inuser() doesn't need access_ok()
    x86: don't reload after cmpxchg in unsafe_atomic_op2() loop
    x86: convert arch_futex_atomic_op_inuser() to user_access_begin/user_access_end()
    objtool: whitelist __sanitizer_cov_trace_switch()
    [parisc, s390, sparc64] no need for access_ok() in futex handling
    sh: no need of access_ok() in arch_futex_atomic_op_inuser()
    futex: arch_futex_atomic_op_inuser() calling conventions change
    completion: Use lockdep_assert_RT_in_threaded_ctx() in complete_all()
    lockdep: Add posixtimer context tracing bits
    lockdep: Annotate irq_work
    lockdep: Add hrtimer context tracing bits
    lockdep: Introduce wait-type checks
    completion: Use simple wait queues
    sched/swait: Prepare usage in completions
    ...

    Linus Torvalds
     
  • Pull RCU updates from Ingo Molnar:
    "The main changes in this cycle were:

    - Make kfree_rcu() use kfree_bulk() for added performance

    - RCU updates

    - Callback-overload handling updates

    - Tasks-RCU KCSAN and sparse updates

    - Locking torture test and RCU torture test updates

    - Documentation updates

    - Miscellaneous fixes"

    * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (74 commits)
    rcu: Make rcu_barrier() account for offline no-CBs CPUs
    rcu: Mark rcu_state.gp_seq to detect concurrent writes
    Documentation/memory-barriers: Fix typos
    doc: Add rcutorture scripting to torture.txt
    doc/RCU/rcu: Use https instead of http if possible
    doc/RCU/rcu: Use absolute paths for non-rst files
    doc/RCU/rcu: Use ':ref:' for links to other docs
    doc/RCU/listRCU: Update example function name
    doc/RCU/listRCU: Fix typos in a example code snippets
    doc/RCU/Design: Remove remaining HTML tags in ReST files
    doc: Add some more RCU list patterns in the kernel
    rcutorture: Set KCSAN Kconfig options to detect more data races
    rcutorture: Manually clean up after rcu_barrier() failure
    rcutorture: Make rcu_torture_barrier_cbs() post from corresponding CPU
    rcuperf: Measure memory footprint during kfree_rcu() test
    rcutorture: Annotation lockless accesses to rcu_torture_current
    rcutorture: Add READ_ONCE() to rcu_torture_count and rcu_torture_batch
    rcutorture: Fix stray access to rcu_fwd_cb_nodelay
    rcutorture: Fix rcu_torture_one_read()/rcu_torture_writer() data race
    rcutorture: Make kvm-find-errors.sh abort on bad directory
    ...

    Linus Torvalds
     
  • Pull objtool updates from Ingo Molnar:
    "The biggest changes in this cycle were the vmlinux.o optimizations by
    Peter Zijlstra, which are preparatory and optimization work to run
    objtool against the much richer vmlinux.o object file, to perform
    new, whole-program section based logic. That work exposed a handful
    of problems with the existing code, which fixes and optimizations are
    merged here. The complete 'vmlinux.o and noinstr' work is still work
    in progress, targeted for v5.8.

    There's also assorted fixes and enhancements from Josh Poimboeuf.

    In particular I'd like to draw attention to commit 644592d328370,
    which turns fatal objtool errors into failed kernel builds. This
    behavior is IMO now justified on multiple grounds (it's easy currently
    to not notice an essentially corrupted kernel build), and the commit
    has been in -next testing for several weeks, but there could still be
    build failures with old or weird toolchains. Should that be widespread
    or high profile enough then I'd suggest a quick revert, to not hold up
    the merge window"

    * 'core-objtool-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
    objtool: Re-arrange validate_functions()
    objtool: Optimize find_rela_by_dest_range()
    objtool: Delete cleanup()
    objtool: Optimize read_sections()
    objtool: Optimize find_symbol_by_name()
    objtool: Resize insn_hash
    objtool: Rename find_containing_func()
    objtool: Optimize find_symbol_*() and read_symbols()
    objtool: Optimize find_section_by_name()
    objtool: Optimize find_section_by_index()
    objtool: Add a statistics mode
    objtool: Optimize find_symbol_by_index()
    x86/kexec: Make relocate_kernel_64.S objtool clean
    x86/kexec: Use RIP relative addressing
    objtool: Rename func_for_each_insn_all()
    objtool: Rename func_for_each_insn()
    objtool: Introduce validate_return()
    objtool: Improve call destination function detection
    objtool: Fix clang switch table edge case
    objtool: Add relocation check for alternative sections
    ...

    Linus Torvalds
     
  • Pull power management updates from Rafael Wysocki:
    "These clean up and rework the PM QoS API, address a suspend-to-idle
    wakeup regression on some ACPI-based platforms, clean up and extend a
    few cpuidle drivers, update multiple cpufreq drivers and cpufreq
    documentation, and fix a number of issues in devfreq and several other
    things all over.

    Specifics:

    - Clean up and rework the PM QoS API to simplify the code and reduce
    the size of it (Rafael Wysocki).

    - Fix a suspend-to-idle wakeup regression on Dell XPS13 9370 and
    similar platforms where the USB plug/unplug events are handled by
    the EC (Rafael Wysocki).

    - CLean up the intel_idle and PSCI cpuidle drivers (Rafael Wysocki,
    Ulf Hansson).

    - Extend the haltpoll cpuidle driver so that it can be forced to run
    on some systems where it refused to load (Maciej Szmigiero).

    - Convert several cpufreq documents to the .rst format and move the
    legacy driver documentation into one common file (Mauro Carvalho
    Chehab, Rafael Wysocki).

    - Update several cpufreq drivers:

    * Extend and fix the imx-cpufreq-dt driver (Anson Huang).

    * Improve the -EPROBE_DEFER handling and fix unwanted CPU
    overclocking on i.MX6ULL in imx6q-cpufreq (Anson Huang,
    Christoph Niedermaier).

    * Add support for Krait based SoCs to the qcom driver (Ansuel
    Smith).

    * Add support for OPP_PLUS to ti-cpufreq (Lokesh Vutla).

    * Add platform specific intermediate callbacks support to
    cpufreq-dt and update the imx6q driver (Peng Fan).

    * Simplify and consolidate some pieces of the intel_pstate
    driver and update its documentation (Rafael Wysocki, Alex
    Hung).

    - Fix several devfreq issues:

    * Remove unneeded extern keyword from a devfreq header file and
    use the DEVFREQ_GOV_UPDATE_INTERNAL event name instead of
    DEVFREQ_GOV_INTERNAL (Chanwoo Choi).

    * Fix the handling of dev_pm_qos_remove_request() result
    (Leonard Crestez).

    * Use constant name for userspace governor (Pierre Kuo).

    * Get rid of doc warnings and fix a typo (Christophe JAILLET).

    - Use built-in RCU list checking in some places in the PM core to
    avoid false-positive RCU usage warnings (Madhuparna Bhowmik).

    - Add explicit READ_ONCE()/WRITE_ONCE() annotations to low-level PM
    QoS routines (Qian Cai).

    - Fix removal of wakeup sources to avoid NULL pointer dereferences in
    a corner case (Neeraj Upadhyay).

    - Clean up the handling of hibernate compat ioctls and fix the
    related documentation (Eric Biggers).

    - Update the idle_inject power capping driver to use variable-length
    arrays instead of zero-length arrays (Gustavo Silva).

    - Fix list format in a PM QoS document (Randy Dunlap).

    - Make the cpufreq stats module use scnprintf() to avoid potential
    buffer overflows (Takashi Iwai).

    - Add pm_runtime_get_if_active() to PM-runtime API (Sakari Ailus).

    - Allow no domain-idle-states DT property in generic PM domains (Ulf
    Hansson).

    - Fix a broken y-axis scale in the intel_pstate_tracer utility (Doug
    Smythies)"

    * tag 'pm-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (78 commits)
    cpufreq: intel_pstate: Simplify intel_pstate_cpu_init()
    tools/power/x86/intel_pstate_tracer: fix a broken y-axis scale
    ACPI: PM: s2idle: Refine active GPEs check
    ACPICA: Allow acpi_any_gpe_status_set() to skip one GPE
    PM: sleep: wakeup: Skip wakeup_source_sysfs_remove() if device is not there
    PM / devfreq: Get rid of some doc warnings
    PM / devfreq: Fix handling dev_pm_qos_remove_request result
    PM / devfreq: Fix a typo in a comment
    PM / devfreq: Change to DEVFREQ_GOV_UPDATE_INTERVAL event name
    PM / devfreq: Remove unneeded extern keyword
    PM / devfreq: Use constant name of userspace governor
    ACPI: PM: s2idle: Fix comment in acpi_s2idle_prepare_late()
    cpufreq: qcom: Add support for krait based socs
    cpufreq: imx6q-cpufreq: Improve the logic of -EPROBE_DEFER handling
    cpufreq: Use scnprintf() for avoiding potential buffer overflow
    cpuidle: psci: Split psci_dt_cpu_init_idle()
    PM / Domains: Allow no domain-idle-states DT property in genpd when parsing
    PM / hibernate: Remove unnecessary compat ioctl overrides
    PM: hibernate: fix docs for ioctls that return loff_t via pointer
    Documentation: intel_pstate: update links for references
    ...

    Linus Torvalds
     
  • Its possible to have divergent ALU32 and ALU64 bounds when using JMP32
    instructins and ALU64 arithmatic operations. Sometimes the clang will
    even generate this code. Because the case is a bit tricky lets add
    a specific test for it.

    Here is pseudocode asm version to illustrate the idea,

    1 r0 = 0xffffffff00000001;
    2 if w0 > 1 goto %l[fail];
    3 r0 += 1
    5 if w0 > 2 goto %l[fail]
    6 exit

    The intent here is the verifier will fail the load if the 32bit bounds
    are not tracked correctly through ALU64 op. Similarly we can check the
    64bit bounds are correctly zero extended after ALU32 ops.

    1 r0 = 0xffffffff00000001;
    2 w0 += 1
    2 if r0 > 3 goto %l[fail];
    6 exit

    The above will fail if we do not correctly zero extend 64bit bounds
    after 32bit op.

    Signed-off-by: John Fastabend
    Signed-off-by: Alexei Starovoitov
    Link: https://lore.kernel.org/bpf/158560430155.10843.514209255758200922.stgit@john-Precision-5820-Tower

    John Fastabend
     
  • After changes to add update_reg_bounds after ALU ops and 32-bit bounds
    tracking truncation of boundary crossing range will fail earlier and with
    a different error message. Now the test error trace is the following

    11: (17) r1 -= 2147483584
    12: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0)
    R1_w=invP(id=0,smin_value=-2147483584,smax_value=63)
    R10=fp0 fp-8_w=mmmmmmmm
    12: (17) r1 -= 2147483584
    13: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0)
    R1_w=invP(id=0,
    umin_value=18446744069414584448,umax_value=18446744071562068095,
    var_off=(0xffffffff00000000; 0xffffffff))
    R10=fp0 fp-8_w=mmmmmmmm
    13: (77) r1 >>= 8
    14: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0)
    R1_w=invP(id=0,
    umin_value=72057594021150720,umax_value=72057594029539328,
    var_off=(0xffffffff000000; 0xffffff),
    s32_min_value=-16777216,s32_max_value=-1,
    u32_min_value=-16777216)
    R10=fp0 fp-8_w=mmmmmmmm
    14: (0f) r0 += r1
    value 72057594021150720 makes map_value pointer be out of bounds

    Because we have 'umin_value == umax_value' instead of previously
    where 'umin_value != umax_value' we can now fail earlier noting
    that pointer addition is out of bounds.

    Signed-off-by: John Fastabend
    Signed-off-by: Alexei Starovoitov
    Link: https://lore.kernel.org/bpf/158560428103.10843.6316594510312781186.stgit@john-Precision-5820-Tower

    John Fastabend
     
  • With current ALU32 subreg handling and retval refine fix from last
    patches we see an expected failure in test_verifier. With verbose
    verifier state being printed at each step for clarity we have the
    following relavent lines [I omit register states that are not
    necessarily useful to see failure cause],

    #101/p bpf_get_stack return R0 within range FAIL
    Failed to load prog 'Success'!
    [..]
    14: (85) call bpf_get_stack#67
    R0_w=map_value(id=0,off=0,ks=8,vs=48,imm=0)
    R3_w=inv48
    15:
    R0=inv(id=0,smax_value=48,var32_off=(0x0; 0xffffffff))
    15: (b7) r1 = 0
    16:
    R0=inv(id=0,smax_value=48,var32_off=(0x0; 0xffffffff))
    R1_w=inv0
    16: (bf) r8 = r0
    17:
    R0=inv(id=0,smax_value=48,var32_off=(0x0; 0xffffffff))
    R1_w=inv0
    R8_w=inv(id=0,smax_value=48,var32_off=(0x0; 0xffffffff))
    17: (67) r8 <>= 32
    19
    R0=inv(id=0,smax_value=48,var32_off=(0x0; 0xffffffff))
    R1_w=inv0
    R8_w=inv(id=0,smin_value=-2147483648,
    smax_value=2147483647,
    var32_off=(0x0; 0xffffffff))
    19: (cd) if r1 s< r8 goto pc+16
    R0=inv(id=0,smax_value=48,var32_off=(0x0; 0xffffffff))
    R1_w=inv0
    R8_w=inv(id=0,smin_value=-2147483648,
    smax_value=0,
    var32_off=(0x0; 0xffffffff))
    20:
    R0=inv(id=0,smax_value=48,var32_off=(0x0; 0xffffffff))
    R1_w=inv0
    R8_w=inv(id=0,smin_value=-2147483648,
    smax_value=0,
    R9=inv48
    20: (1f) r9 -= r8
    21: (bf) r2 = r7
    22:
    R2_w=map_value(id=0,off=0,ks=8,vs=48,imm=0)
    22: (0f) r2 += r8
    value -2147483648 makes map_value pointer be out of bounds

    After call bpf_get_stack() on line 14 and some moves we have at line 16
    an r8 bound with max_value 48 but an unknown min value. This is to be
    expected bpf_get_stack call can only return a max of the input size but
    is free to return any negative error in the 32-bit register space. The
    C helper is returning an int so will use lower 32-bits.

    Lines 17 and 18 clear the top 32 bits with a left/right shift but use
    ARSH so we still have worst case min bound before line 19 of -2147483648.
    At this point the signed check 'r1 s< r8' meant to protect the addition
    on line 22 where dst reg is a map_value pointer may very well return
    true with a large negative number. Then the final line 22 will detect
    this as an invalid operation and fail the program. What we want to do
    is proceed only if r8 is positive non-error. So change 'r1 s< r8' to
    'r1 s> r8' so that we jump if r8 is negative.

    Next we will throw an error because we access past the end of the map
    value. The map value size is 48 and sizeof(struct test_val) is 48 so
    we walk off the end of the map value on the second call to
    get bpf_get_stack(). Fix this by changing sizeof(struct test_val) to
    24 by using 'sizeof(struct test_val) / 2'. After this everything passes
    as expected.

    Signed-off-by: John Fastabend
    Signed-off-by: Alexei Starovoitov
    Link: https://lore.kernel.org/bpf/158560426019.10843.3285429543232025187.stgit@john-Precision-5820-Tower

    John Fastabend
     
  • Before this series the verifier would clamp return bounds of
    bpf_get_stack() to [0, X] and this led the verifier to believe
    that a JMP_JSLT 0 would be false and so would prune that path.

    The result is anything hidden behind that JSLT would be unverified.
    Add a test to catch this case by hiding an goto pc-1 behind the
    check which will cause an infinite loop if not rejected.

    Signed-off-by: John Fastabend
    Signed-off-by: Alexei Starovoitov
    Link: https://lore.kernel.org/bpf/158560423908.10843.11783152347709008373.stgit@john-Precision-5820-Tower

    John Fastabend
     
  • Pull spi and regulator updates from Mark Brown:
    "At one point in the release cycle I managed to fat finger things and
    apply some SPI fixes onto a regulator branch and merge that into the
    SPI tree, then pull in a change shared with the MTD tree moving the
    Mediatek quadspi driver over to become the Mediatek spi-nor driver in
    the SPI tree.

    This has made a mess which I only just noticed while preparing this
    and I can't see a sensible way to unpick things due to other
    subsequent merge commits especially the pull from MTD so it looks like
    the most sensible thing to do is give up and combine the two pull
    requests.

    Fortunately both subsystems were fairly quiet this cycle, the
    highlights are:

    regulator:

    - Support for Monoloithic Power Systems MP5416, MP8867 and MPS8869
    and Qualcomm PMI8994 and SMB208.

    SPI:

    - Lots of enhancements for spi-fsl-dspi, including XSPI mode support,
    from Vladimir Oltean.

    - Support for amlogic Meson G12A, IBM FSI, Mediatek spi-nor (moved
    from MTD), NXP i.MX8Mx, Rockchip PX30, RK3308 and RK3328, and
    Qualcomm Atheros AR934x/QCA95xx"

    * tag 'regulator-spi-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/misc: (118 commits)
    spi: efm32: Convert to use GPIO descriptors
    regulator: qcom_smd: Add pmi8994 regulator support
    regulator: da9063: Fix get_mode() functions to read sleep field
    spi: spi-fsl-lpspi: Replace zero-length array with flexible-array member
    spi: spi-s3c24xx: Replace zero-length array with flexible-array member
    spi: stm32: Fix comments compilation warnings
    spi: atmel-quadspi: Add verbose debug facilities to monitor register accesses
    spi: spi-fsl-dspi: Add support for LS1028A
    spi: spi-fsl-dspi: Move invariant configs out of dspi_transfer_one_message
    spi: spi-fsl-dspi: Fix interrupt-less DMA mode taking an XSPI code path
    spi: spi-fsl-dspi: Avoid NULL pointer in dspi_slave_abort for non-DMA mode
    spi: spi-fsl-dspi: Replace interruptible wait queue with a simple completion
    spi: spi-fsl-dspi: Protect against races on dspi->words_in_flight
    spi: spi-fsl-dspi: Avoid reading more data than written in EOQ mode
    spi: spi-fsl-dspi: Fix bits-per-word acceleration in DMA mode
    spi: spi-fsl-dspi: Fix little endian access to PUSHR CMD and TXDATA
    spi: spi-fsl-dspi: Don't access reserved fields in SPI_MCR
    regulator: driver.h: fix regulator_map_* function names
    regulator: da9063: fix suspend
    spi: mxs: Drop GPIO includes
    ...

    Linus Torvalds