06 Jul, 2021

2 commits

  • Pull USB / Thunderbolt updates from Greg KH:
    "Here is the big set of USB and Thunderbolt patches for 5.14-rc1.

    Nothing major here just lots of little changes for new hardware and
    features. Highlights are:

    - more USB 4 support added to the thunderbolt core

    - build warning fixes all over the place

    - usb-serial driver updates and new device support

    - mtu3 driver updates

    - gadget driver updates

    - dwc3 driver updates

    - dwc2 driver updates

    - isp1760 host driver updates

    - musb driver updates

    - lots of other tiny things.

    Full details are in the shortlog.

    All of these have been in linux-next for a while now with no reported
    issues"

    * tag 'usb-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (223 commits)
    phy: qcom-qusb2: Add configuration for SM4250 and SM6115
    dt-bindings: phy: qcom,qusb2: document sm4250/6115 compatible
    dt-bindings: usb: qcom,dwc3: Add bindings for sm6115/4250
    USB: cdc-acm: blacklist Heimann USB Appset device
    usb: xhci-mtk: allow multiple Start-Split in a microframe
    usb: ftdi-elan: remove redundant continue statement in a while-loop
    usb: class: cdc-wdm: return the correct errno code
    xhci: remove redundant continue statement
    usb: dwc3: Fix debugfs creation flow
    usb: gadget: hid: fix error return code in hid_bind()
    usb: gadget: eem: fix echo command packet response issue
    usb: gadget: f_hid: fix endianness issue with descriptors
    Revert "USB: misc: Add onboard_usb_hub driver"
    Revert "of/platform: Add stubs for of_platform_device_create/destroy()"
    Revert "usb: host: xhci-plat: Create platform device for onboard hubs in probe()"
    Revert "arm64: dts: qcom: sc7180-trogdor: Add nodes for onboard USB hub"
    xhci: solve a double free problem while doing s4
    xhci: handle failed buffer copy to URB sg list and fix a W=1 copiler warning
    xhci: Add adaptive interrupt rate for isoch TRBs with XHCI_AVOID_BEI quirk
    xhci: Remove unused defines for ERST_SIZE and ERST_ENTRIES
    ...

    Linus Torvalds
     
  • Pull tty / serial updates from Greg KH:
    "Here is the big set of tty and serial driver patches for 5.14-rc1.

    A bit more than normal, but nothing major, lots of cleanups.
    Highlights are:

    - lots of tty api cleanups and mxser driver cleanups from Jiri

    - build warning fixes

    - various serial driver updates

    - coding style cleanups

    - various tty driver minor fixes and updates

    - removal of broken and disable r3964 line discipline (finally!)

    All of these have been in linux-next for a while with no reported
    issues"

    * tag 'tty-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (227 commits)
    serial: mvebu-uart: remove unused member nb from struct mvebu_uart
    arm64: dts: marvell: armada-37xx: Fix reg for standard variant of UART
    dt-bindings: mvebu-uart: fix documentation
    serial: mvebu-uart: correctly calculate minimal possible baudrate
    serial: mvebu-uart: do not allow changing baudrate when uartclk is not available
    serial: mvebu-uart: fix calculation of clock divisor
    tty: make linux/tty_flip.h self-contained
    serial: Prefer unsigned int to bare use of unsigned
    serial: 8250: 8250_omap: Fix possible interrupt storm on K3 SoCs
    serial: qcom_geni_serial: use DT aliases according to DT bindings
    Revert "tty: serial: Add UART driver for Cortina-Access platform"
    tty: serial: Add UART driver for Cortina-Access platform
    MAINTAINERS: add me back as mxser maintainer
    mxser: Documentation, fix typos
    mxser: Documentation, make the docs up-to-date
    mxser: Documentation, remove traces of callout device
    mxser: introduce mxser_16550A_or_MUST helper
    mxser: rename flags to old_speed in mxser_set_serial_info
    mxser: use port variable in mxser_set_serial_info
    mxser: access info->MCR under info->slock
    ...

    Linus Torvalds
     

05 Jul, 2021

1 commit

  • …git/paulmck/linux-rcu

    Pull RCU updates from Paul McKenney:

    - Bitmap parsing support for "all" as an alias for all bits

    - Documentation updates

    - Miscellaneous fixes, including some that overlap into mm and lockdep

    - kvfree_rcu() updates

    - mem_dump_obj() updates, with acks from one of the slab-allocator
    maintainers

    - RCU NOCB CPU updates, including limited deoffloading

    - SRCU updates

    - Tasks-RCU updates

    - Torture-test updates

    * 'core-rcu-2021.07.04' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (78 commits)
    tasks-rcu: Make show_rcu_tasks_gp_kthreads() be static inline
    rcu-tasks: Make ksoftirqd provide RCU Tasks quiescent states
    rcu: Add missing __releases() annotation
    rcu: Remove obsolete rcu_read_unlock() deadlock commentary
    rcu: Improve comments describing RCU read-side critical sections
    rcu: Create an unrcu_pointer() to remove __rcu from a pointer
    srcu: Early test SRCU polling start
    rcu: Fix various typos in comments
    rcu/nocb: Unify timers
    rcu/nocb: Prepare for fine-grained deferred wakeup
    rcu/nocb: Only cancel nocb timer if not polling
    rcu/nocb: Delete bypass_timer upon nocb_gp wakeup
    rcu/nocb: Cancel nocb_timer upon nocb_gp wakeup
    rcu/nocb: Allow de-offloading rdp leader
    rcu/nocb: Directly call __wake_nocb_gp() from bypass timer
    rcu: Don't penalize priority boosting when there is nothing to boost
    rcu: Point to documentation of ordering guarantees
    rcu: Make rcu_gp_cleanup() be noinline for tracing
    rcu: Restrict RCU_STRICT_GRACE_PERIOD to at most four CPUs
    rcu: Make show_rcu_gp_kthreads() dump rcu_node structures blocking GP
    ...

    Linus Torvalds
     

04 Jul, 2021

1 commit

  • Pull tracing updates from Steven Rostedt:

    - Added option for per CPU threads to the hwlat tracer

    - Have hwlat tracer handle hotplug CPUs

    - New tracer: osnoise, that detects latency caused by interrupts,
    softirqs and scheduling of other tasks.

    - Added timerlat tracer that creates a thread and measures in detail
    what sources of latency it has for wake ups.

    - Removed the "success" field of the sched_wakeup trace event. This has
    been hardcoded as "1" since 2015, no tooling should be looking at it
    now. If one exists, we can revert this commit, fix that tool and try
    to remove it again in the future.

    - tgid mapping fixed to handle more than PID_MAX_DEFAULT pids/tgids.

    - New boot command line option "tp_printk_stop", as tp_printk causes
    trace events to write to console. When user space starts, this can
    easily live lock the system. Having a boot option to stop just after
    boot up is useful to prevent that from happening.

    - Have ftrace_dump_on_oops boot command line option take numbers that
    match the numbers shown in /proc/sys/kernel/ftrace_dump_on_oops.

    - Bootconfig clean ups, fixes and enhancements.

    - New ktest script that tests bootconfig options.

    - Add tracepoint_probe_register_may_exist() to register a tracepoint
    without triggering a WARN*() if it already exists. BPF has a path
    from user space that can do this. All other paths are considered a
    bug.

    - Small clean ups and fixes

    * tag 'trace-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (49 commits)
    tracing: Resize tgid_map to pid_max, not PID_MAX_DEFAULT
    tracing: Simplify & fix saved_tgids logic
    treewide: Add missing semicolons to __assign_str uses
    tracing: Change variable type as bool for clean-up
    trace/timerlat: Fix indentation on timerlat_main()
    trace/osnoise: Make 'noise' variable s64 in run_osnoise()
    tracepoint: Add tracepoint_probe_register_may_exist() for BPF tracing
    tracing: Fix spelling in osnoise tracer "interferences" -> "interference"
    Documentation: Fix a typo on trace/osnoise-tracer
    trace/osnoise: Fix return value on osnoise_init_hotplug_support
    trace/osnoise: Make interval u64 on osnoise_main
    trace/osnoise: Fix 'no previous prototype' warnings
    tracing: Have osnoise_main() add a quiescent state for task rcu
    seq_buf: Make trace_seq_putmem_hex() support data longer than 8
    seq_buf: Fix overflow in seq_buf_putmem_hex()
    trace/osnoise: Support hotplug operations
    trace/hwlat: Support hotplug operations
    trace/hwlat: Protect kdata->kthread with get/put_online_cpus
    trace: Add timerlat tracer
    trace: Add osnoise tracer
    ...

    Linus Torvalds
     

03 Jul, 2021

2 commits

  • Pull iommu updates from Joerg Roedel:

    - SMMU Updates from Will Deacon:

    - SMMUv3:
    - Support stalling faults for platform devices
    - Decrease defaults sizes for the event and PRI queues
    - SMMUv2:
    - Support for a new '->probe_finalize' hook, needed by Nvidia
    - Even more Qualcomm compatible strings
    - Avoid Adreno TTBR1 quirk for DB820C platform

    - Intel VT-d updates from Lu Baolu:

    - Convert Intel IOMMU to use sva_lib helpers in iommu core
    - ftrace and debugfs supports for page fault handling
    - Support asynchronous nested capabilities
    - Various misc cleanups

    - Support for new VIOT ACPI table to make the VirtIO IOMMU
    available on x86

    - Add the amd_iommu=force_enable command line option to enable
    the IOMMU on platforms where they are known to cause problems

    - Support for version 2 of the Rockchip IOMMU

    - Various smaller fixes, cleanups and refactorings

    * tag 'iommu-updates-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (66 commits)
    iommu/virtio: Enable x86 support
    iommu/dma: Pass address limit rather than size to iommu_setup_dma_ops()
    ACPI: Add driver for the VIOT table
    ACPI: Move IOMMU setup code out of IORT
    ACPI: arm64: Move DMA setup operations out of IORT
    iommu/vt-d: Fix dereference of pointer info before it is null checked
    iommu: Update "iommu.strict" documentation
    iommu/arm-smmu: Check smmu->impl pointer before dereferencing
    iommu/arm-smmu-v3: Remove unnecessary oom message
    iommu/arm-smmu: Fix arm_smmu_device refcount leak in address translation
    iommu/arm-smmu: Fix arm_smmu_device refcount leak when arm_smmu_rpm_get fails
    iommu/vt-d: Fix linker error on 32-bit
    iommu/vt-d: No need to typecast
    iommu/vt-d: Define counter explicitly as unsigned int
    iommu/vt-d: Remove unnecessary braces
    iommu/vt-d: Removed unused iommu_count in dmar domain
    iommu/vt-d: Use bitfields for DMAR capabilities
    iommu/vt-d: Use DEVICE_ATTR_RO macro
    iommu/vt-d: Fix out-bounds-warning in intel/svm.c
    iommu/vt-d: Add PRQ handling latency sampling
    ...

    Linus Torvalds
     
  • Merge more updates from Andrew Morton:
    "190 patches.

    Subsystems affected by this patch series: mm (hugetlb, userfaultfd,
    vmscan, kconfig, proc, z3fold, zbud, ras, mempolicy, memblock,
    migration, thp, nommu, kconfig, madvise, memory-hotplug, zswap,
    zsmalloc, zram, cleanups, kfence, and hmm), procfs, sysctl, misc,
    core-kernel, lib, lz4, checkpatch, init, kprobes, nilfs2, hfs,
    signals, exec, kcov, selftests, compress/decompress, and ipc"

    * emailed patches from Andrew Morton : (190 commits)
    ipc/util.c: use binary search for max_idx
    ipc/sem.c: use READ_ONCE()/WRITE_ONCE() for use_global_lock
    ipc: use kmalloc for msg_queue and shmid_kernel
    ipc sem: use kvmalloc for sem_undo allocation
    lib/decompressors: remove set but not used variabled 'level'
    selftests/vm/pkeys: exercise x86 XSAVE init state
    selftests/vm/pkeys: refill shadow register after implicit kernel write
    selftests/vm/pkeys: handle negative sys_pkey_alloc() return code
    selftests/vm/pkeys: fix alloc_random_pkey() to make it really, really random
    kcov: add __no_sanitize_coverage to fix noinstr for all architectures
    exec: remove checks in __register_bimfmt()
    x86: signal: don't do sas_ss_reset() until we are certain that sigframe won't be abandoned
    hfsplus: report create_date to kstat.btime
    hfsplus: remove unnecessary oom message
    nilfs2: remove redundant continue statement in a while-loop
    kprobes: remove duplicated strong free_insn_page in x86 and s390
    init: print out unknown kernel parameters
    checkpatch: do not complain about positive return values starting with EPOLL
    checkpatch: improve the indented label test
    checkpatch: scripts/spdxcheck.py now requires python3
    ...

    Linus Torvalds
     

02 Jul, 2021

1 commit

  • Pull cgroup updates from Tejun Heo:

    - cgroup.kill is added which implements atomic killing of the whole
    subtree.

    Down the line, this should be able to replace the multiple userland
    implementations of "keep killing till empty".

    - PSI can now be turned off at boot time to avoid overhead for
    configurations which don't care about PSI.

    * 'for-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
    cgroup: make per-cgroup pressure stall tracking configurable
    cgroup: Fix kernel-doc
    cgroup: inline cgroup_task_freeze()
    tests/cgroup: test cgroup.kill
    tests/cgroup: move cg_wait_for(), cg_prepare_for_wait()
    tests/cgroup: use cgroup.kill in cg_killall()
    docs/cgroup: add entry for cgroup.kill
    cgroup: introduce cgroup.kill

    Linus Torvalds
     

01 Jul, 2021

9 commits

  • Now that the feature is fully implemented (the faulting path hooks exist
    so userspace is notified, and the ioctl to resolve such faults is
    available), advertise this as a supported feature.

    Link: https://lkml.kernel.org/r/20210503180737.2487560-6-axelrasmussen@google.com
    Signed-off-by: Axel Rasmussen
    Acked-by: Hugh Dickins
    Acked-by: Peter Xu
    Cc: Alexander Viro
    Cc: Andrea Arcangeli
    Cc: Brian Geffon
    Cc: "Dr . David Alan Gilbert"
    Cc: Jerome Glisse
    Cc: Joe Perches
    Cc: Kirill A. Shutemov
    Cc: Lokesh Gidra
    Cc: Mike Kravetz
    Cc: Mike Rapoport
    Cc: Mina Almasry
    Cc: Oliver Upton
    Cc: Shaohua Li
    Cc: Shuah Khan
    Cc: Stephen Rothwell
    Cc: Wang Qing
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Axel Rasmussen
     
  • Export the PTE/PMD status of uffd-wp to pagemap too.

    Link: https://lkml.kernel.org/r/20210428225030.9708-6-peterx@redhat.com
    Signed-off-by: Peter Xu
    Cc: Alexander Viro
    Cc: Andrea Arcangeli
    Cc: Axel Rasmussen
    Cc: Brian Geffon
    Cc: "Dr . David Alan Gilbert"
    Cc: Hugh Dickins
    Cc: Jerome Glisse
    Cc: Joe Perches
    Cc: Kirill A. Shutemov
    Cc: Lokesh Gidra
    Cc: Mike Kravetz
    Cc: Mike Rapoport
    Cc: Mina Almasry
    Cc: Oliver Upton
    Cc: Shaohua Li
    Cc: Shuah Khan
    Cc: Stephen Rothwell
    Cc: Wang Qing
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Xu
     
  • When using HUGETLB_PAGE_FREE_VMEMMAP, the freeing unused vmemmap pages
    associated with each HugeTLB page is default off. Now the vmemmap is PMD
    mapped. So there is no side effect when this feature is enabled with no
    HugeTLB pages in the system. Someone may want to enable this feature in
    the compiler time instead of using boot command line. So add a config to
    make it default on when someone do not want to enable it via command line.

    Link: https://lkml.kernel.org/r/20210616094915.34432-4-songmuchun@bytedance.com
    Signed-off-by: Muchun Song
    Cc: Chen Huang
    Cc: David Hildenbrand
    Cc: Jonathan Corbet
    Cc: Michal Hocko
    Cc: Mike Kravetz
    Cc: Oscar Salvador
    Cc: Xiongchun Duan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Muchun Song
     
  • The preparation of splitting huge PMD mapping of vmemmap pages is ready,
    so switch the mapping from PTE to PMD.

    Link: https://lkml.kernel.org/r/20210616094915.34432-3-songmuchun@bytedance.com
    Signed-off-by: Muchun Song
    Reviewed-by: Mike Kravetz
    Cc: Chen Huang
    Cc: David Hildenbrand
    Cc: Jonathan Corbet
    Cc: Michal Hocko
    Cc: Oscar Salvador
    Cc: Xiongchun Duan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Muchun Song
     
  • The parameter of memory_hotplug.memmap_on_memory is not compatible with
    hugetlb_free_vmemmap. So disable it when hugetlb_free_vmemmap is enabled.

    [akpm@linux-foundation.org: remove unneeded include, per Oscar]

    Link: https://lkml.kernel.org/r/20210510030027.56044-9-songmuchun@bytedance.com
    Signed-off-by: Muchun Song
    Acked-by: Mike Kravetz
    Cc: Alexander Viro
    Cc: Andy Lutomirski
    Cc: Anshuman Khandual
    Cc: Balbir Singh
    Cc: Barry Song
    Cc: Bodeddula Balasubramaniam
    Cc: Borislav Petkov
    Cc: Chen Huang
    Cc: Dave Hansen
    Cc: David Hildenbrand
    Cc: David Rientjes
    Cc: HORIGUCHI NAOYA
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Joao Martins
    Cc: Joerg Roedel
    Cc: Jonathan Corbet
    Cc: Matthew Wilcox
    Cc: Miaohe Lin
    Cc: Michal Hocko
    Cc: Mina Almasry
    Cc: Oliver Neukum
    Cc: Oscar Salvador
    Cc: Paul E. McKenney
    Cc: Pawan Gupta
    Cc: Peter Zijlstra
    Cc: Randy Dunlap
    Cc: Thomas Gleixner
    Cc: Xiongchun Duan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Muchun Song
     
  • Add a kernel parameter hugetlb_free_vmemmap to enable the feature of
    freeing unused vmemmap pages associated with each hugetlb page on boot.

    We disable PMD mapping of vmemmap pages for x86-64 arch when this feature
    is enabled. Because vmemmap_remap_free() depends on vmemmap being base
    page mapped.

    Link: https://lkml.kernel.org/r/20210510030027.56044-8-songmuchun@bytedance.com
    Signed-off-by: Muchun Song
    Reviewed-by: Oscar Salvador
    Reviewed-by: Barry Song
    Reviewed-by: Miaohe Lin
    Tested-by: Chen Huang
    Tested-by: Bodeddula Balasubramaniam
    Reviewed-by: Mike Kravetz
    Cc: Alexander Viro
    Cc: Andy Lutomirski
    Cc: Anshuman Khandual
    Cc: Balbir Singh
    Cc: Borislav Petkov
    Cc: Dave Hansen
    Cc: David Hildenbrand
    Cc: David Rientjes
    Cc: HORIGUCHI NAOYA
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Joao Martins
    Cc: Joerg Roedel
    Cc: Jonathan Corbet
    Cc: Matthew Wilcox
    Cc: Michal Hocko
    Cc: Mina Almasry
    Cc: Oliver Neukum
    Cc: Paul E. McKenney
    Cc: Pawan Gupta
    Cc: Peter Zijlstra
    Cc: Randy Dunlap
    Cc: Thomas Gleixner
    Cc: Xiongchun Duan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Muchun Song
     
  • When we free a HugeTLB page to the buddy allocator, we need to allocate
    the vmemmap pages associated with it. However, we may not be able to
    allocate the vmemmap pages when the system is under memory pressure. In
    this case, we just refuse to free the HugeTLB page. This changes behavior
    in some corner cases as listed below:

    1) Failing to free a huge page triggered by the user (decrease nr_pages).

    User needs to try again later.

    2) Failing to free a surplus huge page when freed by the application.

    Try again later when freeing a huge page next time.

    3) Failing to dissolve a free huge page on ZONE_MOVABLE via
    offline_pages().

    This can happen when we have plenty of ZONE_MOVABLE memory, but
    not enough kernel memory to allocate vmemmmap pages. We may even
    be able to migrate huge page contents, but will not be able to
    dissolve the source huge page. This will prevent an offline
    operation and is unfortunate as memory offlining is expected to
    succeed on movable zones. Users that depend on memory hotplug
    to succeed for movable zones should carefully consider whether the
    memory savings gained from this feature are worth the risk of
    possibly not being able to offline memory in certain situations.

    4) Failing to dissolve a huge page on CMA/ZONE_MOVABLE via
    alloc_contig_range() - once we have that handling in place. Mainly
    affects CMA and virtio-mem.

    Similar to 3). virito-mem will handle migration errors gracefully.
    CMA might be able to fallback on other free areas within the CMA
    region.

    Vmemmap pages are allocated from the page freeing context. In order for
    those allocations to be not disruptive (e.g. trigger oom killer)
    __GFP_NORETRY is used. hugetlb_lock is dropped for the allocation because
    a non sleeping allocation would be too fragile and it could fail too
    easily under memory pressure. GFP_ATOMIC or other modes to access memory
    reserves is not used because we want to prevent consuming reserves under
    heavy hugetlb freeing.

    [mike.kravetz@oracle.com: fix dissolve_free_huge_page use of tail/head page]
    Link: https://lkml.kernel.org/r/20210527231225.226987-1-mike.kravetz@oracle.com
    [willy@infradead.org: fix alloc_vmemmap_page_list documentation warning]
    Link: https://lkml.kernel.org/r/20210615200242.1716568-6-willy@infradead.org

    Link: https://lkml.kernel.org/r/20210510030027.56044-7-songmuchun@bytedance.com
    Signed-off-by: Muchun Song
    Signed-off-by: Mike Kravetz
    Signed-off-by: Matthew Wilcox (Oracle)
    Reviewed-by: Mike Kravetz
    Reviewed-by: Oscar Salvador
    Cc: Alexander Viro
    Cc: Andy Lutomirski
    Cc: Anshuman Khandual
    Cc: Balbir Singh
    Cc: Barry Song
    Cc: Bodeddula Balasubramaniam
    Cc: Borislav Petkov
    Cc: Chen Huang
    Cc: Dave Hansen
    Cc: David Hildenbrand
    Cc: David Rientjes
    Cc: HORIGUCHI NAOYA
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Joao Martins
    Cc: Joerg Roedel
    Cc: Jonathan Corbet
    Cc: Matthew Wilcox
    Cc: Miaohe Lin
    Cc: Michal Hocko
    Cc: Mina Almasry
    Cc: Oliver Neukum
    Cc: Paul E. McKenney
    Cc: Pawan Gupta
    Cc: Peter Zijlstra
    Cc: Randy Dunlap
    Cc: Thomas Gleixner
    Cc: Xiongchun Duan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Muchun Song
     
  • …/device-mapper/linux-dm

    Pull device mapper updates from Mike Snitzer:

    - Various DM persistent-data library improvements and fixes that
    benefit both the DM thinp and cache targets.

    - A few small DM kcopyd efficiency improvements.

    - Significant zoned related block core, DM core and DM zoned target
    changes that culminate with adding zoned append emulation (which is
    required to properly fix DM crypt's zoned support).

    - Various DM writecache target changes that improve efficiency. Adds an
    optional "metadata_only" feature that only promotes bios flagged with
    REQ_META. But the most significant improvement is writecache's
    ability to pause writeback, for a confiurable time, if/when the
    working set is larger than the cache (and the cache is full) -- this
    ensures performance is no worse than the slower origin device.

    * tag 'for-5.14/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (35 commits)
    dm writecache: make writeback pause configurable
    dm writecache: pause writeback if cache full and origin being written directly
    dm io tracker: factor out IO tracker
    dm btree remove: assign new_root only when removal succeeds
    dm zone: fix dm_revalidate_zones() memory allocation
    dm ps io affinity: remove redundant continue statement
    dm writecache: add optional "metadata_only" parameter
    dm writecache: add "cleaner" and "max_age" to Documentation
    dm writecache: write at least 4k when committing
    dm writecache: flush origin device when writing and cache is full
    dm writecache: have ssd writeback wait if the kcopyd workqueue is busy
    dm writecache: use list_move instead of list_del/list_add in writecache_writeback()
    dm writecache: commit just one block, not a full page
    dm writecache: remove unused gfp_t argument from wc_add_block()
    dm crypt: Fix zoned block device support
    dm: introduce zone append emulation
    dm: rearrange core declarations for extended use from dm-zone.c
    block: introduce BIO_ZONE_WRITE_LOCKED bio flag
    block: introduce bio zone helpers
    block: improve handling of all zones reset operation
    ...

    Linus Torvalds
     
  • Pull core block updates from Jens Axboe:

    - disk events cleanup (Christoph)

    - gendisk and request queue allocation simplifications (Christoph)

    - bdev_disk_changed cleanups (Christoph)

    - IO priority improvements (Bart)

    - Chained bio completion trace fix (Edward)

    - blk-wbt fixes (Jan)

    - blk-wbt enable/disable fix (Zhang)

    - Scheduler dispatch improvements (Jan, Ming)

    - Shared tagset scheduler improvements (John)

    - BFQ updates (Paolo, Luca, Pietro)

    - BFQ lock inversion fix (Jan)

    - Documentation improvements (Kir)

    - CLONE_IO block cgroup fix (Tejun)

    - Remove of ancient and deprecated block dump feature (zhangyi)

    - Discard merge fix (Ming)

    - Misc fixes or followup fixes (Colin, Damien, Dan, Long, Max, Thomas,
    Yang)

    * tag 'for-5.14/block-2021-06-29' of git://git.kernel.dk/linux-block: (129 commits)
    block: fix discard request merge
    block/mq-deadline: Remove a WARN_ON_ONCE() call
    blk-mq: update hctx->dispatch_busy in case of real scheduler
    blk: Fix lock inversion between ioc lock and bfqd lock
    bfq: Remove merged request already in bfq_requests_merged()
    block: pass a gendisk to bdev_disk_changed
    block: move bdev_disk_changed
    block: add the events* attributes to disk_attrs
    block: move the disk events code to a separate file
    block: fix trace completion for chained bio
    block/partitions/msdos: Fix typo inidicator -> indicator
    block, bfq: reset waker pointer with shared queues
    block, bfq: check waker only for queues with no in-flight I/O
    block, bfq: avoid delayed merge of async queues
    block, bfq: boost throughput by extending queue-merging times
    block, bfq: consider also creation time in delayed stable merge
    block, bfq: fix delayed stable merge check
    block, bfq: let also stably merged queues enjoy weight raising
    blk-wbt: make sure throttle is enabled properly
    blk-wbt: introduce a new disable state to prevent false positive by rwb_enabled()
    ...

    Linus Torvalds
     

30 Jun, 2021

10 commits

  • Merge misc updates from Andrew Morton:
    "191 patches.

    Subsystems affected by this patch series: kthread, ia64, scripts,
    ntfs, squashfs, ocfs2, kernel/watchdog, and mm (gup, pagealloc, slab,
    slub, kmemleak, dax, debug, pagecache, gup, swap, memcg, pagemap,
    mprotect, bootmem, dma, tracing, vmalloc, kasan, initialization,
    pagealloc, and memory-failure)"

    * emailed patches from Andrew Morton : (191 commits)
    mm,hwpoison: make get_hwpoison_page() call get_any_page()
    mm,hwpoison: send SIGBUS with error virutal address
    mm/page_alloc: split pcp->high across all online CPUs for cpuless nodes
    mm/page_alloc: allow high-order pages to be stored on the per-cpu lists
    mm: replace CONFIG_FLAT_NODE_MEM_MAP with CONFIG_FLATMEM
    mm: replace CONFIG_NEED_MULTIPLE_NODES with CONFIG_NUMA
    docs: remove description of DISCONTIGMEM
    arch, mm: remove stale mentions of DISCONIGMEM
    mm: remove CONFIG_DISCONTIGMEM
    m68k: remove support for DISCONTIGMEM
    arc: remove support for DISCONTIGMEM
    arc: update comment about HIGHMEM implementation
    alpha: remove DISCONTIGMEM and NUMA
    mm/page_alloc: move free_the_page
    mm/page_alloc: fix counting of managed_pages
    mm/page_alloc: improve memmap_pages dbg msg
    mm: drop SECTION_SHIFT in code comments
    mm/page_alloc: introduce vm.percpu_pagelist_high_fraction
    mm/page_alloc: limit the number of pages on PCP lists when reclaim is active
    mm/page_alloc: scale the number of pages that are batch freed
    ...

    Linus Torvalds
     
  • Pull ACPI updates from Rafael Wysocki:
    "These update the ACPICA code in the kernel to the 20210604 upstream
    revision, add preliminary support for the Platform Runtime Mechanism
    (PRM), address issues related to the handling of device dependencies
    in the ACPI device eunmeration code, improve the tracking of ACPI
    power resource states, improve the ACPI support for suspend-to-idle on
    AMD systems, continue the unification of message printing in the ACPI
    code, address assorted issues and clean up the code in a number of
    places.

    Specifics:

    - Update ACPICA code in the kernel to upstrea revision 20210604
    including the following changes:

    - Add defines for the CXL Host Bridge Structureand and add the
    CFMWS structure definition to CEDT (Alison Schofield).
    - iASL: Finish support for the IVRS ACPI table (Bob Moore).
    - iASL: Add support for the SVKL table (Bob Moore).
    - iASL: Add full support for RGRT ACPI table (Bob Moore).
    - iASL: Add support for the BDAT ACPI table (Bob Moore).
    - iASL: add disassembler support for PRMT (Erik Kaneda).
    - Fix memory leak caused by _CID repair function (Erik Kaneda).
    - Add support for PlatformRtMechanism OpRegion (Erik Kaneda).
    - Add PRMT module header to facilitate parsing (Erik Kaneda).
    - Add _PLD panel positions (Fabian Wüthrich).
    - MADT: add Multiprocessor Wakeup Mailbox Structure and the SVKL
    table headers (Kuppuswamy Sathyanarayanan).
    - Use ACPI_FALLTHROUGH (Wei Ming Chen).

    - Add preliminary support for the Platform Runtime Mechanism (PRM) to
    allow the AML interpreter to call PRM functions (Erik Kaneda).

    - Address some issues related to the handling of device dependencies
    reported by _DEP in the ACPI device enumeration code and clean up
    some related pieces of it (Rafael Wysocki).

    - Improve the tracking of states of ACPI power resources (Rafael
    Wysocki).

    - Improve ACPI support for suspend-to-idle on AMD systems (Alex
    Deucher, Mario Limonciello, Pratik Vishwakarma).

    - Continue the unification and cleanup of message printing in the
    ACPI code (Hanjun Guo, Heiner Kallweit).

    - Fix possible buffer overrun issue with the description_show() sysfs
    attribute method (Krzysztof Wilczyński).

    - Improve the acpi_mask_gpe kernel command line parameter handling
    and clean up the core ACPI code related to sysfs (Andy Shevchenko,
    Baokun Li, Clayton Casciato).

    - Postpone bringing devices in the general ACPI PM domain to D0
    during resume from system-wide suspend until they are really needed
    (Dmitry Torokhov).

    - Make the ACPI processor driver fix up C-state latency if not
    ordered (Mario Limonciello).

    - Add support for identifying devices depening on the given one that
    are not its direct descendants with the help of _DEP (Daniel
    Scally).

    - Extend the checks related to ACPI IRQ overrides on x86 in order to
    avoid false-positives (Hui Wang).

    - Add battery DPTF participant for Intel SoCs (Sumeet Pawnikar).

    - Rearrange the ACPI fan driver and device power management code to
    use a common list of device IDs (Rafael Wysocki).

    - Fix clang CFI violation in the ACPI BGRT table parsing code and
    clean it up (Nathan Chancellor).

    - Add GPE-related quirks for some laptops to the EC driver (Chris
    Chiu, Zhang Rui).

    - Make the ACPI PPTT table parsing code populate the cache-id value
    if present in the firmware (James Morse).

    - Remove redundant clearing of context->ret.pointer from
    acpi_run_osc() (Hans de Goede).

    - Add missing acpi_put_table() in acpi_init_fpdt() (Jing Xiangfeng).

    - Make ACPI APEI handle ARM Processor Error CPER records like Memory
    Error ones to avoid user space task lockups (Xiaofei Tan).

    - Stop warning about disabled ACPI in APEI (Jon Hunter).

    - Fix fall-through warning for Clang in the SBSHC driver (Gustavo A.
    R. Silva).

    - Add custom DSDT file as Makefile prerequisite (Richard Fitzgerald).

    - Initialize local variable to avoid garbage being returned (Colin
    Ian King).

    - Simplify assorted pieces of code, address assorted coding style and
    documentation issues and comment typos (Baokun Li, Christophe
    JAILLET, Clayton Casciato, Liu Shixin, Shaokun Zhang, Wei Yongjun,
    Yang Li, Zhen Lei)"

    * tag 'acpi-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (97 commits)
    ACPI: PM: postpone bringing devices to D0 unless we need them
    ACPI: tables: Add custom DSDT file as makefile prerequisite
    ACPI: bgrt: Use sysfs_emit
    ACPI: bgrt: Fix CFI violation
    ACPI: EC: trust DSDT GPE for certain HP laptop
    ACPI: scan: Simplify acpi_table_events_fn()
    ACPI: PM: Adjust behavior for field problems on AMD systems
    ACPI: PM: s2idle: Add support for new Microsoft UUID
    ACPI: PM: s2idle: Add support for multiple func mask
    ACPI: PM: s2idle: Refactor common code
    ACPI: PM: s2idle: Use correct revision id
    ACPI: sysfs: Remove tailing return statement in void function
    ACPI: sysfs: Use __ATTR_RO() and __ATTR_RW() macros
    ACPI: sysfs: Sort headers alphabetically
    ACPI: sysfs: Refactor param_get_trace_state() to drop dead code
    ACPI: sysfs: Unify pattern of memory allocations
    ACPI: sysfs: Allow bitmap list to be supplied to acpi_mask_gpe
    ACPI: sysfs: Make sparse happy about address space in use
    ACPI: scan: Fix race related to dropping dependencies
    ACPI: scan: Reorganize acpi_device_add()
    ...

    Linus Torvalds
     
  • Pull power management updates from Rafael Wysocki:
    "These add hybrid processors support to the intel_pstate driver and
    make it work with more processor models when HWP is disabled, make the
    intel_idle driver use special C6 idle state paremeters when package
    C-states are disabled, add cooling support to the tegra30 devfreq
    driver, rework the TEO (timer events oriented) cpuidle governor,
    extend the OPP (operating performance points) framework to use the
    required-opps DT property in more cases, fix some issues and clean up
    a number of assorted pieces of code.

    Specifics:

    - Make intel_pstate support hybrid processors using abstract
    performance units in the HWP interface (Rafael Wysocki).

    - Add Icelake servers and Cometlake support in no-HWP mode to
    intel_pstate (Giovanni Gherdovich).

    - Make cpufreq_online() error path be consistent with the CPU device
    removal path in cpufreq (Rafael Wysocki).

    - Clean up 3 cpufreq drivers and the statistics code (Hailong Liu,
    Randy Dunlap, Shaokun Zhang).

    - Make intel_idle use special idle state parameters for C6 when
    package C-states are disabled (Chen Yu).

    - Rework the TEO (timer events oriented) cpuidle governor to address
    some theoretical shortcomings in it (Rafael Wysocki).

    - Drop unneeded semicolon from the TEO governor (Wan Jiabing).

    - Modify the runtime PM framework to accept unassigned suspend and
    resume callback pointers (Ulf Hansson).

    - Improve pm_runtime_get_sync() documentation (Krzysztof Kozlowski).

    - Improve device performance states support in the generic power
    domains (genpd) framework (Ulf Hansson).

    - Fix some documentation issues in genpd (Yang Yingliang).

    - Make the operating performance points (OPP) framework use the
    required-opps DT property in use cases that are not related to
    genpd (Hsin-Yi Wang).

    - Make lazy_link_required_opp_table() use list_del_init instead of
    list_del/INIT_LIST_HEAD (Yang Yingliang).

    - Simplify wake IRQs handling in the core system-wide sleep support
    code and clean up some coding style inconsistencies in it (Tian
    Tao, Zhen Lei).

    - Add cooling support to the tegra30 devfreq driver and improve its
    DT bindings (Dmitry Osipenko).

    - Fix some assorted issues in the devfreq core and drivers (Chanwoo
    Choi, Dong Aisheng, YueHaibing)"

    * tag 'pm-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (39 commits)
    PM / devfreq: passive: Fix get_target_freq when not using required-opp
    cpufreq: Make cpufreq_online() call driver->offline() on errors
    opp: Allow required-opps to be used for non genpd use cases
    cpuidle: teo: remove unneeded semicolon in teo_select()
    dt-bindings: devfreq: tegra30-actmon: Add cooling-cells
    dt-bindings: devfreq: tegra30-actmon: Convert to schema
    PM / devfreq: userspace: Use DEVICE_ATTR_RW macro
    PM: runtime: Clarify documentation when callbacks are unassigned
    PM: runtime: Allow unassigned ->runtime_suspend|resume callbacks
    PM: runtime: Improve path in rpm_idle() when no callback
    PM: hibernate: remove leading spaces before tabs
    PM: sleep: remove trailing spaces and tabs
    PM: domains: Drop/restore performance state votes for devices at runtime PM
    PM: domains: Return early if perf state is already set for the device
    PM: domains: Split code in dev_pm_genpd_set_performance_state()
    cpuidle: teo: Use kerneldoc documentation in admin-guide
    cpuidle: teo: Rework most recent idle duration values treatment
    cpuidle: teo: Change the main idle state selection logic
    cpuidle: teo: Cosmetic modification of teo_select()
    cpuidle: teo: Cosmetic modifications of teo_update()
    ...

    Linus Torvalds
     
  • Pull timer updates from Thomas Gleixner:
    "Time and clocksource/clockevent related updates:

    Core changes:

    - Infrastructure to support per CPU "broadcast" devices for per CPU
    clockevent devices which stop in deep idle states. This allows us
    to utilize the more efficient architected timer on certain ARM SoCs
    for normal operation instead of permanentely using the slow to
    access SoC specific clockevent device.

    - Print the name of the broadcast/wakeup device in /proc/timer_list

    - Make the clocksource watchdog more robust against delays between
    reading the current active clocksource and the watchdog
    clocksource. Such delays can be caused by NMIs, SMIs and vCPU
    preemption.

    Handle this by reading the watchdog clocksource twice, i.e. before
    and after reading the current active clocksource. In case that the
    two watchdog reads shows an excessive time delta, the read sequence
    is repeated up to 3 times.

    - Improve the debug output and add a test module for the watchdog
    mechanism.

    - Reimplementation of the venerable time64_to_tm() function with a
    faster and significantly smaller version. Straight from the source,
    i.e. the author of the related research paper contributed this!

    Driver changes:

    - No new drivers, not even new device tree bindings!

    - Fixes, improvements and cleanups and all over the place"

    * tag 'timers-core-2021-06-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits)
    time/kunit: Add missing MODULE_LICENSE()
    time: Improve performance of time64_to_tm()
    clockevents: Use list_move() instead of list_del()/list_add()
    clocksource: Print deviation in nanoseconds when a clocksource becomes unstable
    clocksource: Provide kernel module to test clocksource watchdog
    clocksource: Reduce clocksource-skew threshold
    clocksource: Limit number of CPUs checked for clock synchronization
    clocksource: Check per-CPU clock synchronization when marked unstable
    clocksource: Retry clock read if long delays detected
    clockevents: Add missing parameter documentation
    clocksource/drivers/timer-ti-dm: Drop unnecessary restore
    clocksource/arm_arch_timer: Improve Allwinner A64 timer workaround
    clocksource/drivers/arm_global_timer: Remove duplicated argument in arm_global_timer
    clocksource/drivers/arm_global_timer: Make symbol 'gt_clk_rate_change_nb' static
    arm: zynq: don't disable CONFIG_ARM_GLOBAL_TIMER due to CONFIG_CPU_FREQ anymore
    clocksource/drivers/arm_global_timer: Implement rate compensation whenever source clock changes
    clocksource/drivers/ingenic: Rename unreasonable array names
    clocksource/drivers/timer-ti-dm: Save and restore timer TIOCP_CFG
    clocksource/drivers/mediatek: Ack and disable interrupts on suspend
    clocksource/drivers/samsung_pwm: Constify source IO memory
    ...

    Linus Torvalds
     
  • Remove description of DISCONTIGMEM from the "Memory Models" document and
    update VM sysctl description so that it won't mention DISCONIGMEM.

    Link: https://lkml.kernel.org/r/20210608091316.3622-8-rppt@kernel.org
    Signed-off-by: Mike Rapoport
    Acked-by: Arnd Bergmann
    Reviewed-by: David Hildenbrand
    Cc: Geert Uytterhoeven
    Cc: Ivan Kokshaysky
    Cc: Jonathan Corbet
    Cc: Matt Turner
    Cc: Richard Henderson
    Cc: Vineet Gupta
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • This introduces a new sysctl vm.percpu_pagelist_high_fraction. It is
    similar to the old vm.percpu_pagelist_fraction. The old sysctl increased
    both pcp->batch and pcp->high with the higher pcp->high potentially
    reducing zone->lock contention. However, the higher pcp->batch value also
    potentially increased allocation latency while the PCP was refilled. This
    sysctl only adjusts pcp->high so that zone->lock contention is potentially
    reduced but allocation latency during a PCP refill remains the same.

    # grep -E "high:|batch" /proc/zoneinfo | tail -2
    high: 649
    batch: 63

    # sysctl vm.percpu_pagelist_high_fraction=8
    # grep -E "high:|batch" /proc/zoneinfo | tail -2
    high: 35071
    batch: 63

    # sysctl vm.percpu_pagelist_high_fraction=64
    high: 4383
    batch: 63

    # sysctl vm.percpu_pagelist_high_fraction=0
    high: 649
    batch: 63

    [mgorman@techsingularity.net: fix documentation]
    Link: https://lkml.kernel.org/r/20210528151010.GQ30378@techsingularity.net

    Link: https://lkml.kernel.org/r/20210525080119.5455-7-mgorman@techsingularity.net
    Signed-off-by: Mel Gorman
    Acked-by: Dave Hansen
    Acked-by: Vlastimil Babka
    Cc: Hillf Danton
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • Patch series "Calculate pcp->high based on zone sizes and active CPUs", v2.

    The per-cpu page allocator (PCP) is meant to reduce contention on the zone
    lock but the sizing of batch and high is archaic and neither takes the
    zone size into account or the number of CPUs local to a zone. With larger
    zones and more CPUs per node, the contention is getting worse.
    Furthermore, the fact that vm.percpu_pagelist_fraction adjusts both batch
    and high values means that the sysctl can reduce zone lock contention but
    also increase allocation latencies.

    This series disassociates pcp->high from pcp->batch and then scales
    pcp->high based on the size of the local zone with limited impact to
    reclaim and accounting for active CPUs but leaves pcp->batch static. It
    also adapts the number of pages that can be on the pcp list based on
    recent freeing patterns.

    The motivation is partially to adjust to larger memory sizes but is also
    driven by the fact that large batches of page freeing via release_pages()
    often shows zone contention as a major part of the problem. Another is a
    bug report based on an older kernel where a multi-terabyte process can
    takes several minutes to exit. A workaround was to use
    vm.percpu_pagelist_fraction to increase the pcp->high value but testing
    indicated that a production workload could not use the same values because
    of an increase in allocation latencies. Unfortunately, I cannot reproduce
    this test case myself as the multi-terabyte machines are in active use but
    it should alleviate the problem.

    The series aims to address both and partially acts as a pre-requisite.
    pcp only works with order-0 which is useless for SLUB (when using high
    orders) and THP (unconditionally). To store high-order pages on PCP, the
    pcp->high values need to be increased first.

    This patch (of 6):

    The vm.percpu_pagelist_fraction is used to increase the batch and high
    limits for the per-cpu page allocator (PCP). The intent behind the sysctl
    is to reduce zone lock acquisition when allocating/freeing pages but it
    has a problem. While it can decrease contention, it can also increase
    latency on the allocation side due to unreasonably large batch sizes.
    This leads to games where an administrator adjusts
    percpu_pagelist_fraction on the fly to work around contention and
    allocation latency problems.

    This series aims to alleviate the problems with zone lock contention while
    avoiding the allocation-side latency problems. For the purposes of
    review, it's easier to remove this sysctl now and reintroduce a similar
    sysctl later in the series that deals only with pcp->high.

    Link: https://lkml.kernel.org/r/20210525080119.5455-1-mgorman@techsingularity.net
    Link: https://lkml.kernel.org/r/20210525080119.5455-2-mgorman@techsingularity.net
    Signed-off-by: Mel Gorman
    Acked-by: Dave Hansen
    Acked-by: Vlastimil Babka
    Cc: Hillf Danton
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • The macro PAGE_REPORTING_MIN_ORDER is defined as the page reporting
    threshold. It can't be adjusted at runtime.

    This introduces a variable (@page_reporting_order) to replace the marcro
    (PAGE_REPORTING_MIN_ORDER). MAX_ORDER is assigned to it initially,
    meaning the page reporting is disabled. It will be specified by driver if
    valid one is provided. Otherwise, it will fall back to @pageblock_order.
    It's also exported so that the page reporting order can be adjusted at
    runtime.

    Link: https://lkml.kernel.org/r/20210625014710.42954-3-gshan@redhat.com
    Signed-off-by: Gavin Shan
    Suggested-by: David Hildenbrand
    Reviewed-by: Alexander Duyck
    Cc: Anshuman Khandual
    Cc: Catalin Marinas
    Cc: "Michael S. Tsirkin"
    Cc: Will Deacon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gavin Shan
     
  • "watchdog/%u" threads has be replaced by cpu_stop_work. The current
    description is extremely misleading.

    Link: https://lkml.kernel.org/r/1619687073-24686-5-git-send-email-wangqing@vivo.com
    Signed-off-by: Wang Qing
    Reviewed-by: Petr Mladek
    Cc: "Guilherme G. Piccoli"
    Cc: Joe Perches
    Cc: Jonathan Corbet
    Cc: Kees Cook
    Cc: Mauro Carvalho Chehab
    Cc: Qais Yousef
    Cc: Randy Dunlap
    Cc: Santosh Sivaraj
    Cc: Stephen Kitt
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wang Qing
     
  • "watchdog/%u" threads has be replaced by cpu_stop_work. The current
    description is extremely misleading.

    Link: https://lkml.kernel.org/r/1619687073-24686-4-git-send-email-wangqing@vivo.com
    Signed-off-by: Wang Qing
    Reviewed-by: Petr Mladek
    Cc: "Guilherme G. Piccoli"
    Cc: Joe Perches
    Cc: Jonathan Corbet
    Cc: Kees Cook
    Cc: Mauro Carvalho Chehab
    Cc: Qais Yousef
    Cc: Randy Dunlap
    Cc: Santosh Sivaraj
    Cc: Stephen Kitt
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wang Qing
     

29 Jun, 2021

7 commits

  • * pm-cpufreq:
    cpufreq: Make cpufreq_online() call driver->offline() on errors
    cpufreq: loongson2: Remove unused linux/sched.h headers
    cpufreq: sh: Remove unused linux/sched.h headers
    cpufreq: stats: Clean up local variable in cpufreq_stats_create_table()
    cpufreq: intel_pstate: hybrid: Fix build with CONFIG_ACPI unset
    cpufreq: sc520_freq: add 'fallthrough' to one case
    cpufreq: intel_pstate: Add Cometlake support in no-HWP mode
    cpufreq: intel_pstate: Add Icelake servers support in no-HWP mode
    cpufreq: intel_pstate: hybrid: CPU-specific scaling factor
    cpufreq: intel_pstate: hybrid: Avoid exposing two global attributes

    * pm-cpuidle:
    cpuidle: teo: remove unneeded semicolon in teo_select()
    cpuidle: teo: Use kerneldoc documentation in admin-guide
    cpuidle: teo: Rework most recent idle duration values treatment
    cpuidle: teo: Change the main idle state selection logic
    cpuidle: teo: Cosmetic modification of teo_select()
    cpuidle: teo: Cosmetic modifications of teo_update()
    intel_idle: Adjust the SKX C6 parameters if PC6 is disabled

    Rafael J. Wysocki
     
  • Pull pstore updates from Kees Cook:
    "Use normal block device I/O path for pstore/blk. (Christoph Hellwig,
    Kees Cook, Pu Lehui)"

    * tag 'pstore-v5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
    pstore/blk: Include zone in pstore_device_info
    pstore/blk: Fix kerndoc and redundancy on blkdev param
    pstore/blk: Use the normal block device I/O path
    pstore/blk: Move verify_size() macro out of function
    pstore/blk: Improve failure reporting

    Linus Torvalds
     
  • Pull documentation updates from Jonathan Corbet:
    "This was a reasonably active cycle for documentation; this includes:

    - Some kernel-doc cleanups. That script is still regex onslaught from
    hell, but it has gotten a little better.

    - Improvements to the checkpatch docs, which are also used by the
    tool itself.

    - A major update to the pathname lookup documentation.

    - Elimination of :doc: markup, since our automarkup magic can create
    references from filenames without all the extra noise.

    - The flurry of Chinese translation activity continues.

    Plus, of course, the usual collection of updates, typo fixes, and
    warning fixes"

    * tag 'docs-5.14' of git://git.lwn.net/linux: (115 commits)
    docs: path-lookup: use bare function() rather than literals
    docs: path-lookup: update symlink description
    docs: path-lookup: update get_link() ->follow_link description
    docs: path-lookup: update WALK_GET, WALK_PUT desc
    docs: path-lookup: no get_link()
    docs: path-lookup: update i_op->put_link and cookie description
    docs: path-lookup: i_op->follow_link replaced with i_op->get_link
    docs: path-lookup: Add macro name to symlink limit description
    docs: path-lookup: remove filename_mountpoint
    docs: path-lookup: update do_last() part
    docs: path-lookup: update path_mountpoint() part
    docs: path-lookup: update path_to_nameidata() part
    docs: path-lookup: update follow_managed() part
    docs: Makefile: Use CONFIG_SHELL not SHELL
    docs: Take a little noise out of the build process
    docs: x86: avoid using ReST :doc:`foo` markup
    docs: virt: kvm: s390-pv-boot.rst: avoid using ReST :doc:`foo` markup
    docs: userspace-api: landlock.rst: avoid using ReST :doc:`foo` markup
    docs: trace: ftrace.rst: avoid using ReST :doc:`foo` markup
    docs: trace: coresight: coresight.rst: avoid using ReST :doc:`foo` markup
    ...

    Linus Torvalds
     
  • Pull media updates from Mauro Carvalho Chehab:

    - V4L2 core control API was split into separate files

    - New RC maps: tango and tc-90405

    - Hantro driver got support for G2/HEVC decoder

    - av7710 is moving to staging, together with some legacy APIs

    - several cleanups related to compat_ioctl32 code

    - Move the MPEG-2 stateless control type out of staging

    - Address several issues with RPM get logic on media drivers

    - Lots of cleanups, bug fixes and improvements.

    * tag 'media/v5.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (394 commits)
    media: s5p-mfc: Fix display delay control creation
    media: mtk-vpu: on suspend, read/write regs only if vpu is running
    media: video-mux: Skip dangling endpoints
    media: Fix Media Controller API config checks
    media: i2c: rdacm20: Re-work ov10635 reset
    media: i2c: rdacm20: Check return values
    media: i2c: rdacm20: Report camera module name
    media: i2c: rdacm20: Enable noise immunity
    media: i2c: rdacm20: Embed 'serializer' field
    media: i2c: rdacm21: Power up OV10640 before OV490
    media: i2c: rdacm21: Fix OV10640 powerup
    media: i2c: rdacm21: Add delay after OV490 reset
    media: i2c: max9271: Introduce wake_up() function
    media: i2c: max9271: Check max9271_write() return
    media: i2c: max9286: Rework comments in .bound()
    media: i2c: max9286: Define high channel amplitude
    media: i2c: max9286: Cache channel amplitude
    media: i2c: max9286: Rename reverse_channel_mv
    media: i2c: max9286: Adjust parameters indent
    media: hantro: add support for Rockchip RK3036
    ...

    Linus Torvalds
     
  • Commit 95b88f4d71cb953e02206be3c757083601391a0f ("dm writecache: pause
    writeback if cache full and origin being written directly") introduced a
    code that pauses cache flushing if we are issuing writes directly to the
    origin.

    Improve that initial commit by making the timeout code configurable
    (via the option "pause_writeback"). Also change the default from 1s to
    3s because it performed better.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Mike Snitzer

    Mikulas Patocka
     
  • Pull x86 splitlock updates from Ingo Molnar:

    - Add the "ratelimit:N" parameter to the split_lock_detect= boot
    option, to rate-limit the generation of bus-lock exceptions.

    This is both easier on system resources and kinder to offending
    applications than the current policy of outright killing them.

    - Document the split-lock detection feature and its parameters.

    * tag 'x86-splitlock-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    Documentation/x86: Add ratelimit in buslock.rst
    Documentation/admin-guide: Add bus lock ratelimit
    x86/bus_lock: Set rate limit for bus lock
    Documentation/x86: Add buslock.rst

    Linus Torvalds
     
  • Pull x86 cleanups from Ingo Molnar:
    "Misc cleanups & removal of obsolete code"

    * tag 'x86-cleanups-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/sgx: Correct kernel-doc's arg name in sgx_encl_release()
    doc: Remove references to IBM Calgary
    x86/setup: Document that Windows reserves the first MiB
    x86/crash: Remove crash_reserve_low_1M()
    x86/setup: Remove CONFIG_X86_RESERVE_LOW and reservelow= options
    x86/alternative: Align insn bytes vertically
    x86: Fix leftover comment typos
    x86/asm: Simplify __smp_mb() definition
    x86/alternatives: Make the x86nops[] symbol static

    Linus Torvalds
     

26 Jun, 2021

2 commits


25 Jun, 2021

1 commit


22 Jun, 2021

4 commits

  • When the clocksource watchdog marks a clock as unstable, this might
    be due to that clock being unstable or it might be due to delays that
    happen to occur between the reads of the two clocks. It would be good
    to have a way of testing the clocksource watchdog's ability to
    distinguish between these two causes of clock skew and instability.

    Therefore, provide a new clocksource-wdtest module selected by a new
    TEST_CLOCKSOURCE_WATCHDOG Kconfig option. This module has a single module
    parameter named "holdoff" that provides the number of seconds of delay
    before testing should start, which defaults to zero when built as a module
    and to 10 seconds when built directly into the kernel. Very large systems
    that boot slowly may need to increase the value of this module parameter.

    This module uses hand-crafted clocksource structures to do its testing,
    thus avoiding messing up timing for the rest of the kernel and for user
    applications. This module first verifies that the ->uncertainty_margin
    field of the clocksource structures are set sanely. It then tests the
    delay-detection capability of the clocksource watchdog, increasing the
    number of consecutive delays injected, first provoking console messages
    complaining about the delays and finally forcing a clock-skew event.
    Unexpected test results cause at least one WARN_ON_ONCE() console splat.
    If there are no splats, the test has passed. Finally, it fuzzes the
    value returned from a clocksource to test the clocksource watchdog's
    ability to detect time skew.

    This module checks the state of its clocksource after each test, and
    uses WARN_ON_ONCE() to emit a console splat if there are any failures.
    This should enable all types of test frameworks to detect any such
    failures.

    This facility is intended for diagnostic use only, and should be avoided
    on production systems.

    Reported-by: Chris Mason
    Suggested-by: Thomas Gleixner
    Signed-off-by: Paul E. McKenney
    Signed-off-by: Thomas Gleixner
    Tested-by: Feng Tang
    Link: https://lore.kernel.org/r/20210527190124.440372-5-paulmck@kernel.org

    Paul E. McKenney
     
  • Currently, if skew is detected on a clock marked CLOCK_SOURCE_VERIFY_PERCPU,
    that clock is checked on all CPUs. This is thorough, but might not be
    what you want on a system with a few tens of CPUs, let alone a few hundred
    of them.

    Therefore, by default check only up to eight randomly chosen CPUs. Also
    provide a new clocksource.verify_n_cpus kernel boot parameter. A value of
    -1 says to check all of the CPUs, and a non-negative value says to randomly
    select that number of CPUs, without concern about selecting the same CPU
    multiple times. However, make use of a cpumask so that a given CPU will be
    checked at most once.

    Suggested-by: Thomas Gleixner # For verify_n_cpus=1.
    Signed-off-by: Paul E. McKenney
    Signed-off-by: Thomas Gleixner
    Acked-by: Feng Tang
    Link: https://lore.kernel.org/r/20210527190124.440372-3-paulmck@kernel.org

    Paul E. McKenney
     
  • When the clocksource watchdog marks a clock as unstable, this might be due
    to that clock being unstable or it might be due to delays that happen to
    occur between the reads of the two clocks. Yes, interrupts are disabled
    across those two reads, but there are no shortage of things that can delay
    interrupts-disabled regions of code ranging from SMI handlers to vCPU
    preemption. It would be good to have some indication as to why the clock
    was marked unstable.

    Therefore, re-read the watchdog clock on either side of the read from the
    clock under test. If the watchdog clock shows an excessive time delta
    between its pair of reads, the reads are retried.

    The maximum number of retries is specified by a new kernel boot parameter
    clocksource.max_cswd_read_retries, which defaults to three, that is, up to
    four reads, one initial and up to three retries. If more than one retry
    was required, a message is printed on the console (the occasional single
    retry is expected behavior, especially in guest OSes). If the maximum
    number of retries is exceeded, the clock under test will be marked
    unstable. However, the probability of this happening due to various sorts
    of delays is quite small. In addition, the reason (clock-read delays) for
    the unstable marking will be apparent.

    Reported-by: Chris Mason
    Signed-off-by: Paul E. McKenney
    Signed-off-by: Thomas Gleixner
    Acked-by: Feng Tang
    Link: https://lore.kernel.org/r/20210527190124.440372-1-paulmck@kernel.org

    Paul E. McKenney
     
  • Introduce an rq-qos policy that assigns an I/O priority to requests based
    on blk-cgroup configuration settings. This policy has the following
    advantages over the ioprio_set() system call:
    - This policy is cgroup based so it has all the advantages of cgroups.
    - While ioprio_set() does not affect page cache writeback I/O, this rq-qos
    controller affects page cache writeback I/O for filesystems that support
    assiociating a cgroup with writeback I/O. See also
    Documentation/admin-guide/cgroup-v2.rst.

    Cc: Damien Le Moal
    Cc: Hannes Reinecke
    Cc: Christoph Hellwig
    Cc: Ming Lei
    Cc: Johannes Thumshirn
    Cc: Himanshu Madhani
    Signed-off-by: Bart Van Assche
    Link: https://lore.kernel.org/r/20210618004456.7280-5-bvanassche@acm.org
    Signed-off-by: Jens Axboe

    Bart Van Assche