08 Mar, 2020

1 commit

  • Merge Linux stable release v5.4.24 into imx_5.4.y

    * tag 'v5.4.24': (3306 commits)
    Linux 5.4.24
    blktrace: Protect q->blk_trace with RCU
    kvm: nVMX: VMWRITE checks unsupported field before read-only field
    ...

    Signed-off-by: Jason Liu

    Conflicts:
    arch/arm/boot/dts/imx6sll-evk.dts
    arch/arm/boot/dts/imx7ulp.dtsi
    arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi
    drivers/clk/imx/clk-composite-8m.c
    drivers/gpio/gpio-mxc.c
    drivers/irqchip/Kconfig
    drivers/mmc/host/sdhci-of-esdhc.c
    drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c
    drivers/net/can/flexcan.c
    drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
    drivers/net/ethernet/mscc/ocelot.c
    drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
    drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
    drivers/net/phy/realtek.c
    drivers/pci/controller/mobiveil/pcie-mobiveil-host.c
    drivers/perf/fsl_imx8_ddr_perf.c
    drivers/tee/optee/shm_pool.c
    drivers/usb/cdns3/gadget.c
    kernel/sched/cpufreq.c
    net/core/xdp.c
    sound/soc/fsl/fsl_esai.c
    sound/soc/fsl/fsl_sai.c
    sound/soc/sof/core.c
    sound/soc/sof/imx/Kconfig
    sound/soc/sof/loader.c

    Jason Liu
     

04 Mar, 2020

2 commits


18 Jan, 2020

1 commit


09 Jan, 2020

1 commit

  • commit a7583e72a5f22470d3e6fd3b6ba912892242339f upstream.

    The commit 0f27cff8597d ("ACPI: sysfs: Make ACPI GPE mask kernel
    parameter cover all GPEs") says:
    "Use a bitmap of size 0xFF instead of a u64 for the GPE mask so 256
    GPEs can be masked"

    But the masking of GPE 0xFF it not supported and the check condition
    "gpe > ACPI_MASKABLE_GPE_MAX" is not valid because the type of gpe is
    u8.

    So modify the macro ACPI_MASKABLE_GPE_MAX to 0x100, and drop the "gpe >
    ACPI_MASKABLE_GPE_MAX" check. In addition, update the docs "Format" for
    acpi_mask_gpe parameter.

    Fixes: 0f27cff8597d ("ACPI: sysfs: Make ACPI GPE mask kernel parameter cover all GPEs")
    Signed-off-by: Yunfeng Ye
    [ rjw: Use u16 as gpe data type in acpi_gpe_apply_masked_gpes() ]
    Signed-off-by: Rafael J. Wysocki
    Signed-off-by: Greg Kroah-Hartman

    Yunfeng Ye
     

18 Dec, 2019

1 commit


16 Dec, 2019

1 commit

  • This is the 5.4.3 stable release

    Conflicts:
    drivers/cpufreq/imx-cpufreq-dt.c
    drivers/spi/spi-fsl-qspi.c

    The conflict is very minor, fixed it when do the merge. The imx-cpufreq-dt.c
    is just one line code-style change, using upstream one, no any function change.

    The spi-fsl-qspi.c has minor conflicts when merge upstream fixes: c69b17da53b2
    spi: spi-fsl-qspi: Clear TDH bits in FLSHCR register

    After merge, basic boot sanity test and basic qspi test been done on i.mx

    Signed-off-by: Jason Liu

    Jason Liu
     

29 Nov, 2019

1 commit

  • commit 64870ed1b12e235cfca3f6c6da75b542c973ff78 upstream.

    For MDS vulnerable processors with TSX support, enabling either MDS or
    TAA mitigations will enable the use of VERW to flush internal processor
    buffers at the right code path. IOW, they are either both mitigated
    or both not. However, if the command line options are inconsistent,
    the vulnerabilites sysfs files may not report the mitigation status
    correctly.

    For example, with only the "mds=off" option:

    vulnerabilities/mds:Vulnerable; SMT vulnerable
    vulnerabilities/tsx_async_abort:Mitigation: Clear CPU buffers; SMT vulnerable

    The mds vulnerabilities file has wrong status in this case. Similarly,
    the taa vulnerability file will be wrong with mds mitigation on, but
    taa off.

    Change taa_select_mitigation() to sync up the two mitigation status
    and have them turned off if both "mds=off" and "tsx_async_abort=off"
    are present.

    Update documentation to emphasize the fact that both "mds=off" and
    "tsx_async_abort=off" have to be specified together for processors that
    are affected by both TAA and MDS to be effective.

    [ bp: Massage and add kernel-parameters.txt change too. ]

    Fixes: 1b42f017415b ("x86/speculation/taa: Add mitigation for TSX Async Abort")
    Signed-off-by: Waiman Long
    Signed-off-by: Borislav Petkov
    Cc: Greg Kroah-Hartman
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Jiri Kosina
    Cc: Jonathan Corbet
    Cc: Josh Poimboeuf
    Cc: linux-doc@vger.kernel.org
    Cc: Mark Gross
    Cc:
    Cc: Pawan Gupta
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Tim Chen
    Cc: Tony Luck
    Cc: Tyler Hicks
    Cc: x86-ml
    Link: https://lkml.kernel.org/r/20191115161445.30809-2-longman@redhat.com
    Signed-off-by: Greg Kroah-Hartman

    Waiman Long
     

28 Nov, 2019

3 commits


05 Nov, 2019

2 commits

  • Add the initial ITLB_MULTIHIT documentation.

    [ tglx: Add it to the index so it gets actually built. ]

    Signed-off-by: Antonio Gomez Iglesias
    Signed-off-by: Nelson D'Souza
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Thomas Gleixner

    Gomez Iglesias, Antonio
     
  • The page table pages corresponding to broken down large pages are zapped in
    FIFO order, so that the large page can potentially be recovered, if it is
    not longer being used for execution. This removes the performance penalty
    for walking deeper EPT page tables.

    By default, one large page will last about one hour once the guest
    reaches a steady state.

    Signed-off-by: Junaid Shahid
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Thomas Gleixner

    Junaid Shahid
     

04 Nov, 2019

1 commit

  • With some Intel processors, putting the same virtual address in the TLB
    as both a 4 KiB and 2 MiB page can confuse the instruction fetch unit
    and cause the processor to issue a machine check resulting in a CPU lockup.

    Unfortunately when EPT page tables use huge pages, it is possible for a
    malicious guest to cause this situation.

    Add a knob to mark huge pages as non-executable. When the nx_huge_pages
    parameter is enabled (and we are using EPT), all huge pages are marked as
    NX. If the guest attempts to execute in one of those pages, the page is
    broken down into 4K pages, which are then marked executable.

    This is not an issue for shadow paging (except nested EPT), because then
    the host is in control of TLB flushes and the problematic situation cannot
    happen. With nested EPT, again the nested guest can cause problems shadow
    and direct EPT is treated in the same way.

    [ tglx: Fixup default to auto and massage wording a bit ]

    Originally-by: Junaid Shahid
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Thomas Gleixner

    Paolo Bonzini
     

28 Oct, 2019

3 commits

  • Add the documenation for TSX Async Abort. Include the description of
    the issue, how to check the mitigation state, control the mitigation,
    guidance for system administrators.

    [ bp: Add proper SPDX tags, touch ups by Josh and me. ]

    Co-developed-by: Antonio Gomez Iglesias

    Signed-off-by: Pawan Gupta
    Signed-off-by: Antonio Gomez Iglesias
    Signed-off-by: Borislav Petkov
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Mark Gross
    Reviewed-by: Tony Luck
    Reviewed-by: Josh Poimboeuf

    Pawan Gupta
     
  • Platforms which are not affected by X86_BUG_TAA may want the TSX feature
    enabled. Add "auto" option to the TSX cmdline parameter. When tsx=auto
    disable TSX when X86_BUG_TAA is present, otherwise enable TSX.

    More details on X86_BUG_TAA can be found here:
    https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/tsx_async_abort.html

    [ bp: Extend the arg buffer to accommodate "auto\0". ]

    Signed-off-by: Pawan Gupta
    Signed-off-by: Borislav Petkov
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Tony Luck
    Reviewed-by: Josh Poimboeuf

    Pawan Gupta
     
  • Add a kernel cmdline parameter "tsx" to control the Transactional
    Synchronization Extensions (TSX) feature. On CPUs that support TSX
    control, use "tsx=on|off" to enable or disable TSX. Not specifying this
    option is equivalent to "tsx=off". This is because on certain processors
    TSX may be used as a part of a speculative side channel attack.

    Carve out the TSX controlling functionality into a separate compilation
    unit because TSX is a CPU feature while the TSX async abort control
    machinery will go to cpu/bugs.c.

    [ bp: - Massage, shorten and clear the arg buffer.
    - Clarifications of the tsx= possible options - Josh.
    - Expand on TSX_CTRL availability - Pawan. ]

    Signed-off-by: Pawan Gupta
    Signed-off-by: Borislav Petkov
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Josh Poimboeuf

    Pawan Gupta
     

13 Oct, 2019

1 commit


08 Oct, 2019

2 commits

  • cgroup v2 introduces two memory protection thresholds: memory.low
    (best-effort) and memory.min (hard protection). While they generally do
    what they say on the tin, there is a limitation in their implementation
    that makes them difficult to use effectively: that cliff behaviour often
    manifests when they become eligible for reclaim. This patch implements
    more intuitive and usable behaviour, where we gradually mount more
    reclaim pressure as cgroups further and further exceed their protection
    thresholds.

    This cliff edge behaviour happens because we only choose whether or not
    to reclaim based on whether the memcg is within its protection limits
    (see the use of mem_cgroup_protected in shrink_node), but we don't vary
    our reclaim behaviour based on this information. Imagine the following
    timeline, with the numbers the lruvec size in this zone:

    1. memory.low=1000000, memory.current=999999. 0 pages may be scanned.
    2. memory.low=1000000, memory.current=1000000. 0 pages may be scanned.
    3. memory.low=1000000, memory.current=1000001. 1000001* pages may be
    scanned. (?!)

    * Of course, we won't usually scan all available pages in the zone even
    without this patch because of scan control priority, over-reclaim
    protection, etc. However, as shown by the tests at the end, these
    techniques don't sufficiently throttle such an extreme change in input,
    so cliff-like behaviour isn't really averted by their existence alone.

    Here's an example of how this plays out in practice. At Facebook, we are
    trying to protect various workloads from "system" software, like
    configuration management tools, metric collectors, etc (see this[0] case
    study). In order to find a suitable memory.low value, we start by
    determining the expected memory range within which the workload will be
    comfortable operating. This isn't an exact science -- memory usage deemed
    "comfortable" will vary over time due to user behaviour, differences in
    composition of work, etc, etc. As such we need to ballpark memory.low,
    but doing this is currently problematic:

    1. If we end up setting it too low for the workload, it won't have
    *any* effect (see discussion above). The group will receive the full
    weight of reclaim and won't have any priority while competing with the
    less important system software, as if we had no memory.low configured
    at all.

    2. Because of this behaviour, we end up erring on the side of setting
    it too high, such that the comfort range is reliably covered. However,
    protected memory is completely unavailable to the rest of the system,
    so we might cause undue memory and IO pressure there when we *know* we
    have some elasticity in the workload.

    3. Even if we get the value totally right, smack in the middle of the
    comfort zone, we get extreme jumps between no pressure and full
    pressure that cause unpredictable pressure spikes in the workload due
    to the current binary reclaim behaviour.

    With this patch, we can set it to our ballpark estimation without too much
    worry. Any undesirable behaviour, such as too much or too little reclaim
    pressure on the workload or system will be proportional to how far our
    estimation is off. This means we can set memory.low much more
    conservatively and thus waste less resources *without* the risk of the
    workload falling off a cliff if we overshoot.

    As a more abstract technical description, this unintuitive behaviour
    results in having to give high-priority workloads a large protection
    buffer on top of their expected usage to function reliably, as otherwise
    we have abrupt periods of dramatically increased memory pressure which
    hamper performance. Having to set these thresholds so high wastes
    resources and generally works against the principle of work conservation.
    In addition, having proportional memory reclaim behaviour has other
    benefits. Most notably, before this patch it's basically mandatory to set
    memory.low to a higher than desirable value because otherwise as soon as
    you exceed memory.low, all protection is lost, and all pages are eligible
    to scan again. By contrast, having a gradual ramp in reclaim pressure
    means that you now still get some protection when thresholds are exceeded,
    which means that one can now be more comfortable setting memory.low to
    lower values without worrying that all protection will be lost. This is
    important because workingset size is really hard to know exactly,
    especially with variable workloads, so at least getting *some* protection
    if your workingset size grows larger than you expect increases user
    confidence in setting memory.low without a huge buffer on top being
    needed.

    Thanks a lot to Johannes Weiner and Tejun Heo for their advice and
    assistance in thinking about how to make this work better.

    In testing these changes, I intended to verify that:

    1. Changes in page scanning become gradual and proportional instead of
    binary.

    To test this, I experimented stepping further and further down
    memory.low protection on a workload that floats around 19G workingset
    when under memory.low protection, watching page scan rates for the
    workload cgroup:

    +------------+-----------------+--------------------+--------------+
    | memory.low | test (pgscan/s) | control (pgscan/s) | % of control |
    +------------+-----------------+--------------------+--------------+
    | 21G | 0 | 0 | N/A |
    | 17G | 867 | 3799 | 23% |
    | 12G | 1203 | 3543 | 34% |
    | 8G | 2534 | 3979 | 64% |
    | 4G | 3980 | 4147 | 96% |
    | 0 | 3799 | 3980 | 95% |
    +------------+-----------------+--------------------+--------------+

    As you can see, the test kernel (with a kernel containing this
    patch) ramps up page scanning significantly more gradually than the
    control kernel (without this patch).

    2. More gradual ramp up in reclaim aggression doesn't result in
    premature OOMs.

    To test this, I wrote a script that slowly increments the number of
    pages held by stress(1)'s --vm-keep mode until a production system
    entered severe overall memory contention. This script runs in a highly
    protected slice taking up the majority of available system memory.
    Watching vmstat revealed that page scanning continued essentially
    nominally between test and control, without causing forward reclaim
    progress to become arrested.

    [0]: https://facebookmicrosites.github.io/cgroup2/docs/overview.html#case-study-the-fbtax2-project

    [akpm@linux-foundation.org: reflow block comments to fit in 80 cols]
    [chris@chrisdown.name: handle cgroup_disable=memory when getting memcg protection]
    Link: http://lkml.kernel.org/r/20190201045711.GA18302@chrisdown.name
    Link: http://lkml.kernel.org/r/20190124014455.GA6396@chrisdown.name
    Signed-off-by: Chris Down
    Acked-by: Johannes Weiner
    Reviewed-by: Roman Gushchin
    Cc: Michal Hocko
    Cc: Tejun Heo
    Cc: Dennis Zhou
    Cc: Tetsuo Handa
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chris Down
     
  • Currently execution of panic() continues until Xen's panic notifier
    (xen_panic_event()) is called at which point we make a hypercall that
    never returns.

    This means that any notifier that is supposed to be called later as
    well as significant part of panic() code (such as pstore writes from
    kmsg_dump()) is never executed.

    There is no reason for xen_panic_event() to be this last point in
    execution since panic()'s emergency_restart() will call into
    xen_emergency_restart() from where we can perform our hypercall.

    Nevertheless, we will provide xen_legacy_crash boot option that will
    preserve original behavior during crash. This option could be used,
    for example, if running kernel dumper (which happens after panic
    notifiers) is undesirable.

    Reported-by: James Dingwall
    Signed-off-by: Boris Ostrovsky
    Reviewed-by: Juergen Gross

    Boris Ostrovsky
     

28 Sep, 2019

1 commit

  • Pull kernel lockdown mode from James Morris:
    "This is the latest iteration of the kernel lockdown patchset, from
    Matthew Garrett, David Howells and others.

    From the original description:

    This patchset introduces an optional kernel lockdown feature,
    intended to strengthen the boundary between UID 0 and the kernel.
    When enabled, various pieces of kernel functionality are restricted.
    Applications that rely on low-level access to either hardware or the
    kernel may cease working as a result - therefore this should not be
    enabled without appropriate evaluation beforehand.

    The majority of mainstream distributions have been carrying variants
    of this patchset for many years now, so there's value in providing a
    doesn't meet every distribution requirement, but gets us much closer
    to not requiring external patches.

    There are two major changes since this was last proposed for mainline:

    - Separating lockdown from EFI secure boot. Background discussion is
    covered here: https://lwn.net/Articles/751061/

    - Implementation as an LSM, with a default stackable lockdown LSM
    module. This allows the lockdown feature to be policy-driven,
    rather than encoding an implicit policy within the mechanism.

    The new locked_down LSM hook is provided to allow LSMs to make a
    policy decision around whether kernel functionality that would allow
    tampering with or examining the runtime state of the kernel should be
    permitted.

    The included lockdown LSM provides an implementation with a simple
    policy intended for general purpose use. This policy provides a coarse
    level of granularity, controllable via the kernel command line:

    lockdown={integrity|confidentiality}

    Enable the kernel lockdown feature. If set to integrity, kernel features
    that allow userland to modify the running kernel are disabled. If set to
    confidentiality, kernel features that allow userland to extract
    confidential information from the kernel are also disabled.

    This may also be controlled via /sys/kernel/security/lockdown and
    overriden by kernel configuration.

    New or existing LSMs may implement finer-grained controls of the
    lockdown features. Refer to the lockdown_reason documentation in
    include/linux/security.h for details.

    The lockdown feature has had signficant design feedback and review
    across many subsystems. This code has been in linux-next for some
    weeks, with a few fixes applied along the way.

    Stephen Rothwell noted that commit 9d1f8be5cf42 ("bpf: Restrict bpf
    when kernel lockdown is in confidentiality mode") is missing a
    Signed-off-by from its author. Matthew responded that he is providing
    this under category (c) of the DCO"

    * 'next-lockdown' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (31 commits)
    kexec: Fix file verification on S390
    security: constify some arrays in lockdown LSM
    lockdown: Print current->comm in restriction messages
    efi: Restrict efivar_ssdt_load when the kernel is locked down
    tracefs: Restrict tracefs when the kernel is locked down
    debugfs: Restrict debugfs when the kernel is locked down
    kexec: Allow kexec_file() with appropriate IMA policy when locked down
    lockdown: Lock down perf when in confidentiality mode
    bpf: Restrict bpf when kernel lockdown is in confidentiality mode
    lockdown: Lock down tracing and perf kprobes when in confidentiality mode
    lockdown: Lock down /proc/kcore
    x86/mmiotrace: Lock down the testmmiotrace module
    lockdown: Lock down module params that specify hardware parameters (eg. ioport)
    lockdown: Lock down TIOCSSERIAL
    lockdown: Prohibit PCMCIA CIS storage when the kernel is locked down
    acpi: Disable ACPI table override if the kernel is locked down
    acpi: Ignore acpi_rsdp kernel param when the kernel has been locked down
    ACPI: Limit access to custom_method when the kernel is locked down
    x86/msr: Restrict MSR access when the kernel is locked down
    x86: Lock down IO port access when the kernel is locked down
    ...

    Linus Torvalds
     

25 Sep, 2019

3 commits

  • Merge updates from Andrew Morton:

    - a few hot fixes

    - ocfs2 updates

    - almost all of -mm (slab-generic, slab, slub, kmemleak, kasan,
    cleanups, debug, pagecache, memcg, gup, pagemap, memory-hotplug,
    sparsemem, vmalloc, initialization, z3fold, compaction, mempolicy,
    oom-kill, hugetlb, migration, thp, mmap, madvise, shmem, zswap,
    zsmalloc)

    * emailed patches from Andrew Morton : (132 commits)
    mm/zsmalloc.c: fix a -Wunused-function warning
    zswap: do not map same object twice
    zswap: use movable memory if zpool support allocate movable memory
    zpool: add malloc_support_movable to zpool_driver
    shmem: fix obsolete comment in shmem_getpage_gfp()
    mm/madvise: reduce code duplication in error handling paths
    mm: mmap: increase sockets maximum memory size pgoff for 32bits
    mm/mmap.c: refine find_vma_prev() with rb_last()
    riscv: make mmap allocation top-down by default
    mips: use generic mmap top-down layout and brk randomization
    mips: replace arch specific way to determine 32bit task with generic version
    mips: adjust brk randomization offset to fit generic version
    mips: use STACK_TOP when computing mmap base address
    mips: properly account for stack randomization and stack guard gap
    arm: use generic mmap top-down layout and brk randomization
    arm: use STACK_TOP when computing mmap base address
    arm: properly account for stack randomization and stack guard gap
    arm64, mm: make randomization selected by generic topdown mmap layout
    arm64, mm: move generic mmap layout functions to mm
    arm64: consider stack randomization for mmap base only when necessary
    ...

    Linus Torvalds
     
  • Cgroup v1 memcg controller has exposed a dedicated kmem limit to users
    which turned out to be really a bad idea because there are paths which
    cannot shrink the kernel memory usage enough to get below the limit (e.g.
    because the accounted memory is not reclaimable). There are cases when
    the failure is even not allowed (e.g. __GFP_NOFAIL). This means that the
    kmem limit is in excess to the hard limit without any way to shrink and
    thus completely useless. OOM killer cannot be invoked to handle the
    situation because that would lead to a premature oom killing.

    As a result many places might see ENOMEM returning from kmalloc and result
    in unexpected errors. E.g. a global OOM killer when there is a lot of
    free memory because ENOMEM is translated into VM_FAULT_OOM in #PF path and
    therefore pagefault_out_of_memory would result in OOM killer.

    Please note that the kernel memory is still accounted to the overall limit
    along with the user memory so removing the kmem specific limit should
    still allow to contain kernel memory consumption. Unlike the kmem one,
    though, it invokes memory reclaim and targeted memcg oom killing if
    necessary.

    Start the deprecation process by crying to the kernel log. Let's see
    whether there are relevant usecases and simply return to EINVAL in the
    second stage if nobody complains in few releases.

    [akpm@linux-foundation.org: tweak documentation text]
    Link: http://lkml.kernel.org/r/20190911151612.GI4023@dhcp22.suse.cz
    Signed-off-by: Michal Hocko
    Reviewed-by: Shakeel Butt
    Cc: Johannes Weiner
    Cc: Vladimir Davydov
    Cc: Andrey Ryabinin
    Cc: Thomas Lindroth
    Cc: Tetsuo Handa
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • The debug_pagealloc functionality is useful to catch buggy page allocator
    users that cause e.g. use after free or double free. When page
    inconsistency is detected, debugging is often simpler by knowing the call
    stack of process that last allocated and freed the page. When page_owner
    is also enabled, we record the allocation stack trace, but not freeing.

    This patch therefore adds recording of freeing process stack trace to page
    owner info, if both page_owner and debug_pagealloc are configured and
    enabled. With only page_owner enabled, this info is not useful for the
    memory leak debugging use case. dump_page() is adjusted to print the
    info. An example result of calling __free_pages() twice may look like
    this (note the page last free stack trace):

    BUG: Bad page state in process bash pfn:13d8f8
    page:ffffc31984f63e00 refcount:-1 mapcount:0 mapping:0000000000000000 index:0x0
    flags: 0x1affff800000000()
    raw: 01affff800000000 dead000000000100 dead000000000122 0000000000000000
    raw: 0000000000000000 0000000000000000 ffffffffffffffff 0000000000000000
    page dumped because: nonzero _refcount
    page_owner tracks the page as freed
    page last allocated via order 0, migratetype Unmovable, gfp_mask 0xcc0(GFP_KERNEL)
    prep_new_page+0x143/0x150
    get_page_from_freelist+0x289/0x380
    __alloc_pages_nodemask+0x13c/0x2d0
    khugepaged+0x6e/0xc10
    kthread+0xf9/0x130
    ret_from_fork+0x3a/0x50
    page last free stack trace:
    free_pcp_prepare+0x134/0x1e0
    free_unref_page+0x18/0x90
    khugepaged+0x7b/0xc10
    kthread+0xf9/0x130
    ret_from_fork+0x3a/0x50
    Modules linked in:
    CPU: 3 PID: 271 Comm: bash Not tainted 5.3.0-rc4-2.g07a1a73-default+ #57
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-prebuilt.qemu.org 04/01/2014
    Call Trace:
    dump_stack+0x85/0xc0
    bad_page.cold+0xba/0xbf
    rmqueue_pcplist.isra.0+0x6c5/0x6d0
    rmqueue+0x2d/0x810
    get_page_from_freelist+0x191/0x380
    __alloc_pages_nodemask+0x13c/0x2d0
    __get_free_pages+0xd/0x30
    __pud_alloc+0x2c/0x110
    copy_page_range+0x4f9/0x630
    dup_mmap+0x362/0x480
    dup_mm+0x68/0x110
    copy_process+0x19e1/0x1b40
    _do_fork+0x73/0x310
    __x64_sys_clone+0x75/0x80
    do_syscall_64+0x6e/0x1e0
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x7f10af854a10
    ...

    Link: http://lkml.kernel.org/r/20190820131828.22684-5-vbabka@suse.cz
    Signed-off-by: Vlastimil Babka
    Cc: Kirill A. Shutemov
    Cc: Matthew Wilcox
    Cc: Mel Gorman
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     

24 Sep, 2019

1 commit

  • Pull PCI updates from Bjorn Helgaas:
    "Enumeration:

    - Consolidate _HPP/_HPX stuff in pci-acpi.c and simplify it
    (Krzysztof Wilczynski)

    - Fix incorrect PCIe device types and remove dev->has_secondary_link
    to simplify code that deals with upstream/downstream ports (Mika
    Westerberg)

    - After suspend, restore Resizable BAR size bits correctly for 1MB
    BARs (Sumit Saxena)

    - Enable PCI_MSI_IRQ_DOMAIN support for RISC-V (Wesley Terpstra)

    Virtualization:

    - Add ACS quirks for iProc PAXB (Abhinav Ratna), Amazon Annapurna
    Labs (Ali Saidi)

    - Move sysfs SR-IOV functions to iov.c (Kelsey Skunberg)

    - Remove group write permissions from sysfs sriov_numvfs,
    sriov_drivers_autoprobe (Kelsey Skunberg)

    Hotplug:

    - Simplify pciehp indicator control (Denis Efremov)

    Peer-to-peer DMA:

    - Allow P2P DMA between root ports for whitelisted bridges (Logan
    Gunthorpe)

    - Whitelist some Intel host bridges for P2P DMA (Logan Gunthorpe)

    - DMA map P2P DMA requests that traverse host bridge (Logan
    Gunthorpe)

    Amazon Annapurna Labs host bridge driver:

    - Add DT binding and controller driver (Jonathan Chocron)

    Hyper-V host bridge driver:

    - Fix hv_pci_dev->pci_slot use-after-free (Dexuan Cui)

    - Fix PCI domain number collisions (Haiyang Zhang)

    - Use instance ID bytes 4 & 5 as PCI domain numbers (Haiyang Zhang)

    - Fix build errors on non-SYSFS config (Randy Dunlap)

    i.MX6 host bridge driver:

    - Limit DBI register length (Stefan Agner)

    Intel VMD host bridge driver:

    - Fix config addressing issues (Jon Derrick)

    Layerscape host bridge driver:

    - Add bar_fixed_64bit property to endpoint driver (Xiaowei Bao)

    - Add CONFIG_PCI_LAYERSCAPE_EP to build EP/RC drivers separately
    (Xiaowei Bao)

    Mediatek host bridge driver:

    - Add MT7629 controller support (Jianjun Wang)

    Mobiveil host bridge driver:

    - Fix CPU base address setup (Hou Zhiqiang)

    - Make "num-lanes" property optional (Hou Zhiqiang)

    Tegra host bridge driver:

    - Fix OF node reference leak (Nishka Dasgupta)

    - Disable MSI for root ports to work around design problem (Vidya
    Sagar)

    - Add Tegra194 DT binding and controller support (Vidya Sagar)

    - Add support for sideband pins and slot regulators (Vidya Sagar)

    - Add PIPE2UPHY support (Vidya Sagar)

    Misc:

    - Remove unused pci_block_cfg_access() et al (Kelsey Skunberg)

    - Unexport pci_bus_get(), etc (Kelsey Skunberg)

    - Hide PM, VC, link speed, ATS, ECRC, PTM constants and interfaces in
    the PCI core (Kelsey Skunberg)

    - Clean up sysfs DEVICE_ATTR() usage (Kelsey Skunberg)

    - Mark expected switch fall-through (Gustavo A. R. Silva)

    - Propagate errors for optional regulators and PHYs (Thierry Reding)

    - Fix kernel command line resource_alignment parameter issues (Logan
    Gunthorpe)"

    * tag 'pci-v5.4-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (112 commits)
    PCI: Add pci_irq_vector() and other stubs when !CONFIG_PCI
    arm64: tegra: Add PCIe slot supply information in p2972-0000 platform
    arm64: tegra: Add configuration for PCIe C5 sideband signals
    PCI: tegra: Add support to enable slot regulators
    PCI: tegra: Add support to configure sideband pins
    PCI: vmd: Fix shadow offsets to reflect spec changes
    PCI: vmd: Fix config addressing when using bus offsets
    PCI: dwc: Add validation that PCIe core is set to correct mode
    PCI: dwc: al: Add Amazon Annapurna Labs PCIe controller driver
    dt-bindings: PCI: Add Amazon's Annapurna Labs PCIe host bridge binding
    PCI: Add quirk to disable MSI-X support for Amazon's Annapurna Labs Root Port
    PCI/VPD: Prevent VPD access for Amazon's Annapurna Labs Root Port
    PCI: Add ACS quirk for Amazon Annapurna Labs root ports
    PCI: Add Amazon's Annapurna Labs vendor ID
    MAINTAINERS: Add PCI native host/endpoint controllers designated reviewer
    PCI: hv: Use bytes 4 and 5 from instance ID as the PCI domain numbers
    dt-bindings: PCI: tegra: Add PCIe slot supplies regulator entries
    dt-bindings: PCI: tegra: Add sideband pins configuration entries
    PCI: tegra: Add Tegra194 PCIe support
    PCI: Get rid of dev->has_secondary_link flag
    ...

    Linus Torvalds
     

22 Sep, 2019

1 commit

  • …device-mapper/linux-dm

    Pull device mapper updates from Mike Snitzer:

    - crypto and DM crypt advances that allow the crypto API to reclaim
    implementation details that do not belong in DM crypt. The wrapper
    template for ESSIV generation that was factored out will also be used
    by fscrypt in the future.

    - Add root hash pkcs#7 signature verification to the DM verity target.

    - Add a new "clone" DM target that allows for efficient remote
    replication of a device.

    - Enhance DM bufio's cache to be tailored to each client based on use.
    Clients that make heavy use of the cache get more of it, and those
    that use less have reduced cache usage.

    - Add a new DM_GET_TARGET_VERSION ioctl to allow userspace to query the
    version number of a DM target (even if the associated module isn't
    yet loaded).

    - Fix invalid memory access in DM zoned target.

    - Fix the max_discard_sectors limit advertised by the DM raid target;
    it was mistakenly storing the limit in bytes rather than sectors.

    - Small optimizations and cleanups in DM writecache target.

    - Various fixes and cleanups in DM core, DM raid1 and space map portion
    of DM persistent data library.

    * tag 'for-5.4/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (22 commits)
    dm: introduce DM_GET_TARGET_VERSION
    dm bufio: introduce a global cache replacement
    dm bufio: remove old-style buffer cleanup
    dm bufio: introduce a global queue
    dm bufio: refactor adjust_total_allocated
    dm bufio: call adjust_total_allocated from __link_buffer and __unlink_buffer
    dm: add clone target
    dm raid: fix updating of max_discard_sectors limit
    dm writecache: skip writecache_wait for pmem mode
    dm stats: use struct_size() helper
    dm crypt: omit parsing of the encapsulated cipher
    dm crypt: switch to ESSIV crypto API template
    crypto: essiv - create wrapper template for ESSIV generation
    dm space map common: remove check for impossible sm_find_free() return value
    dm raid1: use struct_size() with kzalloc()
    dm writecache: optimize performance by sorting the blocks for writeback_all
    dm writecache: add unlikely for getting two block with same LBA
    dm writecache: remove unused member pointer in writeback_struct
    dm zoned: fix invalid memory access
    dm verity: add root hash pkcs#7 signature verification
    ...

    Linus Torvalds
     

21 Sep, 2019

1 commit

  • Pull powerpc updates from Michael Ellerman:
    "This is a bit late, partly due to me travelling, and partly due to a
    power outage knocking out some of my test systems *while* I was
    travelling.

    - Initial support for running on a system with an Ultravisor, which
    is software that runs below the hypervisor and protects guests
    against some attacks by the hypervisor.

    - Support for building the kernel to run as a "Secure Virtual
    Machine", ie. as a guest capable of running on a system with an
    Ultravisor.

    - Some changes to our DMA code on bare metal, to allow devices with
    medium sized DMA masks (> 32 && < 59 bits) to use more than 2GB of
    DMA space.

    - Support for firmware assisted crash dumps on bare metal (powernv).

    - Two series fixing bugs in and refactoring our PCI EEH code.

    - A large series refactoring our exception entry code to use gas
    macros, both to make it more readable and also enable some future
    optimisations.

    As well as many cleanups and other minor features & fixups.

    Thanks to: Adam Zerella, Alexey Kardashevskiy, Alistair Popple, Andrew
    Donnellan, Aneesh Kumar K.V, Anju T Sudhakar, Anshuman Khandual,
    Balbir Singh, Benjamin Herrenschmidt, Cédric Le Goater, Christophe
    JAILLET, Christophe Leroy, Christopher M. Riedl, Christoph Hellwig,
    Claudio Carvalho, Daniel Axtens, David Gibson, David Hildenbrand,
    Desnes A. Nunes do Rosario, Ganesh Goudar, Gautham R. Shenoy, Greg
    Kurz, Guerney Hunt, Gustavo Romero, Halil Pasic, Hari Bathini, Joakim
    Tjernlund, Jonathan Neuschafer, Jordan Niethe, Leonardo Bras, Lianbo
    Jiang, Madhavan Srinivasan, Mahesh Salgaonkar, Mahesh Salgaonkar,
    Masahiro Yamada, Maxiwell S. Garcia, Michael Anderson, Nathan
    Chancellor, Nathan Lynch, Naveen N. Rao, Nicholas Piggin, Oliver
    O'Halloran, Qian Cai, Ram Pai, Ravi Bangoria, Reza Arbab, Ryan Grimm,
    Sam Bobroff, Santosh Sivaraj, Segher Boessenkool, Sukadev Bhattiprolu,
    Thiago Bauermann, Thiago Jung Bauermann, Thomas Gleixner, Tom
    Lendacky, Vasant Hegde"

    * tag 'powerpc-5.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (264 commits)
    powerpc/mm/mce: Keep irqs disabled during lockless page table walk
    powerpc: Use ftrace_graph_ret_addr() when unwinding
    powerpc/ftrace: Enable HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
    ftrace: Look up the address of return_to_handler() using helpers
    powerpc: dump kernel log before carrying out fadump or kdump
    docs: powerpc: Add missing documentation reference
    powerpc/xmon: Fix output of XIVE IPI
    powerpc/xmon: Improve output of XIVE interrupts
    powerpc/mm/radix: remove useless kernel messages
    powerpc/fadump: support holes in kernel boot memory area
    powerpc/fadump: remove RMA_START and RMA_END macros
    powerpc/fadump: update documentation about option to release opalcore
    powerpc/fadump: consider f/w load area
    powerpc/opalcore: provide an option to invalidate /sys/firmware/opal/core file
    powerpc/opalcore: export /sys/firmware/opal/core for analysing opal crashes
    powerpc/fadump: update documentation about CONFIG_PRESERVE_FA_DUMP
    powerpc/fadump: add support to preserve crash data on FADUMP disabled kernel
    powerpc/fadump: improve how crashed kernel's memory is reserved
    powerpc/fadump: consider reserved ranges while releasing memory
    powerpc/fadump: make crash memory ranges array allocation generic
    ...

    Linus Torvalds
     

19 Sep, 2019

1 commit

  • Pull tty/serial driver updates from Greg KH:
    "Even in this age, people are still making new serial port silicon,
    why...

    Anyway, here's the TTY and Serial driver update for 5.4-rc1. Lots of
    changes in here for a number of embedded serial port devices that are
    being worked on because people really like to see those console
    logs...

    Other than that, nothing major here, no core tty changes that anyone
    should care about.

    All of these have been in linux-next for a while with no reported
    issues"

    * tag 'tty-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (125 commits)
    serial: tegra: Add PIO mode support
    serial: tegra: report clk rate errors
    serial: tegra: add support to adjust baud rate
    serial: tegra: DT for Adjusted baud rates
    serial: tegra: add support to use 8 bytes trigger
    serial: tegra: set maximum num of uart ports to 8
    serial: tegra: check for FIFO mode enabled status
    dt-binding: serial: tegra: add new chips
    serial: tegra: report error to upper tty layer
    serial: tegra: flush the RX fifo on frame error
    serial: tegra: avoid reg access when clk disabled
    serial: tegra: add support to ignore read
    serial: sprd: correct the wrong sequence of arguments
    dt-bindings: serial: Convert riscv,sifive-serial to json-schema
    serial: max310x: turn off transmitter before activating AutoCTS or auto transmitter flow control
    serial: max310x: Properly set flags in AutoCTS mode
    tty: serial: fix platform_no_drv_owner.cocci warnings
    dt-bindings: serial: Document Freescale LINFlexD UART
    serial: fsl_linflexuart: Update compatible string
    tty: n_gsm: avoid recursive locking with async port hangup
    ...

    Linus Torvalds
     

18 Sep, 2019

2 commits

  • Pull block updates from Jens Axboe:

    - Two NVMe pull requests:
    - ana log parse fix from Anton
    - nvme quirks support for Apple devices from Ben
    - fix missing bio completion tracing for multipath stack devices
    from Hannes and Mikhail
    - IP TOS settings for nvme rdma and tcp transports from Israel
    - rq_dma_dir cleanups from Israel
    - tracing for Get LBA Status command from Minwoo
    - Some nvme-tcp cleanups from Minwoo, Potnuri and Myself
    - Some consolidation between the fabrics transports for handling
    the CAP register
    - reset race with ns scanning fix for fabrics (move fabrics
    commands to a dedicated request queue with a different lifetime
    from the admin request queue)."
    - controller reset and namespace scan races fixes
    - nvme discovery log change uevent support
    - naming improvements from Keith
    - multiple discovery controllers reject fix from James
    - some regular cleanups from various people

    - Series fixing (and re-fixing) null_blk debug printing and nr_devices
    checks (André)

    - A few pull requests from Song, with fixes from Andy, Guoqing,
    Guilherme, Neil, Nigel, and Yufen.

    - REQ_OP_ZONE_RESET_ALL support (Chaitanya)

    - Bio merge handling unification (Christoph)

    - Pick default elevator correctly for devices with special needs
    (Damien)

    - Block stats fixes (Hou)

    - Timeout and support devices nbd fixes (Mike)

    - Series fixing races around elevator switching and device add/remove
    (Ming)

    - sed-opal cleanups (Revanth)

    - Per device weight support for BFQ (Fam)

    - Support for blk-iocost, a new model that can properly account cost of
    IO workloads. (Tejun)

    - blk-cgroup writeback fixes (Tejun)

    - paride queue init fixes (zhengbin)

    - blk_set_runtime_active() cleanup (Stanley)

    - Block segment mapping optimizations (Bart)

    - lightnvm fixes (Hans/Minwoo/YueHaibing)

    - Various little fixes and cleanups

    * tag 'for-5.4/block-2019-09-16' of git://git.kernel.dk/linux-block: (186 commits)
    null_blk: format pr_* logs with pr_fmt
    null_blk: match the type of parameter nr_devices
    null_blk: do not fail the module load with zero devices
    block: also check RQF_STATS in blk_mq_need_time_stamp()
    block: make rq sector size accessible for block stats
    bfq: Fix bfq linkage error
    raid5: use bio_end_sector in r5_next_bio
    raid5: remove STRIPE_OPS_REQ_PENDING
    md: add feature flag MD_FEATURE_RAID0_LAYOUT
    md/raid0: avoid RAID0 data corruption due to layout confusion.
    raid5: don't set STRIPE_HANDLE to stripe which is in batch list
    raid5: don't increment read_errors on EILSEQ return
    nvmet: fix a wrong error status returned in error log page
    nvme: send discovery log page change events to userspace
    nvme: add uevent variables for controller devices
    nvme: enable aen regardless of the presence of I/O queues
    nvme-fabrics: allow discovery subsystems accept a kato
    nvmet: Use PTR_ERR_OR_ZERO() in nvmet_init_discovery()
    nvme: Remove redundant assignment of cq vector
    nvme: Assign subsys instance from first ctrl
    ...

    Linus Torvalds
     
  • Pull documentation updates from Jonathan Corbet:
    "It's a somewhat calmer cycle for docs this time, as the churn of the
    mass RST conversion is happily mostly behind us.

    - A new document on reproducible builds.

    - We finally got around to zapping the documentation for hardware
    support that was removed in 2004; one doesn't want to rush these
    things.

    - The usual assortment of fixes, typo corrections, etc"

    * tag 'docs-5.4' of git://git.lwn.net/linux: (67 commits)
    Documentation: kbuild: Add document about reproducible builds
    docs: printk-formats: Stop encouraging use of unnecessary %h[xudi] and %hh[xudi]
    Documentation: Add "earlycon=sbi" to the admin guide
    doc:lock: remove reference to clever use of read-write lock
    devices.txt: improve entry for comedi (char major 98)
    docs: mtd: Update spi nor reference driver
    doc: arm64: fix grammar dtb placed in no attributes region
    Documentation: sysrq: don't recommend 'S' 'U' before 'B'
    mailmap: Update email address for Quentin Perret
    docs: ftrace: clarify when tracing is disabled by the trace file
    docs: process: fix broken link
    Documentation/arm/samsung-s3c24xx: Remove stray U+FEFF character to fix title
    Documentation/arm/sa1100/assabet: Fix 'make assabet_defconfig' command
    Documentation/arm/sa1100: Remove some obsolete documentation
    docs/zh_CN: update Chinese howto.rst for latexdocs making
    Documentation: virt: Fix broken reference to virt tree's index
    docs: Fix typo on pull requests guide
    kernel-doc: Allow anonymous enum
    Documentation: sphinx: Don't parse socket() as identifier reference
    Documentation: sphinx: Add missing comma to list of strings
    ...

    Linus Torvalds
     

17 Sep, 2019

5 commits

  • Pull x86 platform-drivers updates from Andy Shevchenko:

    - ASUS WMI driver got a couple of updates, i.e. support of FAN is fixed
    for recent products and the charge threshold support has been added

    - Two uknown key events for Dell laptops are being ignored now to avoid
    spamming users with harmless messages

    - HP ZBook 17 G5 and ASUS Zenbook UX430UNR got accelerometer support.

    - Intel CherryTrail platforms had a regression with wake up. Now it's
    fixed

    - Intel PMC driver got fixed in order to work nicely in Xen
    environment

    - Intel Speed Select driver provides bucket vs core count relationship.
    Besides that the tools has been updated for better output

    - The PrivacyGuard is enabled on Lenovo ThinkPad laptops

    - Three tablets - Trekstor Primebook C11B 2-in-1, Irbis TW90 and Chuwi
    Surbook Mini - got touchscreen support

    * tag 'platform-drivers-x86-v5.4-1' of git://git.infradead.org/linux-platform-drivers-x86: (53 commits)
    MAINTAINERS: Switch PDx86 subsystem status to Odd Fixes
    platform/x86: asus-wmi: Refactor charge threshold to use the battery hooking API
    platform/x86: asus-wmi: Rename CHARGE_THRESHOLD to RSOC
    platform/x86: asus-wmi: Reorder ASUS_WMI_CHARGE_THRESHOLD
    tools/power/x86/intel-speed-select: Display core count for bucket
    platform/x86: ISST: Allow additional TRL MSRs
    tools/power/x86/intel-speed-select: Fix memory leak
    tools/power/x86/intel-speed-select: Output success/failed for command output
    tools/power/x86/intel-speed-select: Output human readable CPU list
    tools/power/x86/intel-speed-select: Change turbo ratio output to maximum turbo frequency
    tools/power/x86/intel-speed-select: Switch output to MHz
    tools/power/x86/intel-speed-select: Simplify output for turbo-freq and base-freq
    tools/power/x86/intel-speed-select: Fix cpu-count output
    tools/power/x86/intel-speed-select: Fix help option typo
    tools/power/x86/intel-speed-select: Fix package typo
    tools/power/x86/intel-speed-select: Fix a read overflow in isst_set_tdp_level_msr()
    platform/x86: intel_int0002_vgpio: Use device_init_wakeup
    platform/x86: intel_int0002_vgpio: Fix wakeups not working on Cherry Trail
    platform/x86: compal-laptop: Initialize "value" in ec_read_u8()
    platform/x86: touchscreen_dmi: Add info for the Trekstor Primebook C11B 2-in-1
    ...

    Linus Torvalds
     
  • Pull scheduler updates from Ingo Molnar:

    - MAINTAINERS: Add Mark Rutland as perf submaintainer, Juri Lelli and
    Vincent Guittot as scheduler submaintainers. Add Dietmar Eggemann,
    Steven Rostedt, Ben Segall and Mel Gorman as scheduler reviewers.

    As perf and the scheduler is getting bigger and more complex,
    document the status quo of current responsibilities and interests,
    and spread the review pain^H^H^H^H fun via an increase in the Cc:
    linecount generated by scripts/get_maintainer.pl. :-)

    - Add another series of patches that brings the -rt (PREEMPT_RT) tree
    closer to mainline: split the monolithic CONFIG_PREEMPT dependencies
    into a new CONFIG_PREEMPTION category that will allow the eventual
    introduction of CONFIG_PREEMPT_RT. Still a few more hundred patches
    to go though.

    - Extend the CPU cgroup controller with uclamp.min and uclamp.max to
    allow the finer shaping of CPU bandwidth usage.

    - Micro-optimize energy-aware wake-ups from O(CPUS^2) to O(CPUS).

    - Improve the behavior of high CPU count, high thread count
    applications running under cpu.cfs_quota_us constraints.

    - Improve balancing with SCHED_IDLE (SCHED_BATCH) tasks present.

    - Improve CPU isolation housekeeping CPU allocation NUMA locality.

    - Fix deadline scheduler bandwidth calculations and logic when cpusets
    rebuilds the topology, or when it gets deadline-throttled while it's
    being offlined.

    - Convert the cpuset_mutex to percpu_rwsem, to allow it to be used from
    setscheduler() system calls without creating global serialization.
    Add new synchronization between cpuset topology-changing events and
    the deadline acceptance tests in setscheduler(), which were broken
    before.

    - Rework the active_mm state machine to be less confusing and more
    optimal.

    - Rework (simplify) the pick_next_task() slowpath.

    - Improve load-balancing on AMD EPYC systems.

    - ... and misc cleanups, smaller fixes and improvements - please see
    the Git log for more details.

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits)
    sched/psi: Correct overly pessimistic size calculation
    sched/fair: Speed-up energy-aware wake-ups
    sched/uclamp: Always use 'enum uclamp_id' for clamp_id values
    sched/uclamp: Update CPU's refcount on TG's clamp changes
    sched/uclamp: Use TG's clamps to restrict TASK's clamps
    sched/uclamp: Propagate system defaults to the root group
    sched/uclamp: Propagate parent clamps
    sched/uclamp: Extend CPU's cgroup controller
    sched/topology: Improve load balancing on AMD EPYC systems
    arch, ia64: Make NUMA select SMP
    sched, perf: MAINTAINERS update, add submaintainers and reviewers
    sched/fair: Use rq_lock/unlock in online_fair_sched_group
    cpufreq: schedutil: fix equation in comment
    sched: Rework pick_next_task() slow-path
    sched: Allow put_prev_task() to drop rq->lock
    sched/fair: Expose newidle_balance()
    sched: Add task_struct pointer to sched_class::set_curr_task
    sched: Rework CPU hotplug task selection
    sched/{rt,deadline}: Fix set_next_task vs pick_next_task
    sched: Fix kerneldoc comment for ia64_set_curr_task
    ...

    Linus Torvalds
     
  • Pull RCU updates from Ingo Molnar:
    "This cycle's RCU changes were:

    - A few more RCU flavor consolidation cleanups.

    - Updates to RCU's list-traversal macros improving lockdep usability.

    - Forward-progress improvements for no-CBs CPUs: Avoid ignoring
    incoming callbacks during grace-period waits.

    - Forward-progress improvements for no-CBs CPUs: Use ->cblist
    structure to take advantage of others' grace periods.

    - Also added a small commit that avoids needlessly inflicting
    scheduler-clock ticks on callback-offloaded CPUs.

    - Forward-progress improvements for no-CBs CPUs: Reduce contention on
    ->nocb_lock guarding ->cblist.

    - Forward-progress improvements for no-CBs CPUs: Add ->nocb_bypass
    list to further reduce contention on ->nocb_lock guarding ->cblist.

    - Miscellaneous fixes.

    - Torture-test updates.

    - minor LKMM updates"

    * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (86 commits)
    MAINTAINERS: Update from paulmck@linux.ibm.com to paulmck@kernel.org
    rcu: Don't include in rcutiny.h
    rcu: Allow rcu_do_batch() to dynamically adjust batch sizes
    rcu/nocb: Don't wake no-CBs GP kthread if timer posted under overload
    rcu/nocb: Reduce __call_rcu_nocb_wake() leaf rcu_node ->lock contention
    rcu/nocb: Reduce nocb_cb_wait() leaf rcu_node ->lock contention
    rcu/nocb: Advance CBs after merge in rcutree_migrate_callbacks()
    rcu/nocb: Avoid synchronous wakeup in __call_rcu_nocb_wake()
    rcu/nocb: Print no-CBs diagnostics when rcutorture writer unduly delayed
    rcu/nocb: EXP Check use and usefulness of ->nocb_lock_contended
    rcu/nocb: Add bypass callback queueing
    rcu/nocb: Atomic ->len field in rcu_segcblist structure
    rcu/nocb: Unconditionally advance and wake for excessive CBs
    rcu/nocb: Reduce ->nocb_lock contention with separate ->nocb_gp_lock
    rcu/nocb: Reduce contention at no-CBs invocation-done time
    rcu/nocb: Reduce contention at no-CBs registry-time CB advancement
    rcu/nocb: Round down for number of no-CBs grace-period kthreads
    rcu/nocb: Avoid ->nocb_lock capture by corresponding CPU
    rcu/nocb: Avoid needless wakeups of no-CBs grace-period kthread
    rcu/nocb: Make __call_rcu_nocb_wake() safe for many callbacks
    ...

    Linus Torvalds
     
  • Pull ia64 updates from Tony Luck:
    "The big change here is removal of support for SGI Altix"

    * tag 'please-pull-ia64_for_5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux: (33 commits)
    genirq: remove the is_affinity_mask_valid hook
    ia64: remove CONFIG_SWIOTLB ifdefs
    ia64: remove support for machvecs
    ia64: move the screen_info setup to common code
    ia64: move the ROOT_DEV setup to common code
    ia64: rework iommu probing
    ia64: remove the unused sn_coherency_id symbol
    ia64: remove the SGI UV simulator support
    ia64: remove the zx1 swiotlb machvec
    ia64: remove CONFIG_ACPI ifdefs
    ia64: remove CONFIG_PCI ifdefs
    ia64: remove the hpsim platform
    ia64: remove now unused machvec indirections
    ia64: remove support for the SGI SN2 platform
    drivers: remove the SGI SN2 IOC4 base support
    drivers: remove the SGI SN2 IOC3 base support
    qla2xxx: remove SGI SN2 support
    qla1280: remove SGI SN2 support
    misc/sgi-xp: remove SGI SN2 support
    char/mspec: remove SGI SN2 support
    ...

    Linus Torvalds
     
  • Pull arm64 updates from Will Deacon:
    "Although there isn't tonnes of code in terms of line count, there are
    a fair few headline features which I've noted both in the tag and also
    in the merge commits when I pulled everything together.

    The part I'm most pleased with is that we had 35 contributors this
    time around, which feels like a big jump from the usual small group of
    core arm64 arch developers. Hopefully they all enjoyed it so much that
    they'll continue to contribute, but we'll see.

    It's probably worth highlighting that we've pulled in a branch from
    the risc-v folks which moves our CPU topology code out to where it can
    be shared with others.

    Summary:

    - 52-bit virtual addressing in the kernel

    - New ABI to allow tagged user pointers to be dereferenced by
    syscalls

    - Early RNG seeding by the bootloader

    - Improve robustness of SMP boot

    - Fix TLB invalidation in light of recent architectural
    clarifications

    - Support for i.MX8 DDR PMU

    - Remove direct LSE instruction patching in favour of static keys

    - Function error injection using kprobes

    - Support for the PPTT "thread" flag introduced by ACPI 6.3

    - Move PSCI idle code into proper cpuidle driver

    - Relaxation of implicit I/O memory barriers

    - Build with RELR relocations when toolchain supports them

    - Numerous cleanups and non-critical fixes"

    * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (114 commits)
    arm64: remove __iounmap
    arm64: atomics: Use K constraint when toolchain appears to support it
    arm64: atomics: Undefine internal macros after use
    arm64: lse: Make ARM64_LSE_ATOMICS depend on JUMP_LABEL
    arm64: asm: Kill 'asm/atomic_arch.h'
    arm64: lse: Remove unused 'alt_lse' assembly macro
    arm64: atomics: Remove atomic_ll_sc compilation unit
    arm64: avoid using hard-coded registers for LSE atomics
    arm64: atomics: avoid out-of-line ll/sc atomics
    arm64: Use correct ll/sc atomic constraints
    jump_label: Don't warn on __exit jump entries
    docs/perf: Add documentation for the i.MX8 DDR PMU
    perf/imx_ddr: Add support for AXI ID filtering
    arm64: kpti: ensure patched kernel text is fetched from PoU
    arm64: fix fixmap copy for 16K pages and 48-bit VA
    perf/smmuv3: Validate groups for global filtering
    perf/smmuv3: Validate group size
    arm64: Relax Documentation/arm64/tagged-pointers.rst
    arm64: kvm: Replace hardcoded '1' with SYS_PAR_EL1_F
    arm64: mm: Ignore spurious translation faults taken from the kernel
    ...

    Linus Torvalds
     

16 Sep, 2019

1 commit


14 Sep, 2019

2 commits


12 Sep, 2019

1 commit

  • Add the dm-clone target, which allows cloning of arbitrary block
    devices.

    dm-clone produces a one-to-one copy of an existing, read-only source
    device into a writable destination device: It presents a virtual block
    device which makes all data appear immediately, and redirects reads and
    writes accordingly.

    The main use case of dm-clone is to clone a potentially remote,
    high-latency, read-only, archival-type block device into a writable,
    fast, primary-type device for fast, low-latency I/O. The cloned device
    is visible/mountable immediately and the copy of the source device to
    the destination device happens in the background, in parallel with user
    I/O.

    When the cloning completes, the dm-clone table can be removed altogether
    and be replaced, e.g., by a linear table, mapping directly to the
    destination device.

    For further information and examples of how to use dm-clone, please read
    Documentation/admin-guide/device-mapper/dm-clone.rst

    Suggested-by: Vangelis Koukis
    Co-developed-by: Ilias Tsitsimpis
    Signed-off-by: Ilias Tsitsimpis
    Signed-off-by: Nikos Tsironis
    Signed-off-by: Mike Snitzer

    Nikos Tsironis
     

11 Sep, 2019

1 commit