08 Oct, 2020

1 commit

  • * tag 'v5.4.70': (3051 commits)
    Linux 5.4.70
    netfilter: ctnetlink: add a range check for l3/l4 protonum
    ep_create_wakeup_source(): dentry name can change under you...
    ...

    Conflicts:
    arch/arm/mach-imx/pm-imx6.c
    arch/arm64/boot/dts/freescale/imx8mm-evk.dts
    arch/arm64/boot/dts/freescale/imx8mn-ddr4-evk.dts
    drivers/crypto/caam/caamalg.c
    drivers/gpu/drm/imx/dw_hdmi-imx.c
    drivers/gpu/drm/imx/imx-ldb.c
    drivers/gpu/drm/imx/ipuv3/ipuv3-crtc.c
    drivers/mmc/host/sdhci-esdhc-imx.c
    drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
    drivers/net/ethernet/freescale/enetc/enetc.c
    drivers/net/ethernet/freescale/enetc/enetc_pf.c
    drivers/thermal/imx_thermal.c
    drivers/usb/cdns3/ep0.c
    drivers/xen/swiotlb-xen.c
    sound/soc/fsl/fsl_esai.c
    sound/soc/fsl/fsl_sai.c

    Signed-off-by: Jason Liu

    Jason Liu
     

07 Oct, 2020

1 commit

  • commit 2b8bd423614c595540eaadcfbc702afe8e155e50 upstream.

    Currently io_ticks is approximated by adding one at each start and end of
    requests if jiffies counter has changed. This works perfectly for requests
    shorter than a jiffy or if one of requests starts/ends at each jiffy.

    If disk executes just one request at a time and they are longer than two
    jiffies then only first and last jiffies will be accounted.

    Fix is simple: at the end of request add up into io_ticks jiffies passed
    since last update rather than just one jiffy.

    Example: common HDD executes random read 4k requests around 12ms.

    fio --name=test --filename=/dev/sdb --rw=randread --direct=1 --runtime=30 &
    iostat -x 10 sdb

    Note changes of iostat's "%util" 8,43% -> 99,99% before/after patch:

    Before:

    Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
    sdb 0,00 0,00 82,60 0,00 330,40 0,00 8,00 0,96 12,09 12,09 0,00 1,02 8,43

    After:

    Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
    sdb 0,00 0,00 82,50 0,00 330,00 0,00 8,00 1,00 12,10 12,10 0,00 12,12 99,99

    Now io_ticks does not loose time between start and end of requests, but
    for queue-depth > 1 some I/O time between adjacent starts might be lost.

    For load estimation "%util" is not as useful as average queue length,
    but it clearly shows how often disk queue is completely empty.

    Fixes: 5b18b5a73760 ("block: delete part_round_stats and switch to less precise counting")
    Signed-off-by: Konstantin Khlebnikov
    Reviewed-by: Ming Lei
    Signed-off-by: Jens Axboe
    From: "Banerjee, Debabrata"
    Signed-off-by: Greg Kroah-Hartman

    Konstantin Khlebnikov
     

19 Jun, 2020

1 commit

  • * tag 'v5.4.47': (2193 commits)
    Linux 5.4.47
    KVM: arm64: Save the host's PtrAuth keys in non-preemptible context
    KVM: arm64: Synchronize sysreg state on injecting an AArch32 exception
    ...

    Conflicts:
    arch/arm/boot/dts/imx6qdl.dtsi
    arch/arm/mach-imx/Kconfig
    arch/arm/mach-imx/common.h
    arch/arm/mach-imx/suspend-imx6.S
    arch/arm64/boot/dts/freescale/imx8qxp-mek.dts
    arch/powerpc/include/asm/cacheflush.h
    drivers/cpufreq/imx6q-cpufreq.c
    drivers/dma/imx-sdma.c
    drivers/edac/synopsys_edac.c
    drivers/firmware/imx/imx-scu.c
    drivers/net/ethernet/freescale/fec.h
    drivers/net/ethernet/freescale/fec_main.c
    drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
    drivers/net/phy/phy_device.c
    drivers/perf/fsl_imx8_ddr_perf.c
    drivers/usb/cdns3/gadget.c
    drivers/usb/dwc3/gadget.c
    include/uapi/linux/dma-buf.h

    Signed-off-by: Jason Liu

    Jason Liu
     

11 Jun, 2020

3 commits

  • commit 3798cc4d106e91382bfe016caa2edada27c2bb3f upstream

    Make the docs match the code.

    Signed-off-by: Josh Poimboeuf
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Josh Poimboeuf
     
  • commit 7222a1b5b87417f22265c92deea76a6aecd0fb0f upstream

    Add documentation for the SRBDS vulnerability and its mitigation.

    [ bp: Massage.
    jpoimboe: sysfs table strings. ]

    Signed-off-by: Mark Gross
    Signed-off-by: Borislav Petkov
    Reviewed-by: Tony Luck
    Reviewed-by: Josh Poimboeuf
    Signed-off-by: Greg Kroah-Hartman

    Mark Gross
     
  • commit 7e5b3c267d256822407a22fdce6afdf9cd13f9fb upstream

    SRBDS is an MDS-like speculative side channel that can leak bits from the
    random number generator (RNG) across cores and threads. New microcode
    serializes the processor access during the execution of RDRAND and
    RDSEED. This ensures that the shared buffer is overwritten before it is
    released for reuse.

    While it is present on all affected CPU models, the microcode mitigation
    is not needed on models that enumerate ARCH_CAPABILITIES[MDS_NO] in the
    cases where TSX is not supported or has been disabled with TSX_CTRL.

    The mitigation is activated by default on affected processors and it
    increases latency for RDRAND and RDSEED instructions. Among other
    effects this will reduce throughput from /dev/urandom.

    * Enable administrator to configure the mitigation off when desired using
    either mitigations=off or srbds=off.

    * Export vulnerability status via sysfs

    * Rename file-scoped macros to apply for non-whitelist table initializations.

    [ bp: Massage,
    - s/VULNBL_INTEL_STEPPING/VULNBL_INTEL_STEPPINGS/g,
    - do not read arch cap MSR a second time in tsx_fused_off() - just pass it in,
    - flip check in cpu_set_bug_bits() to save an indentation level,
    - reflow comments.
    jpoimboe: s/Mitigated/Mitigation/ in user-visible strings
    tglx: Dropped the fused off magic for now
    ]

    Signed-off-by: Mark Gross
    Signed-off-by: Borislav Petkov
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Tony Luck
    Reviewed-by: Pawan Gupta
    Reviewed-by: Josh Poimboeuf
    Tested-by: Neelima Krishnan
    Signed-off-by: Greg Kroah-Hartman

    Mark Gross
     

29 Apr, 2020

1 commit

  • …t for high speed devices")

    commit 3155f4f40811c5d7e3c686215051acf504e05565 upstream.

    Commit bd0e6c9614b9 ("usb: hub: try old enumeration scheme first for
    high speed devices") changed the way the hub driver enumerates
    high-speed devices. Instead of using the "new" enumeration scheme
    first and switching to the "old" scheme if that doesn't work, we start
    with the "old" scheme. In theory this is better because the "old"
    scheme is slightly faster -- it involves resetting the device only
    once instead of twice.

    However, for a long time Windows used only the "new" scheme. Zeng Tao
    said that Windows 8 and later use the "old" scheme for high-speed
    devices, but apparently there are some devices that don't like it.
    William Bader reports that the Ricoh webcam built into his Sony Vaio
    laptop not only doesn't enumerate under the "old" scheme, it gets hung
    up so badly that it won't then enumerate under the "new" scheme! Only
    a cold reset will fix it.

    Therefore we will revert the commit and go back to trying the "new"
    scheme first for high-speed devices.

    Reported-and-tested-by: William Bader <williambader@hotmail.com>
    Ref: https://bugzilla.kernel.org/show_bug.cgi?id=207219
    Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
    Fixes: bd0e6c9614b9 ("usb: hub: try old enumeration scheme first for high speed devices")
    CC: Zeng Tao <prime.zeng@hisilicon.com>
    CC: <stable@vger.kernel.org>
    Link: https://lore.kernel.org/r/Pine.LNX.4.44L0.2004221611230.11262-100000@iolanthe.rowland.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

    Alan Stern
     

23 Apr, 2020

1 commit


21 Mar, 2020

1 commit

  • [ Upstream commit 3f9e12e0df012c4a9a7fd7eb0d3ae69b459d6b2c ]

    In case the WDAT interface is broken, give the user an option to
    ignore it to let a native driver bind to the watchdog device instead.

    Signed-off-by: Jean Delvare
    Acked-by: Mika Westerberg
    Signed-off-by: Rafael J. Wysocki
    Signed-off-by: Sasha Levin

    Jean Delvare
     

08 Mar, 2020

1 commit

  • Merge Linux stable release v5.4.24 into imx_5.4.y

    * tag 'v5.4.24': (3306 commits)
    Linux 5.4.24
    blktrace: Protect q->blk_trace with RCU
    kvm: nVMX: VMWRITE checks unsupported field before read-only field
    ...

    Signed-off-by: Jason Liu

    Conflicts:
    arch/arm/boot/dts/imx6sll-evk.dts
    arch/arm/boot/dts/imx7ulp.dtsi
    arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi
    drivers/clk/imx/clk-composite-8m.c
    drivers/gpio/gpio-mxc.c
    drivers/irqchip/Kconfig
    drivers/mmc/host/sdhci-of-esdhc.c
    drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c
    drivers/net/can/flexcan.c
    drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
    drivers/net/ethernet/mscc/ocelot.c
    drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
    drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
    drivers/net/phy/realtek.c
    drivers/pci/controller/mobiveil/pcie-mobiveil-host.c
    drivers/perf/fsl_imx8_ddr_perf.c
    drivers/tee/optee/shm_pool.c
    drivers/usb/cdns3/gadget.c
    kernel/sched/cpufreq.c
    net/core/xdp.c
    sound/soc/fsl/fsl_esai.c
    sound/soc/fsl/fsl_sai.c
    sound/soc/sof/core.c
    sound/soc/sof/imx/Kconfig
    sound/soc/sof/loader.c

    Jason Liu
     

04 Mar, 2020

2 commits


18 Jan, 2020

1 commit


09 Jan, 2020

1 commit

  • commit a7583e72a5f22470d3e6fd3b6ba912892242339f upstream.

    The commit 0f27cff8597d ("ACPI: sysfs: Make ACPI GPE mask kernel
    parameter cover all GPEs") says:
    "Use a bitmap of size 0xFF instead of a u64 for the GPE mask so 256
    GPEs can be masked"

    But the masking of GPE 0xFF it not supported and the check condition
    "gpe > ACPI_MASKABLE_GPE_MAX" is not valid because the type of gpe is
    u8.

    So modify the macro ACPI_MASKABLE_GPE_MAX to 0x100, and drop the "gpe >
    ACPI_MASKABLE_GPE_MAX" check. In addition, update the docs "Format" for
    acpi_mask_gpe parameter.

    Fixes: 0f27cff8597d ("ACPI: sysfs: Make ACPI GPE mask kernel parameter cover all GPEs")
    Signed-off-by: Yunfeng Ye
    [ rjw: Use u16 as gpe data type in acpi_gpe_apply_masked_gpes() ]
    Signed-off-by: Rafael J. Wysocki
    Signed-off-by: Greg Kroah-Hartman

    Yunfeng Ye
     

18 Dec, 2019

1 commit


16 Dec, 2019

1 commit

  • This is the 5.4.3 stable release

    Conflicts:
    drivers/cpufreq/imx-cpufreq-dt.c
    drivers/spi/spi-fsl-qspi.c

    The conflict is very minor, fixed it when do the merge. The imx-cpufreq-dt.c
    is just one line code-style change, using upstream one, no any function change.

    The spi-fsl-qspi.c has minor conflicts when merge upstream fixes: c69b17da53b2
    spi: spi-fsl-qspi: Clear TDH bits in FLSHCR register

    After merge, basic boot sanity test and basic qspi test been done on i.mx

    Signed-off-by: Jason Liu

    Jason Liu
     

29 Nov, 2019

1 commit

  • commit 64870ed1b12e235cfca3f6c6da75b542c973ff78 upstream.

    For MDS vulnerable processors with TSX support, enabling either MDS or
    TAA mitigations will enable the use of VERW to flush internal processor
    buffers at the right code path. IOW, they are either both mitigated
    or both not. However, if the command line options are inconsistent,
    the vulnerabilites sysfs files may not report the mitigation status
    correctly.

    For example, with only the "mds=off" option:

    vulnerabilities/mds:Vulnerable; SMT vulnerable
    vulnerabilities/tsx_async_abort:Mitigation: Clear CPU buffers; SMT vulnerable

    The mds vulnerabilities file has wrong status in this case. Similarly,
    the taa vulnerability file will be wrong with mds mitigation on, but
    taa off.

    Change taa_select_mitigation() to sync up the two mitigation status
    and have them turned off if both "mds=off" and "tsx_async_abort=off"
    are present.

    Update documentation to emphasize the fact that both "mds=off" and
    "tsx_async_abort=off" have to be specified together for processors that
    are affected by both TAA and MDS to be effective.

    [ bp: Massage and add kernel-parameters.txt change too. ]

    Fixes: 1b42f017415b ("x86/speculation/taa: Add mitigation for TSX Async Abort")
    Signed-off-by: Waiman Long
    Signed-off-by: Borislav Petkov
    Cc: Greg Kroah-Hartman
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Jiri Kosina
    Cc: Jonathan Corbet
    Cc: Josh Poimboeuf
    Cc: linux-doc@vger.kernel.org
    Cc: Mark Gross
    Cc:
    Cc: Pawan Gupta
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Tim Chen
    Cc: Tony Luck
    Cc: Tyler Hicks
    Cc: x86-ml
    Link: https://lkml.kernel.org/r/20191115161445.30809-2-longman@redhat.com
    Signed-off-by: Greg Kroah-Hartman

    Waiman Long
     

28 Nov, 2019

3 commits


05 Nov, 2019

2 commits

  • Add the initial ITLB_MULTIHIT documentation.

    [ tglx: Add it to the index so it gets actually built. ]

    Signed-off-by: Antonio Gomez Iglesias
    Signed-off-by: Nelson D'Souza
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Thomas Gleixner

    Gomez Iglesias, Antonio
     
  • The page table pages corresponding to broken down large pages are zapped in
    FIFO order, so that the large page can potentially be recovered, if it is
    not longer being used for execution. This removes the performance penalty
    for walking deeper EPT page tables.

    By default, one large page will last about one hour once the guest
    reaches a steady state.

    Signed-off-by: Junaid Shahid
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Thomas Gleixner

    Junaid Shahid
     

04 Nov, 2019

1 commit

  • With some Intel processors, putting the same virtual address in the TLB
    as both a 4 KiB and 2 MiB page can confuse the instruction fetch unit
    and cause the processor to issue a machine check resulting in a CPU lockup.

    Unfortunately when EPT page tables use huge pages, it is possible for a
    malicious guest to cause this situation.

    Add a knob to mark huge pages as non-executable. When the nx_huge_pages
    parameter is enabled (and we are using EPT), all huge pages are marked as
    NX. If the guest attempts to execute in one of those pages, the page is
    broken down into 4K pages, which are then marked executable.

    This is not an issue for shadow paging (except nested EPT), because then
    the host is in control of TLB flushes and the problematic situation cannot
    happen. With nested EPT, again the nested guest can cause problems shadow
    and direct EPT is treated in the same way.

    [ tglx: Fixup default to auto and massage wording a bit ]

    Originally-by: Junaid Shahid
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Thomas Gleixner

    Paolo Bonzini
     

28 Oct, 2019

3 commits

  • Add the documenation for TSX Async Abort. Include the description of
    the issue, how to check the mitigation state, control the mitigation,
    guidance for system administrators.

    [ bp: Add proper SPDX tags, touch ups by Josh and me. ]

    Co-developed-by: Antonio Gomez Iglesias

    Signed-off-by: Pawan Gupta
    Signed-off-by: Antonio Gomez Iglesias
    Signed-off-by: Borislav Petkov
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Mark Gross
    Reviewed-by: Tony Luck
    Reviewed-by: Josh Poimboeuf

    Pawan Gupta
     
  • Platforms which are not affected by X86_BUG_TAA may want the TSX feature
    enabled. Add "auto" option to the TSX cmdline parameter. When tsx=auto
    disable TSX when X86_BUG_TAA is present, otherwise enable TSX.

    More details on X86_BUG_TAA can be found here:
    https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/tsx_async_abort.html

    [ bp: Extend the arg buffer to accommodate "auto\0". ]

    Signed-off-by: Pawan Gupta
    Signed-off-by: Borislav Petkov
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Tony Luck
    Reviewed-by: Josh Poimboeuf

    Pawan Gupta
     
  • Add a kernel cmdline parameter "tsx" to control the Transactional
    Synchronization Extensions (TSX) feature. On CPUs that support TSX
    control, use "tsx=on|off" to enable or disable TSX. Not specifying this
    option is equivalent to "tsx=off". This is because on certain processors
    TSX may be used as a part of a speculative side channel attack.

    Carve out the TSX controlling functionality into a separate compilation
    unit because TSX is a CPU feature while the TSX async abort control
    machinery will go to cpu/bugs.c.

    [ bp: - Massage, shorten and clear the arg buffer.
    - Clarifications of the tsx= possible options - Josh.
    - Expand on TSX_CTRL availability - Pawan. ]

    Signed-off-by: Pawan Gupta
    Signed-off-by: Borislav Petkov
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Josh Poimboeuf

    Pawan Gupta
     

13 Oct, 2019

1 commit


08 Oct, 2019

2 commits

  • cgroup v2 introduces two memory protection thresholds: memory.low
    (best-effort) and memory.min (hard protection). While they generally do
    what they say on the tin, there is a limitation in their implementation
    that makes them difficult to use effectively: that cliff behaviour often
    manifests when they become eligible for reclaim. This patch implements
    more intuitive and usable behaviour, where we gradually mount more
    reclaim pressure as cgroups further and further exceed their protection
    thresholds.

    This cliff edge behaviour happens because we only choose whether or not
    to reclaim based on whether the memcg is within its protection limits
    (see the use of mem_cgroup_protected in shrink_node), but we don't vary
    our reclaim behaviour based on this information. Imagine the following
    timeline, with the numbers the lruvec size in this zone:

    1. memory.low=1000000, memory.current=999999. 0 pages may be scanned.
    2. memory.low=1000000, memory.current=1000000. 0 pages may be scanned.
    3. memory.low=1000000, memory.current=1000001. 1000001* pages may be
    scanned. (?!)

    * Of course, we won't usually scan all available pages in the zone even
    without this patch because of scan control priority, over-reclaim
    protection, etc. However, as shown by the tests at the end, these
    techniques don't sufficiently throttle such an extreme change in input,
    so cliff-like behaviour isn't really averted by their existence alone.

    Here's an example of how this plays out in practice. At Facebook, we are
    trying to protect various workloads from "system" software, like
    configuration management tools, metric collectors, etc (see this[0] case
    study). In order to find a suitable memory.low value, we start by
    determining the expected memory range within which the workload will be
    comfortable operating. This isn't an exact science -- memory usage deemed
    "comfortable" will vary over time due to user behaviour, differences in
    composition of work, etc, etc. As such we need to ballpark memory.low,
    but doing this is currently problematic:

    1. If we end up setting it too low for the workload, it won't have
    *any* effect (see discussion above). The group will receive the full
    weight of reclaim and won't have any priority while competing with the
    less important system software, as if we had no memory.low configured
    at all.

    2. Because of this behaviour, we end up erring on the side of setting
    it too high, such that the comfort range is reliably covered. However,
    protected memory is completely unavailable to the rest of the system,
    so we might cause undue memory and IO pressure there when we *know* we
    have some elasticity in the workload.

    3. Even if we get the value totally right, smack in the middle of the
    comfort zone, we get extreme jumps between no pressure and full
    pressure that cause unpredictable pressure spikes in the workload due
    to the current binary reclaim behaviour.

    With this patch, we can set it to our ballpark estimation without too much
    worry. Any undesirable behaviour, such as too much or too little reclaim
    pressure on the workload or system will be proportional to how far our
    estimation is off. This means we can set memory.low much more
    conservatively and thus waste less resources *without* the risk of the
    workload falling off a cliff if we overshoot.

    As a more abstract technical description, this unintuitive behaviour
    results in having to give high-priority workloads a large protection
    buffer on top of their expected usage to function reliably, as otherwise
    we have abrupt periods of dramatically increased memory pressure which
    hamper performance. Having to set these thresholds so high wastes
    resources and generally works against the principle of work conservation.
    In addition, having proportional memory reclaim behaviour has other
    benefits. Most notably, before this patch it's basically mandatory to set
    memory.low to a higher than desirable value because otherwise as soon as
    you exceed memory.low, all protection is lost, and all pages are eligible
    to scan again. By contrast, having a gradual ramp in reclaim pressure
    means that you now still get some protection when thresholds are exceeded,
    which means that one can now be more comfortable setting memory.low to
    lower values without worrying that all protection will be lost. This is
    important because workingset size is really hard to know exactly,
    especially with variable workloads, so at least getting *some* protection
    if your workingset size grows larger than you expect increases user
    confidence in setting memory.low without a huge buffer on top being
    needed.

    Thanks a lot to Johannes Weiner and Tejun Heo for their advice and
    assistance in thinking about how to make this work better.

    In testing these changes, I intended to verify that:

    1. Changes in page scanning become gradual and proportional instead of
    binary.

    To test this, I experimented stepping further and further down
    memory.low protection on a workload that floats around 19G workingset
    when under memory.low protection, watching page scan rates for the
    workload cgroup:

    +------------+-----------------+--------------------+--------------+
    | memory.low | test (pgscan/s) | control (pgscan/s) | % of control |
    +------------+-----------------+--------------------+--------------+
    | 21G | 0 | 0 | N/A |
    | 17G | 867 | 3799 | 23% |
    | 12G | 1203 | 3543 | 34% |
    | 8G | 2534 | 3979 | 64% |
    | 4G | 3980 | 4147 | 96% |
    | 0 | 3799 | 3980 | 95% |
    +------------+-----------------+--------------------+--------------+

    As you can see, the test kernel (with a kernel containing this
    patch) ramps up page scanning significantly more gradually than the
    control kernel (without this patch).

    2. More gradual ramp up in reclaim aggression doesn't result in
    premature OOMs.

    To test this, I wrote a script that slowly increments the number of
    pages held by stress(1)'s --vm-keep mode until a production system
    entered severe overall memory contention. This script runs in a highly
    protected slice taking up the majority of available system memory.
    Watching vmstat revealed that page scanning continued essentially
    nominally between test and control, without causing forward reclaim
    progress to become arrested.

    [0]: https://facebookmicrosites.github.io/cgroup2/docs/overview.html#case-study-the-fbtax2-project

    [akpm@linux-foundation.org: reflow block comments to fit in 80 cols]
    [chris@chrisdown.name: handle cgroup_disable=memory when getting memcg protection]
    Link: http://lkml.kernel.org/r/20190201045711.GA18302@chrisdown.name
    Link: http://lkml.kernel.org/r/20190124014455.GA6396@chrisdown.name
    Signed-off-by: Chris Down
    Acked-by: Johannes Weiner
    Reviewed-by: Roman Gushchin
    Cc: Michal Hocko
    Cc: Tejun Heo
    Cc: Dennis Zhou
    Cc: Tetsuo Handa
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chris Down
     
  • Currently execution of panic() continues until Xen's panic notifier
    (xen_panic_event()) is called at which point we make a hypercall that
    never returns.

    This means that any notifier that is supposed to be called later as
    well as significant part of panic() code (such as pstore writes from
    kmsg_dump()) is never executed.

    There is no reason for xen_panic_event() to be this last point in
    execution since panic()'s emergency_restart() will call into
    xen_emergency_restart() from where we can perform our hypercall.

    Nevertheless, we will provide xen_legacy_crash boot option that will
    preserve original behavior during crash. This option could be used,
    for example, if running kernel dumper (which happens after panic
    notifiers) is undesirable.

    Reported-by: James Dingwall
    Signed-off-by: Boris Ostrovsky
    Reviewed-by: Juergen Gross

    Boris Ostrovsky
     

28 Sep, 2019

1 commit

  • Pull kernel lockdown mode from James Morris:
    "This is the latest iteration of the kernel lockdown patchset, from
    Matthew Garrett, David Howells and others.

    From the original description:

    This patchset introduces an optional kernel lockdown feature,
    intended to strengthen the boundary between UID 0 and the kernel.
    When enabled, various pieces of kernel functionality are restricted.
    Applications that rely on low-level access to either hardware or the
    kernel may cease working as a result - therefore this should not be
    enabled without appropriate evaluation beforehand.

    The majority of mainstream distributions have been carrying variants
    of this patchset for many years now, so there's value in providing a
    doesn't meet every distribution requirement, but gets us much closer
    to not requiring external patches.

    There are two major changes since this was last proposed for mainline:

    - Separating lockdown from EFI secure boot. Background discussion is
    covered here: https://lwn.net/Articles/751061/

    - Implementation as an LSM, with a default stackable lockdown LSM
    module. This allows the lockdown feature to be policy-driven,
    rather than encoding an implicit policy within the mechanism.

    The new locked_down LSM hook is provided to allow LSMs to make a
    policy decision around whether kernel functionality that would allow
    tampering with or examining the runtime state of the kernel should be
    permitted.

    The included lockdown LSM provides an implementation with a simple
    policy intended for general purpose use. This policy provides a coarse
    level of granularity, controllable via the kernel command line:

    lockdown={integrity|confidentiality}

    Enable the kernel lockdown feature. If set to integrity, kernel features
    that allow userland to modify the running kernel are disabled. If set to
    confidentiality, kernel features that allow userland to extract
    confidential information from the kernel are also disabled.

    This may also be controlled via /sys/kernel/security/lockdown and
    overriden by kernel configuration.

    New or existing LSMs may implement finer-grained controls of the
    lockdown features. Refer to the lockdown_reason documentation in
    include/linux/security.h for details.

    The lockdown feature has had signficant design feedback and review
    across many subsystems. This code has been in linux-next for some
    weeks, with a few fixes applied along the way.

    Stephen Rothwell noted that commit 9d1f8be5cf42 ("bpf: Restrict bpf
    when kernel lockdown is in confidentiality mode") is missing a
    Signed-off-by from its author. Matthew responded that he is providing
    this under category (c) of the DCO"

    * 'next-lockdown' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (31 commits)
    kexec: Fix file verification on S390
    security: constify some arrays in lockdown LSM
    lockdown: Print current->comm in restriction messages
    efi: Restrict efivar_ssdt_load when the kernel is locked down
    tracefs: Restrict tracefs when the kernel is locked down
    debugfs: Restrict debugfs when the kernel is locked down
    kexec: Allow kexec_file() with appropriate IMA policy when locked down
    lockdown: Lock down perf when in confidentiality mode
    bpf: Restrict bpf when kernel lockdown is in confidentiality mode
    lockdown: Lock down tracing and perf kprobes when in confidentiality mode
    lockdown: Lock down /proc/kcore
    x86/mmiotrace: Lock down the testmmiotrace module
    lockdown: Lock down module params that specify hardware parameters (eg. ioport)
    lockdown: Lock down TIOCSSERIAL
    lockdown: Prohibit PCMCIA CIS storage when the kernel is locked down
    acpi: Disable ACPI table override if the kernel is locked down
    acpi: Ignore acpi_rsdp kernel param when the kernel has been locked down
    ACPI: Limit access to custom_method when the kernel is locked down
    x86/msr: Restrict MSR access when the kernel is locked down
    x86: Lock down IO port access when the kernel is locked down
    ...

    Linus Torvalds
     

25 Sep, 2019

3 commits

  • Merge updates from Andrew Morton:

    - a few hot fixes

    - ocfs2 updates

    - almost all of -mm (slab-generic, slab, slub, kmemleak, kasan,
    cleanups, debug, pagecache, memcg, gup, pagemap, memory-hotplug,
    sparsemem, vmalloc, initialization, z3fold, compaction, mempolicy,
    oom-kill, hugetlb, migration, thp, mmap, madvise, shmem, zswap,
    zsmalloc)

    * emailed patches from Andrew Morton : (132 commits)
    mm/zsmalloc.c: fix a -Wunused-function warning
    zswap: do not map same object twice
    zswap: use movable memory if zpool support allocate movable memory
    zpool: add malloc_support_movable to zpool_driver
    shmem: fix obsolete comment in shmem_getpage_gfp()
    mm/madvise: reduce code duplication in error handling paths
    mm: mmap: increase sockets maximum memory size pgoff for 32bits
    mm/mmap.c: refine find_vma_prev() with rb_last()
    riscv: make mmap allocation top-down by default
    mips: use generic mmap top-down layout and brk randomization
    mips: replace arch specific way to determine 32bit task with generic version
    mips: adjust brk randomization offset to fit generic version
    mips: use STACK_TOP when computing mmap base address
    mips: properly account for stack randomization and stack guard gap
    arm: use generic mmap top-down layout and brk randomization
    arm: use STACK_TOP when computing mmap base address
    arm: properly account for stack randomization and stack guard gap
    arm64, mm: make randomization selected by generic topdown mmap layout
    arm64, mm: move generic mmap layout functions to mm
    arm64: consider stack randomization for mmap base only when necessary
    ...

    Linus Torvalds
     
  • Cgroup v1 memcg controller has exposed a dedicated kmem limit to users
    which turned out to be really a bad idea because there are paths which
    cannot shrink the kernel memory usage enough to get below the limit (e.g.
    because the accounted memory is not reclaimable). There are cases when
    the failure is even not allowed (e.g. __GFP_NOFAIL). This means that the
    kmem limit is in excess to the hard limit without any way to shrink and
    thus completely useless. OOM killer cannot be invoked to handle the
    situation because that would lead to a premature oom killing.

    As a result many places might see ENOMEM returning from kmalloc and result
    in unexpected errors. E.g. a global OOM killer when there is a lot of
    free memory because ENOMEM is translated into VM_FAULT_OOM in #PF path and
    therefore pagefault_out_of_memory would result in OOM killer.

    Please note that the kernel memory is still accounted to the overall limit
    along with the user memory so removing the kmem specific limit should
    still allow to contain kernel memory consumption. Unlike the kmem one,
    though, it invokes memory reclaim and targeted memcg oom killing if
    necessary.

    Start the deprecation process by crying to the kernel log. Let's see
    whether there are relevant usecases and simply return to EINVAL in the
    second stage if nobody complains in few releases.

    [akpm@linux-foundation.org: tweak documentation text]
    Link: http://lkml.kernel.org/r/20190911151612.GI4023@dhcp22.suse.cz
    Signed-off-by: Michal Hocko
    Reviewed-by: Shakeel Butt
    Cc: Johannes Weiner
    Cc: Vladimir Davydov
    Cc: Andrey Ryabinin
    Cc: Thomas Lindroth
    Cc: Tetsuo Handa
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • The debug_pagealloc functionality is useful to catch buggy page allocator
    users that cause e.g. use after free or double free. When page
    inconsistency is detected, debugging is often simpler by knowing the call
    stack of process that last allocated and freed the page. When page_owner
    is also enabled, we record the allocation stack trace, but not freeing.

    This patch therefore adds recording of freeing process stack trace to page
    owner info, if both page_owner and debug_pagealloc are configured and
    enabled. With only page_owner enabled, this info is not useful for the
    memory leak debugging use case. dump_page() is adjusted to print the
    info. An example result of calling __free_pages() twice may look like
    this (note the page last free stack trace):

    BUG: Bad page state in process bash pfn:13d8f8
    page:ffffc31984f63e00 refcount:-1 mapcount:0 mapping:0000000000000000 index:0x0
    flags: 0x1affff800000000()
    raw: 01affff800000000 dead000000000100 dead000000000122 0000000000000000
    raw: 0000000000000000 0000000000000000 ffffffffffffffff 0000000000000000
    page dumped because: nonzero _refcount
    page_owner tracks the page as freed
    page last allocated via order 0, migratetype Unmovable, gfp_mask 0xcc0(GFP_KERNEL)
    prep_new_page+0x143/0x150
    get_page_from_freelist+0x289/0x380
    __alloc_pages_nodemask+0x13c/0x2d0
    khugepaged+0x6e/0xc10
    kthread+0xf9/0x130
    ret_from_fork+0x3a/0x50
    page last free stack trace:
    free_pcp_prepare+0x134/0x1e0
    free_unref_page+0x18/0x90
    khugepaged+0x7b/0xc10
    kthread+0xf9/0x130
    ret_from_fork+0x3a/0x50
    Modules linked in:
    CPU: 3 PID: 271 Comm: bash Not tainted 5.3.0-rc4-2.g07a1a73-default+ #57
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-prebuilt.qemu.org 04/01/2014
    Call Trace:
    dump_stack+0x85/0xc0
    bad_page.cold+0xba/0xbf
    rmqueue_pcplist.isra.0+0x6c5/0x6d0
    rmqueue+0x2d/0x810
    get_page_from_freelist+0x191/0x380
    __alloc_pages_nodemask+0x13c/0x2d0
    __get_free_pages+0xd/0x30
    __pud_alloc+0x2c/0x110
    copy_page_range+0x4f9/0x630
    dup_mmap+0x362/0x480
    dup_mm+0x68/0x110
    copy_process+0x19e1/0x1b40
    _do_fork+0x73/0x310
    __x64_sys_clone+0x75/0x80
    do_syscall_64+0x6e/0x1e0
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x7f10af854a10
    ...

    Link: http://lkml.kernel.org/r/20190820131828.22684-5-vbabka@suse.cz
    Signed-off-by: Vlastimil Babka
    Cc: Kirill A. Shutemov
    Cc: Matthew Wilcox
    Cc: Mel Gorman
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     

24 Sep, 2019

1 commit

  • Pull PCI updates from Bjorn Helgaas:
    "Enumeration:

    - Consolidate _HPP/_HPX stuff in pci-acpi.c and simplify it
    (Krzysztof Wilczynski)

    - Fix incorrect PCIe device types and remove dev->has_secondary_link
    to simplify code that deals with upstream/downstream ports (Mika
    Westerberg)

    - After suspend, restore Resizable BAR size bits correctly for 1MB
    BARs (Sumit Saxena)

    - Enable PCI_MSI_IRQ_DOMAIN support for RISC-V (Wesley Terpstra)

    Virtualization:

    - Add ACS quirks for iProc PAXB (Abhinav Ratna), Amazon Annapurna
    Labs (Ali Saidi)

    - Move sysfs SR-IOV functions to iov.c (Kelsey Skunberg)

    - Remove group write permissions from sysfs sriov_numvfs,
    sriov_drivers_autoprobe (Kelsey Skunberg)

    Hotplug:

    - Simplify pciehp indicator control (Denis Efremov)

    Peer-to-peer DMA:

    - Allow P2P DMA between root ports for whitelisted bridges (Logan
    Gunthorpe)

    - Whitelist some Intel host bridges for P2P DMA (Logan Gunthorpe)

    - DMA map P2P DMA requests that traverse host bridge (Logan
    Gunthorpe)

    Amazon Annapurna Labs host bridge driver:

    - Add DT binding and controller driver (Jonathan Chocron)

    Hyper-V host bridge driver:

    - Fix hv_pci_dev->pci_slot use-after-free (Dexuan Cui)

    - Fix PCI domain number collisions (Haiyang Zhang)

    - Use instance ID bytes 4 & 5 as PCI domain numbers (Haiyang Zhang)

    - Fix build errors on non-SYSFS config (Randy Dunlap)

    i.MX6 host bridge driver:

    - Limit DBI register length (Stefan Agner)

    Intel VMD host bridge driver:

    - Fix config addressing issues (Jon Derrick)

    Layerscape host bridge driver:

    - Add bar_fixed_64bit property to endpoint driver (Xiaowei Bao)

    - Add CONFIG_PCI_LAYERSCAPE_EP to build EP/RC drivers separately
    (Xiaowei Bao)

    Mediatek host bridge driver:

    - Add MT7629 controller support (Jianjun Wang)

    Mobiveil host bridge driver:

    - Fix CPU base address setup (Hou Zhiqiang)

    - Make "num-lanes" property optional (Hou Zhiqiang)

    Tegra host bridge driver:

    - Fix OF node reference leak (Nishka Dasgupta)

    - Disable MSI for root ports to work around design problem (Vidya
    Sagar)

    - Add Tegra194 DT binding and controller support (Vidya Sagar)

    - Add support for sideband pins and slot regulators (Vidya Sagar)

    - Add PIPE2UPHY support (Vidya Sagar)

    Misc:

    - Remove unused pci_block_cfg_access() et al (Kelsey Skunberg)

    - Unexport pci_bus_get(), etc (Kelsey Skunberg)

    - Hide PM, VC, link speed, ATS, ECRC, PTM constants and interfaces in
    the PCI core (Kelsey Skunberg)

    - Clean up sysfs DEVICE_ATTR() usage (Kelsey Skunberg)

    - Mark expected switch fall-through (Gustavo A. R. Silva)

    - Propagate errors for optional regulators and PHYs (Thierry Reding)

    - Fix kernel command line resource_alignment parameter issues (Logan
    Gunthorpe)"

    * tag 'pci-v5.4-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (112 commits)
    PCI: Add pci_irq_vector() and other stubs when !CONFIG_PCI
    arm64: tegra: Add PCIe slot supply information in p2972-0000 platform
    arm64: tegra: Add configuration for PCIe C5 sideband signals
    PCI: tegra: Add support to enable slot regulators
    PCI: tegra: Add support to configure sideband pins
    PCI: vmd: Fix shadow offsets to reflect spec changes
    PCI: vmd: Fix config addressing when using bus offsets
    PCI: dwc: Add validation that PCIe core is set to correct mode
    PCI: dwc: al: Add Amazon Annapurna Labs PCIe controller driver
    dt-bindings: PCI: Add Amazon's Annapurna Labs PCIe host bridge binding
    PCI: Add quirk to disable MSI-X support for Amazon's Annapurna Labs Root Port
    PCI/VPD: Prevent VPD access for Amazon's Annapurna Labs Root Port
    PCI: Add ACS quirk for Amazon Annapurna Labs root ports
    PCI: Add Amazon's Annapurna Labs vendor ID
    MAINTAINERS: Add PCI native host/endpoint controllers designated reviewer
    PCI: hv: Use bytes 4 and 5 from instance ID as the PCI domain numbers
    dt-bindings: PCI: tegra: Add PCIe slot supplies regulator entries
    dt-bindings: PCI: tegra: Add sideband pins configuration entries
    PCI: tegra: Add Tegra194 PCIe support
    PCI: Get rid of dev->has_secondary_link flag
    ...

    Linus Torvalds
     

22 Sep, 2019

1 commit

  • …device-mapper/linux-dm

    Pull device mapper updates from Mike Snitzer:

    - crypto and DM crypt advances that allow the crypto API to reclaim
    implementation details that do not belong in DM crypt. The wrapper
    template for ESSIV generation that was factored out will also be used
    by fscrypt in the future.

    - Add root hash pkcs#7 signature verification to the DM verity target.

    - Add a new "clone" DM target that allows for efficient remote
    replication of a device.

    - Enhance DM bufio's cache to be tailored to each client based on use.
    Clients that make heavy use of the cache get more of it, and those
    that use less have reduced cache usage.

    - Add a new DM_GET_TARGET_VERSION ioctl to allow userspace to query the
    version number of a DM target (even if the associated module isn't
    yet loaded).

    - Fix invalid memory access in DM zoned target.

    - Fix the max_discard_sectors limit advertised by the DM raid target;
    it was mistakenly storing the limit in bytes rather than sectors.

    - Small optimizations and cleanups in DM writecache target.

    - Various fixes and cleanups in DM core, DM raid1 and space map portion
    of DM persistent data library.

    * tag 'for-5.4/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (22 commits)
    dm: introduce DM_GET_TARGET_VERSION
    dm bufio: introduce a global cache replacement
    dm bufio: remove old-style buffer cleanup
    dm bufio: introduce a global queue
    dm bufio: refactor adjust_total_allocated
    dm bufio: call adjust_total_allocated from __link_buffer and __unlink_buffer
    dm: add clone target
    dm raid: fix updating of max_discard_sectors limit
    dm writecache: skip writecache_wait for pmem mode
    dm stats: use struct_size() helper
    dm crypt: omit parsing of the encapsulated cipher
    dm crypt: switch to ESSIV crypto API template
    crypto: essiv - create wrapper template for ESSIV generation
    dm space map common: remove check for impossible sm_find_free() return value
    dm raid1: use struct_size() with kzalloc()
    dm writecache: optimize performance by sorting the blocks for writeback_all
    dm writecache: add unlikely for getting two block with same LBA
    dm writecache: remove unused member pointer in writeback_struct
    dm zoned: fix invalid memory access
    dm verity: add root hash pkcs#7 signature verification
    ...

    Linus Torvalds
     

21 Sep, 2019

1 commit

  • Pull powerpc updates from Michael Ellerman:
    "This is a bit late, partly due to me travelling, and partly due to a
    power outage knocking out some of my test systems *while* I was
    travelling.

    - Initial support for running on a system with an Ultravisor, which
    is software that runs below the hypervisor and protects guests
    against some attacks by the hypervisor.

    - Support for building the kernel to run as a "Secure Virtual
    Machine", ie. as a guest capable of running on a system with an
    Ultravisor.

    - Some changes to our DMA code on bare metal, to allow devices with
    medium sized DMA masks (> 32 && < 59 bits) to use more than 2GB of
    DMA space.

    - Support for firmware assisted crash dumps on bare metal (powernv).

    - Two series fixing bugs in and refactoring our PCI EEH code.

    - A large series refactoring our exception entry code to use gas
    macros, both to make it more readable and also enable some future
    optimisations.

    As well as many cleanups and other minor features & fixups.

    Thanks to: Adam Zerella, Alexey Kardashevskiy, Alistair Popple, Andrew
    Donnellan, Aneesh Kumar K.V, Anju T Sudhakar, Anshuman Khandual,
    Balbir Singh, Benjamin Herrenschmidt, Cédric Le Goater, Christophe
    JAILLET, Christophe Leroy, Christopher M. Riedl, Christoph Hellwig,
    Claudio Carvalho, Daniel Axtens, David Gibson, David Hildenbrand,
    Desnes A. Nunes do Rosario, Ganesh Goudar, Gautham R. Shenoy, Greg
    Kurz, Guerney Hunt, Gustavo Romero, Halil Pasic, Hari Bathini, Joakim
    Tjernlund, Jonathan Neuschafer, Jordan Niethe, Leonardo Bras, Lianbo
    Jiang, Madhavan Srinivasan, Mahesh Salgaonkar, Mahesh Salgaonkar,
    Masahiro Yamada, Maxiwell S. Garcia, Michael Anderson, Nathan
    Chancellor, Nathan Lynch, Naveen N. Rao, Nicholas Piggin, Oliver
    O'Halloran, Qian Cai, Ram Pai, Ravi Bangoria, Reza Arbab, Ryan Grimm,
    Sam Bobroff, Santosh Sivaraj, Segher Boessenkool, Sukadev Bhattiprolu,
    Thiago Bauermann, Thiago Jung Bauermann, Thomas Gleixner, Tom
    Lendacky, Vasant Hegde"

    * tag 'powerpc-5.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (264 commits)
    powerpc/mm/mce: Keep irqs disabled during lockless page table walk
    powerpc: Use ftrace_graph_ret_addr() when unwinding
    powerpc/ftrace: Enable HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
    ftrace: Look up the address of return_to_handler() using helpers
    powerpc: dump kernel log before carrying out fadump or kdump
    docs: powerpc: Add missing documentation reference
    powerpc/xmon: Fix output of XIVE IPI
    powerpc/xmon: Improve output of XIVE interrupts
    powerpc/mm/radix: remove useless kernel messages
    powerpc/fadump: support holes in kernel boot memory area
    powerpc/fadump: remove RMA_START and RMA_END macros
    powerpc/fadump: update documentation about option to release opalcore
    powerpc/fadump: consider f/w load area
    powerpc/opalcore: provide an option to invalidate /sys/firmware/opal/core file
    powerpc/opalcore: export /sys/firmware/opal/core for analysing opal crashes
    powerpc/fadump: update documentation about CONFIG_PRESERVE_FA_DUMP
    powerpc/fadump: add support to preserve crash data on FADUMP disabled kernel
    powerpc/fadump: improve how crashed kernel's memory is reserved
    powerpc/fadump: consider reserved ranges while releasing memory
    powerpc/fadump: make crash memory ranges array allocation generic
    ...

    Linus Torvalds
     

19 Sep, 2019

1 commit

  • Pull tty/serial driver updates from Greg KH:
    "Even in this age, people are still making new serial port silicon,
    why...

    Anyway, here's the TTY and Serial driver update for 5.4-rc1. Lots of
    changes in here for a number of embedded serial port devices that are
    being worked on because people really like to see those console
    logs...

    Other than that, nothing major here, no core tty changes that anyone
    should care about.

    All of these have been in linux-next for a while with no reported
    issues"

    * tag 'tty-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (125 commits)
    serial: tegra: Add PIO mode support
    serial: tegra: report clk rate errors
    serial: tegra: add support to adjust baud rate
    serial: tegra: DT for Adjusted baud rates
    serial: tegra: add support to use 8 bytes trigger
    serial: tegra: set maximum num of uart ports to 8
    serial: tegra: check for FIFO mode enabled status
    dt-binding: serial: tegra: add new chips
    serial: tegra: report error to upper tty layer
    serial: tegra: flush the RX fifo on frame error
    serial: tegra: avoid reg access when clk disabled
    serial: tegra: add support to ignore read
    serial: sprd: correct the wrong sequence of arguments
    dt-bindings: serial: Convert riscv,sifive-serial to json-schema
    serial: max310x: turn off transmitter before activating AutoCTS or auto transmitter flow control
    serial: max310x: Properly set flags in AutoCTS mode
    tty: serial: fix platform_no_drv_owner.cocci warnings
    dt-bindings: serial: Document Freescale LINFlexD UART
    serial: fsl_linflexuart: Update compatible string
    tty: n_gsm: avoid recursive locking with async port hangup
    ...

    Linus Torvalds
     

18 Sep, 2019

2 commits

  • Pull block updates from Jens Axboe:

    - Two NVMe pull requests:
    - ana log parse fix from Anton
    - nvme quirks support for Apple devices from Ben
    - fix missing bio completion tracing for multipath stack devices
    from Hannes and Mikhail
    - IP TOS settings for nvme rdma and tcp transports from Israel
    - rq_dma_dir cleanups from Israel
    - tracing for Get LBA Status command from Minwoo
    - Some nvme-tcp cleanups from Minwoo, Potnuri and Myself
    - Some consolidation between the fabrics transports for handling
    the CAP register
    - reset race with ns scanning fix for fabrics (move fabrics
    commands to a dedicated request queue with a different lifetime
    from the admin request queue)."
    - controller reset and namespace scan races fixes
    - nvme discovery log change uevent support
    - naming improvements from Keith
    - multiple discovery controllers reject fix from James
    - some regular cleanups from various people

    - Series fixing (and re-fixing) null_blk debug printing and nr_devices
    checks (André)

    - A few pull requests from Song, with fixes from Andy, Guoqing,
    Guilherme, Neil, Nigel, and Yufen.

    - REQ_OP_ZONE_RESET_ALL support (Chaitanya)

    - Bio merge handling unification (Christoph)

    - Pick default elevator correctly for devices with special needs
    (Damien)

    - Block stats fixes (Hou)

    - Timeout and support devices nbd fixes (Mike)

    - Series fixing races around elevator switching and device add/remove
    (Ming)

    - sed-opal cleanups (Revanth)

    - Per device weight support for BFQ (Fam)

    - Support for blk-iocost, a new model that can properly account cost of
    IO workloads. (Tejun)

    - blk-cgroup writeback fixes (Tejun)

    - paride queue init fixes (zhengbin)

    - blk_set_runtime_active() cleanup (Stanley)

    - Block segment mapping optimizations (Bart)

    - lightnvm fixes (Hans/Minwoo/YueHaibing)

    - Various little fixes and cleanups

    * tag 'for-5.4/block-2019-09-16' of git://git.kernel.dk/linux-block: (186 commits)
    null_blk: format pr_* logs with pr_fmt
    null_blk: match the type of parameter nr_devices
    null_blk: do not fail the module load with zero devices
    block: also check RQF_STATS in blk_mq_need_time_stamp()
    block: make rq sector size accessible for block stats
    bfq: Fix bfq linkage error
    raid5: use bio_end_sector in r5_next_bio
    raid5: remove STRIPE_OPS_REQ_PENDING
    md: add feature flag MD_FEATURE_RAID0_LAYOUT
    md/raid0: avoid RAID0 data corruption due to layout confusion.
    raid5: don't set STRIPE_HANDLE to stripe which is in batch list
    raid5: don't increment read_errors on EILSEQ return
    nvmet: fix a wrong error status returned in error log page
    nvme: send discovery log page change events to userspace
    nvme: add uevent variables for controller devices
    nvme: enable aen regardless of the presence of I/O queues
    nvme-fabrics: allow discovery subsystems accept a kato
    nvmet: Use PTR_ERR_OR_ZERO() in nvmet_init_discovery()
    nvme: Remove redundant assignment of cq vector
    nvme: Assign subsys instance from first ctrl
    ...

    Linus Torvalds
     
  • Pull documentation updates from Jonathan Corbet:
    "It's a somewhat calmer cycle for docs this time, as the churn of the
    mass RST conversion is happily mostly behind us.

    - A new document on reproducible builds.

    - We finally got around to zapping the documentation for hardware
    support that was removed in 2004; one doesn't want to rush these
    things.

    - The usual assortment of fixes, typo corrections, etc"

    * tag 'docs-5.4' of git://git.lwn.net/linux: (67 commits)
    Documentation: kbuild: Add document about reproducible builds
    docs: printk-formats: Stop encouraging use of unnecessary %h[xudi] and %hh[xudi]
    Documentation: Add "earlycon=sbi" to the admin guide
    doc:lock: remove reference to clever use of read-write lock
    devices.txt: improve entry for comedi (char major 98)
    docs: mtd: Update spi nor reference driver
    doc: arm64: fix grammar dtb placed in no attributes region
    Documentation: sysrq: don't recommend 'S' 'U' before 'B'
    mailmap: Update email address for Quentin Perret
    docs: ftrace: clarify when tracing is disabled by the trace file
    docs: process: fix broken link
    Documentation/arm/samsung-s3c24xx: Remove stray U+FEFF character to fix title
    Documentation/arm/sa1100/assabet: Fix 'make assabet_defconfig' command
    Documentation/arm/sa1100: Remove some obsolete documentation
    docs/zh_CN: update Chinese howto.rst for latexdocs making
    Documentation: virt: Fix broken reference to virt tree's index
    docs: Fix typo on pull requests guide
    kernel-doc: Allow anonymous enum
    Documentation: sphinx: Don't parse socket() as identifier reference
    Documentation: sphinx: Add missing comma to list of strings
    ...

    Linus Torvalds
     

17 Sep, 2019

1 commit

  • Pull x86 platform-drivers updates from Andy Shevchenko:

    - ASUS WMI driver got a couple of updates, i.e. support of FAN is fixed
    for recent products and the charge threshold support has been added

    - Two uknown key events for Dell laptops are being ignored now to avoid
    spamming users with harmless messages

    - HP ZBook 17 G5 and ASUS Zenbook UX430UNR got accelerometer support.

    - Intel CherryTrail platforms had a regression with wake up. Now it's
    fixed

    - Intel PMC driver got fixed in order to work nicely in Xen
    environment

    - Intel Speed Select driver provides bucket vs core count relationship.
    Besides that the tools has been updated for better output

    - The PrivacyGuard is enabled on Lenovo ThinkPad laptops

    - Three tablets - Trekstor Primebook C11B 2-in-1, Irbis TW90 and Chuwi
    Surbook Mini - got touchscreen support

    * tag 'platform-drivers-x86-v5.4-1' of git://git.infradead.org/linux-platform-drivers-x86: (53 commits)
    MAINTAINERS: Switch PDx86 subsystem status to Odd Fixes
    platform/x86: asus-wmi: Refactor charge threshold to use the battery hooking API
    platform/x86: asus-wmi: Rename CHARGE_THRESHOLD to RSOC
    platform/x86: asus-wmi: Reorder ASUS_WMI_CHARGE_THRESHOLD
    tools/power/x86/intel-speed-select: Display core count for bucket
    platform/x86: ISST: Allow additional TRL MSRs
    tools/power/x86/intel-speed-select: Fix memory leak
    tools/power/x86/intel-speed-select: Output success/failed for command output
    tools/power/x86/intel-speed-select: Output human readable CPU list
    tools/power/x86/intel-speed-select: Change turbo ratio output to maximum turbo frequency
    tools/power/x86/intel-speed-select: Switch output to MHz
    tools/power/x86/intel-speed-select: Simplify output for turbo-freq and base-freq
    tools/power/x86/intel-speed-select: Fix cpu-count output
    tools/power/x86/intel-speed-select: Fix help option typo
    tools/power/x86/intel-speed-select: Fix package typo
    tools/power/x86/intel-speed-select: Fix a read overflow in isst_set_tdp_level_msr()
    platform/x86: intel_int0002_vgpio: Use device_init_wakeup
    platform/x86: intel_int0002_vgpio: Fix wakeups not working on Cherry Trail
    platform/x86: compal-laptop: Initialize "value" in ec_read_u8()
    platform/x86: touchscreen_dmi: Add info for the Trekstor Primebook C11B 2-in-1
    ...

    Linus Torvalds