06 Nov, 2019

1 commit

  • Add TLS TX counter description for the handshake retransmitted
    packets that triggers the resync procedure then skip it, going
    into the regular transmit flow.

    Fixes: 46a3ea98074e ("net/mlx5e: kTLS, Enhance TX resync flow")
    Signed-off-by: Tariq Toukan
    Signed-off-by: Saeed Mahameed
    Acked-by: Jakub Kicinski
    Signed-off-by: David S. Miller

    Tariq Toukan
     

02 Nov, 2019

4 commits

  • Pull networking fixes from David Miller:

    1) Fix free/alloc races in batmanadv, from Sven Eckelmann.

    2) Several leaks and other fixes in kTLS support of mlx5 driver, from
    Tariq Toukan.

    3) BPF devmap_hash cost calculation can overflow on 32-bit, from Toke
    Høiland-Jørgensen.

    4) Add an r8152 device ID, from Kazutoshi Noguchi.

    5) Missing include in ipv6's addrconf.c, from Ben Dooks.

    6) Use siphash in flow dissector, from Eric Dumazet. Attackers can
    easily infer the 32-bit secret otherwise etc.

    7) Several netdevice nesting depth fixes from Taehee Yoo.

    8) Fix several KCSAN reported errors, from Eric Dumazet. For example,
    when doing lockless skb_queue_empty() checks, and accessing
    sk_napi_id/sk_incoming_cpu lockless as well.

    9) Fix jumbo packet handling in RXRPC, from David Howells.

    10) Bump SOMAXCONN and tcp_max_syn_backlog values, from Eric Dumazet.

    11) Fix DMA synchronization in gve driver, from Yangchun Fu.

    12) Several bpf offload fixes, from Jakub Kicinski.

    13) Fix sk_page_frag() recursion during memory reclaim, from Tejun Heo.

    14) Fix ping latency during high traffic rates in hisilicon driver, from
    Jiangfent Xiao.

    * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (146 commits)
    net: fix installing orphaned programs
    net: cls_bpf: fix NULL deref on offload filter removal
    selftests: bpf: Skip write only files in debugfs
    selftests: net: reuseport_dualstack: fix uninitalized parameter
    r8169: fix wrong PHY ID issue with RTL8168dp
    net: dsa: bcm_sf2: Fix IMP setup for port different than 8
    net: phylink: Fix phylink_dbg() macro
    gve: Fixes DMA synchronization.
    inet: stop leaking jiffies on the wire
    ixgbe: Remove duplicate clear_bit() call
    Documentation: networking: device drivers: Remove stray asterisks
    e1000: fix memory leaks
    i40e: Fix receive buffer starvation for AF_XDP
    igb: Fix constant media auto sense switching when no cable is connected
    net: ethernet: arc: add the missed clk_disable_unprepare
    igb: Enable media autosense for the i350.
    igb/igc: Don't warn on fatal read failures when the device is removed
    tcp: increase tcp_max_syn_backlog max value
    net: increase SOMAXCONN to 4096
    netdevsim: Fix use-after-free during device dismantle
    ...

    Linus Torvalds
     
  • Jeff Kirsher says:

    ====================
    Intel Wired LAN Driver Updates 2019-11-01

    This series contains updates to e1000, igb, igc, ixgbe, i40e and driver
    documentation.

    Lyude Paul fixes an issue where a fatal read error occurs when the
    device is unplugged from the machine. So change the read error into a
    warn while the device is still present.

    Manfred Rudigier found that the i350 device was not apart of the "Media
    Auto Sense" feature, yet the device supports it. So add the missing
    i350 device to the check and fix an issue where the media auto sense
    would flip/flop when no cable was connected to the port causing spurious
    kernel log messages.

    I fixed an issue where the fix to resolve receive buffer starvation was
    applied in more than one place in the driver, one being the incorrect
    location in the i40e driver.

    Wenwen Wang fixes a potential memory leak in e1000 where allocated
    memory is not properly cleaned up in one of the error paths.

    Jonathan Neuschäfer cleans up the driver documentation to be consistent
    and remove the footnote reference, since the footnote no longer exists in
    the documentation.

    Igor Pylypiv cleans up a duplicate clearing of a bit, no need to clear
    it twice.

    v2: Fixed alignment issue in patch 3 of the series based on community
    feedback.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • These asterisks were once references to a line that said:
    "* Other names and brands may be claimed as the property of others."
    But now, they serve no purpose; they can only irritate the reader.

    Fixes: de3edab4276c ("e1000: update README for e1000")
    Fixes: a3fb65680f65 ("e100.txt: Cleanup license info in kernel doc")
    Fixes: da8c01c4502a ("e1000e.txt: Add e1000e documentation")
    Fixes: f12a84a9f650 ("Documentation: fm10k: Add kernel documentation")
    Fixes: b55c52b1938c ("igb.txt: Add igb documentation")
    Fixes: c4e9b56e2442 ("igbvf.txt: Add igbvf Documentation")
    Fixes: d7064f4c192c ("Documentation/networking/: Update Intel wired LAN driver documentation")
    Fixes: c4b8c01112a1 ("ixgbevf.txt: Update ixgbevf documentation")
    Fixes: 1e06edcc2f22 ("Documentation: i40e: Prepare documentation for RST conversion")
    Fixes: 105bf2fe6b32 ("i40evf: add driver to kernel build system")
    Fixes: 1fae869bcf3d ("Documentation: ice: Prepare documentation for RST conversion")
    Fixes: df69ba43217d ("ionic: Add basic framework for IONIC Network device driver")
    Signed-off-by: Jonathan Neuschäfer
    Tested-by: Aaron Brown
    Signed-off-by: Jeff Kirsher

    Jonathan Neuschäfer
     
  • Pull arm64 fixes from Will Deacon:
    "These are almost exclusively related to CPU errata in CPUs from
    Broadcom and Qualcomm where the workarounds were either not being
    enabled when they should have been or enabled when they shouldn't have
    been.

    The only "interesting" fix is ensuring that writeable, shared mappings
    are initially mapped as clean since we inadvertently broke the logic
    back in v4.14 and then noticed the problem via code inspection the
    other day.

    The only critical issue we have outstanding is a sporadic NULL
    dereference in the scheduler, which doesn't appear to be
    arm64-specific and PeterZ is tearing his hair out over it at the
    moment.

    Summary:

    - Enable CPU errata workarounds for Broadcom Brahma-B53

    - Enable CPU errata workarounds for Qualcomm Hydra/Kryo CPUs

    - Fix initial dirty status of writeable, shared mappings"

    * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
    arm64: apply ARM64_ERRATUM_843419 workaround for Brahma-B53 core
    arm64: Brahma-B53 is SSB and spectre v2 safe
    arm64: apply ARM64_ERRATUM_845719 workaround for Brahma-B53 core
    arm64: cpufeature: Enable Qualcomm Falkor errata 1009 for Kryo
    arm64: cpufeature: Enable Qualcomm Falkor/Kryo errata 1003
    arm64: Ensure VM_WRITE|VM_SHARED ptes are clean by default

    Linus Torvalds
     

01 Nov, 2019

4 commits

  • The Broadcom Brahma-B53 core is susceptible to the issue described by
    ARM64_ERRATUM_843419 so this commit enables the workaround to be applied
    when executing on that core.

    Since there are now multiple entries to match, we must convert the
    existing ARM64_ERRATUM_843419 into an erratum list and use
    cpucap_multi_entry_cap_matches to match our entries.

    Signed-off-by: Florian Fainelli
    Signed-off-by: Will Deacon

    Florian Fainelli
     
  • The Broadcom Brahma-B53 core is susceptible to the issue described by
    ARM64_ERRATUM_845719 so this commit enables the workaround to be applied
    when executing on that core.

    Since there are now multiple entries to match, we must convert the
    existing ARM64_ERRATUM_845719 into an erratum list.

    Signed-off-by: Doug Berger
    Signed-off-by: Florian Fainelli
    Signed-off-by: Will Deacon

    Doug Berger
     
  • tcp_max_syn_backlog default value depends on memory size
    and TCP ehash size. Before this patch, the max value
    was 2048 [1], which is considered too small nowadays.

    Increase it to 4096 to match the recent SOMAXCONN change.

    [1] This is with TCP ehash size being capped to 524288 buckets.

    Signed-off-by: Eric Dumazet
    Cc: Willy Tarreau
    Cc: Yue Cao
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • SOMAXCONN is /proc/sys/net/core/somaxconn default value.

    It has been defined as 128 more than 20 years ago.

    Since it caps the listen() backlog values, the very small value has
    caused numerous problems over the years, and many people had
    to raise it on their hosts after beeing hit by problems.

    Google has been using 1024 for at least 15 years, and we increased
    this to 4096 after TCP listener rework has been completed, more than
    4 years ago. We got no complain of this change breaking any
    legacy application.

    Many applications indeed setup a TCP listener with listen(fd, -1);
    meaning they let the system select the backlog.

    Raising SOMAXCONN lowers chance of the port being unavailable under
    even small SYNFLOOD attack, and reduces possibilities of side channel
    vulnerabilities.

    Signed-off-by: Eric Dumazet
    Cc: Willy Tarreau
    Cc: Yue Cao
    Signed-off-by: David S. Miller

    Eric Dumazet
     

31 Oct, 2019

1 commit


26 Oct, 2019

1 commit

  • Pull ARM SoC fixes from Olof Johansson:
    "A slightly larger set of fixes have accrued in the last two weeks.
    Mostly a collection of the usual smaller fixes:

    - Marvell Armada: USB phy setup issues on Turris Mox

    - Broadcom: GPIO/pinmux DT mapping corrections for Stingray, MMC bus
    width fix for RPi Zero W, GPIO LED removal for RPI CM3. Also some
    maintainer updates.

    - OMAP: Fixlets for display config, interrupt settings for wifi, some
    clock/PM pieces. Also IOMMU regression fix and a ti-sysc
    no-watchdog regression fix.

    - i.MX: A few fixes around PM/settings, some devicetree fixlets and
    catching up with config option changes in DRM

    - Rockchip: RockRro64 misc DT fixups, Hugsun X99 USB-C, Kevin display
    panel settings

    ... and some smaller fixes for Davinci (backlight, McBSP DMA),
    Allwinner (phy regulators, PMU removal on A64, etc)"

    * tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (42 commits)
    ARM: dts: stm32: relax qspi pins slew-rate for stm32mp157
    MAINTAINERS: Update the Spreadtrum SoC maintainer
    MAINTAINERS: Remove Gregory and Brian for ARCH_BRCMSTB
    ARM: dts: bcm2837-rpi-cm3: Avoid leds-gpio probing issue
    bus: ti-sysc: Fix watchdog quirk handling
    ARM: OMAP2+: Add pdata for OMAP3 ISP IOMMU
    ARM: OMAP2+: Plug in device_enable/idle ops for IOMMUs
    ARM: davinci_all_defconfig: enable GPIO backlight
    ARM: davinci: dm365: Fix McBSP dma_slave_map entry
    ARM: dts: bcm2835-rpi-zero-w: Fix bus-width of sdhci
    ARM: imx_v6_v7_defconfig: Enable CONFIG_DRM_MSM
    arm64: dts: imx8mn: Use correct clock for usdhc's ipg clk
    arm64: dts: imx8mm: Use correct clock for usdhc's ipg clk
    arm64: dts: imx8mq: Use correct clock for usdhc's ipg clk
    ARM: dts: imx7s: Correct GPT's ipg clock source
    ARM: dts: vf610-zii-scu4-aib: Specify 'i2c-mux-idle-disconnect'
    ARM: dts: imx6q-logicpd: Re-Enable SNVS power key
    arm64: dts: lx2160a: Correct CPU core idle state name
    mailmap: Add Simon Arlott (replacement for expired email address)
    arm64: dts: rockchip: Fix override mode for rk3399-kevin panel
    ...

    Linus Torvalds
     

25 Oct, 2019

1 commit


24 Oct, 2019

2 commits

  • Fix the errors in the RiscV CPU DT schema:

    Documentation/devicetree/bindings/riscv/cpus.example.dt.yaml: cpu@0: 'timebase-frequency' is a required property
    Documentation/devicetree/bindings/riscv/cpus.example.dt.yaml: cpu@1: 'timebase-frequency' is a required property
    Documentation/devicetree/bindings/riscv/cpus.example.dt.yaml: cpu@0: compatible:0: 'riscv' is not one of ['sifive,rocket0', 'sifive,e5', 'sifive,e51', 'sifive,u54-mc', 'sifive,u54', 'sifive,u5']
    Documentation/devicetree/bindings/riscv/cpus.example.dt.yaml: cpu@0: compatible: ['riscv'] is too short
    Documentation/devicetree/bindings/riscv/cpus.example.dt.yaml: cpu@0: 'timebase-frequency' is a required property

    The DT spec allows for 'timebase-frequency' to be in 'cpu' or 'cpus' node
    and RiscV requires it in /cpus node, so make it disallowed in cpu
    nodes.

    Fixes: 4fd669a8c487 ("dt-bindings: riscv: convert cpu binding to json-schema")
    Cc: Palmer Dabbelt
    Cc: Albert Ou
    Cc: linux-riscv@lists.infradead.org
    Acked-by: Paul Walmsley
    Signed-off-by: Rob Herring

    Rob Herring
     
  • …git/broonie/regulator

    Pull regulator fixes from Mark Brown:
    "There are a few core fixes here around error handling and handling if
    suspend mode configuration and some driver specific fixes here but the
    most important change is the fix to the fixed-regulator DT schema
    conversion introduced during the last merge window.

    That fixes one of the last two errors preventing successful execution
    of "make dt_binding_check" which will be enormously helpful for DT
    schema development"

    * tag 'regulator-fix-v5.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
    regulator: qcom-rpmh: Fix PMIC5 BoB min voltage
    regulator: pfuze100-regulator: Variable "val" in pfuze100_regulator_probe() could be uninitialized
    regulator: lochnagar: Add on_off_delay for VDDCORE
    regulator: ti-abb: Fix timeout in ti_abb_wait_txdone/ti_abb_clear_all_txdone
    regulator: da9062: fix suspend_enable/disable preparation
    dt-bindings: fixed-regulator: fix compatible enum
    regulator: fixed: Prevent NULL pointer dereference when !CONFIG_OF
    regulator: core: make regulator_register() EPROBE_DEFER aware
    regulator: of: fix suspend-min/max-voltage parsing

    Linus Torvalds
     

23 Oct, 2019

1 commit

  • …/git/sunxi/linux into arm/fixes

    A number of fixes for this release, but mostly:
    - A fixup for the A10 CSI DT binding merged during the 5.4-rc1 window
    - A fix for a dt-binding error
    - Addition of phy regulator delays
    - The PMU on the A64 was found to be non-functional, so we've dropped it for now

    * tag 'sunxi-fixes-for-5.4-1' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux:
    ARM: dts: sun7i: Drop the module clock from the device tree
    dt-bindings: media: sun4i-csi: Drop the module clock
    media: dt-bindings: Fix building error for dt_binding_check
    arm64: dts: allwinner: a64: sopine-baseboard: Add PHY regulator delay
    arm64: dts: allwinner: a64: Drop PMU node
    arm64: dts: allwinner: a64: pine64-plus: Add PHY regulator delay

    Link: https://lore.kernel.org/r/80085a57-c40f-4bed-a9c3-19858d87564e.lettre@localhost
    Signed-off-by: Olof Johansson <olof@lixom.net>

    Olof Johansson
     

22 Oct, 2019

1 commit

  • Pull pin control fixes from Linus Walleij:
    "Here is a bunch of pin control fixes. I was lagging behind on this
    one, some fixes should have come in earlier, sorry about that.

    Anyways here it is, pretty straight-forward fixes, the Strago fix
    stand out as something serious affecting a lot of machines.

    Summary:
    - Handle multiple instances of Intel chips without complaining.
    - Restore the Intel Strago DMI workaround
    - Make the Armada 37xx handle pins over 32
    - Fix the polarity of the LED group on Armada 37xx
    - Fix an off-by-one bug in the NS2 driver
    - Fix error path for iproc's platform_get_irq()
    - Fix error path on the STMFX driver
    - Fix a typo in the Berlin AS370 driver
    - Fix up misc errors in the Aspeed 2600 BMC support
    - Fix a stray SPDX tag"

    * tag 'pinctrl-v5.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
    pinctrl: aspeed-g6: Rename SD3 to EMMC and rework pin groups
    pinctrl: aspeed-g6: Fix UART13 group pinmux
    pinctrl: aspeed-g6: Make SIG_DESC_CLEAR() behave intuitively
    pinctrl: aspeed-g6: Fix I3C3/I3C4 pinmux configuration
    pinctrl: aspeed-g6: Fix I2C14 SDA description
    pinctrl: aspeed-g6: Sort pins for sanity
    dt-bindings: pinctrl: aspeed-g6: Rework SD3 function and groups
    pinctrl: berlin: as370: fix a typo s/spififib/spdifib
    pinctrl: armada-37xx: swap polarity on LED group
    pinctrl: stmfx: fix null pointer on remove
    pinctrl: iproc: allow for error from platform_get_irq()
    pinctrl: ns2: Fix off by one bugs in ns2_pinmux_enable()
    pinctrl: bcm-iproc: Use SPDX header
    pinctrl: armada-37xx: fix control of pins 32 and up
    pinctrl: cherryview: restore Strago DMI workaround for all versions
    pinctrl: intel: Allocate IRQ chip dynamic

    Linus Torvalds
     

20 Oct, 2019

2 commits

  • Pull irq fixes from Thomas Gleixner:
    "A small set of irq chip driver fixes and updates:

    - Update the SIFIVE PLIC interrupt driver to use the fasteoi handler
    to address the shortcomings of the existing flow handling which was
    prone to lose interrupts

    - Use the proper limit for GIC interrupt line numbers

    - Add retrigger support for the recently merged Anapurna Labs Fabric
    interrupt controller to make it complete

    - Enable the ATMEL AIC5 interrupt controller driver on the new
    SAM9X60 SoC"

    * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    irqchip/sifive-plic: Switch to fasteoi flow
    irqchip/gic-v3: Fix GIC_LINE_NR accessor
    irqchip/atmel-aic5: Add support for sam9x60 irqchip
    irqchip/al-fic: Add support for irq retrigger

    Linus Torvalds
     
  • Pull networking fixes from David Miller:
    "I was battling a cold after some recent trips, so quite a bit piled up
    meanwhile, sorry about that.

    Highlights:

    1) Fix fd leak in various bpf selftests, from Brian Vazquez.

    2) Fix crash in xsk when device doesn't support some methods, from
    Magnus Karlsson.

    3) Fix various leaks and use-after-free in rxrpc, from David Howells.

    4) Fix several SKB leaks due to confusion of who owns an SKB and who
    should release it in the llc code. From Eric Biggers.

    5) Kill a bunc of KCSAN warnings in TCP, from Eric Dumazet.

    6) Jumbo packets don't work after resume on r8169, as the BIOS resets
    the chip into non-jumbo mode during suspend. From Heiner Kallweit.

    7) Corrupt L2 header during MPLS push, from Davide Caratti.

    8) Prevent possible infinite loop in tc_ctl_action, from Eric
    Dumazet.

    9) Get register bits right in bcmgenet driver, based upon chip
    version. From Florian Fainelli.

    10) Fix mutex problems in microchip DSA driver, from Marek Vasut.

    11) Cure race between route lookup and invalidation in ipv4, from Wei
    Wang.

    12) Fix performance regression due to false sharing in 'net'
    structure, from Eric Dumazet"

    * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (145 commits)
    net: reorder 'struct net' fields to avoid false sharing
    net: dsa: fix switch tree list
    net: ethernet: dwmac-sun8i: show message only when switching to promisc
    net: aquantia: add an error handling in aq_nic_set_multicast_list
    net: netem: correct the parent's backlog when corrupted packet was dropped
    net: netem: fix error path for corrupted GSO frames
    macb: propagate errors when getting optional clocks
    xen/netback: fix error path of xenvif_connect_data()
    net: hns3: fix mis-counting IRQ vector numbers issue
    net: usb: lan78xx: Connect PHY before registering MAC
    vsock/virtio: discard packets if credit is not respected
    vsock/virtio: send a credit update when buffer size is changed
    mlxsw: spectrum_trap: Push Ethernet header before reporting trap
    net: ensure correct skb->tstamp in various fragmenters
    net: bcmgenet: reset 40nm EPHY on energy detect
    net: bcmgenet: soft reset 40nm EPHYs before MAC init
    net: phy: bcm7xxx: define soft_reset for 40nm EPHY
    net: bcmgenet: don't set phydev->link from MAC
    net: Update address for MediaTek ethernet driver in MAINTAINERS
    ipv4: fix race condition between route lookup and invalidation
    ...

    Linus Torvalds
     

18 Oct, 2019

2 commits

  • Pull arm64 fixes from Will Deacon:
    "The main thing here is a long-awaited workaround for a CPU erratum on
    ThunderX2 which we have developed in conjunction with engineers from
    Cavium/Marvell.

    At the moment, the workaround is unconditionally enabled for affected
    CPUs at runtime but we may add a command-line option to disable it in
    future if performance numbers show up indicating a significant cost
    for real workloads.

    Summary:

    - Work around Cavium/Marvell ThunderX2 erratum #219

    - Fix regression in mlock() ABI caused by sign-extension of TTBR1 addresses

    - More fixes to the spurious kernel fault detection logic

    - Fix pathological preemption race when enabling some CPU features at boot

    - Drop broken kcore macros in favour of generic implementations

    - Fix userspace view of ID_AA64ZFR0_EL1 when SVE is disabled

    - Avoid NULL dereference on allocation failure during hibernation"

    * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
    arm64: tags: Preserve tags for addresses translated via TTBR1
    arm64: mm: fix inverted PAR_EL1.F check
    arm64: sysreg: fix incorrect definition of SYS_PAR_EL1_F
    arm64: entry.S: Do not preempt from IRQ before all cpufeatures are enabled
    arm64: hibernate: check pgd table allocation
    arm64: cpufeature: Treat ID_AA64ZFR0_EL1 as RAZ when SVE is not enabled
    arm64: Fix kcore macros after 52-bit virtual addressing fallout
    arm64: Allow CAVIUM_TX2_ERRATUM_219 to be selected
    arm64: Avoid Cavium TX2 erratum 219 when switching TTBR
    arm64: Enable workaround for Cavium TX2 erratum 219 when running SMT
    arm64: KVM: Trap VM ops when ARM64_WORKAROUND_CAVIUM_TX2_219_TVM is set

    Linus Torvalds
     
  • Workaround for Cavium/Marvell ThunderX2 erratum #219.

    * errata/tx2-219:
    arm64: Allow CAVIUM_TX2_ERRATUM_219 to be selected
    arm64: Avoid Cavium TX2 erratum 219 when switching TTBR
    arm64: Enable workaround for Cavium TX2 erratum 219 when running SMT
    arm64: KVM: Trap VM ops when ARM64_WORKAROUND_CAVIUM_TX2_219_TVM is set

    Will Deacon
     

16 Oct, 2019

1 commit

  • Rename SD3 functions and groups to EMMC to better reflect their intended
    use before the binding escapes too far into the wild. Also clean up the
    SD3 pin groups to eliminate some silliness that slipped through the
    cracks (SD3DAT[4-7]) by unifying them into three new groups: EMMCG1,
    EMMCG4 and EMMCG8 for 1, 4 and 8-bit data buses respectively.

    Signed-off-by: Andrew Jeffery
    Link: https://lore.kernel.org/r/20191008044153.12734-2-andrew@aj.id.au
    Reviewed-by: Rob Herring
    Reviewed-by: Joel Stanley
    Signed-off-by: Linus Walleij

    Andrew Jeffery
     

15 Oct, 2019

2 commits

  • Commit 8974558f49a6 ("mm, page_owner, debug_pagealloc: save and dump
    freeing stack trace") enhanced page_owner to also store freeing stack
    trace, when debug_pagealloc is also enabled. KASAN would also like to
    do this [1] to improve error reports to debug e.g. UAF issues.

    Kirill has suggested that the freeing stack trace saving should be also
    possible to be enabled separately from KASAN or debug_pagealloc, i.e.
    with an extra boot option. Qian argued that we have enough options
    already, and avoiding the extra overhead is not worth the complications
    in the case of a debugging option. Kirill noted that the extra stack
    handle in struct page_owner requires 0.1% of memory.

    This patch therefore enables free stack saving whenever page_owner is
    enabled, regardless of whether debug_pagealloc or KASAN is also enabled.
    KASAN kernels booted with page_owner=on will thus benefit from the
    improved error reports.

    [1] https://bugzilla.kernel.org/show_bug.cgi?id=203967

    [vbabka@suse.cz: v3]
    Link: http://lkml.kernel.org/r/20191007091808.7096-3-vbabka@suse.cz
    Link: http://lkml.kernel.org/r/20190930122916.14969-3-vbabka@suse.cz
    Signed-off-by: Vlastimil Babka
    Reviewed-by: Qian Cai
    Suggested-by: Dmitry Vyukov
    Suggested-by: Walter Wu
    Suggested-by: Andrey Ryabinin
    Suggested-by: Kirill A. Shutemov
    Suggested-by: Qian Cai
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     
  • …/maz/arm-platforms into irq/urgent

    Pull irqchip fixes from Marc Zyngier:

    - Add retrigger support to Amazon's al-fic driver
    - Add SAM9X60 support to Atmel's AIC5 irqchip
    - Fix GICv3 maximum interrupt calculation
    - Convert SiFive's PLIC to the fasteoi IRQ flow

    Thomas Gleixner
     

13 Oct, 2019

4 commits

  • Pull hwmon fixes from Guenter Roeck:

    - Update/fix inspur-ipsps1 and k10temp Documentation

    - Fix nct7904 driver

    - Fix HWMON_P_MIN_ALARM mask in hwmon core

    * tag 'hwmon-for-v5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
    hwmon: docs: Extend inspur-ipsps1 title underline
    hwmon: (nct7904) Add array fan_alarm and vsen_alarm to store the alarms in nct7904_data struct.
    docs: hwmon: Include 'inspur-ipsps1.rst' into docs
    hwmon: Fix HWMON_P_MIN_ALARM mask
    hwmon: (k10temp) Update documentation and add temp2_input info
    hwmon: (nct7904) Fix the incorrect value of vsen_mask in nct7904_data struct

    Linus Torvalds
     
  • Pull tty/serial driver fixes from Greg KH:
    "Here are some small tty and serial driver fixes for 5.4-rc3 that
    resolve a number of reported issues and regressions.

    None of these are huge, full details are in the shortlog. There's also
    a MAINTAINERS update that I think you might have already taken in your
    tree already, but git should handle that merge easily.

    All have been in linux-next with no reported issues"

    * tag 'tty-5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
    MAINTAINERS: kgdb: Add myself as a reviewer for kgdb/kdb
    tty: serial: imx: Use platform_get_irq_optional() for optional IRQs
    serial: fix kernel-doc warning in comments
    serial: 8250_omap: Fix gpio check for auto RTS/CTS
    serial: mctrl_gpio: Check for NULL pointer
    tty: serial: fsl_lpuart: Fix lpuart_flush_buffer()
    tty: serial: Fix PORT_LINFLEXUART definition
    tty: n_hdlc: fix build on SPARC
    serial: uartps: Fix uartps_major handling
    serial: uartlite: fix exit path null pointer
    tty: serial: linflexuart: Fix magic SysRq handling
    serial: sh-sci: Use platform_get_irq_optional() for optional interrupts
    dt-bindings: serial: sh-sci: Document r8a774b1 bindings
    serial/sifive: select SERIAL_EARLYCON
    tty: serial: rda: Fix the link time qualifier of 'rda_uart_exit()'
    tty: serial: owl: Fix the link time qualifier of 'owl_uart_exit()'

    Linus Torvalds
     
  • Pull USB fixes from Greg KH:
    "Here are a lot of small USB driver fixes for 5.4-rc3.

    syzbot has stepped up its testing of the USB driver stack, now able to
    trigger fun race conditions between disconnect and probe functions.
    Because of that we have a lot of fixes in here from Johan and others
    fixing these reported issues that have been around since almost all
    time.

    We also are just deleting the rio500 driver, making all of the syzbot
    bugs found in it moot as it turns out no one has been using it for
    years as there is a userspace version that is being used instead.

    There are also a number of other small fixes in here, all resolving
    reported issues or regressions.

    All have been in linux-next without any reported issues"

    * tag 'usb-5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (65 commits)
    USB: yurex: fix NULL-derefs on disconnect
    USB: iowarrior: use pr_err()
    USB: iowarrior: drop redundant iowarrior mutex
    USB: iowarrior: drop redundant disconnect mutex
    USB: iowarrior: fix use-after-free after driver unbind
    USB: iowarrior: fix use-after-free on release
    USB: iowarrior: fix use-after-free on disconnect
    USB: chaoskey: fix use-after-free on release
    USB: adutux: fix use-after-free on release
    USB: ldusb: fix NULL-derefs on driver unbind
    USB: legousbtower: fix use-after-free on release
    usb: cdns3: Fix for incorrect DMA mask.
    usb: cdns3: fix cdns3_core_init_role()
    usb: cdns3: gadget: Fix full-speed mode
    USB: usb-skeleton: drop redundant in-urb check
    USB: usb-skeleton: fix use-after-free after driver unbind
    USB: usb-skeleton: fix NULL-deref on disconnect
    usb:cdns3: Fix for CV CH9 running with g_zero driver.
    usb: dwc3: Remove dev_err() on platform_get_irq() failure
    usb: dwc3: Switch to platform_get_irq_byname_optional()
    ...

    Linus Torvalds
     
  • Pull xen fixes from Juergen Gross:

    - correct panic handling when running as a Xen guest

    - cleanup the Xen grant driver to remove printing a pointer being
    always NULL

    - remove a soon to be wrong call of of_dma_configure()

    * tag 'for-linus-5.4-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
    xen: Stop abusing DT of_dma_configure API
    xen/grant-table: remove unnecessary printing
    x86/xen: Return from panic notifier

    Linus Torvalds
     

12 Oct, 2019

2 commits

  • Pull module fixes from Jessica Yu:
    "Code cleanups and kbuild/namespace related fixups from Masahiro.

    Most importantly, it fixes a namespace-related modpost issue for
    external module builds

    - Fix broken external module builds due to a modpost bug in
    read_dump(), where the namespace was not being strdup'd and
    sym->namespace would be set to bogus data.

    - Various namespace-related kbuild fixes and cleanups thanks to
    Masahiro Yamada"

    * tag 'modules-for-v5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
    doc: move namespaces.rst from kbuild/ to core-api/
    nsdeps: make generated patches independent of locale
    nsdeps: fix hashbang of scripts/nsdeps
    kbuild: fix build error of 'make nsdeps' in clean tree
    module: rename __kstrtab_ns_* to __kstrtabns_* to avoid symbol conflict
    modpost: fix broken sym->namespace for external module builds
    module: swap the order of symbol.namespace
    scripts: add_namespace: Fix coccicheck failed

    Linus Torvalds
     
  • Describe the fallthrough pseudo-keyword.

    Convert the coding-style.rst example to the keyword style.
    Add description and links to deprecated.rst.

    Miguel Ojeda comments on the eventual [[fallthrough]] syntax:
    "Note that C17/C18 does not have [[fallthrough]].

    C++17 introduced it, as it is mentioned above. I would keep the
    __attribute__((fallthrough)) -> [[fallthrough]] change you did,
    though, since that is indeed the standard syntax (given the paragraph
    references C++17).

    I was told by Aaron Ballman (who is proposing them for C) that it is
    more or less likely that it becomes standardized in C2x. However, it
    is still not added to the draft (other attributes are already,
    though). See N2268 and N2269:

    http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2268.pdf (fallthrough)
    http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2269.pdf (attributes in general)"

    Signed-off-by: Joe Perches
    Acked-by: Nick Desaulniers
    Signed-off-by: Linus Torvalds

    Joe Perches
     

11 Oct, 2019

1 commit

  • Commit 8960b38932be ("linux/dim: Rename externally used net_dim
    members") renamed the net_dim API, removing the "net_" prefix from the
    structures and functions. The patch didn't update the net_dim.txt
    documentation file.

    Fix the documentation so that its examples match the current code.

    Fixes: 8960b38932be ("linux/dim: Rename externally used net_dim members", 2019-06-25)
    Fixes: c002bd529d71 ("linux/dim: Rename externally exposed macros", 2019-06-25)
    Fixes: 4f75da3666c0 ("linux/dim: Move implementation to .c files")
    Cc: Tal Gilboa
    Signed-off-by: Jacob Keller
    Signed-off-by: Jakub Kicinski

    Jacob Keller
     

10 Oct, 2019

1 commit

  • Pull arm64 fixes from Will Deacon:
    "A larger-than-usual batch of arm64 fixes for -rc3.

    The bulk of the fixes are dealing with a bunch of issues with the
    build system from the compat vDSO, which unfortunately led to some
    significant Makefile rework to manage the horrible combinations of
    toolchains that we can end up needing to drive simultaneously.

    We came close to disabling the thing entirely, but Vincenzo was quick
    to spin up some patches and I ended up picking up most of the bits
    that were left [*]. Future work will look at disentangling the header
    files properly.

    Other than that, we have some important fixes all over, including one
    papering over the miscompilation fallout from forcing
    CONFIG_OPTIMIZE_INLINING=y, which I'm still unhappy about. Harumph.

    We've still got a couple of open issues, so I'm expecting to have some
    more fixes later this cycle.

    Summary:

    - Numerous fixes to the compat vDSO build system, especially when
    combining gcc and clang

    - Fix parsing of PAR_EL1 in spurious kernel fault detection

    - Partial workaround for Neoverse-N1 erratum #1542419

    - Fix IRQ priority masking on entry from compat syscalls

    - Fix advertisment of FRINT HWCAP to userspace

    - Attempt to workaround inlining breakage with '__always_inline'

    - Fix accidental freeing of parent SVE state on fork() error path

    - Add some missing NULL pointer checks in instruction emulation init

    - Some formatting and comment fixes"

    [*] Will's final fixes were

    Reviewed-by: Vincenzo Frascino
    Tested-by: Vincenzo Frascino

    but they were already in linux-next by then and he didn't rebase
    just to add those.

    * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (21 commits)
    arm64: armv8_deprecated: Checking return value for memory allocation
    arm64: Kconfig: Make CONFIG_COMPAT_VDSO a proper Kconfig option
    arm64: vdso32: Rename COMPATCC to CC_COMPAT
    arm64: vdso32: Pass '--target' option to clang via VDSO_CAFLAGS
    arm64: vdso32: Don't use KBUILD_CPPFLAGS unconditionally
    arm64: vdso32: Move definition of COMPATCC into vdso32/Makefile
    arm64: Default to building compat vDSO with clang when CONFIG_CC_IS_CLANG
    lib: vdso: Remove CROSS_COMPILE_COMPAT_VDSO
    arm64: vdso32: Remove jump label config option in Makefile
    arm64: vdso32: Detect binutils support for dmb ishld
    arm64: vdso: Remove stale files from old assembly implementation
    arm64: vdso32: Fix broken compat vDSO build warnings
    arm64: mm: fix spurious fault detection
    arm64: ftrace: Ensure synchronisation in PLT setup for Neoverse-N1 #1542419
    arm64: Fix incorrect irqflag restore for priority masking for compat
    arm64: mm: avoid virt_to_phys(init_mm.pgd)
    arm64: cpufeature: Effectively expose FRINT capability to userspace
    arm64: Mark functions using explicit register variables as '__always_inline'
    docs: arm64: Fix indentation and doc formatting
    arm64/sve: Fix wrong free for task->thread.sve_state
    ...

    Linus Torvalds
     

09 Oct, 2019

2 commits

  • Fix documentation build warnings for Pensando ionic:

    Documentation/networking/device_drivers/pensando/ionic.rst:39: WARNING: Unexpected indentation.
    Documentation/networking/device_drivers/pensando/ionic.rst:43: WARNING: Unexpected indentation.

    Fixes: df69ba43217d ("ionic: Add basic framework for IONIC Network device driver")
    Signed-off-by: Randy Dunlap
    Acked-by: Shannon Nelson
    Signed-off-by: Jakub Kicinski

    Randy Dunlap
     
  • …/git/shuah/linux-kselftest

    Pull Kselftest fixes from Shuah Khan:
    "Fixes for existing tests and the framework.

    Cristian Marussi's patches add the ability to skip targets (tests) and
    exclude tests that didn't build from run-list. These patches improve
    the Kselftest results. Ability to skip targets helps avoid running
    tests that aren't supported in certain environments. As an example,
    bpf tests from mainline aren't supported on stable kernels and have
    dependency on bleeding edge llvm. Being able to skip bpf on systems
    that can't meet this llvm dependency will be helpful.

    Kselftest can be built and installed from the main Makefile. This
    change help simplify Kselftest use-cases which addresses request from
    users.

    Kees Cook added per test timeout support to limit individual test
    run-time"

    * tag 'linux-kselftest-5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
    selftests: watchdog: Add command line option to show watchdog_info
    selftests: watchdog: Validate optional file argument
    selftests/kselftest/runner.sh: Add 45 second timeout per test
    kselftest: exclude failed TARGETS from runlist
    kselftest: add capability to skip chosen TARGETS
    selftests: Add kselftest-all and kselftest-install targets

    Linus Torvalds
     

08 Oct, 2019

6 commits

  • We discussed a better location for this file, and agreed that
    core-api/ is a good fit. Rename it to symbol-namespaces.rst
    for disambiguation, and also add it to index.rst and MAINTAINERS.

    Signed-off-by: Masahiro Yamada
    Acked-by: Matthias Maennich
    Signed-off-by: Jessica Yu

    Masahiro Yamada
     
  • Allow the user to select the workaround for TX2-219, and update
    the silicon-errata.rst file to reflect this.

    Cc:
    Signed-off-by: Marc Zyngier
    Signed-off-by: Will Deacon

    Marc Zyngier
     
  • Merge misc fixes from Andrew Morton:
    "The usual shower of hotfixes.

    Chris's memcg patches aren't actually fixes - they're mature but a few
    niggling review issues were late to arrive.

    The ocfs2 fixes are quite old - those took some time to get reviewer
    attention.

    Subsystems affected by this patch series: ocfs2, hotfixes, mm/memcg,
    mm/slab-generic"

    * emailed patches from Andrew Morton :
    mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)
    mm, sl[ou]b: improve memory accounting
    mm, memcg: make scan aggression always exclude protection
    mm, memcg: make memory.emin the baseline for utilisation determination
    mm, memcg: proportional memory.{low,min} reclaim
    mm/vmpressure.c: fix a signedness bug in vmpressure_register_event()
    mm/page_alloc.c: fix a crash in free_pages_prepare()
    mm/z3fold.c: claim page in the beginning of free
    kernel/sysctl.c: do not override max_threads provided by userspace
    memcg: only record foreign writebacks with dirty pages when memcg is not disabled
    mm: fix -Wmissing-prototypes warnings
    writeback: fix use-after-free in finish_writeback_work()
    mm/memremap: drop unused SECTION_SIZE and SECTION_MASK
    panic: ensure preemption is disabled during panic()
    fs: ocfs2: fix a possible null-pointer dereference in ocfs2_info_scan_inode_alloc()
    fs: ocfs2: fix a possible null-pointer dereference in ocfs2_write_end_nolock()
    fs: ocfs2: fix possible null-pointer dereferences in ocfs2_xa_prepare_entry()
    ocfs2: clear zero in unaligned direct IO

    Linus Torvalds
     
  • In most configurations, kmalloc() happens to return naturally aligned
    (i.e. aligned to the block size itself) blocks for power of two sizes.

    That means some kmalloc() users might unknowingly rely on that
    alignment, until stuff breaks when the kernel is built with e.g.
    CONFIG_SLUB_DEBUG or CONFIG_SLOB, and blocks stop being aligned. Then
    developers have to devise workaround such as own kmem caches with
    specified alignment [1], which is not always practical, as recently
    evidenced in [2].

    The topic has been discussed at LSF/MM 2019 [3]. Adding a
    'kmalloc_aligned()' variant would not help with code unknowingly relying
    on the implicit alignment. For slab implementations it would either
    require creating more kmalloc caches, or allocate a larger size and only
    give back part of it. That would be wasteful, especially with a generic
    alignment parameter (in contrast with a fixed alignment to size).

    Ideally we should provide to mm users what they need without difficult
    workarounds or own reimplementations, so let's make the kmalloc()
    alignment to size explicitly guaranteed for power-of-two sizes under all
    configurations. What this means for the three available allocators?

    * SLAB object layout happens to be mostly unchanged by the patch. The
    implicitly provided alignment could be compromised with
    CONFIG_DEBUG_SLAB due to redzoning, however SLAB disables redzoning for
    caches with alignment larger than unsigned long long. Practically on at
    least x86 this includes kmalloc caches as they use cache line alignment,
    which is larger than that. Still, this patch ensures alignment on all
    arches and cache sizes.

    * SLUB layout is also unchanged unless redzoning is enabled through
    CONFIG_SLUB_DEBUG and boot parameter for the particular kmalloc cache.
    With this patch, explicit alignment is guaranteed with redzoning as
    well. This will result in more memory being wasted, but that should be
    acceptable in a debugging scenario.

    * SLOB has no implicit alignment so this patch adds it explicitly for
    kmalloc(). The potential downside is increased fragmentation. While
    pathological allocation scenarios are certainly possible, in my testing,
    after booting a x86_64 kernel+userspace with virtme, around 16MB memory
    was consumed by slab pages both before and after the patch, with
    difference in the noise.

    [1] https://lore.kernel.org/linux-btrfs/c3157c8e8e0e7588312b40c853f65c02fe6c957a.1566399731.git.christophe.leroy@c-s.fr/
    [2] https://lore.kernel.org/linux-fsdevel/20190225040904.5557-1-ming.lei@redhat.com/
    [3] https://lwn.net/Articles/787740/

    [akpm@linux-foundation.org: documentation fixlet, per Matthew]
    Link: http://lkml.kernel.org/r/20190826111627.7505-3-vbabka@suse.cz
    Signed-off-by: Vlastimil Babka
    Reviewed-by: Matthew Wilcox (Oracle)
    Acked-by: Michal Hocko
    Acked-by: Kirill A. Shutemov
    Acked-by: Christoph Hellwig
    Cc: David Sterba
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Ming Lei
    Cc: Dave Chinner
    Cc: "Darrick J . Wong"
    Cc: Christoph Hellwig
    Cc: James Bottomley
    Cc: Vlastimil Babka
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     
  • cgroup v2 introduces two memory protection thresholds: memory.low
    (best-effort) and memory.min (hard protection). While they generally do
    what they say on the tin, there is a limitation in their implementation
    that makes them difficult to use effectively: that cliff behaviour often
    manifests when they become eligible for reclaim. This patch implements
    more intuitive and usable behaviour, where we gradually mount more
    reclaim pressure as cgroups further and further exceed their protection
    thresholds.

    This cliff edge behaviour happens because we only choose whether or not
    to reclaim based on whether the memcg is within its protection limits
    (see the use of mem_cgroup_protected in shrink_node), but we don't vary
    our reclaim behaviour based on this information. Imagine the following
    timeline, with the numbers the lruvec size in this zone:

    1. memory.low=1000000, memory.current=999999. 0 pages may be scanned.
    2. memory.low=1000000, memory.current=1000000. 0 pages may be scanned.
    3. memory.low=1000000, memory.current=1000001. 1000001* pages may be
    scanned. (?!)

    * Of course, we won't usually scan all available pages in the zone even
    without this patch because of scan control priority, over-reclaim
    protection, etc. However, as shown by the tests at the end, these
    techniques don't sufficiently throttle such an extreme change in input,
    so cliff-like behaviour isn't really averted by their existence alone.

    Here's an example of how this plays out in practice. At Facebook, we are
    trying to protect various workloads from "system" software, like
    configuration management tools, metric collectors, etc (see this[0] case
    study). In order to find a suitable memory.low value, we start by
    determining the expected memory range within which the workload will be
    comfortable operating. This isn't an exact science -- memory usage deemed
    "comfortable" will vary over time due to user behaviour, differences in
    composition of work, etc, etc. As such we need to ballpark memory.low,
    but doing this is currently problematic:

    1. If we end up setting it too low for the workload, it won't have
    *any* effect (see discussion above). The group will receive the full
    weight of reclaim and won't have any priority while competing with the
    less important system software, as if we had no memory.low configured
    at all.

    2. Because of this behaviour, we end up erring on the side of setting
    it too high, such that the comfort range is reliably covered. However,
    protected memory is completely unavailable to the rest of the system,
    so we might cause undue memory and IO pressure there when we *know* we
    have some elasticity in the workload.

    3. Even if we get the value totally right, smack in the middle of the
    comfort zone, we get extreme jumps between no pressure and full
    pressure that cause unpredictable pressure spikes in the workload due
    to the current binary reclaim behaviour.

    With this patch, we can set it to our ballpark estimation without too much
    worry. Any undesirable behaviour, such as too much or too little reclaim
    pressure on the workload or system will be proportional to how far our
    estimation is off. This means we can set memory.low much more
    conservatively and thus waste less resources *without* the risk of the
    workload falling off a cliff if we overshoot.

    As a more abstract technical description, this unintuitive behaviour
    results in having to give high-priority workloads a large protection
    buffer on top of their expected usage to function reliably, as otherwise
    we have abrupt periods of dramatically increased memory pressure which
    hamper performance. Having to set these thresholds so high wastes
    resources and generally works against the principle of work conservation.
    In addition, having proportional memory reclaim behaviour has other
    benefits. Most notably, before this patch it's basically mandatory to set
    memory.low to a higher than desirable value because otherwise as soon as
    you exceed memory.low, all protection is lost, and all pages are eligible
    to scan again. By contrast, having a gradual ramp in reclaim pressure
    means that you now still get some protection when thresholds are exceeded,
    which means that one can now be more comfortable setting memory.low to
    lower values without worrying that all protection will be lost. This is
    important because workingset size is really hard to know exactly,
    especially with variable workloads, so at least getting *some* protection
    if your workingset size grows larger than you expect increases user
    confidence in setting memory.low without a huge buffer on top being
    needed.

    Thanks a lot to Johannes Weiner and Tejun Heo for their advice and
    assistance in thinking about how to make this work better.

    In testing these changes, I intended to verify that:

    1. Changes in page scanning become gradual and proportional instead of
    binary.

    To test this, I experimented stepping further and further down
    memory.low protection on a workload that floats around 19G workingset
    when under memory.low protection, watching page scan rates for the
    workload cgroup:

    +------------+-----------------+--------------------+--------------+
    | memory.low | test (pgscan/s) | control (pgscan/s) | % of control |
    +------------+-----------------+--------------------+--------------+
    | 21G | 0 | 0 | N/A |
    | 17G | 867 | 3799 | 23% |
    | 12G | 1203 | 3543 | 34% |
    | 8G | 2534 | 3979 | 64% |
    | 4G | 3980 | 4147 | 96% |
    | 0 | 3799 | 3980 | 95% |
    +------------+-----------------+--------------------+--------------+

    As you can see, the test kernel (with a kernel containing this
    patch) ramps up page scanning significantly more gradually than the
    control kernel (without this patch).

    2. More gradual ramp up in reclaim aggression doesn't result in
    premature OOMs.

    To test this, I wrote a script that slowly increments the number of
    pages held by stress(1)'s --vm-keep mode until a production system
    entered severe overall memory contention. This script runs in a highly
    protected slice taking up the majority of available system memory.
    Watching vmstat revealed that page scanning continued essentially
    nominally between test and control, without causing forward reclaim
    progress to become arrested.

    [0]: https://facebookmicrosites.github.io/cgroup2/docs/overview.html#case-study-the-fbtax2-project

    [akpm@linux-foundation.org: reflow block comments to fit in 80 cols]
    [chris@chrisdown.name: handle cgroup_disable=memory when getting memcg protection]
    Link: http://lkml.kernel.org/r/20190201045711.GA18302@chrisdown.name
    Link: http://lkml.kernel.org/r/20190124014455.GA6396@chrisdown.name
    Signed-off-by: Chris Down
    Acked-by: Johannes Weiner
    Reviewed-by: Roman Gushchin
    Cc: Michal Hocko
    Cc: Tejun Heo
    Cc: Dennis Zhou
    Cc: Tetsuo Handa
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chris Down
     
  • Currently execution of panic() continues until Xen's panic notifier
    (xen_panic_event()) is called at which point we make a hypercall that
    never returns.

    This means that any notifier that is supposed to be called later as
    well as significant part of panic() code (such as pstore writes from
    kmsg_dump()) is never executed.

    There is no reason for xen_panic_event() to be this last point in
    execution since panic()'s emergency_restart() will call into
    xen_emergency_restart() from where we can perform our hypercall.

    Nevertheless, we will provide xen_legacy_crash boot option that will
    preserve original behavior during crash. This option could be used,
    for example, if running kernel dumper (which happens after panic
    notifiers) is undesirable.

    Reported-by: James Dingwall
    Signed-off-by: Boris Ostrovsky
    Reviewed-by: Juergen Gross

    Boris Ostrovsky
     

07 Oct, 2019

1 commit