29 Aug, 2018

1 commit

  • Using a private template is problematic:

    1. We can't assign both a zone and a timeout policy
    (zone assigns a conntrack template, so we hit problem 1)
    2. Using a template needs to take care of ct refcount, else we'll
    eventually free the private template due to ->use underflow.

    This patch reworks template policy to instead work with existing conntrack.

    As long as such conntrack has not yet been placed into the hash table
    (unconfirmed) we can still add the timeout extension.

    The only caveat is that we now need to update/correct ct->timeout to
    reflect the initial/new state, otherwise the conntrack entry retains the
    default 'new' timeout.

    Side effect of this change is that setting the policy must
    now occur from chains that are evaluated *after* the conntrack lookup
    has taken place.

    No released kernel contains the timeout policy feature yet, so this change
    should be ok.

    Changes since v2:
    - don't handle 'ct is confirmed case'
    - after previous patch, no need to special-case tcp/dccp/sctp timeout
    anymore

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

23 Aug, 2018

1 commit

  • The new tcf_exts_for_each_action() macro doesn't reference its
    arguments when CONFIG_NET_CLS_ACT is disabled, which leads to
    a harmless warning in at least one driver:

    drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c: In function 'tc_fill_actions':
    drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c:64:6: error: unused variable 'i' [-Werror=unused-variable]

    Adding a cast to void lets us avoid this kind of warning.
    To be on the safe side, do it for all three arguments, not
    just the one that caused the warning.

    Fixes: 244cd96adb5f ("net_sched: remove list_head from tc_action")
    Signed-off-by: Arnd Bergmann
    Signed-off-by: David S. Miller

    Arnd Bergmann
     

22 Aug, 2018

4 commits

  • Cc: Jamal Hadi Salim
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     
  • After commit 90b73b77d08e, list_head is no longer needed.
    Now we just need to convert the list iteration to array
    iteration for drivers.

    Fixes: 90b73b77d08e ("net: sched: change action API to use array of pointers to actions")
    Cc: Jiri Pirko
    Cc: Vlad Buslov
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     
  • tcf_idr_check() is replaced by tcf_idr_check_alloc(),
    and __tcf_idr_check() now can be folded into tcf_idr_search().

    Fixes: 0190c1d452a9 ("net: sched: atomically check-allocate action")
    Cc: Jiri Pirko
    Cc: Vlad Buslov
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     
  • All ops->delete() wants is getting the tn->idrinfo, but we already
    have tc_action before calling ops->delete(), and tc_action has
    a pointer ->idrinfo.

    More importantly, each type of action does the same thing, that is,
    just calling tcf_idr_delete_index().

    So it can be just removed.

    Fixes: b409074e6693 ("net: sched: add 'delete' function to action ops")
    Cc: Jiri Pirko
    Cc: Vlad Buslov
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     

20 Aug, 2018

3 commits

  • Pull networking fixes from David Miller:

    1) Fix races in IPVS, from Tan Hu.

    2) Missing unbind in matchall classifier, from Hangbin Liu.

    3) Missing act_ife action release, from Vlad Buslov.

    4) Cure lockdep splats in ila, from Cong Wang.

    5) veth queue leak on link delete, from Toshiaki Makita.

    6) Disable isdn's IIOCDBGVAR ioctl, it exposes kernel addresses. From
    Kees Cook.

    7) RCU usage fixup in XDP, from Tariq Toukan.

    8) Two TCP ULP fixes from Daniel Borkmann.

    9) r8169 needs REALTEK_PHY as a Kconfig dependency, from Heiner
    Kallweit.

    10) Always take tcf_lock with BH disabled, otherwise we can deadlock
    with rate estimator code paths. From Vlad Buslov.

    11) Don't use MSI-X on RTL8106e r8169 chips, they don't resume properly.
    From Jian-Hong Pan.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (41 commits)
    ip6_vti: fix creating fallback tunnel device for vti6
    ip_vti: fix a null pointer deferrence when create vti fallback tunnel
    r8169: don't use MSI-X on RTL8106e
    net: lan743x_ptp: convert to ktime_get_clocktai_ts64
    net: sched: always disable bh when taking tcf_lock
    ip6_vti: simplify stats handling in vti6_xmit
    bpf: fix redirect to map under tail calls
    r8169: add missing Kconfig dependency
    tools/bpf: fix bpf selftest test_cgroup_storage failure
    bpf, sockmap: fix sock_map_ctx_update_elem race with exist/noexist
    bpf, sockmap: fix map elem deletion race with smap_stop_sock
    bpf, sockmap: fix leakage of smap_psock_map_entry
    tcp, ulp: fix leftover icsk_ulp_ops preventing sock from reattach
    tcp, ulp: add alias for all ulp modules
    bpf: fix a rcu usage warning in bpf_prog_array_copy_core()
    samples/bpf: all XDP samples should unload xdp/bpf prog on SIGTERM
    net/xdp: Fix suspicious RCU usage warning
    net/mlx5e: Delete unneeded function argument
    Documentation: networking: ti-cpsw: correct cbs parameters for Eth1 100Mb
    isdn: Disable IIOCDBGVAR
    ...

    Linus Torvalds
     
  • Pull first set of KVM updates from Paolo Bonzini:
    "PPC:
    - minor code cleanups

    x86:
    - PCID emulation and CR3 caching for shadow page tables
    - nested VMX live migration
    - nested VMCS shadowing
    - optimized IPI hypercall
    - some optimizations

    ARM will come next week"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (85 commits)
    kvm: x86: Set highest physical address bits in non-present/reserved SPTEs
    KVM/x86: Use CC_SET()/CC_OUT in arch/x86/kvm/vmx.c
    KVM: X86: Implement PV IPIs in linux guest
    KVM: X86: Add kvm hypervisor init time platform setup callback
    KVM: X86: Implement "send IPI" hypercall
    KVM/x86: Move X86_CR4_OSXSAVE check into kvm_valid_sregs()
    KVM: x86: Skip pae_root shadow allocation if tdp enabled
    KVM/MMU: Combine flushing remote tlb in mmu_set_spte()
    KVM: vmx: skip VMWRITE of HOST_{FS,GS}_BASE when possible
    KVM: vmx: skip VMWRITE of HOST_{FS,GS}_SEL when possible
    KVM: vmx: always initialize HOST_{FS,GS}_BASE to zero during setup
    KVM: vmx: move struct host_state usage to struct loaded_vmcs
    KVM: vmx: compute need to reload FS/GS/LDT on demand
    KVM: nVMX: remove a misleading comment regarding vmcs02 fields
    KVM: vmx: rename __vmx_load_host_state() and vmx_save_host_state()
    KVM: vmx: add dedicated utility to access guest's kernel_gs_base
    KVM: vmx: track host_state.loaded using a loaded_vmcs pointer
    KVM: vmx: refactor segmentation code in vmx_save_host_state()
    kvm: nVMX: Fix fault priority for VMX operations
    kvm: nVMX: Fix fault vector for VMX operation at CPL > 0
    ...

    Linus Torvalds
     
  • …l/git/palmer/riscv-linux

    Pull RISC-V updates from Palmer Dabbelt:
    "This contains some major improvements to the RISC-V port, including
    the necessary interrupt controller and timer support to actually make
    it to userspace. Support for three devices has been added:

    - the ISA-mandated timers on RISC-V systems.

    - the ISA-mandated first-level interrupt controller on RISC-V
    systems, which is handled as part of our core arch code because
    it's very small and tightly tied to the ISA.

    - SiFive's platform-level interrupt controller, which talks to the
    actual devices.

    In addition to these new devices, there are a handful of cleanups all
    over the RISC-V tree:

    - build fixes for various configurations:
    * A fix to the vDSO build's makefile so it respects CFLAGS.
    * The addition of __lshrti3, a libgcc derived function necessary
    for some 32-bit configurations.
    * !SMP && PERF_EVENTS

    - Cleanups to the arch code to remove the remnants of old versions of
    the drivers that were just properly submitted.
    * Some dead code from the timer driver, most of which wasn't ever
    even compiled.
    * Cleanups of some interrupt #defines, which are now local to the
    interrupt handling code.

    - Fixes to ptrace(), which while not being sufficient to fully make
    GDB work are at least sufficient to get simple GDB tasks to work.

    - Early printk support via RISC-V's architecturally mandated SBI
    console device.

    - A fix to our early debug trap handler to ensure it's always
    aligned.

    These patches have all been through a fairly extensive review process,
    but as this enables a whole pile of functionality (ie, userspace) I'm
    confident we'll need to submit a few more patches. The only concrete
    issues I know about are the sys_riscv_flush_icache patches, but as I
    managed to screw those up on Friday I figured it'd be best to let them
    bake another week.

    This tag boots a Fedora root filesystem on QEMU's master branch for
    me, and before this morning's rebase (from 4.18-rc8 to 4.18) it booted
    on the HiFive Unleashed.

    Thanks to Christoph Hellwig and the other guys at WD for getting the
    new drivers in shape!"

    * tag 'riscv-for-linus-4.19-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux:
    dt-bindings: interrupt-controller: SiFive Plaform Level Interrupt Controller
    dt-bindings: interrupt-controller: RISC-V local interrupt controller
    RISC-V: Fix !CONFIG_SMP compilation error
    irqchip: add a SiFive PLIC driver
    RISC-V: Add the directive for alignment of stvec's value
    clocksource: new RISC-V SBI timer driver
    RISC-V: implement low-level interrupt handling
    RISC-V: add a definition for the SIE SEIE bit
    RISC-V: remove INTERRUPT_CAUSE_* defines from asm/irq.h
    RISC-V: simplify software interrupt / IPI code
    RISC-V: remove timer leftovers
    RISC-V: Add early printk support via the SBI console
    RISC-V: Don't increment sepc after breakpoint.
    RISC-V: implement __lshrti3.
    RISC-V: Use KBUILD_CFLAGS instead of KCFLAGS when building the vDSO

    Linus Torvalds
     

19 Aug, 2018

14 commits

  • Pull input updates from Dmitry Torokhov:

    - a new driver for Rohm BU21029 touch controller

    - new bitmap APIs: bitmap_alloc, bitmap_zalloc and bitmap_free

    - updates to Atmel, eeti. pxrc and iforce drivers

    - assorted driver cleanups and fixes.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (57 commits)
    MAINTAINERS: Add PhoenixRC Flight Controller Adapter
    Input: do not use WARN() in input_alloc_absinfo()
    Input: mark expected switch fall-throughs
    Input: raydium_i2c_ts - use true and false for boolean values
    Input: evdev - switch to bitmap API
    Input: gpio-keys - switch to bitmap_zalloc()
    Input: elan_i2c_smbus - cast sizeof to int for comparison
    bitmap: Add bitmap_alloc(), bitmap_zalloc() and bitmap_free()
    md: Avoid namespace collision with bitmap API
    dm: Avoid namespace collision with bitmap API
    Input: pm8941-pwrkey - add resin entry
    Input: pm8941-pwrkey - abstract register offsets and event code
    Input: iforce - reorganize joystick configuration lists
    Input: atmel_mxt_ts - move completion to after config crc is updated
    Input: atmel_mxt_ts - don't report zero pressure from T9
    Input: atmel_mxt_ts - zero terminate config firmware file
    Input: atmel_mxt_ts - refactor config update code to add context struct
    Input: atmel_mxt_ts - config CRC may start at T71
    Input: atmel_mxt_ts - remove unnecessary debug on ENOMEM
    Input: atmel_mxt_ts - remove duplicate setup of ABS_MT_PRESSURE
    ...

    Linus Torvalds
     
  • Pull hwspinlock updates from Bjorn Andersson:
    "This introduces devres helpers and an API to request a lock by name,
    then migrates the sprd SPI driver to use these"

    * tag 'hwlock-v4.19' of git://github.com/andersson/remoteproc:
    hwspinlock: Fix incorrect return pointers
    spi: sprd: Change to use devm_hwspin_lock_request_specific()
    spi: sprd: Replace of_hwspin_lock_get_id() with of_hwspin_lock_get_id_byname()
    hwspinlock: Fix one comment mistake
    hwspinlock: Remove redundant config
    hwspinlock: Add devm_xxx() APIs to register/unregister one hwlock controller
    hwspinlock: Add devm_xxx() APIs to request/free hwlock
    hwspinlock: Add one new API to support getting a specific hwlock by the name

    Linus Torvalds
     
  • Pull remoteproc updates from Bjorn Andersson:
    "This adds support for pre-start and post-shutdown hooks for remoteproc
    subdevices, refactors the Qualcomm Hexagon support to allow reuse
    between several drivers, makes authentication in the MDT file loader
    optional, migrates a few format strings to use %pK and migrates the
    Davinci driver to use the reset framework"

    * tag 'rproc-v4.19' of git://github.com/andersson/remoteproc:
    remoteproc/davinci: use the reset framework
    remoteproc/davinci: Mark error recovery as disabled
    remoteproc: st_slim: replace "%p" with "%pK"
    remoteproc: replace "%p" with "%pK"
    remoteproc: qcom: fix Q6V5_WCSS dependencies
    remoteproc: Reset table_ptr in rproc_start() failure paths
    remoteproc: qcom: q6v5-pil: fix modem hang on SDM845 after axis2 clk unvote
    remoteproc: qcom q6v5: fix modular build
    remoteproc: Introduce prepare and unprepare for subdevices
    remoteproc: rename subdev probe and remove functions
    remoteproc: Make client initialize ops in rproc_subdev
    remoteproc: Make start and stop in subdev optional
    remoteproc: Rename subdev functions to start/stop
    remoteproc: qcom: Introduce Hexagon V5 based WCSS driver
    remoteproc: qcom: q6v5-pil: Use common q6v5 helpers
    remoteproc: qcom: adsp: Use common q6v5 helpers
    remoteproc: q6v5: Extract common resource handling
    remoteproc: qcom: mdt_loader: Make the firmware authentication optional

    Linus Torvalds
     
  • Pull DMAengine updates from Vinod Koul:
    "This round brings couple of framework changes, a new driver and usual
    driver updates:

    - new managed helper for dmaengine framework registration

    - split dmaengine pause capability to pause and resume and allow
    drivers to report that individually

    - update dma_request_chan_by_mask() to handle deferred probing

    - move imx-sdma to use virt-dma

    - new driver for Actions Semi Owl family S900 controller

    - minor updates to intel, renesas, mv_xor, pl330 etc"

    * tag 'dmaengine-4.19-rc1' of git://git.infradead.org/users/vkoul/slave-dma: (46 commits)
    dmaengine: Add Actions Semi Owl family S900 DMA driver
    dt-bindings: dmaengine: Add binding for Actions Semi Owl SoCs
    dmaengine: sh: rcar-dmac: Should not stop the DMAC by rcar_dmac_sync_tcr()
    dmaengine: mic_x100_dma: use the new helper to simplify the code
    dmaengine: add a new helper dmaenginem_async_device_register
    dmaengine: imx-sdma: add memcpy interface
    dmaengine: imx-sdma: add SDMA_BD_MAX_CNT to replace '0xffff'
    dmaengine: dma_request_chan_by_mask() to handle deferred probing
    dmaengine: pl330: fix irq race with terminate_all
    dmaengine: Revert "dmaengine: mv_xor_v2: enable COMPILE_TEST"
    dmaengine: mv_xor_v2: use {lower,upper}_32_bits to configure HW descriptor address
    dmaengine: mv_xor_v2: enable COMPILE_TEST
    dmaengine: mv_xor_v2: move unmap to before callback
    dmaengine: mv_xor_v2: convert callback to helper function
    dmaengine: mv_xor_v2: kill the tasklets upon exit
    dmaengine: mv_xor_v2: explicitly freeup irq
    dmaengine: sh: rcar-dmac: Add dma_pause operation
    dmaengine: sh: rcar-dmac: add a new function to clear CHCR.DE with barrier
    dmaengine: idma64: Support dmaengine_terminate_sync()
    dmaengine: hsu: Support dmaengine_terminate_sync()
    ...

    Linus Torvalds
     
  • Pull MMC updates from Ulf Hansson:
    "Updates for MMC for v4.19.

    MMC core:
    - Add some fine-grained hooks to further support HS400 tuning
    - Improve error path for bus width setting for HS400es
    - Use a common method when checking R1 status

    MMC host:
    - renesas_sdhi: Add r8a77990 support
    - renesas_sdhi: Add eMMC HS400 mode support
    - tmio/renesas_sdhi: Improve tuning/clock management
    - tmio: Add eMMC HS400 mode support
    - sunxi: Add support for 3.3V eMMC DDR mode
    - mmci: Initial support to manage variant specific callbacks
    - sdhci: Don't try 3.3V I/O voltage if not supported
    - sdhci-pci-dwc-mshc: Add driver to support Synopsys dwc mshc SDHCI PCI
    - sdhci-of-dwcmshc: Add driver to support Synopsys DWC MSHC SDHCI
    - sdhci-msm: Add support for new version sdcc V5
    - sdhci-pci-o2micro: Add support for O2 eMMC HS200 mode
    - sdhci-pci-o2micro: Add support for O2 hardware tuning
    - sdhci-pci-o2micro: Add MSI interrupt support for O2 SD host
    - sdhci-pci: Add support for Intel ICP
    - sdhci-tegra: Prevent ACMD23 and HS200 mode on Tegra 3
    - sdhci-tegra: Fix eMMC DDR52 mode
    - sdhci-tegra: Improve clock management
    - dw_mmc-rockchip: Document compatible string for px30
    - sdhci-esdhc-imx: Add support for 3.3V eMMC DDR mode
    - sdhci-of-esdhc: Set proper DMA mask for ls104x chips
    - sdhci-of-esdhc: Improve clock management
    - sdhci-of-arasan: Add a quirk to manage unstable clocks
    - dw_mmc-exynos: Address potential external abort during system resume
    - pxamci: Add support for common MMC DT bindings
    - pxamci: Several cleanups and improvements
    - pxamci: Merge immutable branch for pxa to switch to DMA slave maps"

    * tag 'mmc-v4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: (56 commits)
    mmc: core: improve reasonableness of bus width setting for HS400es
    mmc: tmio: remove unneeded variable in tmio_mmc_start_command()
    mmc: renesas_sdhi: Fix sampling clock position selecting
    mmc: tmio: Fix tuning flow
    mmc: sunxi: remove output of virtual base address
    dt-bindings: mmc: rockchip-dw-mshc: add description for px30
    mmc: renesas_sdhi: Add r8a77990 support
    mmc: sunxi: allow 3.3V DDR when DDR is available
    mmc: mmci: Add and implement a ->dma_setup() callback for qcom dml
    mmc: mmci: Initial support to manage variant specific callbacks
    mmc: tegra: Force correct divider calculation on DDR50/52
    mmc: sdhci: Add MSI interrupt support for O2 SD host
    mmc: sdhci: Add support for O2 hardware tuning
    mmc: sdhci: Export sdhci tuning function symbol
    mmc: sdhci: Change O2 Host HS200 mode clock frequency to 200MHz
    mmc: sdhci: Add support for O2 eMMC HS200 mode
    mmc: tegra: Add and use tegra_sdhci_get_max_clock()
    mmc: sdhci-esdhc-imx: fix indent
    mmc: sdhci-esdhc-imx: disable clocks before changing frequency
    mmc: tegra: prevent ACMD23 on Tegra 3
    ...

    Linus Torvalds
     
  • This function was created as a deprecated fallback case back in 2010 by
    commit eb14120f743d ("pcmcia: re-work pcmcia_request_irq()") for legacy
    cases.

    Actual in-kernel users haven't been around for a long while. The last
    in-kernel user was apparently removed four years ago by commit
    5f5316fcd08e ("am2150: Update nmclan_cs.c to use update PCMCIA API").

    Just remove it entirely.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • We haven't had lots of deprecation warnings lately, but the rdma use of
    it made them flare up again.

    They are not useful. They annoy everybody, and nobody ever does
    anything about them, because it's always "somebody elses problem". And
    when people start thinking that warnings are normal, they stop looking
    at them, and the real warnings that mean something go unnoticed.

    If you want to get rid of a function, just get rid of it. Convert every
    user to the new world order.

    And if you can't do that, then don't annoy everybody else with your
    marking that says "I couldn't be bothered to fix this, so I'll just spam
    everybody elses build logs with warnings about my laziness".

    Make a kernelnewbies wiki page about things that could be cleaned up,
    write a blog post about it, or talk to people on the mailing lists. But
    don't add warnings to the kernel build about cleanup that you think
    should happen but you aren't doing yourself.

    Don't. Just don't.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • Pull driver core updates from Greg KH:
    "Here are all of the driver core and related patches for 4.19-rc1.

    Nothing huge here, just a number of small cleanups and the ability to
    now stop the deferred probing after init happens.

    All of these have been in linux-next for a while with only a merge
    issue reported"

    * tag 'driver-core-4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (21 commits)
    base: core: Remove WARN_ON from link dependencies check
    drivers/base: stop new probing during shutdown
    drivers: core: Remove glue dirs from sysfs earlier
    driver core: remove unnecessary function extern declare
    sysfs.h: fix non-kernel-doc comment
    PM / Domains: Stop deferring probe at the end of initcall
    iommu: Remove IOMMU_OF_DECLARE
    iommu: Stop deferring probe at end of initcalls
    pinctrl: Support stopping deferred probe after initcalls
    dt-bindings: pinctrl: add a 'pinctrl-use-default' property
    driver core: allow stopping deferred probe after init
    driver core: add a debugfs entry to show deferred devices
    sysfs: Fix internal_create_group() for named group updates
    base: fix order of OF initialization
    linux/device.h: fix kernel-doc notation warning
    Documentation: update firmware loader fallback reference
    kobject: Replace strncpy with memcpy
    drivers: base: cacheinfo: use OF property_read_u32 instead of get_property,read_number
    kernfs: Replace strncpy with memcpy
    device: Add #define dev_fmt similar to #define pr_fmt
    ...

    Linus Torvalds
     
  • Pull char/misc driver updates from Greg KH:
    "Here is the bit set of char/misc drivers for 4.19-rc1

    There is a lot here, much more than normal, seems like everyone is
    writing new driver subsystems these days... Anyway, major things here
    are:

    - new FSI driver subsystem, yet-another-powerpc low-level hardware
    bus

    - gnss, finally an in-kernel GPS subsystem to try to tame all of the
    crazy out-of-tree drivers that have been floating around for years,
    combined with some really hacky userspace implementations. This is
    only for GNSS receivers, but you have to start somewhere, and this
    is great to see.

    Other than that, there are new slimbus drivers, new coresight drivers,
    new fpga drivers, and loads of DT bindings for all of these and
    existing drivers.

    All of these have been in linux-next for a while with no reported
    issues"

    * tag 'char-misc-4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (255 commits)
    android: binder: Rate-limit debug and userspace triggered err msgs
    fsi: sbefifo: Bump max command length
    fsi: scom: Fix NULL dereference
    misc: mic: SCIF Fix scif_get_new_port() error handling
    misc: cxl: changed asterisk position
    genwqe: card_base: Use true and false for boolean values
    misc: eeprom: assignment outside the if statement
    uio: potential double frees if __uio_register_device() fails
    eeprom: idt_89hpesx: clean up an error pointer vs NULL inconsistency
    misc: ti-st: Fix memory leak in the error path of probe()
    android: binder: Show extra_buffers_size in trace
    firmware: vpd: Fix section enabled flag on vpd_section_destroy
    platform: goldfish: Retire pdev_bus
    goldfish: Use dedicated macros instead of manual bit shifting
    goldfish: Add missing includes to goldfish.h
    mux: adgs1408: new driver for Analog Devices ADGS1408/1409 mux
    dt-bindings: mux: add adi,adgs1408
    Drivers: hv: vmbus: Cleanup synic memory free path
    Drivers: hv: vmbus: Remove use of slow_virt_to_phys()
    Drivers: hv: vmbus: Reset the channel callback in vmbus_onoffer_rescind()
    ...

    Linus Torvalds
     
  • Pull staging and IIO updates from Greg KH:
    "Here are the big staging/iio patches for 4.19-rc1.

    Lots of churn here, with tons of cleanups happening in staging
    drivers, a removal of an old crypto driver that no one was using
    (skein), and the addition of some new IIO drivers. Also added was a
    "gasket" driver from Google that needs loads of work and the erofs
    filesystem.

    Even with adding all of the new drivers and a new filesystem, we are
    only adding about 1000 lines overall to the kernel linecount, which
    shows just how much cleanup happened, and how big the unused crypto
    driver was.

    All of these have been in the linux-next tree for a while now with no
    reported issues"

    * tag 'staging-4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (903 commits)
    staging:rtl8192u: Remove unused macro definitions - Style
    staging:rtl8192u: Add spaces around '+' operator - Style
    staging:rtl8192u: Remove stale comment - Style
    staging: rtl8188eu: remove unused mp_custom_oid.h
    staging: fbtft: Add spaces around / - Style
    staging: fbtft: Erases some repetitive usage of function name - Style
    staging: fbtft: Adjust some empty-line problems - Style
    staging: fbtft: Removes one nesting level to help readability - Style
    staging: fbtft: Changes gamma table to define.
    staging: fbtft: A bit more information on dev_err.
    staging: fbtft: Fixes some alignment issues - Style
    staging: fbtft: Puts macro arguments in parenthesis to avoid precedence issues - Style
    staging: rtl8188eu: remove unused array dB_Invert_Table
    staging: rtl8188eu: remove whitespace, add missing blank line
    staging: rtl8188eu: use is_multicast_ether_addr in rtw_sta_mgt.c
    staging: rtl8188eu: remove whitespace - style
    staging: rtl8188eu: cleanup block comment - style
    staging: rtl8188eu: use is_multicast_ether_addr in rtl8188eu_xmit.c
    staging: rtl8188eu: use is_multicast_ether_addr in recv_linux.c
    staging: rtlwifi: refactor rtl_get_tcb_desc
    ...

    Linus Torvalds
     
  • Pull tty/serial driver updates from Greg KH:
    "Here is the big tty and serial driver pull request for 4.19-rc1.

    It's not all that big, just a number of small serial driver updates
    and fixes, along with some better vt handling for unicode characters
    for those using braille terminals.

    All of these patches have been in linux-next for a long time with no
    reported issues"

    * tag 'tty-4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (73 commits)
    tty: serial: 8250: Revert NXP SC16C2552 workaround
    serial: 8250_exar: Read INT0 from slave device, too
    tty: rocket: Fix possible buffer overwrite on register_PCI
    serial: 8250_dw: Add ACPI support for uart on Broadcom SoC
    serial: 8250_dw: always set baud rate in dw8250_set_termios
    dt-bindings: serial: Add binding for uartlite
    tty: serial: uartlite: Add support for suspend and resume
    tty: serial: uartlite: Add clock adaptation
    tty: serial: uartlite: Add structure for private data
    serial: sh-sci: Improve support for separate TEI and DRI interrupts
    serial: sh-sci: Remove SCIx_RZ_SCIFA_REGTYPE
    serial: sh-sci: Allow for compressed SCIF address
    serial: sh-sci: Improve interrupts description
    serial: 8250: Use cached port name directly in messages
    serial: 8250_exar: Drop unused variable in pci_xr17v35x_setup()
    vt: drop unused struct vt_struct
    vt: avoid a VLA in the unicode screen scroll function
    vt: add /dev/vcsu* to devices.txt
    vt: coherence validation code for the unicode screen buffer
    vt: selection: take screen contents from uniscr if available
    ...

    Linus Torvalds
     
  • Pull USB/PHY updates from Greg KH:
    "Here is the big USB and phy driver patch set for 4.19-rc1.

    Nothing huge but there was a lot of work that happened this
    development cycle:

    - lots of type-c work, with drivers graduating out of staging, and
    displayport support being added.

    - new PHY drivers

    - the normal collection of gadget driver updates and fixes

    - code churn to work on the urb handling path, using irqsave()
    everywhere in anticipation of making this codepath a lot simpler in
    the future.

    - usbserial driver fixes and reworks

    - other misc changes

    All of these have been in linux-next with no reported issues for a
    while"

    * tag 'usb-4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (159 commits)
    USB: serial: pl2303: add a new device id for ATEN
    usb: renesas_usbhs: Kconfig: convert to SPDX identifiers
    usb: dwc3: gadget: Check MaxPacketSize from descriptor
    usb: dwc2: Turn on uframe_sched on "stm32f4x9_fsotg" platforms
    usb: dwc2: Turn on uframe_sched on "amlogic" platforms
    usb: dwc2: Turn on uframe_sched on "his" platforms
    usb: dwc2: Turn on uframe_sched on "bcm" platforms
    usb: dwc2: gadget: ISOC's starting flow improvement
    usb: dwc2: Make dwc2_readl/writel functions endianness-agnostic.
    usb: dwc3: core: Enable AutoRetry feature in the controller
    usb: dwc3: Set default mode for dwc_usb31
    usb: gadget: udc: renesas_usb3: Add register of usb role switch
    usb: dwc2: replace ioread32/iowrite32_rep with dwc2_readl/writel_rep
    usb: dwc2: Modify dwc2_readl/writel functions prototype
    usb: dwc3: pci: Intel Merrifield can be host
    usb: dwc3: pci: Supply device properties via driver data
    arm64: dts: dwc3: description of incr burst type
    usb: dwc3: Enable undefined length INCR burst type
    usb: dwc3: add global soc bus configuration reg0
    usb: dwc3: Describe 'wakeup_work' field of struct dwc3_pci
    ...

    Linus Torvalds
     
  • Daniel Borkmann says:

    ====================
    pull-request: bpf 2018-08-18

    The following pull-request contains BPF updates for your *net* tree.

    The main changes are:

    1) Fix a BPF selftest failure in test_cgroup_storage due to rlimit
    restrictions, from Yonghong.

    2) Fix a suspicious RCU rcu_dereference_check() warning triggered
    from removing a device's XDP memory allocator by using the correct
    rhashtable lookup function, from Tariq.

    3) A batch of BPF sockmap and ULP fixes mainly fixing leaks and races
    as well as enforcing module aliases for ULPs. Another fix for BPF
    map redirect to make them work again with tail calls, from Daniel.

    4) Fix XDP BPF samples to unload their programs upon SIGTERM, from Jesper.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Pablo Neira Ayuso says:

    ====================
    Netfilter/IPVS fixes for net

    The following patchset contains Netfilter/IPVS fixes for your net tree:

    1) Infinite loop in IPVS when net namespace is released, from
    Tan Hu.

    2) Do not show negative timeouts in ip_vs_conn by using the new
    jiffies_delta_to_msecs(), patches from Matteo Croce.

    3) Set F_IFACE flag for linklocal addresses in ip6t_rpfilter,
    from Florian Westphal.

    4) Fix overflow in set size allocation, from Taehee Yoo.

    5) Use netlink_dump_start() from ctnetlink to fix memleak from
    the error path, again from Florian.

    6) Register nfnetlink_subsys in last place, otherwise netns
    init path may lose race and see net->nft uninitialized data.
    This also reverts previous attempt to fix this by increase
    netns refcount, patches from Florian.

    7) Remove conntrack entries on layer 4 protocol tracker module
    removal, from Florian.

    8) Use GFP_KERNEL_ACCOUNT for xtables blob allocation, from
    Michal Hocko.

    9) Get tproxy documentation in sync with existing codebase,
    from Mate Eckl.

    10) Honor preset layer 3 protocol via ctx->family in the new nft_ct
    timeout infrastructure, from Harsha Sharma.

    11) Let uapi nfnetlink_osf.h compile standalone with no errors,
    from Dmitry V. Levin.

    12) Missing braces compilation warning in nft_tproxy, patch from
    Mate Eclk.

    13) Disregard bogus check to bail out on non-anonymous sets from
    the dynamic set update extension.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

18 Aug, 2018

17 commits

  • Pull 9p updates from Dominique Martinet:
    "This contains mostly fixes (6 to be backported to stable) and a few
    changes, here is the breakdown:

    - rework how fids are attributed by replacing some custom tracking in
    a list by an idr

    - for packet-based transports (virtio/rdma) validate that the packet
    length matches what the header says

    - a few race condition fixes found by syzkaller

    - missing argument check when NULL device is passed in sys_mount

    - a few virtio fixes

    - some spelling and style fixes"

    * tag '9p-for-4.19-2' of git://github.com/martinetd/linux: (21 commits)
    net/9p/trans_virtio.c: add null terminal for mount tag
    9p/virtio: fix off-by-one error in sg list bounds check
    9p: fix whitespace issues
    9p: fix multiple NULL-pointer-dereferences
    fs/9p/xattr.c: catch the error of p9_client_clunk when setting xattr failed
    9p: validate PDU length
    net/9p/trans_fd.c: fix race by holding the lock
    net/9p/trans_fd.c: fix race-condition by flushing workqueue before the kfree()
    net/9p/virtio: Fix hard lockup in req_done
    net/9p/trans_virtio.c: fix some spell mistakes in comments
    9p/net: Fix zero-copy path in the 9p virtio transport
    9p: Embed wait_queue_head into p9_req_t
    9p: Replace the fidlist with an IDR
    9p: Change p9_fid_create calling convention
    9p: Fix comment on smp_wmb
    net/9p/client.c: version pointer uninitialized
    fs/9p/v9fs.c: fix spelling mistake "Uknown" -> "Unknown"
    net/9p: fix error path of p9_virtio_probe
    9p/net/protocol.c: return -ENOMEM when kmalloc() failed
    net/9p/client.c: add missing '\n' at the end of p9_debug()
    ...

    Linus Torvalds
     
  • Merge updates from Andrew Morton:

    - a few misc things

    - a few Y2038 fixes

    - ntfs fixes

    - arch/sh tweaks

    - ocfs2 updates

    - most of MM

    * emailed patches from Andrew Morton : (111 commits)
    mm/hmm.c: remove unused variables align_start and align_end
    fs/userfaultfd.c: remove redundant pointer uwq
    mm, vmacache: hash addresses based on pmd
    mm/list_lru: introduce list_lru_shrink_walk_irq()
    mm/list_lru.c: pass struct list_lru_node* as an argument to __list_lru_walk_one()
    mm/list_lru.c: move locking from __list_lru_walk_one() to its caller
    mm/list_lru.c: use list_lru_walk_one() in list_lru_walk_node()
    mm, swap: make CONFIG_THP_SWAP depend on CONFIG_SWAP
    mm/sparse: delete old sparse_init and enable new one
    mm/sparse: add new sparse_init_nid() and sparse_init()
    mm/sparse: move buffer init/fini to the common place
    mm/sparse: use the new sparse buffer functions in non-vmemmap
    mm/sparse: abstract sparse buffer allocations
    mm/hugetlb.c: don't zero 1GiB bootmem pages
    mm, page_alloc: double zone's batchsize
    mm/oom_kill.c: document oom_lock
    mm/hugetlb: remove gigantic page support for HIGHMEM
    mm, oom: remove sleep from under oom_lock
    kernel/dma: remove unsupported gfp_mask parameter from dma_alloc_from_contiguous()
    mm/cma: remove unsupported gfp_mask parameter from cma_alloc()
    ...

    Linus Torvalds
     
  • When perf profiling a wide variety of different workloads, it was found
    that vmacache_find() had higher than expected cost: up to 0.08% of cpu
    utilization in some cases. This was found to rival other core VM
    functions such as alloc_pages_vma() with thp enabled and default
    mempolicy, and the conditionals in __get_vma_policy().

    VMACACHE_HASH() determines which of the four per-task_struct slots a vma
    is cached for a particular address. This currently depends on the pfn,
    so pfn 5212 occupies a different vmacache slot than its neighboring pfn
    5213.

    vmacache_find() iterates through all four of current's vmacache slots
    when looking up an address. Hashing based on pfn, an address has
    ~1/VMACACHE_SIZE chance of being cached in the first vmacache slot, or
    about 25%, *if* the vma is cached.

    This patch hashes an address by its pmd instead of pte to optimize for
    workloads with good spatial locality. This results in a higher
    probability of vmas being cached in the first slot that is checked:
    normally ~70% on the same workloads instead of 25%.

    [rientjes@google.com: various updates]
    Link: http://lkml.kernel.org/r/alpine.DEB.2.21.1807231532290.109445@chino.kir.corp.google.com
    Link: http://lkml.kernel.org/r/alpine.DEB.2.21.1807091749150.114630@chino.kir.corp.google.com
    Signed-off-by: David Rientjes
    Reviewed-by: Andrew Morton
    Cc: Davidlohr Bueso
    Cc: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     
  • Provide list_lru_shrink_walk_irq() and let it behave like
    list_lru_walk_one() except that it locks the spinlock with
    spin_lock_irq(). This is used by scan_shadow_nodes() because its lock
    nests within the i_pages lock which is acquired with IRQ. This change
    allows to use proper locking promitives instead hand crafted
    lock_irq_disable() plus spin_lock().

    There is no EXPORT_SYMBOL provided because the current user is in-kernel
    only.

    Add list_lru_shrink_walk_irq() which acquires the spinlock with the
    proper locking primitives.

    Link: http://lkml.kernel.org/r/20180716111921.5365-5-bigeasy@linutronix.de
    Signed-off-by: Sebastian Andrzej Siewior
    Reviewed-by: Vladimir Davydov
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sebastian Andrzej Siewior
     
  • Rename new_sparse_init() to sparse_init() which enables it. Delete old
    sparse_init() and all the code that became obsolete with.

    [pasha.tatashin@oracle.com: remove unused sparse_mem_maps_populate_node()]
    Link: http://lkml.kernel.org/r/20180716174447.14529-6-pasha.tatashin@oracle.com
    Link: http://lkml.kernel.org/r/20180712203730.8703-6-pasha.tatashin@oracle.com
    Signed-off-by: Pavel Tatashin
    Tested-by: Michael Ellerman [powerpc]
    Tested-by: Oscar Salvador
    Reviewed-by: Oscar Salvador
    Cc: Pasha Tatashin
    Cc: Abdul Haleem
    Cc: Baoquan He
    Cc: Daniel Jordan
    Cc: Dan Williams
    Cc: Dave Hansen
    Cc: David Rientjes
    Cc: Greg Kroah-Hartman
    Cc: Ingo Molnar
    Cc: Jan Kara
    Cc: Jérôme Glisse
    Cc: "Kirill A. Shutemov"
    Cc: Michal Hocko
    Cc: Souptick Joarder
    Cc: Steven Sistare
    Cc: Vlastimil Babka
    Cc: Wei Yang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Tatashin
     
  • Now that both variants of sparse memory use the same buffers to populate
    memory map, we can move sparse_buffer_init()/sparse_buffer_fini() to the
    common place.

    Link: http://lkml.kernel.org/r/20180712203730.8703-4-pasha.tatashin@oracle.com
    Signed-off-by: Pavel Tatashin
    Tested-by: Michael Ellerman [powerpc]
    Tested-by: Oscar Salvador
    Reviewed-by: Andrew Morton
    Cc: Pasha Tatashin
    Cc: Abdul Haleem
    Cc: Baoquan He
    Cc: Daniel Jordan
    Cc: Dan Williams
    Cc: Dave Hansen
    Cc: David Rientjes
    Cc: Greg Kroah-Hartman
    Cc: Ingo Molnar
    Cc: Jan Kara
    Cc: Jérôme Glisse
    Cc: "Kirill A. Shutemov"
    Cc: Michal Hocko
    Cc: Souptick Joarder
    Cc: Steven Sistare
    Cc: Vlastimil Babka
    Cc: Wei Yang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Tatashin
     
  • Patch series "sparse_init rewrite", v6.

    In sparse_init() we allocate two large buffers to temporary hold usemap
    and memmap for the whole machine. However, we can avoid doing that if
    we changed sparse_init() to operated on per-node bases instead of doing
    it on the whole machine beforehand.

    As shown by Baoquan
    http://lkml.kernel.org/r/20180628062857.29658-1-bhe@redhat.com

    The buffers are large enough to cause machine stop to boot on small
    memory systems.

    Another benefit of these changes is that they also obsolete
    CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER.

    This patch (of 5):

    When struct pages are allocated for sparse-vmemmap VA layout, we first try
    to allocate one large buffer, and than if that fails allocate struct pages
    for each section as we go.

    The code that allocates buffer is uses global variables and is spread
    across several call sites.

    Cleanup the code by introducing three functions to handle the global
    buffer:

    sparse_buffer_init() initialize the buffer
    sparse_buffer_fini() free the remaining part of the buffer
    sparse_buffer_alloc() alloc from the buffer, and if buffer is empty
    return NULL

    Define these functions in sparse.c instead of sparse-vmemmap.c because
    later we will use them for non-vmemmap sparse allocations as well.

    [akpm@linux-foundation.org: use PTR_ALIGN()]
    [akpm@linux-foundation.org: s/BUG_ON/WARN_ON/]
    Link: http://lkml.kernel.org/r/20180712203730.8703-2-pasha.tatashin@oracle.com
    Signed-off-by: Pavel Tatashin
    Tested-by: Michael Ellerman [powerpc]
    Reviewed-by: Oscar Salvador
    Tested-by: Oscar Salvador
    Cc: Pasha Tatashin
    Cc: Steven Sistare
    Cc: Daniel Jordan
    Cc: "Kirill A. Shutemov"
    Cc: Michal Hocko
    Cc: Dan Williams
    Cc: Jan Kara
    Cc: Jérôme Glisse
    Cc: Souptick Joarder
    Cc: Baoquan He
    Cc: Greg Kroah-Hartman
    Cc: Vlastimil Babka
    Cc: Wei Yang
    Cc: Dave Hansen
    Cc: David Rientjes
    Cc: Ingo Molnar
    Cc: Abdul Haleem
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Tatashin
     
  • This reverts ee8f248d266e ("hugetlb: add phys addr to struct
    huge_bootmem_page").

    At one time powerpc used this field and supporting code. However that
    was removed with commit 79cc38ded1e1 ("powerpc/mm/hugetlb: Add support
    for reserving gigantic huge pages via kernel command line").

    There are no users of this field and supporting code, so remove it.

    Link: http://lkml.kernel.org/r/20180711195913.1294-1-mike.kravetz@oracle.com
    Signed-off-by: Mike Kravetz
    Reviewed-by: Andrew Morton
    Acked-by: Michal Hocko
    Cc: "Aneesh Kumar K . V"
    Cc: Michael Ellerman
    Cc: Benjamin Herrenschmidt
    Cc: Cannon Matthews
    Cc: Becky Bruce
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Kravetz
     
  • The CMA memory allocator doesn't support standard gfp flags for memory
    allocation, so there is no point having it as a parameter for
    dma_alloc_from_contiguous() function. Replace it by a boolean no_warn
    argument, which covers all the underlaying cma_alloc() function
    supports.

    This will help to avoid giving false feeling that this function supports
    standard gfp flags and callers can pass __GFP_ZERO to get zeroed buffer,
    what has already been an issue: see commit dd65a941f6ba ("arm64:
    dma-mapping: clear buffers allocated with FORCE_CONTIGUOUS flag").

    Link: http://lkml.kernel.org/r/20180709122020eucas1p21a71b092975cb4a3b9954ffc63f699d1~-sqUFoa-h2939329393eucas1p2Y@eucas1p2.samsung.com
    Signed-off-by: Marek Szyprowski
    Acked-by: Michał Nazarewicz
    Acked-by: Vlastimil Babka
    Reviewed-by: Christoph Hellwig
    Cc: Laura Abbott
    Cc: Michal Hocko
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Marek Szyprowski
     
  • cma_alloc() doesn't really support gfp flags other than __GFP_NOWARN, so
    convert gfp_mask parameter to boolean no_warn parameter.

    This will help to avoid giving false feeling that this function supports
    standard gfp flags and callers can pass __GFP_ZERO to get zeroed buffer,
    what has already been an issue: see commit dd65a941f6ba ("arm64:
    dma-mapping: clear buffers allocated with FORCE_CONTIGUOUS flag").

    Link: http://lkml.kernel.org/r/20180709122019eucas1p2340da484acfcc932537e6014f4fd2c29~-sqTPJKij2939229392eucas1p2j@eucas1p2.samsung.com
    Signed-off-by: Marek Szyprowski
    Acked-by: Michal Hocko
    Acked-by: Michał Nazarewicz
    Acked-by: Laura Abbott
    Acked-by: Vlastimil Babka
    Reviewed-by: Christoph Hellwig
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Marek Szyprowski
     
  • We need to distinguish the situations when shrinker has very small
    amount of objects (see vfs_pressure_ratio() called from
    super_cache_count()), and when it has no objects at all. Currently, in
    the both of these cases, shrinker::count_objects() returns 0.

    The patch introduces new SHRINK_EMPTY return value, which will be used
    for "no objects at all" case. It's is a refactoring mostly, as
    SHRINK_EMPTY is replaced by 0 by all callers of do_shrink_slab() in this
    patch, and all the magic will happen in further.

    Link: http://lkml.kernel.org/r/153063069574.1818.11037751256699341813.stgit@localhost.localdomain
    Signed-off-by: Kirill Tkhai
    Acked-by: Vladimir Davydov
    Tested-by: Shakeel Butt
    Cc: Al Viro
    Cc: Andrey Ryabinin
    Cc: Chris Wilson
    Cc: Greg Kroah-Hartman
    Cc: Guenter Roeck
    Cc: "Huang, Ying"
    Cc: Johannes Weiner
    Cc: Josef Bacik
    Cc: Li RongQing
    Cc: Matthew Wilcox
    Cc: Matthias Kaehlcke
    Cc: Mel Gorman
    Cc: Michal Hocko
    Cc: Minchan Kim
    Cc: Philippe Ombredanne
    Cc: Roman Gushchin
    Cc: Sahitya Tummala
    Cc: Stephen Rothwell
    Cc: Tetsuo Handa
    Cc: Thomas Gleixner
    Cc: Waiman Long
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill Tkhai
     
  • Introduce set_shrinker_bit() function to set shrinker-related bit in
    memcg shrinker bitmap, and set the bit after the first item is added and
    in case of reparenting destroyed memcg's items.

    This will allow next patch to make shrinkers be called only, in case of
    they have charged objects at the moment, and to improve shrink_slab()
    performance.

    [ktkhai@virtuozzo.com: v9]
    Link: http://lkml.kernel.org/r/153112557572.4097.17315791419810749985.stgit@localhost.localdomain
    Link: http://lkml.kernel.org/r/153063065671.1818.15914674956134687268.stgit@localhost.localdomain
    Signed-off-by: Kirill Tkhai
    Acked-by: Vladimir Davydov
    Tested-by: Shakeel Butt
    Cc: Al Viro
    Cc: Andrey Ryabinin
    Cc: Chris Wilson
    Cc: Greg Kroah-Hartman
    Cc: Guenter Roeck
    Cc: "Huang, Ying"
    Cc: Johannes Weiner
    Cc: Josef Bacik
    Cc: Li RongQing
    Cc: Matthew Wilcox
    Cc: Matthias Kaehlcke
    Cc: Mel Gorman
    Cc: Michal Hocko
    Cc: Minchan Kim
    Cc: Philippe Ombredanne
    Cc: Roman Gushchin
    Cc: Sahitya Tummala
    Cc: Stephen Rothwell
    Cc: Tetsuo Handa
    Cc: Thomas Gleixner
    Cc: Waiman Long
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill Tkhai
     
  • This will be used in next patch.

    Link: http://lkml.kernel.org/r/153063064347.1818.1987011484100392706.stgit@localhost.localdomain
    Signed-off-by: Kirill Tkhai
    Acked-by: Vladimir Davydov
    Tested-by: Shakeel Butt
    Cc: Al Viro
    Cc: Andrey Ryabinin
    Cc: Chris Wilson
    Cc: Greg Kroah-Hartman
    Cc: Guenter Roeck
    Cc: "Huang, Ying"
    Cc: Johannes Weiner
    Cc: Josef Bacik
    Cc: Li RongQing
    Cc: Matthew Wilcox
    Cc: Matthias Kaehlcke
    Cc: Mel Gorman
    Cc: Michal Hocko
    Cc: Minchan Kim
    Cc: Philippe Ombredanne
    Cc: Roman Gushchin
    Cc: Sahitya Tummala
    Cc: Stephen Rothwell
    Cc: Tetsuo Handa
    Cc: Thomas Gleixner
    Cc: Waiman Long
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill Tkhai
     
  • This is just refactoring to allow the next patches to have dst_memcg
    pointer in memcg_drain_list_lru_node().

    Link: http://lkml.kernel.org/r/153063062118.1818.2761273817739499749.stgit@localhost.localdomain
    Signed-off-by: Kirill Tkhai
    Acked-by: Vladimir Davydov
    Tested-by: Shakeel Butt
    Cc: Al Viro
    Cc: Andrey Ryabinin
    Cc: Chris Wilson
    Cc: Greg Kroah-Hartman
    Cc: Guenter Roeck
    Cc: "Huang, Ying"
    Cc: Johannes Weiner
    Cc: Josef Bacik
    Cc: Li RongQing
    Cc: Matthew Wilcox
    Cc: Matthias Kaehlcke
    Cc: Mel Gorman
    Cc: Michal Hocko
    Cc: Minchan Kim
    Cc: Philippe Ombredanne
    Cc: Roman Gushchin
    Cc: Sahitya Tummala
    Cc: Stephen Rothwell
    Cc: Tetsuo Handa
    Cc: Thomas Gleixner
    Cc: Waiman Long
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill Tkhai
     
  • Add list_lru::shrinker_id field and populate it by registered shrinker
    id.

    This will be used to set correct bit in memcg shrinkers map by lru code
    in next patches, after there appeared the first related to memcg element
    in list_lru.

    Link: http://lkml.kernel.org/r/153063059758.1818.14866596416857717800.stgit@localhost.localdomain
    Signed-off-by: Kirill Tkhai
    Acked-by: Vladimir Davydov
    Tested-by: Shakeel Butt
    Cc: Al Viro
    Cc: Andrey Ryabinin
    Cc: Chris Wilson
    Cc: Greg Kroah-Hartman
    Cc: Guenter Roeck
    Cc: "Huang, Ying"
    Cc: Johannes Weiner
    Cc: Josef Bacik
    Cc: Li RongQing
    Cc: Matthew Wilcox
    Cc: Matthias Kaehlcke
    Cc: Mel Gorman
    Cc: Michal Hocko
    Cc: Minchan Kim
    Cc: Philippe Ombredanne
    Cc: Roman Gushchin
    Cc: Sahitya Tummala
    Cc: Stephen Rothwell
    Cc: Tetsuo Handa
    Cc: Thomas Gleixner
    Cc: Waiman Long
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill Tkhai
     
  • Imagine a big node with many cpus, memory cgroups and containers. Let
    we have 200 containers, every container has 10 mounts, and 10 cgroups.
    All container tasks don't touch foreign containers mounts. If there is
    intensive pages write, and global reclaim happens, a writing task has to
    iterate over all memcgs to shrink slab, before it's able to go to
    shrink_page_list().

    Iteration over all the memcg slabs is very expensive: the task has to
    visit 200 * 10 = 2000 shrinkers for every memcg, and since there are
    2000 memcgs, the total calls are 2000 * 2000 = 4000000.

    So, the shrinker makes 4 million do_shrink_slab() calls just to try to
    isolate SWAP_CLUSTER_MAX pages in one of the actively writing memcg via
    shrink_page_list(). I've observed a node spending almost 100% in
    kernel, making useless iteration over already shrinked slab.

    This patch adds bitmap of memcg-aware shrinkers to memcg. The size of
    the bitmap depends on bitmap_nr_ids, and during memcg life it's
    maintained to be enough to fit bitmap_nr_ids shrinkers. Every bit in
    the map is related to corresponding shrinker id.

    Next patches will maintain set bit only for really charged memcg. This
    will allow shrink_slab() to increase its performance in significant way.
    See the last patch for the numbers.

    [ktkhai@virtuozzo.com: v9]
    Link: http://lkml.kernel.org/r/153112549031.4097.3576147070498769979.stgit@localhost.localdomain
    [ktkhai@virtuozzo.com: add comment to mem_cgroup_css_online()]
    Link: http://lkml.kernel.org/r/521f9e5f-c436-b388-fe83-4dc870bfb489@virtuozzo.com
    Link: http://lkml.kernel.org/r/153063056619.1818.12550500883688681076.stgit@localhost.localdomain
    Signed-off-by: Kirill Tkhai
    Acked-by: Vladimir Davydov
    Tested-by: Shakeel Butt
    Cc: Al Viro
    Cc: Andrey Ryabinin
    Cc: Chris Wilson
    Cc: Greg Kroah-Hartman
    Cc: Guenter Roeck
    Cc: "Huang, Ying"
    Cc: Johannes Weiner
    Cc: Josef Bacik
    Cc: Li RongQing
    Cc: Matthew Wilcox
    Cc: Matthias Kaehlcke
    Cc: Mel Gorman
    Cc: Michal Hocko
    Cc: Minchan Kim
    Cc: Philippe Ombredanne
    Cc: Roman Gushchin
    Cc: Sahitya Tummala
    Cc: Stephen Rothwell
    Cc: Tetsuo Handa
    Cc: Thomas Gleixner
    Cc: Waiman Long
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill Tkhai
     
  • Introduce shrinker::id number, which is used to enumerate memcg-aware
    shrinkers. The number start from 0, and the code tries to maintain it
    as small as possible.

    This will be used to represent a memcg-aware shrinkers in memcg
    shrinkers map.

    Since all memcg-aware shrinkers are based on list_lru, which is
    per-memcg in case of !CONFIG_MEMCG_KMEM only, the new functionality will
    be under this config option.

    [ktkhai@virtuozzo.com: v9]
    Link: http://lkml.kernel.org/r/153112546435.4097.10607140323811756557.stgit@localhost.localdomain
    Link: http://lkml.kernel.org/r/153063054586.1818.6041047871606697364.stgit@localhost.localdomain
    Signed-off-by: Kirill Tkhai
    Acked-by: Vladimir Davydov
    Tested-by: Shakeel Butt
    Cc: Al Viro
    Cc: Andrey Ryabinin
    Cc: Chris Wilson
    Cc: Greg Kroah-Hartman
    Cc: Guenter Roeck
    Cc: "Huang, Ying"
    Cc: Johannes Weiner
    Cc: Josef Bacik
    Cc: Li RongQing
    Cc: Matthew Wilcox
    Cc: Matthias Kaehlcke
    Cc: Mel Gorman
    Cc: Michal Hocko
    Cc: Minchan Kim
    Cc: Philippe Ombredanne
    Cc: Roman Gushchin
    Cc: Sahitya Tummala
    Cc: Stephen Rothwell
    Cc: Tetsuo Handa
    Cc: Thomas Gleixner
    Cc: Waiman Long
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill Tkhai