09 Sep, 2017

8 commits

  • Allow interval trees to quickly check for overlaps to avoid unnecesary
    tree lookups in interval_tree_iter_first().

    As of this patch, all interval tree flavors will require using a
    'rb_root_cached' such that we can have the leftmost node easily
    available. While most users will make use of this feature, those with
    special functions (in addition to the generic insert, delete, search
    calls) will avoid using the cached option as they can do funky things
    with insertions -- for example, vma_interval_tree_insert_after().

    [jglisse@redhat.com: fix deadlock from typo vm_lock_anon_vma()]
    Link: http://lkml.kernel.org/r/20170808225719.20723-1-jglisse@redhat.com
    Link: http://lkml.kernel.org/r/20170719014603.19029-12-dave@stgolabs.net
    Signed-off-by: Davidlohr Bueso
    Signed-off-by: Jérôme Glisse
    Acked-by: Christian König
    Acked-by: Peter Zijlstra (Intel)
    Acked-by: Doug Ledford
    Acked-by: Michael S. Tsirkin
    Cc: David Airlie
    Cc: Jason Wang
    Cc: Christian Benvenuti
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • First, number of CPUs can't be negative number.

    Second, different signnnedness leads to suboptimal code in the following
    cases:

    1)
    kmalloc(nr_cpu_ids * sizeof(X));

    "int" has to be sign extended to size_t.

    2)
    while (loff_t *pos < nr_cpu_ids)

    MOVSXD is 1 byte longed than the same MOV.

    Other cases exist as well. Basically compiler is told that nr_cpu_ids
    can't be negative which can't be deduced if it is "int".

    Code savings on allyesconfig kernel: -3KB

    add/remove: 0/0 grow/shrink: 25/264 up/down: 261/-3631 (-3370)
    function old new delta
    coretemp_cpu_online 450 512 +62
    rcu_init_one 1234 1272 +38
    pci_device_probe 374 399 +25

    ...

    pgdat_reclaimable_pages 628 556 -72
    select_fallback_rq 446 369 -77
    task_numa_find_cpu 1923 1807 -116

    Link: http://lkml.kernel.org/r/20170819114959.GA30580@avx2
    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • memset32() can be used to initialise these three arrays. Minor code
    footprint reduction.

    Link: http://lkml.kernel.org/r/20170720184539.31609-8-willy@infradead.org
    Signed-off-by: Matthew Wilcox
    Cc: "James E.J. Bottomley"
    Cc: "Martin K. Petersen"
    Cc: "H. Peter Anvin"
    Cc: David Miller
    Cc: Ingo Molnar
    Cc: Ivan Kokshaysky
    Cc: Matt Turner
    Cc: Michael Ellerman
    Cc: Minchan Kim
    Cc: Ralf Baechle
    Cc: Richard Henderson
    Cc: Russell King
    Cc: Sam Ravnborg
    Cc: Sergey Senozhatsky
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • zram was the motivation for creating memset_l(). Minchan Kim sees a 7%
    performance improvement on x86 with 100MB of non-zero deduplicatable
    data:

    perf stat -r 10 dd if=/dev/zram0 of=/dev/null

    vanilla: 0.232050465 seconds time elapsed ( +- 0.51% )
    memset_l: 0.217219387 seconds time elapsed ( +- 0.07% )

    Link: http://lkml.kernel.org/r/20170720184539.31609-7-willy@infradead.org
    Signed-off-by: Matthew Wilcox
    Tested-by: Minchan Kim
    Cc: Sergey Senozhatsky
    Cc: "H. Peter Anvin"
    Cc: "James E.J. Bottomley"
    Cc: "Martin K. Petersen"
    Cc: David Miller
    Cc: Ingo Molnar
    Cc: Ivan Kokshaysky
    Cc: Matt Turner
    Cc: Michael Ellerman
    Cc: Ralf Baechle
    Cc: Richard Henderson
    Cc: Russell King
    Cc: Sam Ravnborg
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • This macro is useful to avoid link error on 32-bit systems.

    We have the same definition in two drivers, so move it to
    include/linux/kernel.h

    While we are here, refactor DIV_ROUND_UP_ULL() by using
    DIV_ROUND_DOWN_ULL().

    Link: http://lkml.kernel.org/r/1500945156-12907-1-git-send-email-yamada.masahiro@socionext.com
    Signed-off-by: Masahiro Yamada
    Acked-by: Mark Brown
    Cc: Cyrille Pitchen
    Cc: Jaroslav Kysela
    Cc: Takashi Iwai
    Cc: Liam Girdwood
    Cc: Boris Brezillon
    Cc: Marek Vasut
    Cc: Brian Norris
    Cc: Richard Weinberger
    Cc: David Woodhouse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Masahiro Yamada
     
  • Patch series "Separate NUMA statistics from zone statistics", v2.

    Each page allocation updates a set of per-zone statistics with a call to
    zone_statistics(). As discussed in 2017 MM summit, these are a
    substantial source of overhead in the page allocator and are very rarely
    consumed. This significant overhead in cache bouncing caused by zone
    counters (NUMA associated counters) update in parallel in multi-threaded
    page allocation (pointed out by Dave Hansen).

    A link to the MM summit slides:
    http://people.netfilter.org/hawk/presentations/MM-summit2017/MM-summit2017-JesperBrouer.pdf

    To mitigate this overhead, this patchset separates NUMA statistics from
    zone statistics framework, and update NUMA counter threshold to a fixed
    size of MAX_U16 - 2, as a small threshold greatly increases the update
    frequency of the global counter from local per cpu counter (suggested by
    Ying Huang). The rationality is that these statistics counters don't
    need to be read often, unlike other VM counters, so it's not a problem
    to use a large threshold and make readers more expensive.

    With this patchset, we see 31.3% drop of CPU cycles(537-->369, see
    below) for per single page allocation and reclaim on Jesper's
    page_bench03 benchmark. Meanwhile, this patchset keeps the same style
    of virtual memory statistics with little end-user-visible effects (only
    move the numa stats to show behind zone page stats, see the first patch
    for details).

    I did an experiment of single page allocation and reclaim concurrently
    using Jesper's page_bench03 benchmark on a 2-Socket Broadwell-based
    server (88 processors with 126G memory) with different size of threshold
    of pcp counter.

    Benchmark provided by Jesper D Brouer(increase loop times to 10000000):
    https://github.com/netoptimizer/prototype-kernel/tree/master/kernel/mm/bench

    Threshold CPU cycles Throughput(88 threads)
    32 799 241760478
    64 640 301628829
    125 537 358906028 system by default
    256 468 412397590
    512 428 450550704
    4096 399 482520943
    20000 394 489009617
    30000 395 488017817
    65533 369(-31.3%) 521661345(+45.3%) with this patchset
    N/A 342(-36.3%) 562900157(+56.8%) disable zone_statistics

    This patch (of 3):

    In this patch, NUMA statistics is separated from zone statistics
    framework, all the call sites of NUMA stats are changed to use
    numa-stats-specific functions, it does not have any functionality change
    except that the number of NUMA stats is shown behind zone page stats
    when users *read* the zone info.

    E.g. cat /proc/zoneinfo
    ***Base*** ***With this patch***
    nr_free_pages 3976 nr_free_pages 3976
    nr_zone_inactive_anon 0 nr_zone_inactive_anon 0
    nr_zone_active_anon 0 nr_zone_active_anon 0
    nr_zone_inactive_file 0 nr_zone_inactive_file 0
    nr_zone_active_file 0 nr_zone_active_file 0
    nr_zone_unevictable 0 nr_zone_unevictable 0
    nr_zone_write_pending 0 nr_zone_write_pending 0
    nr_mlock 0 nr_mlock 0
    nr_page_table_pages 0 nr_page_table_pages 0
    nr_kernel_stack 0 nr_kernel_stack 0
    nr_bounce 0 nr_bounce 0
    nr_zspages 0 nr_zspages 0
    numa_hit 0 *nr_free_cma 0*
    numa_miss 0 numa_hit 0
    numa_foreign 0 numa_miss 0
    numa_interleave 0 numa_foreign 0
    numa_local 0 numa_interleave 0
    numa_other 0 numa_local 0
    *nr_free_cma 0* numa_other 0
    ... ...
    vm stats threshold: 10 vm stats threshold: 10
    ... ...

    The next patch updates the numa stats counter size and threshold.

    [akpm@linux-foundation.org: coding-style fixes]
    Link: http://lkml.kernel.org/r/1503568801-21305-2-git-send-email-kemi.wang@intel.com
    Signed-off-by: Kemi Wang
    Reported-by: Jesper Dangaard Brouer
    Acked-by: Mel Gorman
    Cc: Michal Hocko
    Cc: Johannes Weiner
    Cc: Christopher Lameter
    Cc: Dave Hansen
    Cc: Andi Kleen
    Cc: Ying Huang
    Cc: Aaron Lu
    Cc: Tim Chen
    Cc: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kemi Wang
     
  • The fix in the parent made me look at that function, and react to how
    illogical and illegible the array initializer was.

    Use named array indexes to make it clearer what is going on, and make
    the initializer not depend silently on the exact index numbers.

    [ The initializer now also shows an odd inconsistency in the naming:
    note the IWCM vs IWPM.. - Linus ]

    Cc: Leon Romanovsky
    Cc: Doug Ledford
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • The netlink message sent with type == 0, which doesn't have any client
    behind it, caused to the overflow in max_num_ops array.

    Fix it by declaring zero number of ops for the first client.

    Fixes: c9901724a2f1 ("RDMA/netlink: Remove netlink clients infrastructure")
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Linus Torvalds

    Leon Romanovsky
     

08 Sep, 2017

19 commits

  • Pull SCSI updates from James Bottomley:
    "This is mostly updates of the usual suspects: lpfc, qla2xxx, hisi_sas,
    megaraid_sas, zfcp and a host of minor updates.

    The major driver change here is the elimination of the block based
    cciss driver in favour of the SCSI based hpsa driver (which now drives
    all the legacy cases cciss used to be required for). Plus a reset
    handler clean up and the redo of the SAS SMP handler to use bsg lib"

    * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (279 commits)
    scsi: scsi-mq: Always unprepare before requeuing a request
    scsi: Show .retries and .jiffies_at_alloc in debugfs
    scsi: Improve requeuing behavior
    scsi: Call scsi_initialize_rq() for filesystem requests
    scsi: qla2xxx: Reset the logo flag, after target re-login.
    scsi: qla2xxx: Fix slow mem alloc behind lock
    scsi: qla2xxx: Clear fc4f_nvme flag
    scsi: qla2xxx: add missing includes for qla_isr
    scsi: qla2xxx: Fix an integer overflow in sysfs code
    scsi: aacraid: report -ENOMEM to upper layer from aac_convert_sgraw2()
    scsi: aacraid: get rid of one level of indentation
    scsi: aacraid: fix indentation errors
    scsi: storvsc: fix memory leak on ring buffer busy
    scsi: scsi_transport_sas: switch to bsg-lib for SMP passthrough
    scsi: smartpqi: remove the smp_handler stub
    scsi: hpsa: remove the smp_handler stub
    scsi: bsg-lib: pass the release callback through bsg_setup_queue
    scsi: Rework handling of scsi_device.vpd_pg8[03]
    scsi: Rework the code for caching Vital Product Data (VPD)
    scsi: rcu: Introduce rcu_swap_protected()
    ...

    Linus Torvalds
     
  • Pull gcc plugins update from Kees Cook:
    "This finishes the porting work on randstruct, and introduces a new
    option to structleak, both noted below:

    - For the randstruct plugin, enable automatic randomization of
    structures that are entirely function pointers (along with a couple
    designated initializer fixes).

    - For the structleak plugin, provide an option to perform zeroing
    initialization of all otherwise uninitialized stack variables that
    are passed by reference (Ard Biesheuvel)"

    * tag 'gcc-plugins-v4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
    gcc-plugins: structleak: add option to init all vars used as byref args
    randstruct: Enable function pointer struct detection
    drivers/net/wan/z85230.c: Use designated initializers
    drm/amd/powerplay: rv: Use designated initializers

    Linus Torvalds
     
  • Pull DeviceTree updates from Rob Herring:
    "There's a few orphans in the conversion to %pOF printf specifiers
    included here that no one else picked up.

    Summary:

    - Convert more DT code to use of_property_read_* API.

    - Improve DT overlay support when adding multiple overlays

    - Convert printk's to %pOF format specifiers. Most went via subsystem
    trees, but picked up the remaining orphans

    - Correct unittests to use preferred "okay" for "status" property
    value

    - Add a KASLR seed property

    - Vendor prefixes for Mellanox, Theobroma System, Adaptrum, Moxa

    - Fix modalias buffer handling

    - Clean-up of include paths for building dtbs

    - Add bindings for amc6821, isl1208, tsl2x7x, srf02, and srf10
    devices

    - Add nvmem bindings for MediaTek MT7623 and MT7622 SoC

    - Add compatible string for Allwinner H5 Mali-450 GPU

    - Fix links to old OpenFirmware docs with new mirror on
    devicetree.org

    - Remove status property from binding doc examples"

    * tag 'devicetree-for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (45 commits)
    devicetree: Adjust status "ok" -> "okay" under drivers/of/
    dt-bindings: Remove "status" from examples
    dt-bindings: pinctrl: sh-pfc: Use generic node name
    dt-bindings: Add vendor Mellanox
    dt-binding: net/phy: fix interrupts description
    virt: Convert to using %pOF instead of full_name
    macintosh: Convert to using %pOF instead of full_name
    ide: pmac: Convert to using %pOF instead of full_name
    microblaze: Convert to using %pOF instead of full_name
    dt-bindings: usb: musb: Grammar s/the/to/, s/is/are/
    of: Use PLATFORM_DEVID_NONE definition
    of/device: Fix of_device_get_modalias() buffer handling
    of/device: Prevent buffer overflow in of_device_modalias()
    dt-bindings: add amc6821, isl1208 trivial bindings
    dt-bindings: add vendor prefix for Theobroma Systems
    of: search scripts/dtc/include-prefixes path for both CPP and DTC
    of: remove arch/$(SRCARCH)/boot/dts from include search path for CPP
    of: remove drivers/of/testcase-data from include search path for CPP
    of: return of_get_cpu_node from of_cpu_device_node_get if CPUs are not registered
    iio: srf08: add device tree binding for srf02 and srf10
    ...

    Linus Torvalds
     
  • Pull i916 drm fixes from Rodrigo Vivi:
    "Since Dave is on paternity leave we are sending drm/i915 fixes for
    v4.14-rc1 directly to you as he had asked us to do.

    The most critical ones are the GPU reset fix for gen2-4 and GVT fix
    for a regression that is blocking gvt init to work on your tree.

    The rest is general fixes for patches coming from drm-next"

    Acked-by: Dave Airlie

    * tag 'drm-intel-next-fixes-2017-09-07' of git://anongit.freedesktop.org/git/drm-intel:
    drm/i915: Re-enable GTT following a device reset
    drm/i915: Annotate user relocs with __user
    drm/i915: Silence sparse by using gfp_t
    drm/i915: Add __rcu to radix tree slot pointer
    drm/i915: Fix the missing PPAT cache attributes on CNL
    drm/i915/gvt: Remove one duplicated MMIO
    drm/i915: Fix enum pipe vs. enum transcoder for the PCH transcoder
    drm/i915: Make i2c lock ops static
    drm/i915: Make i9xx_load_ycbcr_conversion_matrix() static
    drm/i915/edp: Increase T12 panel delay to 900 ms to fix DP AUX CH timeouts
    drm/i915: Ignore duplicate VMA stored within the per-object handle LUT
    drm/i915: Skip fence alignemnt check for the CCS plane
    drm/i915: Treat fb->offsets[] as a raw byte offset instead of a linear offset
    drm/i915: Always wake the device to flush the GTT
    drm/i915: Recreate vmapping even when the object is pinned
    drm/i915: Quietly cancel FBC activation if CRTC is turned off before worker

    Linus Torvalds
     
  • Pull LED updates from Jacek Anaszewski:
    "LED class drivers improvements:

    leds-pca955x:
    - add Device Tree support and bindings
    - use devm_led_classdev_register()
    - add GPIO support
    - prevent crippled LED class device name
    - check for I2C errors

    leds-gpio:
    - add optional retain-state-shutdown DT property
    - allow LED to retain state at shutdown

    leds-tlc591xx:
    - merge conditional tests
    - add missing of_node_put

    leds-powernv:
    - delete an error message for a failed memory allocation in
    powernv_led_create()

    leds-is31fl32xx.c
    - convert to using custom %pOF printf format specifier

    Constify attribute_group structures in:
    - leds-blinkm
    - leds-lm3533

    Make several arrays static const in:
    - leds-aat1290
    - leds-lp5521
    - leds-lp5562
    - leds-lp8501"

    * tag 'leds_for_4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds:
    leds: pca955x: check for I2C errors
    leds: gpio: Allow LED to retain state at shutdown
    dt-bindings: leds: gpio: Add optional retain-state-shutdown property
    leds: powernv: Delete an error message for a failed memory allocation in powernv_led_create()
    leds: lp8501: make several arrays static const
    leds: lp5562: make several arrays static const
    leds: lp5521: make several arrays static const
    leds: aat1290: make array max_mm_current_percent static const
    leds: pca955x: Prevent crippled LED device name
    leds: lm3533: constify attribute_group structure
    dt-bindings: leds: add pca955x
    leds: pca955x: add GPIO support
    leds: pca955x: use devm_led_classdev_register
    leds: pca955x: add device tree support
    leds: Convert to using %pOF instead of full_name
    leds: blinkm: constify attribute_group structures.
    leds: tlc591xx: add missing of_node_put
    leds: tlc591xx: merge conditional tests

    Linus Torvalds
     
  • Pull dmaengine updates from Vinod Koul:
    "This one features the usual updates to the drivers and one good part
    of removing DA_SG from core as it has no users.

    Summary:

    - Remove DMA_SG support as we have no users for this feature
    - New driver for Altera / Intel mSGDMA IP core
    - Support for memset in dmatest and qcom_hidma driver
    - Update for non cyclic mode in k3dma, bunch of update in bam_dma,
    bcm sba-raid
    - Constify device ids across drivers"

    * tag 'dmaengine-4.14-rc1' of git://git.infradead.org/users/vkoul/slave-dma: (52 commits)
    dmaengine: sun6i: support V3s SoC variant
    dmaengine: sun6i: make gate bit in sun8i's DMA engines a common quirk
    dmaengine: rcar-dmac: document R8A77970 bindings
    dmaengine: xilinx_dma: Fix error code format specifier
    dmaengine: altera: Use macros instead of structs to describe the registers
    dmaengine: ti-dma-crossbar: Fix dra7 reserve function
    dmaengine: pl330: constify amba_id
    dmaengine: pl08x: constify amba_id
    dmaengine: bcm-sba-raid: Remove redundant SBA_REQUEST_STATE_COMPLETED
    dmaengine: bcm-sba-raid: Explicitly ACK mailbox message after sending
    dmaengine: bcm-sba-raid: Add debugfs support
    dmaengine: bcm-sba-raid: Remove redundant SBA_REQUEST_STATE_RECEIVED
    dmaengine: bcm-sba-raid: Re-factor sba_process_deferred_requests()
    dmaengine: bcm-sba-raid: Pre-ack async tx descriptor
    dmaengine: bcm-sba-raid: Peek mbox when we have no free requests
    dmaengine: bcm-sba-raid: Alloc resources before registering DMA device
    dmaengine: bcm-sba-raid: Improve sba_issue_pending() run duration
    dmaengine: bcm-sba-raid: Increase number of free sba_request
    dmaengine: bcm-sba-raid: Allow arbitrary number free sba_request
    dmaengine: bcm-sba-raid: Remove reqs_free_count from sba_device
    ...

    Linus Torvalds
     
  • Pull backlight updates from Lee Jones:
    "Fix-ups:
    - Constification; pwm_bl
    - Use new GPIO API; gpio_backlight
    - Remove unused functionality; gpio_backlight

    Bug Fixes:
    - Fix artificial MAXREG limit; lm3630a_bl"

    * tag 'backlight-next-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight:
    backlight: gpio_backlight: Delete pdata inversion
    backlight: gpio_backlight: Convert to use GPIO descriptor
    backlight: pwm_bl: Make of_device_ids const
    backlight: lm3630a: Bump REG_MAX value to 0x50 instead of 0x1F

    Linus Torvalds
     
  • Pull MFD updates from Lee Jones:
    "New Drivers
    - RK805 Power Management IC (PMIC)
    - ROHM BD9571MWV-M MFD Power Management IC (PMIC)
    - Texas Instruments TPS68470 Power Management IC (PMIC) & LEDs

    New Device Support:
    - Add support for HiSilicon Hi6421v530 to hi6421-pmic-core
    - Add support for X-Powers AXP806 to axp20x
    - Add support for X-Powers AXP813 to axp20x
    - Add support for Intel Sunrise Point LPSS to intel-lpss-pci

    New Functionality:
    - Amend API to provide register layout; atmel-smc

    Fix-ups:
    - DT re-work; omap, nokia
    - Header file location change {I2C => MFD}; dm355evm_msp, tps65010
    - Fix chip ID formatting issue(s); rk808
    - Optionally register touchscreen devices; da9052-core
    - Documentation improvements; twl-core
    - Constification; rtsx_pcr, ab8500-core, da9055-i2c, da9052-spi
    - Drop unnecessary static declaration; max8925-i2c
    - Kconfig changes (missing deps and remove module support)
    - Slim down oversized licence statement; hi6421-pmic-core
    - Use managed resources (devm_*); lp87565
    - Supply proper error checking/handling; t7l66xb

    Bug Fixes:
    - Fix counter duplication issue; da9052-core
    - Fix potential NULL deference issue; max8998
    - Leave SPI-NOR write-protection bit alone; lpc_ich
    - Ensure device is put into reset during suspend; intel-lpss
    - Correct register offset variable size; omap-usb-tll"

    * tag 'mfd-next-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (61 commits)
    mfd: intel_soc_pmic: Differentiate between Bay and Cherry Trail CRC variants
    mfd: intel_soc_pmic: Export separate mfd-cell configs for BYT and CHT
    dt-bindings: mfd: Add bindings for ZII RAVE devices
    mfd: omap-usb-tll: Fix register offsets
    mfd: da9052: Constify spi_device_id
    mfd: intel-lpss: Put I2C and SPI controllers into reset state on suspend
    mfd: da9055: Constify i2c_device_id
    mfd: intel-lpss: Add missing PCI ID for Intel Sunrise Point LPSS devices
    mfd: t7l66xb: Handle return value of clk_prepare_enable
    mfd: Add ROHM BD9571MWV-M PMIC DT bindings
    mfd: intel_soc_pmic_chtwc: Turn Kconfig option into a bool
    mfd: lp87565: Convert to use devm_mfd_add_devices()
    mfd: Add support for TPS68470 device
    mfd: lpc_ich: Do not touch SPI-NOR write protection bit on Haswell/Broadwell
    mfd: syscon: atmel-smc: Add helper to retrieve register layout
    mfd: axp20x: Use correct platform device ID for many PEK
    dt-bindings: mfd: axp20x: Introduce bindings for AXP813
    mfd: axp20x: Add support for AXP813 PMIC
    dt-bindings: mfd: axp20x: Add AXP806 to supported list of chips
    mfd: Add ROHM BD9571MWV-M MFD PMIC driver
    ...

    Linus Torvalds
     
  • Pull input updates from Dmitry Torokhov:

    - a new GPIO bit-banging driver implementing PS/2 protocol

    - a new power key driver for Rockchip RK805 PMIC

    - bunch of patches constifying various device ID structures

    - Elan I2C touchpad driver now supports devices with 2 buttons

    - other assorted fixes

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (76 commits)
    Input: byd - make array seq static, reduces object code size
    Input: xilinx_ps2 - fix multiline comment style
    Input: pxa27x_keypad - handle return value of clk_prepare_enable
    Input: tegra-kbc - handle return value of clk_prepare_enable
    Input: PS/2 gpio bit banging driver for serio bus
    Input: xen-kbdfront - enable auto repeat for xen keyboard frontend driver
    Input: ambakmi - constify amba_id
    Input: atmel_mxt_ts - add support for reset line
    Input: atmel_mxt_ts - use more managed resources
    Input: wacom_w8001 - constify serio_device_id
    Input: tsc40 - constify serio_device_id
    Input: touchwin - constify serio_device_id
    Input: touchright - constify serio_device_id
    Input: touchit213 - constify serio_device_id
    Input: penmount - constify serio_device_id
    Input: mtouch - constify serio_device_id
    Input: inexio - constify serio_device_id
    Input: hampshire - constify serio_device_id
    Input: gunze - constify serio_device_id
    Input: fujitsu_ts - constify serio_device_id
    ...

    Linus Torvalds
     
  • Pull mailbox updates from Jassi Brar:
    "Just behavorial changes to a controller driver: the Broadcom's Flexrm
    mailbox driver has been modifified to support debugfs and TX-Done
    mechanism by ACK.

    Nothing for the core mailbox stack"

    * tag 'mailbox-v4.14' of git://git.linaro.org/landing-teams/working/fujitsu/integration:
    mailbox: bcm-flexrm-mailbox: Use txdone_ack instead of txdone_poll
    mailbox: bcm-flexrm-mailbox: Use bitmap instead of IDA
    mailbox: bcm-flexrm-mailbox: Fix mask used in CMPL_START_ADDR_VALUE()
    mailbox: bcm-flexrm-mailbox: Add debugfs support
    mailbox: bcm-flexrm-mailbox: Set IRQ affinity hint for FlexRM ring IRQs

    Linus Torvalds
     
  • Pull media updates from Mauro Carvalho Chehab:
    "Brazil's Independence Day pull request :-)

    This is one of the biggest media pull requests, with 625 patches
    affecting almost all parts of media (RC, DVB, V4L2, CEC, docs).

    This contains:

    - A lot of new drivers:
    * DVB frontends: mxl5xx, stv0910, stv6111;
    * camera flash: as3645a led driver;
    * HDMI receiver: adv748X;
    * camera sensor: Omnivision 6650 5M driver (ov6650);
    * HDMI CEC: ao-cec meson driver;
    * V4L2: Qualcom camss driver;
    * Remote controller: gpio-ir-tx, pwm-ir-tx and zx-irdec drivers.

    - The DDbridge DVB driver got a massive update, with makes it in sync
    with modern hardware from that vendor;

    - There's an important milestone on this series: the DVB
    documentation was written in 2003, but only started to be updated
    in 2007. It also used to contain several gaps from the time it was
    kept out of tree, mentioning error codes and device nodes that
    never existed upstream. On this series, it received a massive
    update: all non-deprecated digital TV APIs are now in sync with the
    current implementation;

    - Some DVB APIs that aren't used by any upstream driver got removed;

    - Other parts of the media documentation algo got updated, fixing
    some bugs on its PDF output and making it compatible with Sphinx
    version 1.6.

    As the number of hacks required to build PDF output reduced, I hope
    we'll have less troubles as newer versions of our documentation
    toolchain are released (famous last words);

    - As usual, lots of driver cleanups and improvements"

    * tag 'media/v4.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (624 commits)
    media: leds: as3645a: add V4L2_FLASH_LED_CLASS dependency
    media: get rid of removed DMX_GET_CAPS and DMX_SET_SOURCE leftovers
    media: Revert "[media] v4l: async: make v4l2 coexist with devicetree nodes in a dt overlay"
    media: staging: atomisp: sh_css_calloc shall return a pointer to the allocated space
    media: Revert "[media] lirc_dev: remove superfluous get/put_device() calls"
    media: add qcom_camss.rst to v4l-drivers rst file
    media: dvb headers: make checkpatch happier
    media: dvb uapi: move frontend legacy API to another part of the book
    media: pixfmt-srggb12p.rst: better format the table for PDF output
    media: docs-rst: media: Don't use \small for V4L2_PIX_FMT_SRGGB10 documentation
    media: index.rst: don't write "Contents:" on PDF output
    media: pixfmt*.rst: replace a two dots by a comma
    media: vidioc-g-fmt.rst: adjust table format
    media: vivid.rst: add a blank line to correct ReST format
    media: v4l2 uapi book: get rid of driver programming's chapter
    media: format.rst: use the right markup for important notes
    media: docs-rst: cardlists: change their format to flat-tables
    media: em28xx-cardlist.rst: update to reflect last changes
    media: v4l2-event.rst: adjust table to fit on PDF output
    media: docs: don't show ToC for each part on PDF output
    ...

    Linus Torvalds
     
  • Pull MD updates from Shaohua Li:
    "This update mainly fixes bugs:

    - Make raid5 ppl support several ppl from Pawel

    - Several raid5-cache bug fixes from Song

    - Bitmap fixes from Neil and Me

    - One raid1/10 regression fix since 4.12 from Me

    - Other small fixes and cleanup"

    * tag 'md/4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
    md/bitmap: disable bitmap_resize for file-backed bitmaps.
    raid5-ppl: Recovery support for multiple partial parity logs
    md: Runtime support for multiple ppls
    md/raid0: attach correct cgroup info in bio
    lib/raid6: align AVX512 constants to 512 bits, not bytes
    raid5: remove raid5_build_block
    md/r5cache: call mddev_lock/unlock() in r5c_journal_mode_show
    md: replace seq_release_private with seq_release
    md: notify about new spare disk in the container
    md/raid1/10: reset bio allocated from mempool
    md/raid5: release/flush io in raid5_do_work()
    md/bitmap: copy correct data for bitmap super

    Linus Torvalds
     
  • Pull MMC updates from Ulf Hansson:
    "MMC core:
    - Continue to refactor the mmc block code to prepare for blkmq
    - Move mmc block debugfs into block module
    - Next step for eMMC CMDQ by adding a new mmc host interface for it
    - Move Kconfig option MMC_DEBUG from core to host
    - Some additional minor improvements

    MMC host:
    - Declare structs as const when applicable
    - Explicitly request exclusive reset control when applicable
    - Improve some error paths and other various cleanups
    - sdhci: Preparations to support SDHCI OMAP
    - sdhci: Improve some PM related code
    - sdhci: Re-factoring and modernizations
    - sdhci-xenon: Add runtime PM and system sleep support
    - sdhci-xenon: Add support for eMMC HS400 Enhanced Strobe
    - sdhci-cadence: Add system sleep support
    - sdhci-of-at91: Improve system sleep support
    - dw_mmc: Add support for Hisilicon hi3660
    - sunxi: Add support for A83T eMMC
    - sunxi: Add support for DDR52 mode
    - meson-gx: Add support for UHS-I SD-cards
    - meson-gx: Cleanups and improvements
    - tmio: Fix CMD12 (STOP) handling
    - tmio: Cleanups and improvements
    - renesas_sdhi: Add r8a7743/5 support
    - renesas-sdhi: Add support for R-Car Gen3 SDHI DMAC
    - renesas_sdhi: Cleanups and improvements"

    * tag 'mmc-v4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: (145 commits)
    mmc: renesas_sdhi: Add r8a7743/5 support
    mmc: meson-gx: fix __ffsdi2 undefined on arm32
    mmc: sdhci-xenon: add runtime pm support and reimplement standby
    mmc: core: Move mmc_start_areq() declaration
    mmc: mmci: stop building qcom dml as module
    mmc: sunxi: Reset the device at probe time
    clk: sunxi-ng: Provide a default reset hook
    mmc: meson-gx: rework tuning function
    mmc: meson-gx: change default tx phase
    mmc: meson-gx: implement voltage switch callback
    mmc: meson-gx: use CCF to handle the clock phases
    mmc: meson-gx: implement card_busy callback
    mmc: meson-gx: simplify interrupt handler
    mmc: meson-gx: work around clk-stop issue
    mmc: meson-gx: fix dual data rate mode frequencies
    mmc: meson-gx: rework clock init function
    mmc: meson-gx: rework clk_set function
    mmc: meson-gx: rework set_ios function
    mmc: meson-gx: cfg init overwrite values
    mmc: meson-gx: initialize sane clk default before clock register
    ...

    Linus Torvalds
     
  • James Bottomley
     
  • Pull block layer updates from Jens Axboe:
    "This is the first pull request for 4.14, containing most of the code
    changes. It's a quiet series this round, which I think we needed after
    the churn of the last few series. This contains:

    - Fix for a registration race in loop, from Anton Volkov.

    - Overflow complaint fix from Arnd for DAC960.

    - Series of drbd changes from the usual suspects.

    - Conversion of the stec/skd driver to blk-mq. From Bart.

    - A few BFQ improvements/fixes from Paolo.

    - CFQ improvement from Ritesh, allowing idling for group idle.

    - A few fixes found by Dan's smatch, courtesy of Dan.

    - A warning fixup for a race between changing the IO scheduler and
    device remova. From David Jeffery.

    - A few nbd fixes from Josef.

    - Support for cgroup info in blktrace, from Shaohua.

    - Also from Shaohua, new features in the null_blk driver to allow it
    to actually hold data, among other things.

    - Various corner cases and error handling fixes from Weiping Zhang.

    - Improvements to the IO stats tracking for blk-mq from me. Can
    drastically improve performance for fast devices and/or big
    machines.

    - Series from Christoph removing bi_bdev as being needed for IO
    submission, in preparation for nvme multipathing code.

    - Series from Bart, including various cleanups and fixes for switch
    fall through case complaints"

    * 'for-4.14/block' of git://git.kernel.dk/linux-block: (162 commits)
    kernfs: checking for IS_ERR() instead of NULL
    drbd: remove BIOSET_NEED_RESCUER flag from drbd_{md_,}io_bio_set
    drbd: Fix allyesconfig build, fix recent commit
    drbd: switch from kmalloc() to kmalloc_array()
    drbd: abort drbd_start_resync if there is no connection
    drbd: move global variables to drbd namespace and make some static
    drbd: rename "usermode_helper" to "drbd_usermode_helper"
    drbd: fix race between handshake and admin disconnect/down
    drbd: fix potential deadlock when trying to detach during handshake
    drbd: A single dot should be put into a sequence.
    drbd: fix rmmod cleanup, remove _all_ debugfs entries
    drbd: Use setup_timer() instead of init_timer() to simplify the code.
    drbd: fix potential get_ldev/put_ldev refcount imbalance during attach
    drbd: new disk-option disable-write-same
    drbd: Fix resource role for newly created resources in events2
    drbd: mark symbols static where possible
    drbd: Send P_NEG_ACK upon write error in protocol != C
    drbd: add explicit plugging when submitting batches
    drbd: change list_for_each_safe to while(list_first_entry_or_null)
    drbd: introduce drbd_recv_header_maybe_unplug
    ...

    Linus Torvalds
     
  • Pull xen updates from Juergen Gross:

    - the new pvcalls backend for routing socket calls from a guest to dom0

    - some cleanups of Xen code

    - a fix for wrong usage of {get,put}_cpu()

    * tag 'for-linus-4.14b-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (27 commits)
    xen/mmu: set MMU_NORMAL_PT_UPDATE in remap_area_mfn_pte_fn
    xen: Don't try to call xen_alloc_p2m_entry() on autotranslating guests
    xen/events: events_fifo: Don't use {get,put}_cpu() in xen_evtchn_fifo_init()
    xen/pvcalls: use WARN_ON(1) instead of __WARN()
    xen: remove not used trace functions
    xen: remove unused function xen_set_domain_pte()
    xen: remove tests for pvh mode in pure pv paths
    xen-platform: constify pci_device_id.
    xen: cleanup xen.h
    xen: introduce a Kconfig option to enable the pvcalls backend
    xen/pvcalls: implement write
    xen/pvcalls: implement read
    xen/pvcalls: implement the ioworker functions
    xen/pvcalls: disconnect and module_exit
    xen/pvcalls: implement release command
    xen/pvcalls: implement poll command
    xen/pvcalls: implement accept command
    xen/pvcalls: implement listen command
    xen/pvcalls: implement bind command
    xen/pvcalls: implement connect command
    ...

    Linus Torvalds
     
  • Pull powerpc updates from Michael Ellerman:
    "Nothing really major this release, despite quite a lot of activity.
    Just lots of things all over the place.

    Some things of note include:

    - Access via perf to a new type of PMU (IMC) on Power9, which can
    count both core events as well as nest unit events (Memory
    controller etc).

    - Optimisations to the radix MMU TLB flushing, mostly to avoid
    unnecessary Page Walk Cache (PWC) flushes when the structure of the
    tree is not changing.

    - Reworks/cleanups of do_page_fault() to modernise it and bring it
    closer to other architectures where possible.

    - Rework of our page table walking so that THP updates only need to
    send IPIs to CPUs where the affected mm has run, rather than all
    CPUs.

    - The size of our vmalloc area is increased to 56T on 64-bit hash MMU
    systems. This avoids problems with the percpu allocator on systems
    with very sparse NUMA layouts.

    - STRICT_KERNEL_RWX support on PPC32.

    - A new sched domain topology for Power9, to capture the fact that
    pairs of cores may share an L2 cache.

    - Power9 support for VAS, which is a new mechanism for accessing
    coprocessors, and initial support for using it with the NX
    compression accelerator.

    - Major work on the instruction emulation support, adding support for
    many new instructions, and reworking it so it can be used to
    implement the emulation needed to fixup alignment faults.

    - Support for guests under PowerVM to use the Power9 XIVE interrupt
    controller.

    And probably that many things again that are almost as interesting,
    but I had to keep the list short. Plus the usual fixes and cleanups as
    always.

    Thanks to: Alexey Kardashevskiy, Alistair Popple, Andreas Schwab,
    Aneesh Kumar K.V, Anju T Sudhakar, Arvind Yadav, Balbir Singh,
    Benjamin Herrenschmidt, Bhumika Goyal, Breno Leitao, Bryant G. Ly,
    Christophe Leroy, Cédric Le Goater, Dan Carpenter, Dou Liyang,
    Frederic Barrat, Gautham R. Shenoy, Geliang Tang, Geoff Levand, Hannes
    Reinecke, Haren Myneni, Ivan Mikhaylov, John Allen, Julia Lawall,
    LABBE Corentin, Laurentiu Tudor, Madhavan Srinivasan, Markus Elfring,
    Masahiro Yamada, Matt Brown, Michael Neuling, Murilo Opsfelder Araujo,
    Nathan Fontenot, Naveen N. Rao, Nicholas Piggin, Oliver O'Halloran,
    Paul Mackerras, Rashmica Gupta, Rob Herring, Rui Teng, Sam Bobroff,
    Santosh Sivaraj, Scott Wood, Shilpasri G Bhat, Sukadev Bhattiprolu,
    Suraj Jitindar Singh, Tobin C. Harding, Victor Aoqui"

    * tag 'powerpc-4.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (321 commits)
    powerpc/xive: Fix section __init warning
    powerpc: Fix kernel crash in emulation of vector loads and stores
    powerpc/xive: improve debugging macros
    powerpc/xive: add XIVE Exploitation Mode to CAS
    powerpc/xive: introduce H_INT_ESB hcall
    powerpc/xive: add the HW IRQ number under xive_irq_data
    powerpc/xive: introduce xive_esb_write()
    powerpc/xive: rename xive_poke_esb() in xive_esb_read()
    powerpc/xive: guest exploitation of the XIVE interrupt controller
    powerpc/xive: introduce a common routine xive_queue_page_alloc()
    powerpc/sstep: Avoid used uninitialized error
    axonram: Return directly after a failed kzalloc() in axon_ram_probe()
    axonram: Improve a size determination in axon_ram_probe()
    axonram: Delete an error message for a failed memory allocation in axon_ram_probe()
    powerpc/powernv/npu: Move tlb flush before launching ATSD
    powerpc/macintosh: constify wf_sensor_ops structures
    powerpc/iommu: Use permission-specific DEVICE_ATTR variants
    powerpc/eeh: Delete an error out of memory message at init time
    powerpc/mm: Use seq_putc() in two functions
    macintosh: Convert to using %pOF instead of full_name
    ...

    Linus Torvalds
     
  • Pull EFI updates from Ingo Molnar:
    "The main changes in this cycle were:

    - Transparently fall back to other poweroff method(s) if EFI poweroff
    fails (and returns)

    - Use separate PE/COFF section headers for the RX and RW parts of the
    ARM stub loader so that the firmware can use strict mapping
    permissions

    - Add support for requesting the firmware to wipe RAM at warm reboot

    - Increase the size of the random seed obtained from UEFI so CRNG
    fast init can complete earlier

    - Update the EFI framebuffer address if it points to a BAR that gets
    moved by the PCI resource allocation code

    - Enable "reset attack mitigation" of TPM environments: this is
    enabled if the kernel is configured with
    CONFIG_RESET_ATTACK_MITIGATION=y.

    - Clang related fixes

    - Misc cleanups, constification, refactoring, etc"

    * 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    efi/bgrt: Use efi_mem_type()
    efi: Move efi_mem_type() to common code
    efi/reboot: Make function pointer orig_pm_power_off static
    efi/random: Increase size of firmware supplied randomness
    efi/libstub: Enable reset attack mitigation
    firmware/efi/esrt: Constify attribute_group structures
    firmware/efi: Constify attribute_group structures
    firmware/dcdbas: Constify attribute_group structures
    arm/efi: Split zImage code and data into separate PE/COFF sections
    arm/efi: Replace open coded constants with symbolic ones
    arm/efi: Remove pointless dummy .reloc section
    arm/efi: Remove forbidden values from the PE/COFF header
    drivers/fbdev/efifb: Allow BAR to be moved instead of claiming it
    efi/reboot: Fall back to original power-off method if EFI_RESET_SHUTDOWN returns
    efi/arm/arm64: Add missing assignment of efi.config_table
    efi/libstub/arm64: Set -fpie when building the EFI stub
    efi/libstub/arm64: Force 'hidden' visibility for section markers
    efi/libstub/arm64: Use hidden attribute for struct screen_info reference
    efi/arm: Don't mark ACPI reclaim memory as MEMBLOCK_NOMAP

    Linus Torvalds
     
  • Pull x86 platform updates from Ingo Molnar:
    "The main changes include various Hyper-V optimizations such as faster
    hypercalls and faster/better TLB flushes - and there's also some
    Intel-MID cleanups"

    * 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    tracing/hyper-v: Trace hyperv_mmu_flush_tlb_others()
    x86/hyper-v: Support extended CPU ranges for TLB flush hypercalls
    x86/platform/intel-mid: Make several arrays static, to make code smaller
    MAINTAINERS: Add missed file for Hyper-V
    x86/hyper-v: Use hypercall for remote TLB flush
    hyper-v: Globalize vp_index
    x86/hyper-v: Implement rep hypercalls
    hyper-v: Use fast hypercall for HVCALL_SIGNAL_EVENT
    x86/hyper-v: Introduce fast hypercall implementation
    x86/hyper-v: Make hv_do_hypercall() inline
    x86/hyper-v: Include hyperv/ only when CONFIG_HYPERV is set
    x86/platform/intel-mid: Make 'bt_sfi_data' const
    x86/platform/intel-mid: Make IRQ allocation a bit more flexible
    x86/platform/intel-mid: Group timers callbacks together

    Linus Torvalds
     

07 Sep, 2017

13 commits

  • Pull libata updates from Tejun Heo:
    "Except for the ahci fix that fixes a boot issue, nothing major in this
    pull request. Some new platform controller support and device specific
    changes"

    * 'for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
    libata: zpodd: make arrays cdb static, reduces object code size
    ahci: don't use MSI for devices with the silly Intel NVMe remapping scheme
    dt-bindings: ata: add DT bindings for MediaTek SATA controller
    ata: mediatek: add support for MediaTek SATA controller
    pata_octeon_cf: use of_property_read_{bool|u32}()
    cs5536: add support for IDE controller variant
    ata: sata_gemini: Introduce explicit IDE pin control
    ata: sata_gemini: Retire custom pin control
    ata: ahci_platform: Add shutdown handler
    ata: sata_gemini: explicitly request exclusive reset control
    ata: Drop unnecessary static
    ata: Convert to using %pOF instead of full_name

    Linus Torvalds
     
  • Merge updates from Andrew Morton:

    - various misc bits

    - DAX updates

    - OCFS2

    - most of MM

    * emailed patches from Andrew Morton : (119 commits)
    mm,fork: introduce MADV_WIPEONFORK
    x86,mpx: make mpx depend on x86-64 to free up VMA flag
    mm: add /proc/pid/smaps_rollup
    mm: hugetlb: clear target sub-page last when clearing huge page
    mm: oom: let oom_reap_task and exit_mmap run concurrently
    swap: choose swap device according to numa node
    mm: replace TIF_MEMDIE checks by tsk_is_oom_victim
    mm, oom: do not rely on TIF_MEMDIE for memory reserves access
    z3fold: use per-cpu unbuddied lists
    mm, swap: don't use VMA based swap readahead if HDD is used as swap
    mm, swap: add sysfs interface for VMA based swap readahead
    mm, swap: VMA based swap readahead
    mm, swap: fix swap readahead marking
    mm, swap: add swap readahead hit statistics
    mm/vmalloc.c: don't reinvent the wheel but use existing llist API
    mm/vmstat.c: fix wrong comment
    selftests/memfd: add memfd_create hugetlbfs selftest
    mm/shmem: add hugetlbfs support to memfd_create()
    mm, devm_memremap_pages: use multi-order radix for ZONE_DEVICE lookups
    mm/vmalloc.c: halve the number of comparisons performed in pcpu_get_vm_areas()
    ...

    Linus Torvalds
     
  • The .rw_page in struct block_device_operations is used by the swap
    subsystem to read/write the page contents from/into the corresponding
    swap slot in the swap device. To support the THP (Transparent Huge
    Page) swap optimization, the .rw_page is enhanced to support to
    read/write THP if possible.

    Link: http://lkml.kernel.org/r/20170724051840.2309-6-ying.huang@intel.com
    Signed-off-by: "Huang, Ying"
    Reviewed-by: Ross Zwisler [for brd.c, zram_drv.c, pmem.c]
    Cc: Johannes Weiner
    Cc: Minchan Kim
    Cc: Dan Williams
    Cc: Vishal L Verma
    Cc: Jens Axboe
    Cc: "Kirill A . Shutemov"
    Cc: Andrea Arcangeli
    Cc: Hugh Dickins
    Cc: Michal Hocko
    Cc: Rik van Riel
    Cc: Shaohua Li
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Huang Ying
     
  • This patch adds document and kconfig for using of writeback feature.

    Link: http://lkml.kernel.org/r/1498459987-24562-10-git-send-email-minchan@kernel.org
    Signed-off-by: Minchan Kim
    Cc: Juneho Choi
    Cc: Sergey Senozhatsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • This patch enables read IO from backing device. For the feature, it
    implements two IO read functions to transfer data from backing storage.

    One is asynchronous IO function and other is synchronous one.

    A reason I need synchrnous IO is due to partial write which need to
    complete read IO before the overwriting partial data.

    We can make the partial IO's case asynchronous, too but at the moment, I
    don't feel adding more complexity to support such rare use cases so want
    to go with simple.

    [xieyisheng1@huawei.com: read_from_bdev_async(): return 1 to avoid call page_endio() in zram_rw_page()]
    Link: http://lkml.kernel.org/r/1502707447-6944-1-git-send-email-xieyisheng1@huawei.com
    Link: http://lkml.kernel.org/r/1498459987-24562-9-git-send-email-minchan@kernel.org
    Signed-off-by: Minchan Kim
    Signed-off-by: Yisheng Xie
    Cc: Juneho Choi
    Cc: Sergey Senozhatsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • This patch enables write IO to transfer data to backing device. For
    that, it implements write_to_bdev function which creates new bio and
    chaining with parent bio to make the parent bio asynchrnous.

    For rw_page which don't have parent bio, it submit owned bio and handle
    IO completion by zram_page_end_io.

    Also, this patch defines new flag ZRAM_WB to mark written page for later
    read IO.

    [xieyisheng1@huawei.com: fix typo in comment]
    Link: http://lkml.kernel.org/r/1502707447-6944-2-git-send-email-xieyisheng1@huawei.com
    Link: http://lkml.kernel.org/r/1498459987-24562-8-git-send-email-minchan@kernel.org
    Signed-off-by: Minchan Kim
    Signed-off-by: Yisheng Xie
    Cc: Juneho Choi
    Cc: Sergey Senozhatsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • For upcoming asynchronous IO like writeback, zram_rw_page should be
    aware of that whether requested IO was completed or submitted
    successfully, otherwise error.

    For the goal, zram_bvec_rw has three return values.

    -errno: returns error number
    0: IO request is done synchronously
    1: IO request is issued successfully.

    Link: http://lkml.kernel.org/r/1498459987-24562-7-git-send-email-minchan@kernel.org
    Signed-off-by: Minchan Kim
    Cc: Juneho Choi
    Cc: Sergey Senozhatsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • With backing device, zram needs management of free space of backing
    device.

    This patch adds bitmap logic to manage free space which is very naive.
    However, it would be simple enough as considering uncompressible pages's
    frequenty in zram.

    Link: http://lkml.kernel.org/r/1498459987-24562-6-git-send-email-minchan@kernel.org
    Signed-off-by: Minchan Kim
    Cc: Juneho Choi
    Cc: Sergey Senozhatsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • For writeback feature, user should set up backing device before the zram
    working.

    This patch enables the interface via /sys/block/zramX/backing_dev.

    Currently, it supports block device only but it could be enhanced for
    file as well.

    Link: http://lkml.kernel.org/r/1498459987-24562-5-git-send-email-minchan@kernel.org
    Signed-off-by: Minchan Kim
    Cc: Juneho Choi
    Cc: Sergey Senozhatsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • zram_decompress_page naming is not proper because it doesn't decompress
    if page was dedup hit or stored with compression.

    Use more abstract term and consistent with write path function
    __zram_bvec_write.

    Link: http://lkml.kernel.org/r/1498459987-24562-4-git-send-email-minchan@kernel.org
    Signed-off-by: Minchan Kim
    Cc: Juneho Choi
    Cc: Sergey Senozhatsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • zram_compress does several things, compress, entry alloc and check
    limitation. I did for just readbility but it hurts modulization.:(

    So this patch removes zram_compress functions and inline it in
    __zram_bvec_write for upcoming patches.

    Link: http://lkml.kernel.org/r/1498459987-24562-3-git-send-email-minchan@kernel.org
    Signed-off-by: Minchan Kim
    Cc: Juneho Choi
    Cc: Sergey Senozhatsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • Patch series "writeback incompressible pages to storage", v1.

    zRam is useful for memory saving with compressible pages but sometime,
    workload can be changed and system has lots of incompressible pages
    which is very harmful for zram.

    This patch supports writeback feature of zram so admin can set up a
    block device and with it, zram can save the memory via writing out the
    incompressile pages once it found it's incompressible pages (1/4 comp
    ratio) instead of keeping the page in memory.

    [1-3] is just clean up and [4-8] is step by step feature enablement.
    [4-8] is logically not bisectable(ie, logical unit separation)
    although I tried to compiled out without breaking but I think it would
    be better to review.

    This patch (of 9):

    __zram_bvec_write has some of duplicated logic for zram meta data
    handling of same_page|compressed_page. This patch aims to clean it up
    without behavior change.

    [xieyisheng1@huawei.com: fix compr_data_size stat]
    Link: http://lkml.kernel.org/r/1502707447-6944-1-git-send-email-xieyisheng1@huawei.com
    Link: http://lkml.kernel.org/r/1496019048-27016-1-git-send-email-minchan@kernel.org
    Link: http://lkml.kernel.org/r/1498459987-24562-2-git-send-email-minchan@kernel.org
    Signed-off-by: Minchan Kim
    Signed-off-by: Yisheng Xie
    Reviewed-by: Sergey Senozhatsky
    Cc: Juneho Choi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • Historically we have enforced that any kernel zone (e.g ZONE_NORMAL) has
    to precede the Movable zone in the physical memory range. The purpose
    of the movable zone is, however, not bound to any physical memory
    restriction. It merely defines a class of migrateable and reclaimable
    memory.

    There are users (e.g. CMA) who might want to reserve specific physical
    memory ranges for their own purpose. Moreover our pfn walkers have to
    be prepared for zones overlapping in the physical range already because
    we do support interleaving NUMA nodes and therefore zones can interleave
    as well. This means we can allow each memory block to be associated
    with a different zone.

    Loosen the current onlining semantic and allow explicit onlining type on
    any memblock. That means that online_{kernel,movable} will be allowed
    regardless of the physical address of the memblock as long as it is
    offline of course. This might result in moveble zone overlapping with
    other kernel zones. Default onlining then becomes a bit tricky but
    still sensible. echo online > memoryXY/state will online the given
    block to

    1) the default zone if the given range is outside of any zone
    2) the enclosing zone if such a zone doesn't interleave with
    any other zone
    3) the default zone if more zones interleave for this range

    where default zone is movable zone only if movable_node is enabled
    otherwise it is a kernel zone.

    Here is an example of the semantic with (movable_node is not present but
    it work in an analogous way). We start with following memblocks, all of
    them offline:

    memory34/valid_zones:Normal Movable
    memory35/valid_zones:Normal Movable
    memory36/valid_zones:Normal Movable
    memory37/valid_zones:Normal Movable
    memory38/valid_zones:Normal Movable
    memory39/valid_zones:Normal Movable
    memory40/valid_zones:Normal Movable
    memory41/valid_zones:Normal Movable

    Now, we online block 34 in default mode and block 37 as movable

    root@test1:/sys/devices/system/node/node1# echo online > memory34/state
    root@test1:/sys/devices/system/node/node1# echo online_movable > memory37/state
    memory34/valid_zones:Normal
    memory35/valid_zones:Normal Movable
    memory36/valid_zones:Normal Movable
    memory37/valid_zones:Movable
    memory38/valid_zones:Normal Movable
    memory39/valid_zones:Normal Movable
    memory40/valid_zones:Normal Movable
    memory41/valid_zones:Normal Movable

    As we can see all other blocks can still be onlined both into Normal and
    Movable zones and the Normal is default because the Movable zone spans
    only block37 now.

    root@test1:/sys/devices/system/node/node1# echo online_movable > memory41/state
    memory34/valid_zones:Normal
    memory35/valid_zones:Normal Movable
    memory36/valid_zones:Normal Movable
    memory37/valid_zones:Movable
    memory38/valid_zones:Movable Normal
    memory39/valid_zones:Movable Normal
    memory40/valid_zones:Movable Normal
    memory41/valid_zones:Movable

    Now the default zone for blocks 37-41 has changed because movable zone
    spans that range.

    root@test1:/sys/devices/system/node/node1# echo online_kernel > memory39/state
    memory34/valid_zones:Normal
    memory35/valid_zones:Normal Movable
    memory36/valid_zones:Normal Movable
    memory37/valid_zones:Movable
    memory38/valid_zones:Normal Movable
    memory39/valid_zones:Normal
    memory40/valid_zones:Movable Normal
    memory41/valid_zones:Movable

    Note that the block 39 now belongs to the zone Normal and so block38
    falls into Normal by default as well.

    For completness

    root@test1:/sys/devices/system/node/node1# for i in memory[34]?
    do
    echo online > $i/state 2>/dev/null
    done

    memory34/valid_zones:Normal
    memory35/valid_zones:Normal
    memory36/valid_zones:Normal
    memory37/valid_zones:Movable
    memory38/valid_zones:Normal
    memory39/valid_zones:Normal
    memory40/valid_zones:Movable
    memory41/valid_zones:Movable

    Implementation wise the change is quite straightforward. We can get rid
    of allow_online_pfn_range altogether. online_pages allows only offline
    nodes already. The original default_zone_for_pfn will become
    default_kernel_zone_for_pfn. New default_zone_for_pfn implements the
    above semantic. zone_for_pfn_range is slightly reorganized to implement
    kernel and movable online type explicitly and MMOP_ONLINE_KEEP becomes a
    catch all default behavior.

    Link: http://lkml.kernel.org/r/20170714121233.16861-3-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Acked-by: Joonsoo Kim
    Acked-by: Vlastimil Babka
    Acked-by: Reza Arbab
    Cc: Mel Gorman
    Cc: Andrea Arcangeli
    Cc: Yasuaki Ishimatsu
    Cc: Xishi Qiu
    Cc: Kani Toshimitsu
    Cc:
    Cc: Daniel Kiper
    Cc: Igor Mammedov
    Cc: Vitaly Kuznetsov
    Cc: Wei Yang
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko