09 Sep, 2017
8 commits
-
Allow interval trees to quickly check for overlaps to avoid unnecesary
tree lookups in interval_tree_iter_first().As of this patch, all interval tree flavors will require using a
'rb_root_cached' such that we can have the leftmost node easily
available. While most users will make use of this feature, those with
special functions (in addition to the generic insert, delete, search
calls) will avoid using the cached option as they can do funky things
with insertions -- for example, vma_interval_tree_insert_after().[jglisse@redhat.com: fix deadlock from typo vm_lock_anon_vma()]
Link: http://lkml.kernel.org/r/20170808225719.20723-1-jglisse@redhat.com
Link: http://lkml.kernel.org/r/20170719014603.19029-12-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso
Signed-off-by: Jérôme Glisse
Acked-by: Christian König
Acked-by: Peter Zijlstra (Intel)
Acked-by: Doug Ledford
Acked-by: Michael S. Tsirkin
Cc: David Airlie
Cc: Jason Wang
Cc: Christian Benvenuti
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
First, number of CPUs can't be negative number.
Second, different signnnedness leads to suboptimal code in the following
cases:1)
kmalloc(nr_cpu_ids * sizeof(X));"int" has to be sign extended to size_t.
2)
while (loff_t *pos < nr_cpu_ids)MOVSXD is 1 byte longed than the same MOV.
Other cases exist as well. Basically compiler is told that nr_cpu_ids
can't be negative which can't be deduced if it is "int".Code savings on allyesconfig kernel: -3KB
add/remove: 0/0 grow/shrink: 25/264 up/down: 261/-3631 (-3370)
function old new delta
coretemp_cpu_online 450 512 +62
rcu_init_one 1234 1272 +38
pci_device_probe 374 399 +25...
pgdat_reclaimable_pages 628 556 -72
select_fallback_rq 446 369 -77
task_numa_find_cpu 1923 1807 -116Link: http://lkml.kernel.org/r/20170819114959.GA30580@avx2
Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
memset32() can be used to initialise these three arrays. Minor code
footprint reduction.Link: http://lkml.kernel.org/r/20170720184539.31609-8-willy@infradead.org
Signed-off-by: Matthew Wilcox
Cc: "James E.J. Bottomley"
Cc: "Martin K. Petersen"
Cc: "H. Peter Anvin"
Cc: David Miller
Cc: Ingo Molnar
Cc: Ivan Kokshaysky
Cc: Matt Turner
Cc: Michael Ellerman
Cc: Minchan Kim
Cc: Ralf Baechle
Cc: Richard Henderson
Cc: Russell King
Cc: Sam Ravnborg
Cc: Sergey Senozhatsky
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
zram was the motivation for creating memset_l(). Minchan Kim sees a 7%
performance improvement on x86 with 100MB of non-zero deduplicatable
data:perf stat -r 10 dd if=/dev/zram0 of=/dev/null
vanilla: 0.232050465 seconds time elapsed ( +- 0.51% )
memset_l: 0.217219387 seconds time elapsed ( +- 0.07% )Link: http://lkml.kernel.org/r/20170720184539.31609-7-willy@infradead.org
Signed-off-by: Matthew Wilcox
Tested-by: Minchan Kim
Cc: Sergey Senozhatsky
Cc: "H. Peter Anvin"
Cc: "James E.J. Bottomley"
Cc: "Martin K. Petersen"
Cc: David Miller
Cc: Ingo Molnar
Cc: Ivan Kokshaysky
Cc: Matt Turner
Cc: Michael Ellerman
Cc: Ralf Baechle
Cc: Richard Henderson
Cc: Russell King
Cc: Sam Ravnborg
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This macro is useful to avoid link error on 32-bit systems.
We have the same definition in two drivers, so move it to
include/linux/kernel.hWhile we are here, refactor DIV_ROUND_UP_ULL() by using
DIV_ROUND_DOWN_ULL().Link: http://lkml.kernel.org/r/1500945156-12907-1-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada
Acked-by: Mark Brown
Cc: Cyrille Pitchen
Cc: Jaroslav Kysela
Cc: Takashi Iwai
Cc: Liam Girdwood
Cc: Boris Brezillon
Cc: Marek Vasut
Cc: Brian Norris
Cc: Richard Weinberger
Cc: David Woodhouse
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Patch series "Separate NUMA statistics from zone statistics", v2.
Each page allocation updates a set of per-zone statistics with a call to
zone_statistics(). As discussed in 2017 MM summit, these are a
substantial source of overhead in the page allocator and are very rarely
consumed. This significant overhead in cache bouncing caused by zone
counters (NUMA associated counters) update in parallel in multi-threaded
page allocation (pointed out by Dave Hansen).A link to the MM summit slides:
http://people.netfilter.org/hawk/presentations/MM-summit2017/MM-summit2017-JesperBrouer.pdfTo mitigate this overhead, this patchset separates NUMA statistics from
zone statistics framework, and update NUMA counter threshold to a fixed
size of MAX_U16 - 2, as a small threshold greatly increases the update
frequency of the global counter from local per cpu counter (suggested by
Ying Huang). The rationality is that these statistics counters don't
need to be read often, unlike other VM counters, so it's not a problem
to use a large threshold and make readers more expensive.With this patchset, we see 31.3% drop of CPU cycles(537-->369, see
below) for per single page allocation and reclaim on Jesper's
page_bench03 benchmark. Meanwhile, this patchset keeps the same style
of virtual memory statistics with little end-user-visible effects (only
move the numa stats to show behind zone page stats, see the first patch
for details).I did an experiment of single page allocation and reclaim concurrently
using Jesper's page_bench03 benchmark on a 2-Socket Broadwell-based
server (88 processors with 126G memory) with different size of threshold
of pcp counter.Benchmark provided by Jesper D Brouer(increase loop times to 10000000):
https://github.com/netoptimizer/prototype-kernel/tree/master/kernel/mm/benchThreshold CPU cycles Throughput(88 threads)
32 799 241760478
64 640 301628829
125 537 358906028 system by default
256 468 412397590
512 428 450550704
4096 399 482520943
20000 394 489009617
30000 395 488017817
65533 369(-31.3%) 521661345(+45.3%) with this patchset
N/A 342(-36.3%) 562900157(+56.8%) disable zone_statisticsThis patch (of 3):
In this patch, NUMA statistics is separated from zone statistics
framework, all the call sites of NUMA stats are changed to use
numa-stats-specific functions, it does not have any functionality change
except that the number of NUMA stats is shown behind zone page stats
when users *read* the zone info.E.g. cat /proc/zoneinfo
***Base*** ***With this patch***
nr_free_pages 3976 nr_free_pages 3976
nr_zone_inactive_anon 0 nr_zone_inactive_anon 0
nr_zone_active_anon 0 nr_zone_active_anon 0
nr_zone_inactive_file 0 nr_zone_inactive_file 0
nr_zone_active_file 0 nr_zone_active_file 0
nr_zone_unevictable 0 nr_zone_unevictable 0
nr_zone_write_pending 0 nr_zone_write_pending 0
nr_mlock 0 nr_mlock 0
nr_page_table_pages 0 nr_page_table_pages 0
nr_kernel_stack 0 nr_kernel_stack 0
nr_bounce 0 nr_bounce 0
nr_zspages 0 nr_zspages 0
numa_hit 0 *nr_free_cma 0*
numa_miss 0 numa_hit 0
numa_foreign 0 numa_miss 0
numa_interleave 0 numa_foreign 0
numa_local 0 numa_interleave 0
numa_other 0 numa_local 0
*nr_free_cma 0* numa_other 0
... ...
vm stats threshold: 10 vm stats threshold: 10
... ...The next patch updates the numa stats counter size and threshold.
[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/1503568801-21305-2-git-send-email-kemi.wang@intel.com
Signed-off-by: Kemi Wang
Reported-by: Jesper Dangaard Brouer
Acked-by: Mel Gorman
Cc: Michal Hocko
Cc: Johannes Weiner
Cc: Christopher Lameter
Cc: Dave Hansen
Cc: Andi Kleen
Cc: Ying Huang
Cc: Aaron Lu
Cc: Tim Chen
Cc: Dave Hansen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The fix in the parent made me look at that function, and react to how
illogical and illegible the array initializer was.Use named array indexes to make it clearer what is going on, and make
the initializer not depend silently on the exact index numbers.[ The initializer now also shows an odd inconsistency in the naming:
note the IWCM vs IWPM.. - Linus ]Cc: Leon Romanovsky
Cc: Doug Ledford
Signed-off-by: Linus Torvalds -
The netlink message sent with type == 0, which doesn't have any client
behind it, caused to the overflow in max_num_ops array.Fix it by declaring zero number of ops for the first client.
Fixes: c9901724a2f1 ("RDMA/netlink: Remove netlink clients infrastructure")
Signed-off-by: Leon Romanovsky
Signed-off-by: Linus Torvalds
08 Sep, 2017
19 commits
-
Pull SCSI updates from James Bottomley:
"This is mostly updates of the usual suspects: lpfc, qla2xxx, hisi_sas,
megaraid_sas, zfcp and a host of minor updates.The major driver change here is the elimination of the block based
cciss driver in favour of the SCSI based hpsa driver (which now drives
all the legacy cases cciss used to be required for). Plus a reset
handler clean up and the redo of the SAS SMP handler to use bsg lib"* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (279 commits)
scsi: scsi-mq: Always unprepare before requeuing a request
scsi: Show .retries and .jiffies_at_alloc in debugfs
scsi: Improve requeuing behavior
scsi: Call scsi_initialize_rq() for filesystem requests
scsi: qla2xxx: Reset the logo flag, after target re-login.
scsi: qla2xxx: Fix slow mem alloc behind lock
scsi: qla2xxx: Clear fc4f_nvme flag
scsi: qla2xxx: add missing includes for qla_isr
scsi: qla2xxx: Fix an integer overflow in sysfs code
scsi: aacraid: report -ENOMEM to upper layer from aac_convert_sgraw2()
scsi: aacraid: get rid of one level of indentation
scsi: aacraid: fix indentation errors
scsi: storvsc: fix memory leak on ring buffer busy
scsi: scsi_transport_sas: switch to bsg-lib for SMP passthrough
scsi: smartpqi: remove the smp_handler stub
scsi: hpsa: remove the smp_handler stub
scsi: bsg-lib: pass the release callback through bsg_setup_queue
scsi: Rework handling of scsi_device.vpd_pg8[03]
scsi: Rework the code for caching Vital Product Data (VPD)
scsi: rcu: Introduce rcu_swap_protected()
... -
Pull gcc plugins update from Kees Cook:
"This finishes the porting work on randstruct, and introduces a new
option to structleak, both noted below:- For the randstruct plugin, enable automatic randomization of
structures that are entirely function pointers (along with a couple
designated initializer fixes).- For the structleak plugin, provide an option to perform zeroing
initialization of all otherwise uninitialized stack variables that
are passed by reference (Ard Biesheuvel)"* tag 'gcc-plugins-v4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
gcc-plugins: structleak: add option to init all vars used as byref args
randstruct: Enable function pointer struct detection
drivers/net/wan/z85230.c: Use designated initializers
drm/amd/powerplay: rv: Use designated initializers -
Pull DeviceTree updates from Rob Herring:
"There's a few orphans in the conversion to %pOF printf specifiers
included here that no one else picked up.Summary:
- Convert more DT code to use of_property_read_* API.
- Improve DT overlay support when adding multiple overlays
- Convert printk's to %pOF format specifiers. Most went via subsystem
trees, but picked up the remaining orphans- Correct unittests to use preferred "okay" for "status" property
value- Add a KASLR seed property
- Vendor prefixes for Mellanox, Theobroma System, Adaptrum, Moxa
- Fix modalias buffer handling
- Clean-up of include paths for building dtbs
- Add bindings for amc6821, isl1208, tsl2x7x, srf02, and srf10
devices- Add nvmem bindings for MediaTek MT7623 and MT7622 SoC
- Add compatible string for Allwinner H5 Mali-450 GPU
- Fix links to old OpenFirmware docs with new mirror on
devicetree.org- Remove status property from binding doc examples"
* tag 'devicetree-for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (45 commits)
devicetree: Adjust status "ok" -> "okay" under drivers/of/
dt-bindings: Remove "status" from examples
dt-bindings: pinctrl: sh-pfc: Use generic node name
dt-bindings: Add vendor Mellanox
dt-binding: net/phy: fix interrupts description
virt: Convert to using %pOF instead of full_name
macintosh: Convert to using %pOF instead of full_name
ide: pmac: Convert to using %pOF instead of full_name
microblaze: Convert to using %pOF instead of full_name
dt-bindings: usb: musb: Grammar s/the/to/, s/is/are/
of: Use PLATFORM_DEVID_NONE definition
of/device: Fix of_device_get_modalias() buffer handling
of/device: Prevent buffer overflow in of_device_modalias()
dt-bindings: add amc6821, isl1208 trivial bindings
dt-bindings: add vendor prefix for Theobroma Systems
of: search scripts/dtc/include-prefixes path for both CPP and DTC
of: remove arch/$(SRCARCH)/boot/dts from include search path for CPP
of: remove drivers/of/testcase-data from include search path for CPP
of: return of_get_cpu_node from of_cpu_device_node_get if CPUs are not registered
iio: srf08: add device tree binding for srf02 and srf10
... -
Pull i916 drm fixes from Rodrigo Vivi:
"Since Dave is on paternity leave we are sending drm/i915 fixes for
v4.14-rc1 directly to you as he had asked us to do.The most critical ones are the GPU reset fix for gen2-4 and GVT fix
for a regression that is blocking gvt init to work on your tree.The rest is general fixes for patches coming from drm-next"
Acked-by: Dave Airlie
* tag 'drm-intel-next-fixes-2017-09-07' of git://anongit.freedesktop.org/git/drm-intel:
drm/i915: Re-enable GTT following a device reset
drm/i915: Annotate user relocs with __user
drm/i915: Silence sparse by using gfp_t
drm/i915: Add __rcu to radix tree slot pointer
drm/i915: Fix the missing PPAT cache attributes on CNL
drm/i915/gvt: Remove one duplicated MMIO
drm/i915: Fix enum pipe vs. enum transcoder for the PCH transcoder
drm/i915: Make i2c lock ops static
drm/i915: Make i9xx_load_ycbcr_conversion_matrix() static
drm/i915/edp: Increase T12 panel delay to 900 ms to fix DP AUX CH timeouts
drm/i915: Ignore duplicate VMA stored within the per-object handle LUT
drm/i915: Skip fence alignemnt check for the CCS plane
drm/i915: Treat fb->offsets[] as a raw byte offset instead of a linear offset
drm/i915: Always wake the device to flush the GTT
drm/i915: Recreate vmapping even when the object is pinned
drm/i915: Quietly cancel FBC activation if CRTC is turned off before worker -
Pull LED updates from Jacek Anaszewski:
"LED class drivers improvements:leds-pca955x:
- add Device Tree support and bindings
- use devm_led_classdev_register()
- add GPIO support
- prevent crippled LED class device name
- check for I2C errorsleds-gpio:
- add optional retain-state-shutdown DT property
- allow LED to retain state at shutdownleds-tlc591xx:
- merge conditional tests
- add missing of_node_putleds-powernv:
- delete an error message for a failed memory allocation in
powernv_led_create()leds-is31fl32xx.c
- convert to using custom %pOF printf format specifierConstify attribute_group structures in:
- leds-blinkm
- leds-lm3533Make several arrays static const in:
- leds-aat1290
- leds-lp5521
- leds-lp5562
- leds-lp8501"* tag 'leds_for_4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds:
leds: pca955x: check for I2C errors
leds: gpio: Allow LED to retain state at shutdown
dt-bindings: leds: gpio: Add optional retain-state-shutdown property
leds: powernv: Delete an error message for a failed memory allocation in powernv_led_create()
leds: lp8501: make several arrays static const
leds: lp5562: make several arrays static const
leds: lp5521: make several arrays static const
leds: aat1290: make array max_mm_current_percent static const
leds: pca955x: Prevent crippled LED device name
leds: lm3533: constify attribute_group structure
dt-bindings: leds: add pca955x
leds: pca955x: add GPIO support
leds: pca955x: use devm_led_classdev_register
leds: pca955x: add device tree support
leds: Convert to using %pOF instead of full_name
leds: blinkm: constify attribute_group structures.
leds: tlc591xx: add missing of_node_put
leds: tlc591xx: merge conditional tests -
Pull dmaengine updates from Vinod Koul:
"This one features the usual updates to the drivers and one good part
of removing DA_SG from core as it has no users.Summary:
- Remove DMA_SG support as we have no users for this feature
- New driver for Altera / Intel mSGDMA IP core
- Support for memset in dmatest and qcom_hidma driver
- Update for non cyclic mode in k3dma, bunch of update in bam_dma,
bcm sba-raid
- Constify device ids across drivers"* tag 'dmaengine-4.14-rc1' of git://git.infradead.org/users/vkoul/slave-dma: (52 commits)
dmaengine: sun6i: support V3s SoC variant
dmaengine: sun6i: make gate bit in sun8i's DMA engines a common quirk
dmaengine: rcar-dmac: document R8A77970 bindings
dmaengine: xilinx_dma: Fix error code format specifier
dmaengine: altera: Use macros instead of structs to describe the registers
dmaengine: ti-dma-crossbar: Fix dra7 reserve function
dmaengine: pl330: constify amba_id
dmaengine: pl08x: constify amba_id
dmaengine: bcm-sba-raid: Remove redundant SBA_REQUEST_STATE_COMPLETED
dmaengine: bcm-sba-raid: Explicitly ACK mailbox message after sending
dmaengine: bcm-sba-raid: Add debugfs support
dmaengine: bcm-sba-raid: Remove redundant SBA_REQUEST_STATE_RECEIVED
dmaengine: bcm-sba-raid: Re-factor sba_process_deferred_requests()
dmaengine: bcm-sba-raid: Pre-ack async tx descriptor
dmaengine: bcm-sba-raid: Peek mbox when we have no free requests
dmaengine: bcm-sba-raid: Alloc resources before registering DMA device
dmaengine: bcm-sba-raid: Improve sba_issue_pending() run duration
dmaengine: bcm-sba-raid: Increase number of free sba_request
dmaengine: bcm-sba-raid: Allow arbitrary number free sba_request
dmaengine: bcm-sba-raid: Remove reqs_free_count from sba_device
... -
Pull backlight updates from Lee Jones:
"Fix-ups:
- Constification; pwm_bl
- Use new GPIO API; gpio_backlight
- Remove unused functionality; gpio_backlightBug Fixes:
- Fix artificial MAXREG limit; lm3630a_bl"* tag 'backlight-next-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight:
backlight: gpio_backlight: Delete pdata inversion
backlight: gpio_backlight: Convert to use GPIO descriptor
backlight: pwm_bl: Make of_device_ids const
backlight: lm3630a: Bump REG_MAX value to 0x50 instead of 0x1F -
Pull MFD updates from Lee Jones:
"New Drivers
- RK805 Power Management IC (PMIC)
- ROHM BD9571MWV-M MFD Power Management IC (PMIC)
- Texas Instruments TPS68470 Power Management IC (PMIC) & LEDsNew Device Support:
- Add support for HiSilicon Hi6421v530 to hi6421-pmic-core
- Add support for X-Powers AXP806 to axp20x
- Add support for X-Powers AXP813 to axp20x
- Add support for Intel Sunrise Point LPSS to intel-lpss-pciNew Functionality:
- Amend API to provide register layout; atmel-smcFix-ups:
- DT re-work; omap, nokia
- Header file location change {I2C => MFD}; dm355evm_msp, tps65010
- Fix chip ID formatting issue(s); rk808
- Optionally register touchscreen devices; da9052-core
- Documentation improvements; twl-core
- Constification; rtsx_pcr, ab8500-core, da9055-i2c, da9052-spi
- Drop unnecessary static declaration; max8925-i2c
- Kconfig changes (missing deps and remove module support)
- Slim down oversized licence statement; hi6421-pmic-core
- Use managed resources (devm_*); lp87565
- Supply proper error checking/handling; t7l66xbBug Fixes:
- Fix counter duplication issue; da9052-core
- Fix potential NULL deference issue; max8998
- Leave SPI-NOR write-protection bit alone; lpc_ich
- Ensure device is put into reset during suspend; intel-lpss
- Correct register offset variable size; omap-usb-tll"* tag 'mfd-next-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (61 commits)
mfd: intel_soc_pmic: Differentiate between Bay and Cherry Trail CRC variants
mfd: intel_soc_pmic: Export separate mfd-cell configs for BYT and CHT
dt-bindings: mfd: Add bindings for ZII RAVE devices
mfd: omap-usb-tll: Fix register offsets
mfd: da9052: Constify spi_device_id
mfd: intel-lpss: Put I2C and SPI controllers into reset state on suspend
mfd: da9055: Constify i2c_device_id
mfd: intel-lpss: Add missing PCI ID for Intel Sunrise Point LPSS devices
mfd: t7l66xb: Handle return value of clk_prepare_enable
mfd: Add ROHM BD9571MWV-M PMIC DT bindings
mfd: intel_soc_pmic_chtwc: Turn Kconfig option into a bool
mfd: lp87565: Convert to use devm_mfd_add_devices()
mfd: Add support for TPS68470 device
mfd: lpc_ich: Do not touch SPI-NOR write protection bit on Haswell/Broadwell
mfd: syscon: atmel-smc: Add helper to retrieve register layout
mfd: axp20x: Use correct platform device ID for many PEK
dt-bindings: mfd: axp20x: Introduce bindings for AXP813
mfd: axp20x: Add support for AXP813 PMIC
dt-bindings: mfd: axp20x: Add AXP806 to supported list of chips
mfd: Add ROHM BD9571MWV-M MFD PMIC driver
... -
Pull input updates from Dmitry Torokhov:
- a new GPIO bit-banging driver implementing PS/2 protocol
- a new power key driver for Rockchip RK805 PMIC
- bunch of patches constifying various device ID structures
- Elan I2C touchpad driver now supports devices with 2 buttons
- other assorted fixes
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (76 commits)
Input: byd - make array seq static, reduces object code size
Input: xilinx_ps2 - fix multiline comment style
Input: pxa27x_keypad - handle return value of clk_prepare_enable
Input: tegra-kbc - handle return value of clk_prepare_enable
Input: PS/2 gpio bit banging driver for serio bus
Input: xen-kbdfront - enable auto repeat for xen keyboard frontend driver
Input: ambakmi - constify amba_id
Input: atmel_mxt_ts - add support for reset line
Input: atmel_mxt_ts - use more managed resources
Input: wacom_w8001 - constify serio_device_id
Input: tsc40 - constify serio_device_id
Input: touchwin - constify serio_device_id
Input: touchright - constify serio_device_id
Input: touchit213 - constify serio_device_id
Input: penmount - constify serio_device_id
Input: mtouch - constify serio_device_id
Input: inexio - constify serio_device_id
Input: hampshire - constify serio_device_id
Input: gunze - constify serio_device_id
Input: fujitsu_ts - constify serio_device_id
... -
Pull mailbox updates from Jassi Brar:
"Just behavorial changes to a controller driver: the Broadcom's Flexrm
mailbox driver has been modifified to support debugfs and TX-Done
mechanism by ACK.Nothing for the core mailbox stack"
* tag 'mailbox-v4.14' of git://git.linaro.org/landing-teams/working/fujitsu/integration:
mailbox: bcm-flexrm-mailbox: Use txdone_ack instead of txdone_poll
mailbox: bcm-flexrm-mailbox: Use bitmap instead of IDA
mailbox: bcm-flexrm-mailbox: Fix mask used in CMPL_START_ADDR_VALUE()
mailbox: bcm-flexrm-mailbox: Add debugfs support
mailbox: bcm-flexrm-mailbox: Set IRQ affinity hint for FlexRM ring IRQs -
Pull media updates from Mauro Carvalho Chehab:
"Brazil's Independence Day pull request :-)This is one of the biggest media pull requests, with 625 patches
affecting almost all parts of media (RC, DVB, V4L2, CEC, docs).This contains:
- A lot of new drivers:
* DVB frontends: mxl5xx, stv0910, stv6111;
* camera flash: as3645a led driver;
* HDMI receiver: adv748X;
* camera sensor: Omnivision 6650 5M driver (ov6650);
* HDMI CEC: ao-cec meson driver;
* V4L2: Qualcom camss driver;
* Remote controller: gpio-ir-tx, pwm-ir-tx and zx-irdec drivers.- The DDbridge DVB driver got a massive update, with makes it in sync
with modern hardware from that vendor;- There's an important milestone on this series: the DVB
documentation was written in 2003, but only started to be updated
in 2007. It also used to contain several gaps from the time it was
kept out of tree, mentioning error codes and device nodes that
never existed upstream. On this series, it received a massive
update: all non-deprecated digital TV APIs are now in sync with the
current implementation;- Some DVB APIs that aren't used by any upstream driver got removed;
- Other parts of the media documentation algo got updated, fixing
some bugs on its PDF output and making it compatible with Sphinx
version 1.6.As the number of hacks required to build PDF output reduced, I hope
we'll have less troubles as newer versions of our documentation
toolchain are released (famous last words);- As usual, lots of driver cleanups and improvements"
* tag 'media/v4.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (624 commits)
media: leds: as3645a: add V4L2_FLASH_LED_CLASS dependency
media: get rid of removed DMX_GET_CAPS and DMX_SET_SOURCE leftovers
media: Revert "[media] v4l: async: make v4l2 coexist with devicetree nodes in a dt overlay"
media: staging: atomisp: sh_css_calloc shall return a pointer to the allocated space
media: Revert "[media] lirc_dev: remove superfluous get/put_device() calls"
media: add qcom_camss.rst to v4l-drivers rst file
media: dvb headers: make checkpatch happier
media: dvb uapi: move frontend legacy API to another part of the book
media: pixfmt-srggb12p.rst: better format the table for PDF output
media: docs-rst: media: Don't use \small for V4L2_PIX_FMT_SRGGB10 documentation
media: index.rst: don't write "Contents:" on PDF output
media: pixfmt*.rst: replace a two dots by a comma
media: vidioc-g-fmt.rst: adjust table format
media: vivid.rst: add a blank line to correct ReST format
media: v4l2 uapi book: get rid of driver programming's chapter
media: format.rst: use the right markup for important notes
media: docs-rst: cardlists: change their format to flat-tables
media: em28xx-cardlist.rst: update to reflect last changes
media: v4l2-event.rst: adjust table to fit on PDF output
media: docs: don't show ToC for each part on PDF output
... -
Pull MD updates from Shaohua Li:
"This update mainly fixes bugs:- Make raid5 ppl support several ppl from Pawel
- Several raid5-cache bug fixes from Song
- Bitmap fixes from Neil and Me
- One raid1/10 regression fix since 4.12 from Me
- Other small fixes and cleanup"
* tag 'md/4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
md/bitmap: disable bitmap_resize for file-backed bitmaps.
raid5-ppl: Recovery support for multiple partial parity logs
md: Runtime support for multiple ppls
md/raid0: attach correct cgroup info in bio
lib/raid6: align AVX512 constants to 512 bits, not bytes
raid5: remove raid5_build_block
md/r5cache: call mddev_lock/unlock() in r5c_journal_mode_show
md: replace seq_release_private with seq_release
md: notify about new spare disk in the container
md/raid1/10: reset bio allocated from mempool
md/raid5: release/flush io in raid5_do_work()
md/bitmap: copy correct data for bitmap super -
Pull MMC updates from Ulf Hansson:
"MMC core:
- Continue to refactor the mmc block code to prepare for blkmq
- Move mmc block debugfs into block module
- Next step for eMMC CMDQ by adding a new mmc host interface for it
- Move Kconfig option MMC_DEBUG from core to host
- Some additional minor improvementsMMC host:
- Declare structs as const when applicable
- Explicitly request exclusive reset control when applicable
- Improve some error paths and other various cleanups
- sdhci: Preparations to support SDHCI OMAP
- sdhci: Improve some PM related code
- sdhci: Re-factoring and modernizations
- sdhci-xenon: Add runtime PM and system sleep support
- sdhci-xenon: Add support for eMMC HS400 Enhanced Strobe
- sdhci-cadence: Add system sleep support
- sdhci-of-at91: Improve system sleep support
- dw_mmc: Add support for Hisilicon hi3660
- sunxi: Add support for A83T eMMC
- sunxi: Add support for DDR52 mode
- meson-gx: Add support for UHS-I SD-cards
- meson-gx: Cleanups and improvements
- tmio: Fix CMD12 (STOP) handling
- tmio: Cleanups and improvements
- renesas_sdhi: Add r8a7743/5 support
- renesas-sdhi: Add support for R-Car Gen3 SDHI DMAC
- renesas_sdhi: Cleanups and improvements"* tag 'mmc-v4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: (145 commits)
mmc: renesas_sdhi: Add r8a7743/5 support
mmc: meson-gx: fix __ffsdi2 undefined on arm32
mmc: sdhci-xenon: add runtime pm support and reimplement standby
mmc: core: Move mmc_start_areq() declaration
mmc: mmci: stop building qcom dml as module
mmc: sunxi: Reset the device at probe time
clk: sunxi-ng: Provide a default reset hook
mmc: meson-gx: rework tuning function
mmc: meson-gx: change default tx phase
mmc: meson-gx: implement voltage switch callback
mmc: meson-gx: use CCF to handle the clock phases
mmc: meson-gx: implement card_busy callback
mmc: meson-gx: simplify interrupt handler
mmc: meson-gx: work around clk-stop issue
mmc: meson-gx: fix dual data rate mode frequencies
mmc: meson-gx: rework clock init function
mmc: meson-gx: rework clk_set function
mmc: meson-gx: rework set_ios function
mmc: meson-gx: cfg init overwrite values
mmc: meson-gx: initialize sane clk default before clock register
... -
Pull block layer updates from Jens Axboe:
"This is the first pull request for 4.14, containing most of the code
changes. It's a quiet series this round, which I think we needed after
the churn of the last few series. This contains:- Fix for a registration race in loop, from Anton Volkov.
- Overflow complaint fix from Arnd for DAC960.
- Series of drbd changes from the usual suspects.
- Conversion of the stec/skd driver to blk-mq. From Bart.
- A few BFQ improvements/fixes from Paolo.
- CFQ improvement from Ritesh, allowing idling for group idle.
- A few fixes found by Dan's smatch, courtesy of Dan.
- A warning fixup for a race between changing the IO scheduler and
device remova. From David Jeffery.- A few nbd fixes from Josef.
- Support for cgroup info in blktrace, from Shaohua.
- Also from Shaohua, new features in the null_blk driver to allow it
to actually hold data, among other things.- Various corner cases and error handling fixes from Weiping Zhang.
- Improvements to the IO stats tracking for blk-mq from me. Can
drastically improve performance for fast devices and/or big
machines.- Series from Christoph removing bi_bdev as being needed for IO
submission, in preparation for nvme multipathing code.- Series from Bart, including various cleanups and fixes for switch
fall through case complaints"* 'for-4.14/block' of git://git.kernel.dk/linux-block: (162 commits)
kernfs: checking for IS_ERR() instead of NULL
drbd: remove BIOSET_NEED_RESCUER flag from drbd_{md_,}io_bio_set
drbd: Fix allyesconfig build, fix recent commit
drbd: switch from kmalloc() to kmalloc_array()
drbd: abort drbd_start_resync if there is no connection
drbd: move global variables to drbd namespace and make some static
drbd: rename "usermode_helper" to "drbd_usermode_helper"
drbd: fix race between handshake and admin disconnect/down
drbd: fix potential deadlock when trying to detach during handshake
drbd: A single dot should be put into a sequence.
drbd: fix rmmod cleanup, remove _all_ debugfs entries
drbd: Use setup_timer() instead of init_timer() to simplify the code.
drbd: fix potential get_ldev/put_ldev refcount imbalance during attach
drbd: new disk-option disable-write-same
drbd: Fix resource role for newly created resources in events2
drbd: mark symbols static where possible
drbd: Send P_NEG_ACK upon write error in protocol != C
drbd: add explicit plugging when submitting batches
drbd: change list_for_each_safe to while(list_first_entry_or_null)
drbd: introduce drbd_recv_header_maybe_unplug
... -
Pull xen updates from Juergen Gross:
- the new pvcalls backend for routing socket calls from a guest to dom0
- some cleanups of Xen code
- a fix for wrong usage of {get,put}_cpu()
* tag 'for-linus-4.14b-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (27 commits)
xen/mmu: set MMU_NORMAL_PT_UPDATE in remap_area_mfn_pte_fn
xen: Don't try to call xen_alloc_p2m_entry() on autotranslating guests
xen/events: events_fifo: Don't use {get,put}_cpu() in xen_evtchn_fifo_init()
xen/pvcalls: use WARN_ON(1) instead of __WARN()
xen: remove not used trace functions
xen: remove unused function xen_set_domain_pte()
xen: remove tests for pvh mode in pure pv paths
xen-platform: constify pci_device_id.
xen: cleanup xen.h
xen: introduce a Kconfig option to enable the pvcalls backend
xen/pvcalls: implement write
xen/pvcalls: implement read
xen/pvcalls: implement the ioworker functions
xen/pvcalls: disconnect and module_exit
xen/pvcalls: implement release command
xen/pvcalls: implement poll command
xen/pvcalls: implement accept command
xen/pvcalls: implement listen command
xen/pvcalls: implement bind command
xen/pvcalls: implement connect command
... -
Pull powerpc updates from Michael Ellerman:
"Nothing really major this release, despite quite a lot of activity.
Just lots of things all over the place.Some things of note include:
- Access via perf to a new type of PMU (IMC) on Power9, which can
count both core events as well as nest unit events (Memory
controller etc).- Optimisations to the radix MMU TLB flushing, mostly to avoid
unnecessary Page Walk Cache (PWC) flushes when the structure of the
tree is not changing.- Reworks/cleanups of do_page_fault() to modernise it and bring it
closer to other architectures where possible.- Rework of our page table walking so that THP updates only need to
send IPIs to CPUs where the affected mm has run, rather than all
CPUs.- The size of our vmalloc area is increased to 56T on 64-bit hash MMU
systems. This avoids problems with the percpu allocator on systems
with very sparse NUMA layouts.- STRICT_KERNEL_RWX support on PPC32.
- A new sched domain topology for Power9, to capture the fact that
pairs of cores may share an L2 cache.- Power9 support for VAS, which is a new mechanism for accessing
coprocessors, and initial support for using it with the NX
compression accelerator.- Major work on the instruction emulation support, adding support for
many new instructions, and reworking it so it can be used to
implement the emulation needed to fixup alignment faults.- Support for guests under PowerVM to use the Power9 XIVE interrupt
controller.And probably that many things again that are almost as interesting,
but I had to keep the list short. Plus the usual fixes and cleanups as
always.Thanks to: Alexey Kardashevskiy, Alistair Popple, Andreas Schwab,
Aneesh Kumar K.V, Anju T Sudhakar, Arvind Yadav, Balbir Singh,
Benjamin Herrenschmidt, Bhumika Goyal, Breno Leitao, Bryant G. Ly,
Christophe Leroy, Cédric Le Goater, Dan Carpenter, Dou Liyang,
Frederic Barrat, Gautham R. Shenoy, Geliang Tang, Geoff Levand, Hannes
Reinecke, Haren Myneni, Ivan Mikhaylov, John Allen, Julia Lawall,
LABBE Corentin, Laurentiu Tudor, Madhavan Srinivasan, Markus Elfring,
Masahiro Yamada, Matt Brown, Michael Neuling, Murilo Opsfelder Araujo,
Nathan Fontenot, Naveen N. Rao, Nicholas Piggin, Oliver O'Halloran,
Paul Mackerras, Rashmica Gupta, Rob Herring, Rui Teng, Sam Bobroff,
Santosh Sivaraj, Scott Wood, Shilpasri G Bhat, Sukadev Bhattiprolu,
Suraj Jitindar Singh, Tobin C. Harding, Victor Aoqui"* tag 'powerpc-4.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (321 commits)
powerpc/xive: Fix section __init warning
powerpc: Fix kernel crash in emulation of vector loads and stores
powerpc/xive: improve debugging macros
powerpc/xive: add XIVE Exploitation Mode to CAS
powerpc/xive: introduce H_INT_ESB hcall
powerpc/xive: add the HW IRQ number under xive_irq_data
powerpc/xive: introduce xive_esb_write()
powerpc/xive: rename xive_poke_esb() in xive_esb_read()
powerpc/xive: guest exploitation of the XIVE interrupt controller
powerpc/xive: introduce a common routine xive_queue_page_alloc()
powerpc/sstep: Avoid used uninitialized error
axonram: Return directly after a failed kzalloc() in axon_ram_probe()
axonram: Improve a size determination in axon_ram_probe()
axonram: Delete an error message for a failed memory allocation in axon_ram_probe()
powerpc/powernv/npu: Move tlb flush before launching ATSD
powerpc/macintosh: constify wf_sensor_ops structures
powerpc/iommu: Use permission-specific DEVICE_ATTR variants
powerpc/eeh: Delete an error out of memory message at init time
powerpc/mm: Use seq_putc() in two functions
macintosh: Convert to using %pOF instead of full_name
... -
Pull EFI updates from Ingo Molnar:
"The main changes in this cycle were:- Transparently fall back to other poweroff method(s) if EFI poweroff
fails (and returns)- Use separate PE/COFF section headers for the RX and RW parts of the
ARM stub loader so that the firmware can use strict mapping
permissions- Add support for requesting the firmware to wipe RAM at warm reboot
- Increase the size of the random seed obtained from UEFI so CRNG
fast init can complete earlier- Update the EFI framebuffer address if it points to a BAR that gets
moved by the PCI resource allocation code- Enable "reset attack mitigation" of TPM environments: this is
enabled if the kernel is configured with
CONFIG_RESET_ATTACK_MITIGATION=y.- Clang related fixes
- Misc cleanups, constification, refactoring, etc"
* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi/bgrt: Use efi_mem_type()
efi: Move efi_mem_type() to common code
efi/reboot: Make function pointer orig_pm_power_off static
efi/random: Increase size of firmware supplied randomness
efi/libstub: Enable reset attack mitigation
firmware/efi/esrt: Constify attribute_group structures
firmware/efi: Constify attribute_group structures
firmware/dcdbas: Constify attribute_group structures
arm/efi: Split zImage code and data into separate PE/COFF sections
arm/efi: Replace open coded constants with symbolic ones
arm/efi: Remove pointless dummy .reloc section
arm/efi: Remove forbidden values from the PE/COFF header
drivers/fbdev/efifb: Allow BAR to be moved instead of claiming it
efi/reboot: Fall back to original power-off method if EFI_RESET_SHUTDOWN returns
efi/arm/arm64: Add missing assignment of efi.config_table
efi/libstub/arm64: Set -fpie when building the EFI stub
efi/libstub/arm64: Force 'hidden' visibility for section markers
efi/libstub/arm64: Use hidden attribute for struct screen_info reference
efi/arm: Don't mark ACPI reclaim memory as MEMBLOCK_NOMAP -
Pull x86 platform updates from Ingo Molnar:
"The main changes include various Hyper-V optimizations such as faster
hypercalls and faster/better TLB flushes - and there's also some
Intel-MID cleanups"* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tracing/hyper-v: Trace hyperv_mmu_flush_tlb_others()
x86/hyper-v: Support extended CPU ranges for TLB flush hypercalls
x86/platform/intel-mid: Make several arrays static, to make code smaller
MAINTAINERS: Add missed file for Hyper-V
x86/hyper-v: Use hypercall for remote TLB flush
hyper-v: Globalize vp_index
x86/hyper-v: Implement rep hypercalls
hyper-v: Use fast hypercall for HVCALL_SIGNAL_EVENT
x86/hyper-v: Introduce fast hypercall implementation
x86/hyper-v: Make hv_do_hypercall() inline
x86/hyper-v: Include hyperv/ only when CONFIG_HYPERV is set
x86/platform/intel-mid: Make 'bt_sfi_data' const
x86/platform/intel-mid: Make IRQ allocation a bit more flexible
x86/platform/intel-mid: Group timers callbacks together
07 Sep, 2017
13 commits
-
Pull libata updates from Tejun Heo:
"Except for the ahci fix that fixes a boot issue, nothing major in this
pull request. Some new platform controller support and device specific
changes"* 'for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
libata: zpodd: make arrays cdb static, reduces object code size
ahci: don't use MSI for devices with the silly Intel NVMe remapping scheme
dt-bindings: ata: add DT bindings for MediaTek SATA controller
ata: mediatek: add support for MediaTek SATA controller
pata_octeon_cf: use of_property_read_{bool|u32}()
cs5536: add support for IDE controller variant
ata: sata_gemini: Introduce explicit IDE pin control
ata: sata_gemini: Retire custom pin control
ata: ahci_platform: Add shutdown handler
ata: sata_gemini: explicitly request exclusive reset control
ata: Drop unnecessary static
ata: Convert to using %pOF instead of full_name -
Merge updates from Andrew Morton:
- various misc bits
- DAX updates
- OCFS2
- most of MM
* emailed patches from Andrew Morton : (119 commits)
mm,fork: introduce MADV_WIPEONFORK
x86,mpx: make mpx depend on x86-64 to free up VMA flag
mm: add /proc/pid/smaps_rollup
mm: hugetlb: clear target sub-page last when clearing huge page
mm: oom: let oom_reap_task and exit_mmap run concurrently
swap: choose swap device according to numa node
mm: replace TIF_MEMDIE checks by tsk_is_oom_victim
mm, oom: do not rely on TIF_MEMDIE for memory reserves access
z3fold: use per-cpu unbuddied lists
mm, swap: don't use VMA based swap readahead if HDD is used as swap
mm, swap: add sysfs interface for VMA based swap readahead
mm, swap: VMA based swap readahead
mm, swap: fix swap readahead marking
mm, swap: add swap readahead hit statistics
mm/vmalloc.c: don't reinvent the wheel but use existing llist API
mm/vmstat.c: fix wrong comment
selftests/memfd: add memfd_create hugetlbfs selftest
mm/shmem: add hugetlbfs support to memfd_create()
mm, devm_memremap_pages: use multi-order radix for ZONE_DEVICE lookups
mm/vmalloc.c: halve the number of comparisons performed in pcpu_get_vm_areas()
... -
The .rw_page in struct block_device_operations is used by the swap
subsystem to read/write the page contents from/into the corresponding
swap slot in the swap device. To support the THP (Transparent Huge
Page) swap optimization, the .rw_page is enhanced to support to
read/write THP if possible.Link: http://lkml.kernel.org/r/20170724051840.2309-6-ying.huang@intel.com
Signed-off-by: "Huang, Ying"
Reviewed-by: Ross Zwisler [for brd.c, zram_drv.c, pmem.c]
Cc: Johannes Weiner
Cc: Minchan Kim
Cc: Dan Williams
Cc: Vishal L Verma
Cc: Jens Axboe
Cc: "Kirill A . Shutemov"
Cc: Andrea Arcangeli
Cc: Hugh Dickins
Cc: Michal Hocko
Cc: Rik van Riel
Cc: Shaohua Li
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patch adds document and kconfig for using of writeback feature.
Link: http://lkml.kernel.org/r/1498459987-24562-10-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim
Cc: Juneho Choi
Cc: Sergey Senozhatsky
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patch enables read IO from backing device. For the feature, it
implements two IO read functions to transfer data from backing storage.One is asynchronous IO function and other is synchronous one.
A reason I need synchrnous IO is due to partial write which need to
complete read IO before the overwriting partial data.We can make the partial IO's case asynchronous, too but at the moment, I
don't feel adding more complexity to support such rare use cases so want
to go with simple.[xieyisheng1@huawei.com: read_from_bdev_async(): return 1 to avoid call page_endio() in zram_rw_page()]
Link: http://lkml.kernel.org/r/1502707447-6944-1-git-send-email-xieyisheng1@huawei.com
Link: http://lkml.kernel.org/r/1498459987-24562-9-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim
Signed-off-by: Yisheng Xie
Cc: Juneho Choi
Cc: Sergey Senozhatsky
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patch enables write IO to transfer data to backing device. For
that, it implements write_to_bdev function which creates new bio and
chaining with parent bio to make the parent bio asynchrnous.For rw_page which don't have parent bio, it submit owned bio and handle
IO completion by zram_page_end_io.Also, this patch defines new flag ZRAM_WB to mark written page for later
read IO.[xieyisheng1@huawei.com: fix typo in comment]
Link: http://lkml.kernel.org/r/1502707447-6944-2-git-send-email-xieyisheng1@huawei.com
Link: http://lkml.kernel.org/r/1498459987-24562-8-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim
Signed-off-by: Yisheng Xie
Cc: Juneho Choi
Cc: Sergey Senozhatsky
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
For upcoming asynchronous IO like writeback, zram_rw_page should be
aware of that whether requested IO was completed or submitted
successfully, otherwise error.For the goal, zram_bvec_rw has three return values.
-errno: returns error number
0: IO request is done synchronously
1: IO request is issued successfully.Link: http://lkml.kernel.org/r/1498459987-24562-7-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim
Cc: Juneho Choi
Cc: Sergey Senozhatsky
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
With backing device, zram needs management of free space of backing
device.This patch adds bitmap logic to manage free space which is very naive.
However, it would be simple enough as considering uncompressible pages's
frequenty in zram.Link: http://lkml.kernel.org/r/1498459987-24562-6-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim
Cc: Juneho Choi
Cc: Sergey Senozhatsky
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
For writeback feature, user should set up backing device before the zram
working.This patch enables the interface via /sys/block/zramX/backing_dev.
Currently, it supports block device only but it could be enhanced for
file as well.Link: http://lkml.kernel.org/r/1498459987-24562-5-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim
Cc: Juneho Choi
Cc: Sergey Senozhatsky
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
zram_decompress_page naming is not proper because it doesn't decompress
if page was dedup hit or stored with compression.Use more abstract term and consistent with write path function
__zram_bvec_write.Link: http://lkml.kernel.org/r/1498459987-24562-4-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim
Cc: Juneho Choi
Cc: Sergey Senozhatsky
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
zram_compress does several things, compress, entry alloc and check
limitation. I did for just readbility but it hurts modulization.:(So this patch removes zram_compress functions and inline it in
__zram_bvec_write for upcoming patches.Link: http://lkml.kernel.org/r/1498459987-24562-3-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim
Cc: Juneho Choi
Cc: Sergey Senozhatsky
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Patch series "writeback incompressible pages to storage", v1.
zRam is useful for memory saving with compressible pages but sometime,
workload can be changed and system has lots of incompressible pages
which is very harmful for zram.This patch supports writeback feature of zram so admin can set up a
block device and with it, zram can save the memory via writing out the
incompressile pages once it found it's incompressible pages (1/4 comp
ratio) instead of keeping the page in memory.[1-3] is just clean up and [4-8] is step by step feature enablement.
[4-8] is logically not bisectable(ie, logical unit separation)
although I tried to compiled out without breaking but I think it would
be better to review.This patch (of 9):
__zram_bvec_write has some of duplicated logic for zram meta data
handling of same_page|compressed_page. This patch aims to clean it up
without behavior change.[xieyisheng1@huawei.com: fix compr_data_size stat]
Link: http://lkml.kernel.org/r/1502707447-6944-1-git-send-email-xieyisheng1@huawei.com
Link: http://lkml.kernel.org/r/1496019048-27016-1-git-send-email-minchan@kernel.org
Link: http://lkml.kernel.org/r/1498459987-24562-2-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim
Signed-off-by: Yisheng Xie
Reviewed-by: Sergey Senozhatsky
Cc: Juneho Choi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Historically we have enforced that any kernel zone (e.g ZONE_NORMAL) has
to precede the Movable zone in the physical memory range. The purpose
of the movable zone is, however, not bound to any physical memory
restriction. It merely defines a class of migrateable and reclaimable
memory.There are users (e.g. CMA) who might want to reserve specific physical
memory ranges for their own purpose. Moreover our pfn walkers have to
be prepared for zones overlapping in the physical range already because
we do support interleaving NUMA nodes and therefore zones can interleave
as well. This means we can allow each memory block to be associated
with a different zone.Loosen the current onlining semantic and allow explicit onlining type on
any memblock. That means that online_{kernel,movable} will be allowed
regardless of the physical address of the memblock as long as it is
offline of course. This might result in moveble zone overlapping with
other kernel zones. Default onlining then becomes a bit tricky but
still sensible. echo online > memoryXY/state will online the given
block to1) the default zone if the given range is outside of any zone
2) the enclosing zone if such a zone doesn't interleave with
any other zone
3) the default zone if more zones interleave for this rangewhere default zone is movable zone only if movable_node is enabled
otherwise it is a kernel zone.Here is an example of the semantic with (movable_node is not present but
it work in an analogous way). We start with following memblocks, all of
them offline:memory34/valid_zones:Normal Movable
memory35/valid_zones:Normal Movable
memory36/valid_zones:Normal Movable
memory37/valid_zones:Normal Movable
memory38/valid_zones:Normal Movable
memory39/valid_zones:Normal Movable
memory40/valid_zones:Normal Movable
memory41/valid_zones:Normal MovableNow, we online block 34 in default mode and block 37 as movable
root@test1:/sys/devices/system/node/node1# echo online > memory34/state
root@test1:/sys/devices/system/node/node1# echo online_movable > memory37/state
memory34/valid_zones:Normal
memory35/valid_zones:Normal Movable
memory36/valid_zones:Normal Movable
memory37/valid_zones:Movable
memory38/valid_zones:Normal Movable
memory39/valid_zones:Normal Movable
memory40/valid_zones:Normal Movable
memory41/valid_zones:Normal MovableAs we can see all other blocks can still be onlined both into Normal and
Movable zones and the Normal is default because the Movable zone spans
only block37 now.root@test1:/sys/devices/system/node/node1# echo online_movable > memory41/state
memory34/valid_zones:Normal
memory35/valid_zones:Normal Movable
memory36/valid_zones:Normal Movable
memory37/valid_zones:Movable
memory38/valid_zones:Movable Normal
memory39/valid_zones:Movable Normal
memory40/valid_zones:Movable Normal
memory41/valid_zones:MovableNow the default zone for blocks 37-41 has changed because movable zone
spans that range.root@test1:/sys/devices/system/node/node1# echo online_kernel > memory39/state
memory34/valid_zones:Normal
memory35/valid_zones:Normal Movable
memory36/valid_zones:Normal Movable
memory37/valid_zones:Movable
memory38/valid_zones:Normal Movable
memory39/valid_zones:Normal
memory40/valid_zones:Movable Normal
memory41/valid_zones:MovableNote that the block 39 now belongs to the zone Normal and so block38
falls into Normal by default as well.For completness
root@test1:/sys/devices/system/node/node1# for i in memory[34]?
do
echo online > $i/state 2>/dev/null
donememory34/valid_zones:Normal
memory35/valid_zones:Normal
memory36/valid_zones:Normal
memory37/valid_zones:Movable
memory38/valid_zones:Normal
memory39/valid_zones:Normal
memory40/valid_zones:Movable
memory41/valid_zones:MovableImplementation wise the change is quite straightforward. We can get rid
of allow_online_pfn_range altogether. online_pages allows only offline
nodes already. The original default_zone_for_pfn will become
default_kernel_zone_for_pfn. New default_zone_for_pfn implements the
above semantic. zone_for_pfn_range is slightly reorganized to implement
kernel and movable online type explicitly and MMOP_ONLINE_KEEP becomes a
catch all default behavior.Link: http://lkml.kernel.org/r/20170714121233.16861-3-mhocko@kernel.org
Signed-off-by: Michal Hocko
Acked-by: Joonsoo Kim
Acked-by: Vlastimil Babka
Acked-by: Reza Arbab
Cc: Mel Gorman
Cc: Andrea Arcangeli
Cc: Yasuaki Ishimatsu
Cc: Xishi Qiu
Cc: Kani Toshimitsu
Cc:
Cc: Daniel Kiper
Cc: Igor Mammedov
Cc: Vitaly Kuznetsov
Cc: Wei Yang
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds