Eric Lee / smarc-fsl-linux-kernel

06 May, 2017

1 commit

84b114b98 tcp: randomize timestamps on syncookies ... Browse Code »

Whole point of randomization was to hide server uptime, but an attacker
can simply start a syn flood and TCP generates 'old style' timestamps,
directly revealing server jiffies value.

Also, TSval sent by the server to a particular remote address vary
depending on syncookies being sent or not, potentially triggering PAWS
drops for innocent clients.

Lets implement proper randomization, including for SYNcookies.

Also we do not need to export sysctl_tcp_timestamps, since it is not
used from a module.

In v2, I added Florian feedback and contribution, adding tsoff to
tcp_get_cookie_sock().

v3 removed one unused variable in tcp_v4_connect() as Florian spotted.

Fixes: 95a22caee396c ("tcp: randomize tcp timestamp offsets for each connection")
Signed-off-by: Eric Dumazet
Reviewed-by: Florian Westphal
Tested-by: Florian Westphal
Cc: Yuchung Cheng
Signed-off-by: David S. Miller

Eric Dumazet
2017-05-06 00:00:11 +0800

05 May, 2017

11 commits

af82455f7 Merge tag 'char-misc-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc ... Browse Code »

Pull char/misc driver updates from Greg KH:
"Here is the big set of new char/misc driver drivers and features for
4.12-rc1.

There's lots of new drivers added this time around, new firmware
drivers from Google, more auxdisplay drivers, extcon drivers, fpga
drivers, and a bunch of other driver updates. Nothing major, except if
you happen to have the hardware for these drivers, and then you will
be happy :)

All of these have been in linux-next for a while with no reported
issues"

* tag 'char-misc-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (136 commits)
firmware: google memconsole: Fix return value check in platform_memconsole_init()
firmware: Google VPD: Fix return value check in vpd_platform_init()
goldfish_pipe: fix build warning about using too much stack.
goldfish_pipe: An implementation of more parallel pipe
fpga fr br: update supported version numbers
fpga: region: release FPGA region reference in error path
fpga altera-hps2fpga: disable/unprepare clock on error in alt_fpga_bridge_probe()
mei: drop the TODO from samples
firmware: Google VPD sysfs driver
firmware: Google VPD: import lib_vpd source files
misc: lkdtm: Add volatile to intentional NULL pointer reference
eeprom: idt_89hpesx: Add OF device ID table
misc: ds1682: Add OF device ID table
misc: tsl2550: Add OF device ID table
w1: Remove unneeded use of assert() and remove w1_log.h
w1: Use kernel common min() implementation
uio_mf624: Align memory regions to page size and set correct offsets
uio_mf624: Refactor memory info initialization
uio: Allow handling of non page-aligned memory regions
hangcheck-timer: Fix typo in comment
...

Linus Torvalds
2017-05-05 10:15:35 +0800
0be75179d Merge tag 'driver-core-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core ... Browse Code »

Pull driver core updates from Greg KH:
"Very tiny pull request for 4.12-rc1 for the driver core this time
around.

There are some documentation fixes, an eventpoll.h fixup to make it
easier for the libc developers to take our header files directly, and
some very minor driver core fixes and changes.

All have been in linux-next for a very long time with no reported
issues"

* tag 'driver-core-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
Revert "kref: double kref_put() in my_data_handler()"
driver core: don't initialize 'parent' in device_add()
drivers: base: dma-mapping: use nth_page helper
Documentation/ABI: add information about cpu_capacity
debugfs: set no_llseek in DEFINE_DEBUGFS_ATTRIBUTE
eventpoll.h: add missing epoll event masks
eventpoll.h: fix epoll event masks

Linus Torvalds
2017-05-05 09:27:46 +0800
8f28472a7 Merge tag 'usb-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb ... Browse Code »

Pull USB updates from Greg KH:
"Here is the big USB patchset for 4.12-rc1.

Lots of good stuff here, after many many many attempts, the kernel
finally has a working typeC interface, many thanks to Heikki and
Guenter and others who have taken the time to get this merged. It
wasn't an easy path for them at all.

There's also a staging driver that uses this new api, which is why
it's coming in through this tree.

Along with that, there's the usual huge number of changes for gadget
drivers, xhci, and other stuff. Johan also finally refactored pretty
much every driver that was looking at USB endpoints to do it in a
common way, which will help prevent any "badly-formed" devices from
causing problems in drivers. That too wasn't a simple task.

All of these have been in linux-next for a while with no reported
issues"

* tag 'usb-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (263 commits)
staging: typec: Fairchild FUSB302 Type-c chip driver
staging: typec: Type-C Port Controller Interface driver (tcpci)
staging: typec: USB Type-C Port Manager (tcpm)
usb: host: xhci: remove #ifdef around PM functions
usb: musb: don't mark of_dev_auxdata as initdata
usb: misc: legousbtower: Fix buffers on stack
USB: Revert "cdc-wdm: fix "out-of-sync" due to missing notifications"
usb: Make sure usb/phy/of gets built-in
USB: storage: e-mail update in drivers/usb/storage/unusual_devs.h
usb: host: xhci: print correct command ring address
usb: host: xhci: delete sp_dma_buffers for scratchpad
usb: host: xhci: using correct specification chapter reference for DCBAAP
xhci: switch to pci_alloc_irq_vectors
usb: host: xhci-plat: set resume_quirk() for R-Car controllers
usb: host: xhci-plat: add resume_quirk()
usb: host: xhci-plat: enable clk in resume timing
usb: host: plat: Enable xHCI plat runtime PM
USB: serial: ftdi_sio: add device ID for Microsemi/Arrow SF2PLUS Dev Kit
USB: serial: constify static arrays
usb: fix some references for /proc/bus/usb
...

Linus Torvalds
2017-05-05 09:03:51 +0800
4ac4d5848 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Pull networking fixes from David Miller:

1) The wireless rate info fix from Johannes Berg.

2) When a RAW socket is in hdrincl mode, we need to make sure that the
user provided at least a minimally sized ipv4/ipv6 header. Fix from
Alexander Potapenko.

3) We must emit IFLA_PHYS_PORT_NAME netlink attributes using
nla_put_string() so that it is NULL terminated.

4) Fix a bug in TCP fastopen handling, wherein child sockets
erroneously inherit the fastopen_req from the parent, and later can
end up derefencing freed memory or doing a double free. From Eric
Dumazet.

5) Don't clear out netdev stats at close time in tg3 driver, from
YueHaibing.

6) Fix refcount leak in xt_CT, from Gao Feng.

7) In nft_set_bitmap() don't leak dummy elements, from Liping Zhang.

8) Fix deadlock due to taking the expectation lock twice, also from
Liping Zhang.

9) Make xt_socket work again with ipv6, from Peter Tirsek.

10) Don't allow IPV6 to be used with IPVS if ipv6.disable=1, from Paolo
Abeni.

11) Make the BPF loader more flexible wrt. changes to the bpf MAP entry
layout. From Jesper Dangaard Brouer.

12) Fix ethtool reported device name in aquantia driver, from Pavel
Belous.

13) Fix build failures due to the compile time size test not working in
netfilter conntrack. From Geert Uytterhoeven.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (52 commits)
cfg80211: make RATE_INFO_BW_20 the default
ipv6: initialize route null entry in addrconf_init()
qede: Fix possible misconfiguration of advertised autoneg value.
qed: Fix overriding of supported autoneg value.
qed*: Fix possible overflow for status block id field.
rtnetlink: NUL-terminate IFLA_PHYS_PORT_NAME string
netvsc: make sure napi enabled before vmbus_open
aquantia: Fix driver name reported by ethtool
ipv4, ipv6: ensure raw socket message is big enough to hold an IP header
net/sched: remove redundant null check on head
tcp: do not inherit fastopen_req from parent
forcedeth: remove unnecessary carrier status check
ibmvnic: Move queue restarting in ibmvnic_tx_complete
ibmvnic: Record SKB RX queue during poll
ibmvnic: Continue skb processing after skb completion error
ibmvnic: Check for driver reset first in ibmvnic_xmit
ibmvnic: Wait for any pending scrqs entries at driver close
ibmvnic: Clean up tx pools when closing
ibmvnic: Whitespace correction in release_rx_pools
ibmvnic: Delete napi's when releasing driver resources
...

Linus Torvalds
2017-05-05 03:26:43 +0800
8d5e72dfd Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi ... Browse Code »

Pull SCSI updates from James Bottomley:
"This update includes the usual round of major driver updates
(hisi_sas, ufs, fnic, cxlflash, be2iscsi, ipr, stex). There's also the
usual amount of cosmetic and spelling stuff"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (155 commits)
scsi: qla4xxx: fix spelling mistake: "Tempalate" -> "Template"
scsi: stex: make S6flag static
scsi: mac_esp: fix to pass correct device identity to free_irq()
scsi: aacraid: pci_alloc_consistent() failures on ARM64
scsi: ufs: make ufshcd_get_lists_status() register operation obvious
scsi: ufs: use MASK_EE_STATUS
scsi: mac_esp: Replace bogus memory barrier with spinlock
scsi: fcoe: make fcoe_e_d_tov and fcoe_r_a_tov static
scsi: sd_zbc: Do not write lock zones for reset
scsi: sd_zbc: Remove superfluous assignments
scsi: sd: sd_zbc: Rename sd_zbc_setup_write_cmnd
scsi: Improve scsi_get_sense_info_fld
scsi: sd: Cleanup sd_done sense data handling
scsi: sd: Improve sd_completed_bytes
scsi: sd: Fix function descriptions
scsi: mpt3sas: remove redundant wmb
scsi: mpt: Move scsi_remove_host() out of mptscsih_remove_host()
scsi: sg: reset 'res_in_use' after unlinking reserved array
scsi: mvumi: remove code handling zero scsi_sg_count(scmd) case
scsi: fusion: fix spelling mistake: "Persistancy" -> "Persistency"
...

Linus Torvalds
2017-05-05 03:19:44 +0800
2bd804017 Merge tag 'gpio-v4.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio ... Browse Code »

Pull GPIO updates from Linus Walleij:
"This is the bulk of GPIO changes for the v4.12 kernel cycle.

Core changes:

- Return NULL from gpiod_get_optional() when GPIOLIB is disabled.
This was a much discussed change. It affects use cases where people
write drivers that might or might not be using GPIO resources. I
have decided that this is the lesser evil right now.

- Make gpiod_count() behave consistently across different hardware
descriptions.

- Fix the syntax around open drain/open source to not infer active
high/low semantics.

New drivers:

- A new single-register fixed-direction framework driver for hardware
that have lines controlled by a single register that just work in
one direction (out or in), including IRQ support.

- Support the Fintek F71889A GPIO SuperIO controller.

- Support the National NI 169445 MMIO GPIO.

- Support for the X-Gene derivative of the DWC GPIO controller

- Support for the Rohm BD9571MWV-M PMIC GPIO controller.

- Refactor the Gemini GPIO driver to a generic Faraday FTGPIO driver
and replace both the Gemini and the Moxa ART custom drivers with
this driver.

Driver improvements:

- A whole slew of drivers have their spinlocks chaned to raw
spinlocks as they provide irqchips, and thus we are progressing on
realtime compliance.

- Use devm_irq_alloc_descs() in a slew of drivers, getting managed
resources.

- Support for the embedded PWM controller inside the MVEBU driver.

- Debounce, open source and open drain support for the Aspeed driver.

- Misc smaller fixes like spelling and syntax and whatnot"

* tag 'gpio-v4.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: (77 commits)
gpio: f7188x: Add a missing break
gpio: omap: return error if requested debounce time is not possible
gpio: Add ROHM BD9571MWV-M PMIC GPIO driver
gpio: gpio-wcove: fix GPIO IRQ status mask
gpio: DT bindings, move tca9554 from pcf857x to pca953x
gpio: move tca9554 from pcf857x to pca953x
gpio: arizona: Correct check whether the pin is an input
gpio: Add XRA1403 DTS binding documentation
dt-bindings: add exar to vendor prefixes list
gpio: gpio-wcove: fix irq pending status bit width
gpio: dwapb: use dwapb_read instead of readl_relaxed
gpio: aspeed: Add open-source and open-drain support
gpio: aspeed: Add debounce support
gpio: aspeed: dt: Add optional clocks property
gpio: aspeed: dt: Fix description alignment in bindings document
gpio: mvebu: Add limited PWM support
gpio: Use unsigned int for interrupt numbers
gpio: f7188x: Add F71889A GPIO support.
gpio: core: Decouple open drain/source flag with active low/high
gpio: arizona: Correct handling for reading input GPIOs
...

Linus Torvalds
2017-05-05 03:05:32 +0800
99a7583de Merge tag 'platform-drivers-x86-v4.12-1' of git://git.infradead.org/linux-platform-drivers-x86 ... Browse Code »

Pull x86 platform-drivers update from Darren Hart:
"This represents a significantly larger and more complex set of changes
than those of prior merge windows.

In particular, we had several changes with dependencies on other
subsystems which we felt were best managed through merges of immutable
branches, including one each from input, i2c, and leds. Two patches
for the watchdog subsystem are included after discussion with Wim and
Guenter following a collision in linux-next (this should be resolved
and you should only see these two appear in this pull request). These
are called out in the "External" section below.

Summary of changes:
- significant further cleanup of fujitsu-laptop and hp-wmi
- new model support for ideapad, asus, silead, and xiaomi
- new hotkeys for thinkpad and models using intel-vbtn
- dell keyboard backlight improvements
- build and dependency improvements
- intel * ipc fixes, cleanups, and api updates
- single isolated fixes noted below

External:
- watchdog: iTCO_wdt: Add PMC specific noreboot update api
- watchdog: iTCO_wdt: cleanup set/unset no_reboot_bit functions
- Merge branch 'ib/4.10-sparse-keymap-managed'
- Merge branch 'i2c/for-INT33FE'
- Merge branch 'linux-leds/dell-laptop-changes-for-4.12'

platform/x86:
- Add Intel Cherry Trail ACPI INT33FE device driver
- remove sparse_keymap_free() calls
- Make SILEAD_DMI depend on TOUCHSCREEN_SILEAD

asus-wmi:
- try to set als by default
- fix cpufv sysfs file permission

acer-wmi:
- setup accelerometer when ACPI device was found

ideapad-laptop:
- Add IdeaPad V310-15ISK to no_hw_rfkill
- Add IdeaPad 310-15IKB to no_hw_rfkill

intel_pmc_ipc:
- use gcr mem base for S0ix counter read
- Fix iTCO_wdt GCS memory mapping failure
- Add pmc gcr read/write/update api's
- fix gcr offset

dell-laptop:
- Add keyboard backlight timeout AC settings
- Handle return error form dell_get_intensity.
- Protect kbd_state against races
- Refactor kbd_led_triggers_store()

hp-wireless:
- reuse module_acpi_driver
- add Xiaomi's hardware id to the supported list

intel-vbtn:
- add volume up and down

INT33FE:
- add i2c dependency

hp-wmi:
- Cleanup exit paths
- Do not shadow errors in sysfs show functions
- Use DEVICE_ATTR_(RO|RW) helper macros
- Refactor dock and tablet state fetchers
- Cleanup wireless get_(hw|sw)state functions
- Refactor redundant HPWMI_READ functions
- Standardize enum usage for constants
- Cleanup local variable declarations
- Do not shadow error values
- Fix detection for dock and tablet mode
- Fix error value for hp_wmi_tablet_state

fujitsu-laptop:
- simplify error handling in acpi_fujitsu_laptop_add()
- do not log LED registration failures
- switch to managed LED class devices
- reorganize LED-related code
- refactor LED registration
- select LEDS_CLASS
- remove redundant fields from struct fujitsu_bl
- account for backlight power when determining brightness
- do not log set_lcd_level() failures in bl_update_status()
- ignore errors when setting backlight power
- make disable_brightness_adjust a boolean
- clean up use_alt_lcd_levels handling
- sync brightness in set_lcd_level()
- simplify set_lcd_level()
- merge set_lcd_level_alt() into set_lcd_level()
- switch to a managed backlight device
- only handle backlight when appropriate
- update debug message logged by call_fext_func()
- rename call_fext_func() arguments
- simplify call_fext_func()
- clean up local variables in call_fext_func()
- remove keycode fields from struct fujitsu_bl
- model-dependent sparse keymap overrides
- use a sparse keymap for hotkey event generation
- switch to a managed hotkey input device
- refactor hotkey input device setup
- use a sparse keymap for brightness key events
- switch to a managed backlight input device
- refactor backlight input device setup
- remove pf_device field from struct fujitsu_bl
- only register platform device if FUJ02E3 is present
- add and remove platform device in separate functions
- simplify platform device attribute definitions
- remove backlight-related attributes from the platform device
- cleanup error labels in fujitsu_init()
- only register backlight device if FUJ02B1 is present
- sync backlight power status in acpi_fujitsu_laptop_add()
- register backlight device in a separate function
- simplify brightness key event generation logic
- decrease indentation in acpi_fujitsu_bl_notify()

intel-hid:
- Add missing ->thaw callback
- do not set parents of input devices explicitly
- remove redundant set_bit() call
- use devm_input_allocate_device() for HID events input device
- make intel_hid_set_enable() take a boolean argument
- simplify enabling/disabling HID events

silead_dmi:
- Add touchscreen info for Surftab Wintron 7.0
- Abort early if DMI does not match
- Do not treat all devices as i2c_clients
- Add entry for Insyde 7W tablets
- Constify properties arrays

intel_scu_ipc:
- Introduce intel_scu_ipc_raw_command()
- Introduce SCU_DEVICE() macro
- Remove redundant subarch check
- Rearrange init sequence
- Platform data is mandatory

asus-nb-wmi:
- Add wapf4 quirk for the X302UA

dell-*:
- Call new led hw_changed API on kbd brightness change
- Add a generic dell-laptop notifier chain

eeepc-laptop:
- Skip unknown key messages 0x50 0x51

thinkpad_acpi:
- add mapping for new hotkeys
- guard generic hotkey case"

* tag 'platform-drivers-x86-v4.12-1' of git://git.infradead.org/linux-platform-drivers-x86: (108 commits)
platform/x86: Make SILEAD_DMI depend on TOUCHSCREEN_SILEAD
platform/x86: asus-wmi: try to set als by default
platform/x86: asus-wmi: fix cpufv sysfs file permission
platform/x86: acer-wmi: setup accelerometer when ACPI device was found
platform/x86: ideapad-laptop: Add IdeaPad V310-15ISK to no_hw_rfkill
platform/x86: intel_pmc_ipc: use gcr mem base for S0ix counter read
platform/x86: intel_pmc_ipc: Fix iTCO_wdt GCS memory mapping failure
watchdog: iTCO_wdt: Add PMC specific noreboot update api
watchdog: iTCO_wdt: cleanup set/unset no_reboot_bit functions
platform/x86: intel_pmc_ipc: Add pmc gcr read/write/update api's
platform/x86: intel_pmc_ipc: fix gcr offset
platform/x86: dell-laptop: Add keyboard backlight timeout AC settings
platform/x86: dell-laptop: Handle return error form dell_get_intensity.
platform/x86: hp-wireless: reuse module_acpi_driver
platform/x86: intel-vbtn: add volume up and down
platform/x86: INT33FE: add i2c dependency
platform/x86: hp-wmi: Cleanup exit paths
platform/x86: hp-wmi: Do not shadow errors in sysfs show functions
platform/x86: hp-wmi: Use DEVICE_ATTR_(RO|RW) helper macros
platform/x86: hp-wmi: Refactor dock and tablet state fetchers
...

Linus Torvalds
2017-05-05 02:56:59 +0800
a96480723 Merge tag 'for-linus-4.12b-rc0b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip ... Browse Code »

Pull xen updates from Juergen Gross:
"Xen fixes and featrues for 4.12. The main changes are:

- enable building the kernel with Xen support but without enabling
paravirtualized mode (Vitaly Kuznetsov)

- add a new 9pfs xen frontend driver (Stefano Stabellini)

- simplify Xen's cpuid handling by making use of cpu capabilities
(Juergen Gross)

- add/modify some headers for new Xen paravirtualized devices
(Oleksandr Andrushchenko)

- EFI reset_system support under Xen (Julien Grall)

- and the usual cleanups and corrections"

* tag 'for-linus-4.12b-rc0b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (57 commits)
xen: Move xen_have_vector_callback definition to enlighten.c
xen: Implement EFI reset_system callback
arm/xen: Consolidate calls to shutdown hypercall in a single helper
xen: Export xen_reboot
xen/x86: Call xen_smp_intr_init_pv() on BSP
xen: Revert commits da72ff5bfcb0 and 72a9b186292d
xen/pvh: Do not fill kernel's e820 map in init_pvh_bootparams()
xen/scsifront: use offset_in_page() macro
xen/arm,arm64: rename __generic_dma_ops to xen_get_dma_ops
xen/arm,arm64: fix xen_dma_ops after 815dd18 "Consolidate get_dma_ops..."
xen/9pfs: select CONFIG_XEN_XENBUS_FRONTEND
x86/cpu: remove hypervisor specific set_cpu_features
vmware: set cpu capabilities during platform initialization
x86/xen: use capabilities instead of fake cpuid values for xsave
x86/xen: use capabilities instead of fake cpuid values for x2apic
x86/xen: use capabilities instead of fake cpuid values for mwait
x86/xen: use capabilities instead of fake cpuid values for acpi
x86/xen: use capabilities instead of fake cpuid values for acc
x86/xen: use capabilities instead of fake cpuid values for mtrr
x86/xen: use capabilities instead of fake cpuid values for aperf
...

Linus Torvalds
2017-05-05 02:37:09 +0800
842be75c7 cfg80211: make RATE_INFO_BW_20 the default ... Browse Code »

Due to the way I did the RX bitrate conversions in mac80211 with
spatch, going setting flags to setting the value, many drivers now
don't set the bandwidth value for 20 MHz, since with the flags it
wasn't necessary to (there was no 20 MHz flag, only the others.)

Rather than go through and try to fix up all the drivers, instead
renumber the enum so that 20 MHz, which is the typical bandwidth,
actually has the value 0, making those drivers all work again.

If VHT was hit used with a driver not reporting it, e.g. iwlmvm,
this manifested in hitting the bandwidth warning in
cfg80211_calculate_bitrate_vht().

Reported-by: Linus Torvalds
Tested-by: Jens Axboe
Signed-off-by: Johannes Berg
Signed-off-by: David S. Miller

Johannes Berg
2017-05-05 01:15:28 +0800
2f460933f ipv6: initialize route null entry in addrconf_init() ... Browse Code »

Andrey reported a crash on init_net.ipv6.ip6_null_entry->rt6i_idev
since it is always NULL.

This is clearly wrong, we have code to initialize it to loopback_dev,
unfortunately the order is still not correct.

loopback_dev is registered very early during boot, we lose a chance
to re-initialize it in notifier. addrconf_init() is called after
ip6_route_init(), which means we have no chance to correct it.

Fix it by moving this initialization explicitly after
ipv6_add_dev(init_net.loopback_dev) in addrconf_init().

Reported-by: Andrey Konovalov
Signed-off-by: Cong Wang
Tested-by: Andrey Konovalov
Signed-off-by: David S. Miller

WANG Cong
2017-05-05 00:51:24 +0800
f870a3c67 qed*: Fix possible overflow for status block id field. ... Browse Code »

Value for status block id could be more than 256 in 100G mode, need to
update its data type from u8 to u16.

Signed-off-by: Sudarsana Reddy Kalluru
Signed-off-by: Yuval Mintz
Signed-off-by: David S. Miller

sudarsana.kalluru@cavium.com
2017-05-05 00:31:02 +0800

04 May, 2017

28 commits

a1be8edda Merge tag 'modules-for-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux ... Browse Code »

Pull modules updates from Jessica Yu:

- Minor code cleanups

- Fix section alignment for .init_array

* tag 'modules-for-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
kallsyms: Use bounded strnchr() when parsing string
module: Unify the return value type of try_module_get
module: set .init_array alignment to 8

Linus Torvalds
2017-05-04 10:12:27 +0800
4c174688e Merge tag 'trace-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace ... Browse Code »

Pull tracing updates from Steven Rostedt:
"New features for this release:

- Pretty much a full rewrite of the processing of function plugins.
i.e. echo do_IRQ:stacktrace > set_ftrace_filter

- The rewrite was needed to add plugins to be unique to tracing
instances. i.e. mkdir instance/foo; cd instances/foo; echo
do_IRQ:stacktrace > set_ftrace_filter The old way was written very
hacky. This removes a lot of those hacks.

- New "function-fork" tracing option. When set, pids in the
set_ftrace_pid will have their children added when the processes
with their pids listed in the set_ftrace_pid file forks.

- Exposure of "maxactive" for kretprobe in kprobe_events

- Allow for builtin init functions to be traced by the function
tracer (via the kernel command line). Module init function tracing
will come in the next release.

- Added more selftests, and have selftests also test in an instance"

* tag 'trace-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (60 commits)
ring-buffer: Return reader page back into existing ring buffer
selftests: ftrace: Allow some event trigger tests to run in an instance
selftests: ftrace: Have some basic tests run in a tracing instance too
selftests: ftrace: Have event tests also run in an tracing instance
selftests: ftrace: Make func_event_triggers and func_traceonoff_triggers tests do instances
selftests: ftrace: Allow some tests to be run in a tracing instance
tracing/ftrace: Allow for instances to trigger their own stacktrace probes
tracing/ftrace: Allow for the traceonoff probe be unique to instances
tracing/ftrace: Enable snapshot function trigger to work with instances
tracing/ftrace: Allow instances to have their own function probes
tracing/ftrace: Add a better way to pass data via the probe functions
ftrace: Dynamically create the probe ftrace_ops for the trace_array
tracing: Pass the trace_array into ftrace_probe_ops functions
tracing: Have the trace_array hold the list of registered func probes
ftrace: If the hash for a probe fails to update then free what was initialized
ftrace: Have the function probes call their own function
ftrace: Have each function probe use its own ftrace_ops
ftrace: Have unregister_ftrace_function_probe_func() return a value
ftrace: Add helper function ftrace_hash_move_and_update_ops()
ftrace: Remove data field from ftrace_func_probe structure
...

Linus Torvalds
2017-05-04 09:41:21 +0800
dd23f273d Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge misc updates from Andrew Morton:

- a few misc things

- most of MM

- KASAN updates

* emailed patches from Andrew Morton : (102 commits)
kasan: separate report parts by empty lines
kasan: improve double-free report format
kasan: print page description after stacks
kasan: improve slab object description
kasan: change report header
kasan: simplify address description logic
kasan: change allocation and freeing stack traces headers
kasan: unify report headers
kasan: introduce helper functions for determining bug type
mm: hwpoison: call shake_page() after try_to_unmap() for mlocked page
mm: hwpoison: call shake_page() unconditionally
mm/swapfile.c: fix swap space leak in error path of swap_free_entries()
mm/gup.c: fix access_ok() argument type
mm/truncate: avoid pointless cleancache_invalidate_inode() calls.
mm/truncate: bail out early from invalidate_inode_pages2_range() if mapping is empty
fs/block_dev: always invalidate cleancache in invalidate_bdev()
fs: fix data invalidation in the cleancache during direct IO
zram: reduce load operation in page_same_filled
zram: use zram_free_page instead of open-coded
zram: introduce zram data accessor
...

Linus Torvalds
2017-05-04 08:55:59 +0800
df6b74998 mm, swap: remove unused function prototype ... Browse Code »

This is a code cleanup patch, no functionality changes. There are 2
unused function prototype in swap.h, they are removed.

Link: http://lkml.kernel.org/r/20170405071017.23677-1-ying.huang@intel.com
Signed-off-by: "Huang, Ying"
Cc: Tim Chen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Huang Ying
2017-05-04 06:52:11 +0800
ccda7f436 mm: memcontrol: use node page state naming scheme for memcg ... Browse Code »

The memory controllers stat function names are awkwardly long and
arbitrarily different from the zone and node stat functions.

The current interface is named:

mem_cgroup_read_stat()
mem_cgroup_update_stat()
mem_cgroup_inc_stat()
mem_cgroup_dec_stat()
mem_cgroup_update_page_stat()
mem_cgroup_inc_page_stat()
mem_cgroup_dec_page_stat()

This patch renames it to match the corresponding node stat functions:

memcg_page_state() [node_page_state()]
mod_memcg_state() [mod_node_state()]
inc_memcg_state() [inc_node_state()]
dec_memcg_state() [dec_node_state()]
mod_memcg_page_state() [mod_node_page_state()]
inc_memcg_page_state() [inc_node_page_state()]
dec_memcg_page_state() [dec_node_page_state()]

Link: http://lkml.kernel.org/r/20170404220148.28338-4-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner
Acked-by: Vladimir Davydov
Acked-by: Michal Hocko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2017-05-04 06:52:11 +0800
71cd31135 mm: memcontrol: re-use node VM page state enum ... Browse Code »

The current duplication is a high-maintenance mess, and it's painful to
add new items or query memcg state from the rest of the VM.

This increases the size of the stat array marginally, but we should aim
to track all these stats on a per-cgroup level anyway.

Link: http://lkml.kernel.org/r/20170404220148.28338-3-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner
Acked-by: Vladimir Davydov
Cc: Michal Hocko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2017-05-04 06:52:11 +0800
df0e53d06 mm: memcontrol: re-use global VM event enum ... Browse Code »

The current duplication is a high-maintenance mess, and it's painful to
add new items.

This increases the size of the event array, but we'll eventually want
most of the VM events tracked on a per-cgroup basis anyway.

Link: http://lkml.kernel.org/r/20170404220148.28338-2-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner
Acked-by: Vladimir Davydov
Cc: Michal Hocko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2017-05-04 06:52:11 +0800
31176c781 mm: memcontrol: clean up memory.events counting function ... Browse Code »

We only ever count single events, drop the @nr parameter. Rename the
function accordingly. Remove low-information kerneldoc.

Link: http://lkml.kernel.org/r/20170404220148.28338-1-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner
Acked-by: Vladimir Davydov
Acked-by: Michal Hocko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2017-05-04 06:52:11 +0800
2a2e48854 mm: vmscan: fix IO/refault regression in cache workingset transition ... Browse Code »

Since commit 59dc76b0d4df ("mm: vmscan: reduce size of inactive file
list") we noticed bigger IO spikes during changes in cache access
patterns.

The patch in question shrunk the inactive list size to leave more room
for the current workingset in the presence of streaming IO. However,
workingset transitions that previously happened on the inactive list are
now pushed out of memory and incur more refaults to complete.

This patch disables active list protection when refaults are being
observed. This accelerates workingset transitions, and allows more of
the new set to establish itself from memory, without eating into the
ability to protect the established workingset during stable periods.

The workloads that were measurably affected for us were hit pretty bad
by it, with refault/majfault rates doubling and tripling during cache
transitions, and the machines sustaining half-hour periods of 100% IO
utilization, where they'd previously have sub-minute peaks at 60-90%.

Stateful services that handle user data tend to be more conservative
with kernel upgrades. As a result we hit most page cache issues with
some delay, as was the case here.

The severity seemed to warrant a stable tag.

Fixes: 59dc76b0d4df ("mm: vmscan: reduce size of inactive file list")
Link: http://lkml.kernel.org/r/20170404220052.27593-1-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner
Cc: Rik van Riel
Cc: Mel Gorman
Cc: Michal Hocko
Cc: Vladimir Davydov
Cc: [4.7+]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2017-05-04 06:52:11 +0800
ac2e8e40a mm: fix spelling error ... Browse Code »

Fix variable name error in comments. No code changes.

Link: http://lkml.kernel.org/r/20170403161655.5081-1-haolee.swjtu@gmail.com
Signed-off-by: Hao Lee
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hao Lee
2017-05-04 06:52:10 +0800
9927e3887 include/linux/migrate.h: add arg names to prototype ... Browse Code »

It is preferred, and the rest of migrate.h gets it right.

Link: http://lkml.kernel.org/r/1490336009-8024-1-git-send-email-pushkar.iit@gmail.com
Signed-off-by: Pushkar Jambhlekar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pushkar Jambhlekar
2017-05-04 06:52:10 +0800
bd33ef368 mm: enable page poisoning early at boot ... Browse Code »

On SPARSEMEM systems page poisoning is enabled after buddy is up,
because of the dependency on page extension init. This causes the pages
released by free_all_bootmem not to be poisoned. This either delays or
misses the identification of some issues because the pages have to
undergo another cycle of alloc-free-alloc for any corruption to be
detected.

Enable page poisoning early by getting rid of the PAGE_EXT_DEBUG_POISON
flag. Since all the free pages will now be poisoned, the flag need not
be verified before checking the poison during an alloc.

[vinmenon@codeaurora.org: fix Kconfig]
Link: http://lkml.kernel.org/r/1490878002-14423-1-git-send-email-vinmenon@codeaurora.org
Link: http://lkml.kernel.org/r/1490358246-11001-1-git-send-email-vinmenon@codeaurora.org
Signed-off-by: Vinayak Menon
Acked-by: Laura Abbott
Tested-by: Laura Abbott
Cc: Joonsoo Kim
Cc: Michal Hocko
Cc: Akinobu Mita
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vinayak Menon
2017-05-04 06:52:10 +0800
83612a948 mm: remove SWAP_[SUCCESS|AGAIN|FAIL] ... Browse Code »

There is no user for it. Remove it.

[minchan@kernel.org: use false instead of SWAP_FAIL]
Link: http://lkml.kernel.org/r/20170316053313.GA19241@bbox
Link: http://lkml.kernel.org/r/1489555493-14659-11-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim
Cc: Anshuman Khandual
Cc: Hillf Danton
Cc: Johannes Weiner
Cc: Kirill A. Shutemov
Cc: Michal Hocko
Cc: Naoya Horiguchi
Cc: Vlastimil Babka
Cc: Sergey Senozhatsky
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2017-05-04 06:52:10 +0800
e4b822227 mm: make rmap_one boolean function ... Browse Code »

rmap_one's return value controls whether rmap_work should contine to
scan other ptes or not so it's target for changing to boolean. Return
true if the scan should be continued. Otherwise, return false to stop
the scanning.

This patch makes rmap_one's return value to boolean.

Link: http://lkml.kernel.org/r/1489555493-14659-10-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim
Cc: Anshuman Khandual
Cc: Hillf Danton
Cc: Johannes Weiner
Cc: Kirill A. Shutemov
Cc: Michal Hocko
Cc: Naoya Horiguchi
Cc: Vlastimil Babka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2017-05-04 06:52:10 +0800
1df631ae1 mm: make rmap_walk() return void ... Browse Code »

There is no user of the return value from rmap_walk() and friends so
this patch makes them void-returning functions.

Link: http://lkml.kernel.org/r/1489555493-14659-9-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim
Cc: Anshuman Khandual
Cc: Hillf Danton
Cc: Johannes Weiner
Cc: Kirill A. Shutemov
Cc: Michal Hocko
Cc: Naoya Horiguchi
Cc: Vlastimil Babka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2017-05-04 06:52:10 +0800
666e5a406 mm: make ttu's return boolean ... Browse Code »

try_to_unmap() returns SWAP_SUCCESS or SWAP_FAIL so it's suitable for
boolean return. This patch changes it.

Link: http://lkml.kernel.org/r/1489555493-14659-8-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim
Cc: Naoya Horiguchi
Cc: Anshuman Khandual
Cc: Hillf Danton
Cc: Johannes Weiner
Cc: Kirill A. Shutemov
Cc: Michal Hocko
Cc: Vlastimil Babka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2017-05-04 06:52:10 +0800
ad6b67041 mm: remove SWAP_MLOCK in ttu ... Browse Code »

ttu doesn't need to return SWAP_MLOCK. Instead, just return SWAP_FAIL
because it means the page is not-swappable so it should move to another
LRU list(active or unevictable). putback friends will move it to right
list depending on the page's LRU flag.

Link: http://lkml.kernel.org/r/1489555493-14659-6-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim
Cc: Anshuman Khandual
Cc: Hillf Danton
Cc: Johannes Weiner
Cc: Kirill A. Shutemov
Cc: Michal Hocko
Cc: Naoya Horiguchi
Cc: Vlastimil Babka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2017-05-04 06:52:10 +0800
192d72325 mm: make try_to_munlock() return void ... Browse Code »

try_to_munlock returns SWAP_MLOCK if the one of VMAs mapped the page has
VM_LOCKED flag. In that time, VM set PG_mlocked to the page if the page
is not pte-mapped THP which cannot be mlocked, either.

With that, __munlock_isolated_page can use PageMlocked to check whether
try_to_munlock is successful or not without relying on try_to_munlock's
retval. It helps to make try_to_unmap/try_to_unmap_one simple with
upcoming patches.

[minchan@kernel.org: remove PG_Mlocked VM_BUG_ON check]
Link: http://lkml.kernel.org/r/20170411025615.GA6545@bbox
Link: http://lkml.kernel.org/r/1489555493-14659-5-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim
Acked-by: Kirill A. Shutemov
Acked-by: Vlastimil Babka
Cc: Anshuman Khandual
Cc: Hillf Danton
Cc: Johannes Weiner
Cc: Michal Hocko
Cc: Naoya Horiguchi
Cc: Sasha Levin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2017-05-04 06:52:10 +0800
18863d3a3 mm: remove SWAP_DIRTY in ttu ... Browse Code »

If we found lazyfree page is dirty, try_to_unmap_one can just
SetPageSwapBakced in there like PG_mlocked page and just return with
SWAP_FAIL which is very natural because the page is not swappable right
now so that vmscan can activate it. There is no point to introduce new
return value SWAP_DIRTY in try_to_unmap at the moment.

Link: http://lkml.kernel.org/r/1489555493-14659-3-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim
Acked-by: Hillf Danton
Acked-by: Kirill A. Shutemov
Cc: Anshuman Khandual
Cc: Johannes Weiner
Cc: Michal Hocko
Cc: Naoya Horiguchi
Cc: Vlastimil Babka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2017-05-04 06:52:10 +0800
056b9d8a7 mm: remove rodata_test_data export, add pr_fmt ... Browse Code »

Since commit 3ad38ceb2769 ("x86/mm: Remove CONFIG_DEBUG_NX_TEST"),
nothing is using the exported rodata_test_data variable, so drop the
export.

This additionally updates the pr_fmt to avoid redundant strings and
adjusts some whitespace.

Link: http://lkml.kernel.org/r/20170307005313.GA85809@beast
Signed-off-by: Kees Cook
Cc: Jinbum Park
Cc: Arjan van de Ven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kees Cook
2017-05-04 06:52:09 +0800
81378da64 jbd2: mark the transaction context with the scope GFP_NOFS context ... Browse Code »

now that we have memalloc_nofs_{save,restore} api we can mark the whole
transaction context as implicitly GFP_NOFS. All allocations will
automatically inherit GFP_NOFS this way. This means that we do not have
to mark any of those requests with GFP_NOFS and moreover all the
ext4_kv[mz]alloc(GFP_NOFS) are also safe now because even the hardcoded
GFP_KERNEL allocations deep inside the vmalloc will be NOFS now.

[akpm@linux-foundation.org: tweak comments]
Link: http://lkml.kernel.org/r/20170306131408.9828-7-mhocko@kernel.org
Signed-off-by: Michal Hocko
Reviewed-by: Jan Kara
Cc: Dave Chinner
Cc: Theodore Ts'o
Cc: Chris Mason
Cc: David Sterba
Cc: Brian Foster
Cc: Darrick J. Wong
Cc: Nikolay Borisov
Cc: Peter Zijlstra
Cc: Vlastimil Babka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2017-05-04 06:52:09 +0800
7dea19f9e mm: introduce memalloc_nofs_{save,restore} API ... Browse Code »

GFP_NOFS context is used for the following 5 reasons currently:

- to prevent from deadlocks when the lock held by the allocation
context would be needed during the memory reclaim

- to prevent from stack overflows during the reclaim because the
allocation is performed from a deep context already

- to prevent lockups when the allocation context depends on other
reclaimers to make a forward progress indirectly

- just in case because this would be safe from the fs POV

- silence lockdep false positives

Unfortunately overuse of this allocation context brings some problems to
the MM. Memory reclaim is much weaker (especially during heavy FS
metadata workloads), OOM killer cannot be invoked because the MM layer
doesn't have enough information about how much memory is freeable by the
FS layer.

In many cases it is far from clear why the weaker context is even used
and so it might be used unnecessarily. We would like to get rid of
those as much as possible. One way to do that is to use the flag in
scopes rather than isolated cases. Such a scope is declared when really
necessary, tracked per task and all the allocation requests from within
the context will simply inherit the GFP_NOFS semantic.

Not only this is easier to understand and maintain because there are
much less problematic contexts than specific allocation requests, this
also helps code paths where FS layer interacts with other layers (e.g.
crypto, security modules, MM etc...) and there is no easy way to convey
the allocation context between the layers.

Introduce memalloc_nofs_{save,restore} API to control the scope of
GFP_NOFS allocation context. This is basically copying
memalloc_noio_{save,restore} API we have for other restricted allocation
context GFP_NOIO. The PF_MEMALLOC_NOFS flag already exists and it is
just an alias for PF_FSTRANS which has been xfs specific until recently.
There are no more PF_FSTRANS users anymore so let's just drop it.

PF_MEMALLOC_NOFS is now checked in the MM layer and drops __GFP_FS
implicitly same as PF_MEMALLOC_NOIO drops __GFP_IO. memalloc_noio_flags
is renamed to current_gfp_context because it now cares about both
PF_MEMALLOC_NOFS and PF_MEMALLOC_NOIO contexts. Xfs code paths preserve
their semantic. kmem_flags_convert() doesn't need to evaluate the flag
anymore.

This patch shouldn't introduce any functional changes.

Let's hope that filesystems will drop direct GFP_NOFS (resp. ~__GFP_FS)
usage as much as possible and only use a properly documented
memalloc_nofs_{save,restore} checkpoints where they are appropriate.

[akpm@linux-foundation.org: fix comment typo, reflow comment]
Link: http://lkml.kernel.org/r/20170306131408.9828-5-mhocko@kernel.org
Signed-off-by: Michal Hocko
Acked-by: Vlastimil Babka
Cc: Dave Chinner
Cc: Theodore Ts'o
Cc: Chris Mason
Cc: David Sterba
Cc: Jan Kara
Cc: Brian Foster
Cc: Darrick J. Wong
Cc: Nikolay Borisov
Cc: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2017-05-04 06:52:09 +0800
9070733b4 xfs: abstract PF_FSTRANS to PF_MEMALLOC_NOFS ... Browse Code »

xfs has defined PF_FSTRANS to declare a scope GFP_NOFS semantic quite
some time ago. We would like to make this concept more generic and use
it for other filesystems as well. Let's start by giving the flag a more
generic name PF_MEMALLOC_NOFS which is in line with an exiting
PF_MEMALLOC_NOIO already used for the same purpose for GFP_NOIO
contexts. Replace all PF_FSTRANS usage from the xfs code in the first
step before we introduce a full API for it as xfs uses the flag directly
anyway.

This patch doesn't introduce any functional change.

Link: http://lkml.kernel.org/r/20170306131408.9828-4-mhocko@kernel.org
Signed-off-by: Michal Hocko
Reviewed-by: Darrick J. Wong
Reviewed-by: Brian Foster
Acked-by: Vlastimil Babka
Cc: Dave Chinner
Cc: Theodore Ts'o
Cc: Chris Mason
Cc: David Sterba
Cc: Jan Kara
Cc: Nikolay Borisov
Cc: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2017-05-04 06:52:09 +0800
7e7844226 lockdep: allow to disable reclaim lockup detection ... Browse Code »

The current implementation of the reclaim lockup detection can lead to
false positives and those even happen and usually lead to tweak the code
to silence the lockdep by using GFP_NOFS even though the context can use
__GFP_FS just fine.

See

http://lkml.kernel.org/r/20160512080321.GA18496@dastard

as an example.

=================================
[ INFO: inconsistent lock state ]
4.5.0-rc2+ #4 Tainted: G O
---------------------------------
inconsistent {RECLAIM_FS-ON-R} -> {IN-RECLAIM_FS-W} usage.
kswapd0/543 [HC0[0]:SC0[0]:HE1:SE1] takes:

(&xfs_nondir_ilock_class){++++-+}, at: xfs_ilock+0x177/0x200 [xfs]

{RECLAIM_FS-ON-R} state was registered at:
mark_held_locks+0x79/0xa0
lockdep_trace_alloc+0xb3/0x100
kmem_cache_alloc+0x33/0x230
kmem_zone_alloc+0x81/0x120 [xfs]
xfs_refcountbt_init_cursor+0x3e/0xa0 [xfs]
__xfs_refcount_find_shared+0x75/0x580 [xfs]
xfs_refcount_find_shared+0x84/0xb0 [xfs]
xfs_getbmap+0x608/0x8c0 [xfs]
xfs_vn_fiemap+0xab/0xc0 [xfs]
do_vfs_ioctl+0x498/0x670
SyS_ioctl+0x79/0x90
entry_SYSCALL_64_fastpath+0x12/0x6f

CPU0
----
lock(&xfs_nondir_ilock_class);

lock(&xfs_nondir_ilock_class);

*** DEADLOCK ***

3 locks held by kswapd0/543:

stack backtrace:
CPU: 0 PID: 543 Comm: kswapd0 Tainted: G O 4.5.0-rc2+ #4
Call Trace:
lock_acquire+0xd8/0x1e0
down_write_nested+0x5e/0xc0
xfs_ilock+0x177/0x200 [xfs]
xfs_reflink_cancel_cow_range+0x150/0x300 [xfs]
xfs_fs_evict_inode+0xdc/0x1e0 [xfs]
evict+0xc5/0x190
dispose_list+0x39/0x60
prune_icache_sb+0x4b/0x60
super_cache_scan+0x14f/0x1a0
shrink_slab.part.63.constprop.79+0x1e9/0x4e0
shrink_zone+0x15e/0x170
kswapd+0x4f1/0xa80
kthread+0xf2/0x110
ret_from_fork+0x3f/0x70

To quote Dave:
"Ignoring whether reflink should be doing anything or not, that's a
"xfs_refcountbt_init_cursor() gets called both outside and inside
transactions" lockdep false positive case. The problem here is lockdep
has seen this allocation from within a transaction, hence a GFP_NOFS
allocation, and now it's seeing it in a GFP_KERNEL context. Also note
that we have an active reference to this inode.

So, because the reclaim annotations overload the interrupt level
detections and it's seen the inode ilock been taken in reclaim
("interrupt") context, this triggers a reclaim context warning where
it thinks it is unsafe to do this allocation in GFP_KERNEL context
holding the inode ilock..."

This sounds like a fundamental problem of the reclaim lock detection.
It is really impossible to annotate such a special usecase IMHO unless
the reclaim lockup detection is reworked completely. Until then it is
much better to provide a way to add "I know what I am doing flag" and
mark problematic places. This would prevent from abusing GFP_NOFS flag
which has a runtime effect even on configurations which have lockdep
disabled.

Introduce __GFP_NOLOCKDEP flag which tells the lockdep gfp tracking to
skip the current allocation request.

While we are at it also make sure that the radix tree doesn't
accidentaly override tags stored in the upper part of the gfp_mask.

Link: http://lkml.kernel.org/r/20170306131408.9828-3-mhocko@kernel.org
Signed-off-by: Michal Hocko
Suggested-by: Peter Zijlstra
Acked-by: Peter Zijlstra (Intel)
Acked-by: Vlastimil Babka
Cc: Dave Chinner
Cc: Theodore Ts'o
Cc: Chris Mason
Cc: David Sterba
Cc: Jan Kara
Cc: Brian Foster
Cc: Darrick J. Wong
Cc: Nikolay Borisov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2017-05-04 06:52:09 +0800
a6ffdc078 mm: use is_migrate_highatomic() to simplify the code ... Browse Code »

Introduce two helpers, is_migrate_highatomic() and is_migrate_highatomic_page().

Simplify the code, no functional changes.

[akpm@linux-foundation.org: use static inlines rather than macros, per mhocko]
Link: http://lkml.kernel.org/r/58B94F15.6060606@huawei.com
Signed-off-by: Xishi Qiu
Acked-by: Michal Hocko
Cc: Vlastimil Babka
Cc: Mel Gorman
Cc: Minchan Kim
Cc: Joonsoo Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Xishi Qiu
2017-05-04 06:52:08 +0800
9a4caf1e9 mm: memcontrol: provide shmem statistics ... Browse Code »

Cgroups currently don't report how much shmem they use, which can be
useful data to have, in particular since shmem is included in the
cache/file item while being reclaimed like anonymous memory.

Add a counter to track shmem pages during charging and uncharging.

Link: http://lkml.kernel.org/r/20170221164343.32252-1-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner
Reported-by: Chris Down
Cc: Michal Hocko
Cc: Vladimir Davydov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2017-05-04 06:52:08 +0800
802a3a92a mm: reclaim MADV_FREE pages ... Browse Code »

When memory pressure is high, we free MADV_FREE pages. If the pages are
not dirty in pte, the pages could be freed immediately. Otherwise we
can't reclaim them. We put the pages back to anonumous LRU list (by
setting SwapBacked flag) and the pages will be reclaimed in normal
swapout way.

We use normal page reclaim policy. Since MADV_FREE pages are put into
inactive file list, such pages and inactive file pages are reclaimed
according to their age. This is expected, because we don't want to
reclaim too many MADV_FREE pages before used once pages.

Based on Minchan's original patch

[minchan@kernel.org: clean up lazyfree page handling]
Link: http://lkml.kernel.org/r/20170303025237.GB3503@bbox
Link: http://lkml.kernel.org/r/14b8eb1d3f6bf6cc492833f183ac8c304e560484.1487965799.git.shli@fb.com
Signed-off-by: Shaohua Li
Signed-off-by: Minchan Kim
Acked-by: Minchan Kim
Acked-by: Michal Hocko
Acked-by: Johannes Weiner
Acked-by: Hillf Danton
Cc: Hugh Dickins
Cc: Rik van Riel
Cc: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Shaohua Li
2017-05-04 06:52:08 +0800
f7ad2a6cb mm: move MADV_FREE pages into LRU_INACTIVE_FILE list ... Browse Code »

madv()'s MADV_FREE indicate pages are 'lazyfree'. They are still
anonymous pages, but they can be freed without pageout. To distinguish
these from normal anonymous pages, we clear their SwapBacked flag.

MADV_FREE pages could be freed without pageout, so they pretty much like
used once file pages. For such pages, we'd like to reclaim them once
there is memory pressure. Also it might be unfair reclaiming MADV_FREE
pages always before used once file pages and we definitively want to
reclaim the pages before other anonymous and file pages.

To speed up MADV_FREE pages reclaim, we put the pages into
LRU_INACTIVE_FILE list. The rationale is LRU_INACTIVE_FILE list is tiny
nowadays and should be full of used once file pages. Reclaiming
MADV_FREE pages will not have much interfere of anonymous and active
file pages. And the inactive file pages and MADV_FREE pages will be
reclaimed according to their age, so we don't reclaim too many MADV_FREE
pages too. Putting the MADV_FREE pages into LRU_INACTIVE_FILE_LIST also
means we can reclaim the pages without swap support. This idea is
suggested by Johannes.

This patch doesn't move MADV_FREE pages to LRU_INACTIVE_FILE list yet to
avoid bisect failure, next patch will do it.

The patch is based on Minchan's original patch.

[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/2f87063c1e9354677b7618c647abde77b07561e5.1487965799.git.shli@fb.com
Signed-off-by: Shaohua Li
Suggested-by: Johannes Weiner
Acked-by: Johannes Weiner
Acked-by: Minchan Kim
Acked-by: Michal Hocko
Acked-by: Hillf Danton
Cc: Hugh Dickins
Cc: Rik van Riel
Cc: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Shaohua Li
2017-05-04 06:52:08 +0800