07 May, 2016
16 commits
-
There are no callers except through the file_operations struct below
this, so it should be static like everything else here.Signed-off-by: Peter Jones
Signed-off-by: Matt Fleming
Cc: Andy Lutomirski
Cc: Ard Biesheuvel
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Denys Vlasenko
Cc: H. Peter Anvin
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1462570771-13324-6-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar -
The parameters atomic and duplicates of efivar_init always have opposite
values. Drop the parameter atomic, replace the uses of !atomic with
duplicates, and update the call sites accordingly.The code using duplicates is slightly reorganized with an 'else', to avoid
duplicating the lock code.Signed-off-by: Julia Lawall
Signed-off-by: Matt Fleming
Cc: Andy Lutomirski
Cc: Ard Biesheuvel
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Denys Vlasenko
Cc: H. Peter Anvin
Cc: Jeremy Kerr
Cc: Linus Torvalds
Cc: Matthew Garrett
Cc: Peter Zijlstra
Cc: Saurabh Sengar
Cc: Thomas Gleixner
Cc: Vaishali Thakkar
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1462570771-13324-5-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar -
Dan Carpenter reports that passing the address of the pointer to the
kmalloc()'d memory for 'capsule' is dangerous:"drivers/firmware/efi/capsule.c:109 efi_capsule_supported()
warn: did you mean to pass the address of 'capsule'108
109 status = efi.query_capsule_caps(&capsule, 1, &max_size, reset);
^^^^^^^^
If we modify capsule inside this function call then at the end of the
function we aren't freeing the original pointer that we allocated."Ard Biesheuvel noted that we don't even need to call kmalloc() since the
object we allocate isn't very big and doesn't need to persist after the
function returns.Place 'capsule' on the stack instead.
Suggested-by: Ard Biesheuvel
Reported-by: Dan Carpenter
Signed-off-by: Matt Fleming
Acked-by: Ard Biesheuvel
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Bryan O'Donoghue
Cc: Denys Vlasenko
Cc: H. Peter Anvin
Cc: Kweh Hock Leong
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: joeyli
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1462570771-13324-4-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar -
GCC complains about a newly added file for the EFI Bootloader Control:
drivers/firmware/efi/efibc.c: In function 'efibc_set_variable':
drivers/firmware/efi/efibc.c:53:1: error: the frame size of 2272 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]The problem is the declaration of a local variable of type struct
efivar_entry, which is by itself larger than the warning limit of 1024
bytes.Use dynamic memory allocation instead of stack memory for the entry
object.This patch also fixes a potential buffer overflow.
Reported-by: Ingo Molnar
Reported-by: Arnd Bergmann
Signed-off-by: Jeremy Compostella
[ Updated changelog to include GCC error ]
Signed-off-by: Matt Fleming
Cc: Andy Lutomirski
Cc: Ard Biesheuvel
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Denys Vlasenko
Cc: H. Peter Anvin
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1462570771-13324-3-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar -
Taking a mutex in the reboot path is bogus because we cannot sleep
with interrupts disabled, such as when rebooting due to panic(),BUG: sleeping function called from invalid context at kernel/locking/mutex.c:97
in_atomic(): 0, irqs_disabled(): 1, pid: 7, name: rcu_sched
Call Trace:
dump_stack+0x63/0x89
___might_sleep+0xd8/0x120
__might_sleep+0x49/0x80
mutex_lock+0x20/0x50
efi_capsule_pending+0x1d/0x60
native_machine_emergency_restart+0x59/0x280
machine_emergency_restart+0x19/0x20
emergency_restart+0x18/0x20
panic+0x1ba/0x217In this case all other CPUs will have been stopped by the time we
execute the platform reboot code, so 'capsule_pending' cannot change
under our feet. We wouldn't care even if it could since we cannot wait
for it complete.Also, instead of relying on the external 'system_state' variable just
use a reboot notifier, so we can set 'stop_capsules' while holding
'capsule_mutex', thereby avoiding a race where system_state is updated
while we're in the middle of efi_capsule_update_locked() (since CPUs
won't have been stopped at that point).Signed-off-by: Matt Fleming
Cc: Andy Lutomirski
Cc: Ard Biesheuvel
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Bryan O'Donoghue
Cc: Denys Vlasenko
Cc: H. Peter Anvin
Cc: Kweh Hock Leong
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: joeyli
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1462570771-13324-2-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar -
Signed-off-by: Ingo Molnar
-
Pull writeback fix from Jens Axboe:
"Just a single fix for domain aware writeback, fixing a regression that
can cause balance_dirty_pages() to keep looping while not getting any
work done"* 'for-linus' of git://git.kernel.dk/linux-block:
writeback: Fix performance regression in wb_over_bg_thresh() -
Pull x86 fixes from Ingo Molnar:
"This contains two fixes: a boot fix for older SGI/UV systems, and an
APIC calibration fix"* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/tsc: Read all ratio bits from MSR_PLATFORM_INFO
x86/platform/UV: Bring back the call to map_low_mmrs in uv_system_init -
Pull power management and ACPI fixes from Rafael Wysocki:
"Fixes for problems introduced or discovered recently (intel_pstate,
sti-cpufreq, ARM64 cpuidle, Operating Performance Points framework,
generic device properties framework) and one fix for a hotplug-related
deadlock in ACPICA that's been there forever, but is nasty enough.Specifics:
- Fix for a recent regression in the intel_pstate driver causing it
to fail to restore the HWP (HW-managed P-states) configuration of
the boot CPU after suspend-to-RAM (Rafael Wysocki).- Fix for two recent regressions in the intel_pstate driver, one that
can trigger a divide by zero if the driver is accessed via sysfs
before it manages to take the first sample and one causing it to
fail to update a structure field used in a trace point, so the
information coming from it is less useful (Rafael Wysocki).- Fix for a problem in the sti-cpufreq driver introduced during the
4.5 cycle that causes it to break CPU PM in multi-platform kernels
by registering cpufreq-dt (which subsequently doesn't work)
unconditionally and preventing the driver that would actually work
from registering (Sudeep Holla).- Stable-candidate fix for an ARM64 cpuidle issue causing idle state
usage counters to be incorrectly updated for idle states that were
not entered due to errors (James Morse).- Fix for a recently introduced issue in the OPP (Operating
Performance Points) framework causing it to print bogus error
messages for missing optional regulators (Viresh Kumar).- Fix for a recently introduced issue in the generic device
properties framework that may cause it to attempt to dereferece and
invalid pointer in some cases (Heikki Krogerus).- Fix for a deadlock in the ACPICA core that may be triggered by
device (eg Thunderbolt) hotplug (Prarit Bhargava)"* tag 'pm+acpi-4.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM / OPP: Remove useless check
ACPICA: Dispatcher: Update thread ID for recursive method calls
intel_pstate: Fix intel_pstate_get()
cpufreq: intel_pstate: Fix HWP on boot CPU after system resume
cpufreq: st: enable selective initialization based on the platform
ARM: cpuidle: Pass on arm_cpuidle_suspend()'s return value
device property: Avoid potential dereferences of invalid pointers -
Pull scheduler fix from Ingo Molnar:
"This contains a single fix that fixes a nohz tick stopping bug when
mixed-poliocy SCHED_FIFO and SCHED_RR tasks are present on a runqueue"* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
nohz/full, sched/rt: Fix missed tick-reenabling bug in sched_can_stop_tick() -
Pull perf fixes from Ingo Molnar:
"This tree contains two fixes: new Intel CPU model numbers and an
AMD/iommu uncore PMU driver fix"* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/amd/iommu: Do not register a task ctx for uncore like PMUs
perf/x86: Add model numbers for Kabylake CPUs -
Pull EFI fixes from Ingo Molnar:
"This tree contains three fixes: a console spam fix, a file pattern fix
and a sysfb_efi fix for a bug that triggered on older ThinkPads"* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sysfb_efi: Fix valid BAR address range check
x86/efi-bgrt: Switch all pr_err() to pr_notice() for invalid BGRT
MAINTAINERS: Remove asterisk from EFI directory names -
Pull parisc fix from Helge Deller:
"Patch from Dmitry V Levin to fix a kernel crash when a straced process
calls the (invalid) syscall which is equal to value of __NR_Linux_syscalls"* 'parisc-4.6-5' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: fix a bug when syscall number of tracee is __NR_Linux_syscalls -
Pull ARC fixes from Vineet Gupta:
"Late in the cycle, but this has fixes for couple of issues: a PAE40
boot crash and Arnd spotting lack of barriers in BE io-accessors.The 3rd patch for enabling highmem in low physical mem ;-) honestly is
more than a "fix" but its been in works for some time, seems to be
stable in testing and enables 2 of our customers to go forward with
4.6 kernel.- Fix for PTE truncation in PAE40 builds
- Fix for big endian IO accessors lacking IO barrier
- Allow HIGHMEM to work with low physical addresses"* tag 'arc-4.6-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: support HIGHMEM even without PAE40
ARC: Fix PAE40 boot failures due to PTE truncation
ARC: Add missing io barriers to io{read,write}{16,32}be() -
Pull powerpc fix from Michael Ellerman:
"Fix bad inline asm constraint in create_zero_mask() from Anton
Blanchard"* tag 'powerpc-4.6-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc: Fix bad inline asm constraint in create_zero_mask() -
Pull drm fixes from Dave Airlie:
"Fixes for i915, amdgpu/radeon and imx.The IMX fix is for an autoloading regression found in Fedora. The
radeon fixes, are the same fix to amdgpu/radeon to avoid a hardware
lockup in some circumstances with a bad mode, and a double free bug I
took a few hours chasing down the other morning.The i915 fixes are across the board, all stable material, and fixing
some hangs and suspend/resume issues, along with a live status
regressions"* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
gpu: ipu-v3: Fix imx-ipuv3-crtc module autoloading
drm/amdgpu: make sure vertical front porch is at least 1
drm/radeon: make sure vertical front porch is at least 1
drm/amdgpu: set metadata pointer to NULL after freeing.
drm/i915: Make RPS EI/thresholds multiple of 25 on SNB-BDW
drm/i915: Fake HDMI live status
drm/i915: Fix eDP low vswing for Broadwell
drm/i915/ddi: Fix eDP VDD handling during booting and suspend/resume
drm/i915: Fix system resume if PCI device remained enabled
drm/i915: Avoid stalling on pending flips for legacy cursor updates
06 May, 2016
24 commits
-
Do not load one entry beyond the end of the syscall table when the
syscall number of a traced process equals to __NR_Linux_syscalls.
Similar bug with regular processes was fixed by commit 3bb457af4fa8
("[PARISC] Fix bug when syscall nr is __NR_Linux_syscalls").This bug was found by strace test suite.
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry V. Levin
Acked-by: Helge Deller
Signed-off-by: Helge Deller -
* pm-opp-fixes:
PM / OPP: Remove useless check* pm-cpufreq-fixes:
intel_pstate: Fix intel_pstate_get()
cpufreq: intel_pstate: Fix HWP on boot CPU after system resume
cpufreq: st: enable selective initialization based on the platform* pm-cpuidle-fixes:
ARM: cpuidle: Pass on arm_cpuidle_suspend()'s return value -
* acpica-fixes:
ACPICA: Dispatcher: Update thread ID for recursive method calls* device-properties-fixes:
device property: Avoid potential dereferences of invalid pointers -
Currently we read the tsc radio: ratio = (MSR_PLATFORM_INFO >> 8) & 0x1f;
Thus we get bit 8-12 of MSR_PLATFORM_INFO, however according to the SDM
(35.5), the ratio bits are bit 8-15.Ignoring the upper bits can result in an incorrect tsc ratio, which causes the
TSC calibration and the Local APIC timer frequency to be incorrect.Fix this problem by masking 0xff instead.
[ tglx: Massaged changelog ]
Fixes: 7da7c1561366 "x86, tsc: Add static (MSR) TSC calibration on Intel Atom SoCs"
Signed-off-by: Chen Yu
Cc: "Rafael J. Wysocki"
Cc: stable@vger.kernel.org
Cc: Bin Gao
Cc: Len Brown
Link: http://lkml.kernel.org/r/1462505619-5516-1-git-send-email-yu.c.chen@intel.com
Signed-off-by: Thomas Gleixner -
Merge fixes from Andrew Morton:
"14 fixes"* emailed patches from Andrew Morton :
byteswap: try to avoid __builtin_constant_p gcc bug
lib/stackdepot: avoid to return 0 handle
mm: fix kcompactd hang during memory offlining
modpost: fix module autoloading for OF devices with generic compatible property
proc: prevent accessing /proc//environ until it's ready
mm/zswap: provide unique zpool name
mm: thp: kvm: fix memory corruption in KVM with THP enabled
MAINTAINERS: fix Rajendra Nayak's address
mm, cma: prevent nr_isolated_* counters from going negative
mm: update min_free_kbytes from khugepaged after core initialization
huge pagecache: mmap_sem is unlocked when truncation splits pmd
rapidio/mport_cdev: fix uapi type definitions
mm: memcontrol: let v2 cgroups follow changes in system swappiness
mm: thp: correct split_huge_pages file permission -
Apparently patchwork ended up truncating the full name.
Signed-off-by: Linus Torvalds
-
Pull libnvdimm fixes from Dan Williams:
- a fix for the persistent memory 'struct page' driver. The
implementation overlooked the fact that pages are allocated in 2MB
units leading to -ENOMEM when establishing some configurations.It's tagged for -stable as the problem was introduced with the
initial implementation in 4.5.- The new "error status translation" routine, introduced with the 4.6
updates to the nfit driver, missed a necessary path in
acpi_nfit_ctl().The end result is that we are falsely assuming commands complete
successfully when the embedded status says otherwise.* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
nfit: fix translation of command status results
libnvdimm, pfn: fix memmap reservation sizing -
This is another attempt to avoid a regression in wwn_to_u64() after that
started using get_unaligned_be64(), which in turn ran into a bug on
gcc-4.9 through 6.1.The regression got introduced due to the combination of two separate
workarounds (commits e3bde9568d99: "include/linux/unaligned: force
inlining of byteswap operations" and ef3fb2422ffe: "scsi: fc: use
get/put_unaligned64 for wwn access") that each try to sidestep distinct
problems with gcc behavior (code growth and increased stack usage).Unfortunately after both have been applied, a more serious gcc bug has
been uncovered, leading to incorrect object code that discards part of a
function and causes undefined behavior.As part of this problem is how __builtin_constant_p gets evaluated on an
argument passed by reference into an inline function, this avoids the
use of __builtin_constant_p() for all architectures that set
CONFIG_ARCH_USE_BUILTIN_BSWAP. Most architectures do not set
ARCH_SUPPORTS_OPTIMIZED_INLINING, which means they probably do not
suffer from the problem in the qla2xxx driver, but they might still run
into it elsewhere.Both of the original workarounds were only merged in the 4.6 kernel, and
the bug that is fixed by this patch should only appear if both are
there, so we probably don't need to backport the fix. On the other
hand, it works by simplifying the code path and should not have any
negative effects.[arnd@arndb.de: fix older gcc warnings]
(http://lkml.kernel.org/r/12243652.bxSxEgjgfk@wuerfel)
Link: https://lkml.org/lkml/headers/2016/4/12/1103
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66122
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70232
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646
Fixes: e3bde9568d99 ("include/linux/unaligned: force inlining of byteswap operations")
Fixes: ef3fb2422ffe ("scsi: fc: use get/put_unaligned64 for wwn access")
Link: http://lkml.kernel.org/r/1780465.XdtPJpi8Tt@wuerfel
Signed-off-by: Arnd Bergmann
Reviewed-by: Josh Poimboeuf
Tested-by: Josh Poimboeuf # on gcc-5.3
Tested-by: Quinn Tran
Cc: Martin Jambor
Cc: "Martin K. Petersen"
Cc: James Bottomley
Cc: Denys Vlasenko
Cc: Thomas Graf
Cc: Peter Zijlstra
Cc: David Rientjes
Cc: Ingo Molnar
Cc: Himanshu Madhani
Cc: Jan Hubicka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Recently, we allow to save the stacktrace whose hashed value is 0. It
causes the problem that stackdepot could return 0 even if in success.
User of stackdepot cannot distinguish whether it is success or not so we
need to solve this problem. In this patch, 1 bit are added to handle
and make valid handle none 0 by setting this bit. After that, valid
handle will not be 0 and 0 handle will represent failure correctly.Fixes: 33334e25769c ("lib/stackdepot.c: allow the stack trace hash to be zero")
Link: http://lkml.kernel.org/r/1462252403-1106-1-git-send-email-iamjoonsoo.kim@lge.com
Signed-off-by: Joonsoo Kim
Cc: Alexander Potapenko
Cc: Andrey Ryabinin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Assume memory47 is the last online block left in node1. This will hang:
# echo offline > /sys/devices/system/node/node1/memory47/state
After a couple of minutes, the following pops up in dmesg:
INFO: task bash:957 blocked for more than 120 seconds.
Not tainted 4.6.0-rc6+ #6
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
bash D ffff8800b7adbaf8 0 957 951 0x00000000
Call Trace:
schedule+0x35/0x80
schedule_timeout+0x1ac/0x270
wait_for_completion+0xe1/0x120
kthread_stop+0x4f/0x110
kcompactd_stop+0x26/0x40
__offline_pages.constprop.28+0x7e6/0x840
offline_pages+0x11/0x20
memory_block_action+0x73/0x1d0
memory_subsys_offline+0x47/0x60
device_offline+0x86/0xb0
store_mem_state+0xda/0xf0
dev_attr_store+0x18/0x30
sysfs_kf_write+0x37/0x40
kernfs_fop_write+0x11d/0x170
__vfs_write+0x37/0x120
vfs_write+0xa9/0x1a0
SyS_write+0x55/0xc0
entry_SYSCALL_64_fastpath+0x1a/0xa4kcompactd is waiting for kcompactd_max_order > 0 when it's woken up to
actually exit. Check kthread_should_stop() to break out of the wait.Fixes: 698b1b306 ("mm, compaction: introduce kcompactd").
Reported-by: Reza Arbab
Tested-by: Reza Arbab
Cc: Andrea Arcangeli
Cc: "Kirill A. Shutemov"
Cc: Rik van Riel
Cc: Joonsoo Kim
Cc: Mel Gorman
Cc: David Rientjes
Cc: Michal Hocko
Cc: Johannes Weiner
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Since the wildcard at the end of OF module aliases is gone, autoloading
of modules that don't match a device's last (most generic) compatible
value fails.For example the CODA960 VPU on i.MX6Q has the SoC specific compatible
"fsl,imx6q-vpu" and the generic compatible "cnm,coda960". Since the
driver currently only works with knowledge about the SoC specific
integration, it doesn't list "cnm,cod960" in the module device table.This results in the device compatible
"of:NvpuTCfsl,imx6q-vpuCcnm,coda960" not matching the module alias
"of:N*T*Cfsl,imx6q-vpu" anymore, whereas before commit 2f632369ab79
("modpost: don't add a trailing wildcard for OF module aliases") it
matched the module alias "of:N*T*Cfsl,imx6q-vpu*".This patch adds two module aliases for each compatible, one without the
wildcard and one with "C*" appended.$ modinfo coda | grep imx6q
alias: of:N*T*Cfsl,imx6q-vpuC*
alias: of:N*T*Cfsl,imx6q-vpuFixes: 2f632369ab79 ("modpost: don't add a trailing wildcard for OF module aliases")
Link: http://lkml.kernel.org/r/1462203339-15340-1-git-send-email-p.zabel@pengutronix.de
Signed-off-by: Philipp Zabel
Cc: Javier Martinez Canillas
Cc: Brian Norris
Cc: Sjoerd Simons
Cc: Rusty Russell
Cc: Greg Kroah-Hartman
Cc: [4.5+]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
If /proc//environ gets read before the envp[] array is fully set up
in create_{aout,elf,elf_fdpic,flat}_tables(), we might end up trying to
read more bytes than are actually written, as env_start will already be
set but env_end will still be zero, making the range calculation
underflow, allowing to read beyond the end of what has been written.Fix this as it is done for /proc//cmdline by testing env_end for
zero. It is, apparently, intentionally set last in create_*_tables().This bug was found by the PaX size_overflow plugin that detected the
arithmetic underflow of 'this_len = env_end - (env_start + src)' when
env_end is still zero.The expected consequence is that userland trying to access
/proc//environ of a not yet fully set up process may get
inconsistent data as we're in the middle of copying in the environment
variables.Fixes: https://forums.grsecurity.net/viewtopic.php?f=3&t=4363
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=116461
Signed-off-by: Mathias Krause
Cc: Emese Revfy
Cc: Pax Team
Cc: Al Viro
Cc: Mateusz Guzik
Cc: Alexey Dobriyan
Cc: Cyrill Gorcunov
Cc: Jarod Wilson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Instead of using "zswap" as the name for all zpools created, add an
atomic counter and use "zswap%x" with the counter number for each zpool
created, to provide a unique name for each new zpool.As zsmalloc, one of the zpool implementations, requires/expects a unique
name for each pool created, zswap should provide a unique name. The
zsmalloc pool creation does not fail if a new pool with a conflicting
name is created, unless CONFIG_ZSMALLOC_STAT is enabled; in that case,
zsmalloc pool creation fails with -ENOMEM. Then zswap will be unable to
change its compressor parameter if its zpool is zsmalloc; it also will
be unable to change its zpool parameter back to zsmalloc, if it has any
existing old zpool using zsmalloc with page(s) in it. Attempts to
change the parameters will result in failure to create the zpool. This
changes zswap to provide a unique name for each zpool creation.Fixes: f1c54846ee45 ("zswap: dynamic pool creation")
Signed-off-by: Dan Streetman
Reported-by: Sergey Senozhatsky
Reviewed-by: Sergey Senozhatsky
Cc: Dan Streetman
Cc: Minchan Kim
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
After the THP refcounting change, obtaining a compound pages from
get_user_pages() no longer allows us to assume the entire compound page
is immediately mappable from a secondary MMU.A secondary MMU doesn't want to call get_user_pages() more than once for
each compound page, in order to know if it can map the whole compound
page. So a secondary MMU needs to know from a single get_user_pages()
invocation when it can map immediately the entire compound page to avoid
a flood of unnecessary secondary MMU faults and spurious
atomic_inc()/atomic_dec() (pages don't have to be pinned by MMU notifier
users).Ideally instead of the page->_mapcount < 1 check, get_user_pages()
should return the granularity of the "page" mapping in the "mm" passed
to get_user_pages(). However it's non trivial change to pass the "pmd"
status belonging to the "mm" walked by get_user_pages up the stack (up
to the caller of get_user_pages). So the fix just checks if there is
not a single pte mapping on the page returned by get_user_pages, and in
turn if the caller can assume that the whole compound page is mapped in
the current "mm" (in a pmd_trans_huge()). In such case the entire
compound page is safe to map into the secondary MMU without additional
get_user_pages() calls on the surrounding tail/head pages. In addition
of being faster, not having to run other get_user_pages() calls also
reduces the memory footprint of the secondary MMU fault in case the pmd
split happened as result of memory pressure.Without this fix after a MADV_DONTNEED (like invoked by QEMU during
postcopy live migration or balloning) or after generic swapping (with a
failure in split_huge_page() that would only result in pmd splitting and
not a physical page split), KVM would map the whole compound page into
the shadow pagetables, despite regular faults or userfaults (like
UFFDIO_COPY) may map regular pages into the primary MMU as result of the
pte faults, leading to the guest mode and userland mode going out of
sync and not working on the same memory at all times.Any other secondary MMU notifier manager (KVM is just one of the many
MMU notifier users) will need the same information if it doesn't want to
run a flood of get_user_pages_fast and it can support multiple
granularity in the secondary MMU mappings, so I think it is justified to
be exposed not just to KVM.The other option would be to move transparent_hugepage_adjust to
mm/huge_memory.c but that currently has all kind of KVM data structures
in it, so it's definitely not a cut-and-paste work, so I couldn't do a
fix as cleaner as this one for 4.6.Signed-off-by: Andrea Arcangeli
Cc: "Dr. David Alan Gilbert"
Cc: "Kirill A. Shutemov"
Cc: "Li, Liang Z"
Cc: Amit Shah
Cc: Paolo Bonzini
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Eric Engestrom
Cc: Rajendra Nayak
Cc: Afzal Mohammed
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
/proc/sys/vm/stat_refresh warns nr_isolated_anon and nr_isolated_file go
increasingly negative under compaction: which would add delay when
should be none, or no delay when should delay. The bug in compaction
was due to a recent mmotm patch, but much older instance of the bug was
also noticed in isolate_migratepages_range() which is used for CMA and
gigantic hugepage allocations.The bug is caused by putback_movable_pages() in an error path
decrementing the isolated counters without them being previously
incremented by acct_isolated(). Fix isolate_migratepages_range() by
removing the error-path putback, thus reaching acct_isolated() with
migratepages still isolated, and leaving putback to caller like most
other places do.Fixes: edc2ca612496 ("mm, compaction: move pageblock checks up from isolate_migratepages_range()")
[vbabka@suse.cz: expanded the changelog]
Signed-off-by: Hugh Dickins
Signed-off-by: Vlastimil Babka
Acked-by: Joonsoo Kim
Cc: Michal Hocko
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Khugepaged attempts to raise min_free_kbytes if its set too low.
However, on boot khugepaged sets min_free_kbytes first from
subsys_initcall(), and then the mm 'core' over-rides min_free_kbytes
after from init_per_zone_wmark_min(), via a module_init() call.Khugepaged used to use a late_initcall() to set min_free_kbytes (such
that it occurred after the core initialization), however this was
removed when the initialization of min_free_kbytes was integrated into
the starting of the khugepaged thread.The fix here is simply to invoke the core initialization using a
core_initcall() instead of module_init(), such that the previous
initialization ordering is restored. I didn't restore the
late_initcall() since start_stop_khugepaged() already sets
min_free_kbytes via set_recommended_min_free_kbytes().This was noticed when we had a number of page allocation failures when
moving a workload to a kernel with this new initialization ordering. On
an 8GB system this restores min_free_kbytes back to 67584 from 11365
when CONFIG_TRANSPARENT_HUGEPAGE=y is set and either
CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y or
CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y.Fixes: 79553da293d3 ("thp: cleanup khugepaged startup")
Signed-off-by: Jason Baron
Acked-by: Kirill A. Shutemov
Acked-by: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
zap_pmd_range()'s CONFIG_DEBUG_VM !rwsem_is_locked(&mmap_sem) BUG() will
be invalid with huge pagecache, in whatever way it is implemented:
truncation of a hugely-mapped file to an unhugely-aligned size would
easily hit it.(Although anon THP could in principle apply khugepaged to private file
mappings, which are not excluded by the MADV_HUGEPAGE restrictions, in
practice there's a vm_ops check which excludes them, so it never hits
this BUG() - there's no interface to "truncate" an anonymous mapping.)We could complicate the test, to check i_mmap_rwsem also when there's a
vm_file; but my inclination was to make zap_pmd_range() more readable by
simply deleting this check. A search has shown no report of the issue
in the years since commit e0897d75f0b2 ("mm, thp: print useful
information when mmap_sem is unlocked in zap_pmd_range") expanded it
from VM_BUG_ON() - though I cannot point to what commit I would say then
fixed the issue.But there are a couple of other patches now floating around, neither yet
in the tree: let's agree to retain the check as a VM_BUG_ON_VMA(), as
Matthew Wilcox has done; but subject to a vma_is_anonymous() check, as
Kirill Shutemov has done. And let's get this in, without waiting for
any particular huge pagecache implementation to reach the tree.Matthew said "We can reproduce this BUG() in the current Linus tree with
DAX PMDs".Signed-off-by: Hugh Dickins
Tested-by: Matthew Wilcox
Cc: "Kirill A. Shutemov"
Cc: Andrea Arcangeli
Cc: Andres Lagar-Cavilla
Cc: Yang Shi
Cc: Ning Qu
Cc: Mel Gorman
Cc: Andres Lagar-Cavilla
Cc: Konstantin Khlebnikov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Fix problems in uapi definitions reported by Gabriel Laskar: (see
https://lkml.org/lkml/2016/4/5/205 for details)- move public header file rio_mport_cdev.h to include/uapi/linux directory
- change types in data structures passed as IOCTL parameters
- improve parameter checking in some IOCTL service routinesSigned-off-by: Alexandre Bounine
Reported-by: Gabriel Laskar
Tested-by: Barry Wood
Cc: Gabriel Laskar
Cc: Matt Porter
Cc: Aurelien Jacquiot
Cc: Andre van Herk
Cc: Barry Wood
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Cgroup2 currently doesn't have a per-cgroup swappiness setting. We
might want to add one later - that's a different discussion - but until
we do, the cgroups should always follow the system setting. Otherwise
it will be unchangeably set to whatever the ancestor inherited from the
system setting at the time of cgroup creation.Signed-off-by: Johannes Weiner
Acked-by: Michal Hocko
Acked-by: Vladimir Davydov
Cc: [4.5]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
split_huge_pages doesn't support get method at all, so the read
permission sounds confusing, change the permission to write only.And, add "\n" to the output of set method to make it more readable.
Signed-off-by: Yang Shi
Acked-by: Kirill A. Shutemov
Cc: Andrea Arcangeli
Cc: Mel Gorman
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Pull asm-generic syscall fix from Arnd Bergmann:
"My last pull request for asm-generic had just one patch that added two
new system calls to asm/unistd.h, but unfortunately it turned out to
be wrong, pointing arch/tile compat mode at the native handlers rather
than the compat ones.This was spotted by Yury Norov, who is working on ILP32 mode for
arch/arm64, which would have the same problem when merged. This fixes
the table to use the correct compat syscalls, like the other 64-bit
architectures do.I'll try to find the time to come up with a solution that prevents
this problem from happening again, by allowing all future system calls
to just get added in a single file for use by all architectures"* tag 'asm-generic-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
asm-generic: use compat version for preadv2 and pwritev2 -
Pull ARM SoC fixes from Arnd Bergmann:
"Here are a couple last-minute fixes for ARM SoCs. Most of them are
for the OMAP platforms, the rest are all for different platforms.OMAP:
All dts fixes, mostly affecting voltages and pinctrl for various
device drivers:- Regulator minimum voltage fixes for omap5
- ISP syscon register offset fix for omap3
- Fix regulator initial modes for n900
- Fix omap5 pinctrl wkup instance sizeAllwinner:
Remove incorrect constraints from a dcdc1 regulatorAlltera SoCFPGA:
Fix compilation in thumb2 modeSamsung exynos:
Fix a potential oops in the pm-domain error handlingDavinci:
Avoid a link error if NVMEM is disabledRenesas:
Do not mark an external uart clock as disabled, to allow probing
the uarts"* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: davinci: only use NVMEM when available
ARM: SoCFPGA: Fix secondary CPU startup in thumb2 kernel
ARM: dts: omap5: fix range of permitted wakeup pinmux registers
ARM: dts: omap3-n900: Specify peripherals LDO regulators initial mode
ARM: dts: omap3: Fix ISP syscon register offset
ARM: dts: omap5-cm-t54: fix ldo1_reg and ldo4_reg ranges
ARM: dts: omap5-board-common: fix ldo1_reg and ldo4_reg ranges
arm64: dts: r8a7795: Don't disable referenced optional scif clock
ARM: EXYNOS: Properly skip unitialized parent clock in power domain on
ARM: dts: sun8i-q8-common: Do not set constraints on dc1sw regulator -
Update my email and web addresses in the kernel maintainers file.
Signed-off-by: Russell King
Signed-off-by: Linus Torvalds