10 Jul, 2016
3 commits
-
Many comments in kernel/power/snapshot.c do not follow the general
comment formatting rules. They look odd, some of them are outdated
too, some are hard to parse and generally difficult to understand.Clean them up to make them easier to comprehend.
No functional changes.
Signed-off-by: Rafael J. Wysocki
-
The formatting of some function headers in kernel/power/snapshot.c
is not consistent with the general kernel coding style and with the
formatting of some other function headers in the same file.Make all of them follow the same formatting convention.
No functional changes.
Signed-off-by: Rafael J. Wysocki
-
Make hibernate_setup() follow the coding style more closely by adding
some missing braces to the if () statement in it.Signed-off-by: Rafael J. Wysocki
09 Jul, 2016
1 commit
02 Jul, 2016
4 commits
-
One of the memory bitmaps used by the hibernation image restoration
code is freed after the image has been loaded.That is not quite efficient, though, because the memory pages used
for building that bitmap are known to be safe (ie. they were not
used by the image kernel before hibernation) and the arch-specific
code finalizing the image restoration may need them. In that case
it needs to allocate those pages again via the memory management
subsystem, check if they are really safe again by consulting the
other bitmaps and so on.To avoid that, recycle those pages by putting them into the global
list of known safe pages so that they can be given to the arch code
right away when necessary.Signed-off-by: Rafael J. Wysocki
-
Rework mark_unsafe_pages() to use a simpler method of clearing
all bits in free_pages_map and to set the bits for the "unsafe"
pages (ie. pages that were used by the image kernel before
hibernation) with the help of duplicate_memory_bitmap().For this purpose, move the pfn_valid() check from mark_unsafe_pages()
to unpack_orig_pfns() where the "unsafe" pages are discovered.Signed-off-by: Rafael J. Wysocki
-
The core image restoration code preallocates some safe pages
(ie. pages that weren't used by the image kernel before hibernation)
for future use before allocating the bulk of memory for loading the
image data. Those safe pages are then freed so they can be allocated
again (with the memory management subsystem's help). That's done to
ensure that there will be enough safe pages for temporary data
structures needed during image restoration.However, it is not really necessary to free those pages after they
have been allocated. They can be added to the (global) list of
safe pages right away and then picked up from there when needed
without freeing.That reduces the overhead related to using safe pages, especially
in the arch-specific code, so modify the code accordingly.Signed-off-by: Rafael J. Wysocki
-
If freezable workqueue aborts suspend flow, show
workqueue state for debug purpose.Signed-off-by: Roger Lu
Acked-by: Tejun Heo
Signed-off-by: Rafael J. Wysocki
01 Jul, 2016
1 commit
-
Logan Gunthorpe reports that hibernation stopped working reliably for
him after commit ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table
and rodata).That turns out to be a consequence of a long-standing issue with the
64-bit image restoration code on x86, which is that the temporary
page tables set up by it to avoid page tables corruption when the
last bits of the image kernel's memory contents are copied into
their original page frames re-use the boot kernel's text mapping,
but that mapping may very well get corrupted just like any other
part of the page tables. Of course, if that happens, the final
jump to the image kernel's entry point will go to nowhere.The exact reason why commit ab76f7b4ab23 matters here is that it
sometimes causes a PMD of a large page to be split into PTEs
that are allocated dynamically and get corrupted during image
restoration as described above.To fix that issue note that the code copying the last bits of the
image kernel's memory contents to the page frames occupied by them
previoulsy doesn't use the kernel text mapping, because it runs from
a special page covered by the identity mapping set up for that code
from scratch. Hence, the kernel text mapping is only needed before
that code starts to run and then it will only be used just for the
final jump to the image kernel's entry point.Accordingly, the temporary page tables set up in swsusp_arch_resume()
on x86-64 need to contain the kernel text mapping too. That mapping
is only going to be used for the final jump to the image kernel, so
it only needs to cover the image kernel's entry point, because the
first thing the image kernel does after getting control back is to
switch over to its own original page tables. Moreover, the virtual
address of the image kernel's entry point in that mapping has to be
the same as the one mapped by the image kernel's page tables.With that in mind, modify the x86-64's arch_hibernation_header_save()
and arch_hibernation_header_restore() routines to pass the physical
address of the image kernel's entry point (in addition to its virtual
address) to the boot kernel (a small piece of assembly code involved
in passing the entry point's virtual address to the image kernel is
not necessary any more after that, so drop it). Update RESTORE_MAGIC
too to reflect the image header format change.Next, in set_up_temporary_mappings(), use the physical and virtual
addresses of the image kernel's entry point passed in the image
header to set up a minimum kernel text mapping (using memory pages
that won't be overwritten by the image kernel's memory contents) that
will map those addresses to each other as appropriate.This makes the concern about the possible corruption of the original
boot kernel text mapping go away and if the the minimum kernel text
mapping used for the final jump marks the image kernel's entry point
memory as executable, the jump to it is guaraneed to succeed.Fixes: ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table and rodata)
Link: http://marc.info/?l=linux-pm&m=146372852823760&w=2
Reported-by: Logan Gunthorpe
Reported-and-tested-by: Borislav Petkov
Tested-by: Kees Cook
Signed-off-by: Rafael J. Wysocki
28 Jun, 2016
1 commit
-
This makes pm notifier PREPARE/POST symmetrical: if PREPARE
fails, we will only undo what ever happened on PREPARE.It fixes the unbalanced CPU hotplug enable in CPU PM notifier.
Signed-off-by: Lianwei Wang
Signed-off-by: Rafael J. Wysocki
27 Jun, 2016
2 commits
-
Pull SCSI fixes from James Bottomley:
"Two straightforward fixes.One is a concurrency issue only affecting SAS connected SATA drives,
but which could hang the storage subsystem if it triggers (because the
outstanding command count on error never goes back to zero) and the
other is a NO_TAG fallout from the switch to hostwide tags which
causes the system to crash on module insertion (we've checked
carefully and only the 53c700 family of drivers is vulnerable to this
issue)"* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
53c700: fix BUG on untagged commands
scsi: fix race between simultaneous decrements of ->host_failed
25 Jun, 2016
28 commits
-
…git/mason/linux-btrfs
Pull btrfs fixes part 2 from Chris Mason:
"This has one patch from Omar to bring iterate_shared back to btrfs.We have a tree of work we queue up for directory items and it doesn't
lend itself well to shared access. While we're cleaning it up, Omar
has changed things to use an exclusive lock when there are delayed
items"* 'for-linus-4.7-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: fix ->iterate_shared() by upgrading i_rwsem for delayed nodes -
Pull btrfs fixes from Chris Mason:
"I have a two part pull this time because one of the patches Dave
Sterba collected needed to be against v4.7-rc2 or higher (we used
rc4). I try to make my for-linus-xx branch testable on top of the
last major so we can hand fixes to people on the list more easily, so
I've split this pull in two.This first part has some fixes and two performance improvements that
we've been testing for some time.Josef's two performance fixes are most notable. The transid tracking
patch makes a big improvement on pretty much every workload"* 'for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: Force stripesize to the value of sectorsize
btrfs: fix disk_i_size update bug when fallocate() fails
Btrfs: fix error handling in map_private_extent_buffer
Btrfs: fix error return code in btrfs_init_test_fs()
Btrfs: don't do nocow check unless we have to
btrfs: fix deadlock in delayed_ref_async_start
Btrfs: track transid for delayed ref flushing -
Pull sound fixes from Takashi Iwai:
"Again pretty calm weeks: we've had only a few trivial / stable
HD-audio fixes in addition to a possible race fix for snd-dummy driver
spotted by syzkaller"* tag 'sound-4.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: dummy: Fix a use-after-free at closing
ALSA: hda / realtek - add two more Thinkpad IDs (5050,5053) for tpt460 fixup
ALSA: hda - Fix the headset mic jack detection on Dell machine
ALSA: hda/tegra: iomem fixups for sparse warnings
ALSA: hdac_regmap - fix the register access for runtime PM -
Pull x86 kprobe fix from Thomas Gleixner:
"A single fix clearing the TF bit when a fault is single stepped"* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
kprobes/x86: Clear TF bit in fault on single-stepping -
Pull scheduler fixes from Thomas Gleixner:
"A couple of scheduler fixes:- force watchdog reset while processing sysrq-w
- fix a deadlock when enabling trace events in the scheduler
- fixes to the throttled next buddy logic
- fixes for the average accounting (missing serialization and
underflow handling)- allow kernel threads for fallback to online but not active cpus"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/core: Allow kthreads to fall back to online && !active cpus
sched/fair: Do not announce throttled next buddy in dequeue_task_fair()
sched/fair: Initialize throttle_count for new task-groups lazily
sched/fair: Fix cfs_rq avg tracking underflow
kernel/sysrq, watchdog, sched/core: Reset watchdog on all CPUs while processing sysrq-w
sched/debug: Fix deadlock when enabling sched events
sched/fair: Fix post_init_entity_util_avg() serialization -
Commit fe742fd4f90f ("Revert "btrfs: switch to ->iterate_shared()"")
backed out the conversion to ->iterate_shared() for Btrfs because the
delayed inode handling in btrfs_real_readdir() is racy. However, we can
still do readdir in parallel if there are no delayed nodes.This is a temporary fix which upgrades the shared inode lock to an
exclusive lock only when we have delayed items until we come up with a
more complete solution. While we're here, rename the
btrfs_{get,put}_delayed_items functions to make it very clear that
they're just for readdir.Tested with xfstests and by doing a parallel kernel build:
while make tinyconfig && make -j4 && git clean dqfx; do
:
donealong with a bunch of parallel finds in another shell:
while true; do
for ((i=0; i/dev/null &
done
wait
doneSigned-off-by: Omar Sandoval
Signed-off-by: David Sterba
Signed-off-by: Chris Mason -
Pull locking fix from Thomas Gleixner:
"A single fix to address a race in the static key logic"* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/static_key: Fix concurrent static_key_slow_inc() -
Pull irq fix from Thomas Gleixner:
"A single fix for the fallout from the conversion of MIPS GIC to irq
domains"* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/mips-gic: Fix IRQs in gic_dev_domain -
Pull powerpc fixes from Michael Ellerman:
"mm/radix (Aneesh Kumar K.V):
- Update to tlb functions ric argument
- Flush page walk cache when freeing page table
- Update Radix tree size as per ISA 3.0mm/hash (Aneesh Kumar K.V):
- Use the correct PPP mask when updating HPTE
- Don't add memory coherence if cache inhibited is seteeh (Gavin Shan):
- Fix invalid cached PE primary busbpf/jit (Naveen N. Rao):
- Disable classic BPF JIT on ppc64le.. and fix faults caused by radix patching of SLB miss handler
(Michael Ellerman)"* tag 'powerpc-4.7-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/bpf/jit: Disable classic BPF JIT on ppc64le
powerpc: Fix faults caused by radix patching of SLB miss handler
powerpc/eeh: Fix invalid cached PE primary bus
powerpc/mm/radix: Update Radix tree size as per ISA 3.0
powerpc/mm/hash: Don't add memory coherence if cache inhibited is set
powerpc/mm/hash: Use the correct PPP mask when updating HPTE
powerpc/mm/radix: Flush page walk cache when freeing page table
powerpc/mm/radix: Update to tlb functions ric argument -
Commit b235beea9e99 ("Clarify naming of thread info/stack allocators")
breaks the build on some powerpc configs, where THREAD_SIZE < PAGE_SIZE:kernel/fork.c:235:2: error: implicit declaration of function 'free_thread_stack'
kernel/fork.c:355:8: error: assignment from incompatible pointer type
stack = alloc_thread_stack_node(tsk, node);
^Fix it by renaming free_stack() to free_thread_stack(), and updating the
return type of alloc_thread_stack_node().Fixes: b235beea9e99 ("Clarify naming of thread info/stack allocators")
Signed-off-by: Michael Ellerman
Signed-off-by: Linus Torvalds -
Merge misc fixes from Andrew Morton:
"Two weeks worth of fixes here"* emailed patches from Andrew Morton : (41 commits)
init/main.c: fix initcall_blacklisted on ia64, ppc64 and parisc64
autofs: don't get stuck in a loop if vfs_write() returns an error
mm/page_owner: avoid null pointer dereference
tools/vm/slabinfo: fix spelling mistake: "Ocurrences" -> "Occurrences"
fs/nilfs2: fix potential underflow in call to crc32_le
oom, suspend: fix oom_reaper vs. oom_killer_disable race
ocfs2: disable BUG assertions in reading blocks
mm, compaction: abort free scanner if split fails
mm: prevent KASAN false positives in kmemleak
mm/hugetlb: clear compound_mapcount when freeing gigantic pages
mm/swap.c: flush lru pvecs on compound page arrival
memcg: css_alloc should return an ERR_PTR value on error
memcg: mem_cgroup_migrate() may be called with irq disabled
hugetlb: fix nr_pmds accounting with shared page tables
Revert "mm: disable fault around on emulated access bit architecture"
Revert "mm: make faultaround produce old ptes"
mailmap: add Boris Brezillon's email
mailmap: add Antoine Tenart's email
mm, sl[au]b: add __GFP_ATOMIC to the GFP reclaim mask
mm: mempool: kasan: don't poot mempool objects in quarantine
... -
Pull rdma fixes from Doug Ledford:
"This is the second batch of queued up rdma patches for this rc cycle.There isn't anything really major in here. It's passed 0day,
linux-next, and local testing across a wide variety of hardware.
There are still a few known issues to be tracked down, but this should
amount to the vast majority of the rdma RC fixes.Round two of 4.7 rc fixes:
- A couple minor fixes to the rdma core
- Multiple minor fixes to hfi1
- Multiple minor fixes to mlx4/mlx4
- A few minor fixes to i40iw"* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (31 commits)
IB/srpt: Reduce QP buffer size
i40iw: Enable level-1 PBL for fast memory registration
i40iw: Return correct max_fast_reg_page_list_len
i40iw: Correct status check on i40iw_get_pble
i40iw: Correct CQ arming
IB/rdmavt: Correct qp_priv_alloc() return value test
IB/hfi1: Don't zero out qp->s_ack_queue in rvt_reset_qp
IB/hfi1: Fix deadlock with txreq allocation slow path
IB/mlx4: Prevent cross page boundary allocation
IB/mlx4: Fix memory leak if QP creation failed
IB/mlx4: Verify port number in flow steering create flow
IB/mlx4: Fix error flow when sending mads under SRIOV
IB/mlx4: Fix the SQ size of an RC QP
IB/mlx5: Fix wrong naming of port_rcv_data counter
IB/mlx5: Fix post send fence logic
IB/uverbs: Initialize ib_qp_init_attr with zeros
IB/core: Fix false search of the IB_SA_WELL_KNOWN_GUID
IB/core: Fix RoCE v1 multicast join logic issue
IB/core: Fix no default GIDs when netdevice reregisters
IB/hfi1: Send a pkey change event on driver pkey update
... -
Pull HID fix from Jiri Kosina:
"hiddev ioctl() validation fix from Scott Bauer"* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
HID: hiddev: validate num_values for HIDIOCGUSAGES, HIDIOCSUSAGES commands -
…l/git/groeck/linux-staging
Pull hwmon fix from Guenter Roeck:
"Improve fan type detection for dell-smm to prevent kernel hang"* tag 'hwmon-for-linus-v4.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (dell-smm) Cache fan_type() calls and change fan detection -
Pull ACPI fix from Rafael Wysocki:
"Stable-candidate fix for a deadlock in ACPICA introduced during the
4.5 development cycle by a commit attempting to improve the handling
of AML code that doesn't belong to any namespace objects in a given
definition block (Lv Zheng)"* tag 'acpi-4.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPICA: Namespace: Fix deadlock triggered by MLC support in dynamic table loading -
Pull power management fixes from Rafael Wysocki:
"Fix for a latent cpufreq driver bug uncovered by a recent ACPICA
change and several fixes for the devfreq framework, including one fix
for an issue introduced recently.Specifics:
- Fix a latent initialization issue in the pcc-cpufreq driver
(incorrect initial value of a structure field) that has been
uncovered by a recent ACPICA commit (Mike Galbraith).- Add a missing notification in an update_devfreq() error code path
forgotten by a recent devfreq commit (Chanwoo Choi).- Fix devfreq device frequency initialization (Lukasz Luba).
- Fix an incorrect IS_ERR() check in the devfreq framework discovered
by the Smatch checker (Dan Carpenter).- Drop two excessive put_device() calls from the devfreq framework
(MyungJoo Ham, Cai Zhiyong).- Fix a possible memory leak in the devfreq framework and drop an
unnecessary kfree() invocation from it (MyungJoo Ham)"* tag 'pm-4.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM / devfreq: Send the DEVFREQ_POSTCHANGE notification when target() is failed
cpufreq: pcc-cpufreq: Fix doorbell.access_width
PM / devfreq: fix initialization of current frequency in last status
PM / devfreq: exynos-nocp: Remove incorrect IS_ERR() check
PM / devfreq: remove double put_device
PM / devfreq: fix double call put_device
PM / devfreq: fix duplicated kfree on devfreq pointer
PM / devfreq: devm_kzalloc to have dev pointer more precisely -
Pull xen bug fixes from David Vrabel:
- fix x86 PV dom0 crash during early boot on some hardware
- fix two pciback bugs affects certain devices
- fix potential overflow when clearing page tables in x86 PV
* tag 'for-linus-4.7b-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen-pciback: return proper values during BAR sizing
x86/xen: avoid m2p lookup when setting early page table entries
xen/pciback: Fix conf_space read/write overlap check.
x86/xen: fix upper bound of pmd loop in xen_cleanhighmap()
xen/balloon: Fix declared-but-not-defined warning -
Pull arm64 fixes from Will Deacon:
"Here are a few more arm64 fixes, but things do finally appear to be
slowing down. The main fix is avoiding hibernation in a previously
unanticipated situation where we have CPUs parked in the kernel, but
it's all good stuff.- Fix icache/dcache sync for anonymous pages under migration
- Correct the ASID limit check
- Fix parallel builds of Image and Image.gz
- Refuse to hibernate when we have CPUs that we can't offline"* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: hibernate: Don't hibernate on systems with stuck CPUs
arm64: smp: Add function to determine if cpus are stuck in the kernel
arm64: mm: remove page_mapping check in __sync_icache_dcache
arm64: fix boot image dependencies to not generate invalid images
arm64: update ASID limit -
When I replaced kasprintf("%pf") with a direct call to
sprint_symbol_no_offset I must have broken the initcall blacklisting
feature on the arches where dereference_function_descriptor() is
non-trivial.Fixes: c8cdd2be213f (init/main.c: simplify initcall_blacklisted())
Link: http://lkml.kernel.org/r/1466027283-4065-1-git-send-email-linux@rasmusvillemoes.dk
Signed-off-by: Rasmus Villemoes
Cc: Yang Shi
Cc: Prarit Bhargava
Cc: Petr Mladek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
__vfs_write() returns a negative value in a error case.
Link: http://lkml.kernel.org/r/20160616083108.6278.65815.stgit@pluto.themaw.net
Signed-off-by: Andrey Vagin
Signed-off-by: Ian Kent
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
We have dereferenced page_ext before checking it. Lets check it first
and then used it.Fixes: f86e4271978b ("mm: check the return value of lookup_page_ext for all call sites")
Link: http://lkml.kernel.org/r/1465249059-7883-1-git-send-email-sudipm.mukherjee@gmail.com
Signed-off-by: Sudip Mukherjee
Acked-by: Vlastimil Babka
Cc: Joonsoo Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
trivial fix to spelling mistake
Link: http://lkml.kernel.org/r/1466672144-831-1-git-send-email-colin.king@canonical.com
Signed-off-by: Colin Ian King
Acked-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The value `bytes' comes from the filesystem which is about to be
mounted. We cannot trust that the value is always in the range we
expect it to be.Check its value before using it to calculate the length for the crc32_le
call. It value must be larger (or equal) sumoff + 4.This fixes a kernel bug when accidentially mounting an image file which
had the nilfs2 magic value 0x3434 at the right offset 0x406 by chance.
The bytes 0x01 0x00 were stored at 0x408 and were interpreted as a
s_bytes value of 1. This caused an underflow when substracting sumoff +
4 (20) in the call to crc32_le.BUG: unable to handle kernel paging request at ffff88021e600000
IP: crc32_le+0x36/0x100
...
Call Trace:
nilfs_valid_sb.part.5+0x52/0x60 [nilfs2]
nilfs_load_super_block+0x142/0x300 [nilfs2]
init_nilfs+0x60/0x390 [nilfs2]
nilfs_mount+0x302/0x520 [nilfs2]
mount_fs+0x38/0x160
vfs_kern_mount+0x67/0x110
do_mount+0x269/0xe00
SyS_mount+0x9f/0x100
entry_SYSCALL_64_fastpath+0x16/0x71Link: http://lkml.kernel.org/r/1466778587-5184-2-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Torsten Hilbrich
Tested-by: Torsten Hilbrich
Signed-off-by: Ryusuke Konishi
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Tetsuo has reported the following potential oom_killer_disable vs.
oom_reaper race:(1) freeze_processes() starts freezing user space threads.
(2) Somebody (maybe a kenrel thread) calls out_of_memory().
(3) The OOM killer calls mark_oom_victim() on a user space thread
P1 which is already in __refrigerator().
(4) oom_killer_disable() sets oom_killer_disabled = true.
(5) P1 leaves __refrigerator() and enters do_exit().
(6) The OOM reaper calls exit_oom_victim(P1) before P1 can call
exit_oom_victim(P1).
(7) oom_killer_disable() returns while P1 not yet finished
(8) P1 perform IO/interfere with the freezer.This situation is unfortunate. We cannot move oom_killer_disable after
all the freezable kernel threads are frozen because the oom victim might
depend on some of those kthreads to make a forward progress to exit so
we could deadlock. It is also far from trivial to teach the oom_reaper
to not call exit_oom_victim() because then we would lose a guarantee of
the OOM killer and oom_killer_disable forward progress because
exit_mm->mmput might block and never call exit_oom_victim.It seems the easiest way forward is to workaround this race by calling
try_to_freeze_tasks again after oom_killer_disable. This will make sure
that all the tasks are frozen or it bails out.Fixes: 449d777d7ad6 ("mm, oom_reaper: clear TIF_MEMDIE for all tasks queued for oom_reaper")
Link: http://lkml.kernel.org/r/1466597634-16199-1-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko
Reported-by: Tetsuo Handa
Cc: "Rafael J. Wysocki"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
According to some high-load testing, these two BUG assertions were
encountered, this led system panic. Actually, there were some
discussions about removing these two BUG() assertions, it would not
bring any side effect.Then, I did the the following changes,
1) use the existing macro CATCH_BH_JBD_RACES to wrap BUG() in the
ocfs2_read_blocks_sync function like before.2) disable the macro CATCH_BH_JBD_RACES in Makefile by default.
Link: http://lkml.kernel.org/r/1466574294-26863-1-git-send-email-ghe@suse.com
Signed-off-by: Gang He
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Junxiao Bi
Cc: Joseph Qi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
If the memory compaction free scanner cannot successfully split a free
page (only possible due to per-zone low watermark), terminate the free
scanner rather than continuing to scan memory needlessly. If the
watermark is insufficient for a free page of order order, then
terminate the scanner since all future splits will also likely fail.This prevents the compaction freeing scanner from scanning all memory on
very large zones (very noticeable for zones > 128GB, for instance) when
all splits will likely fail while holding zone->lock.compaction_alloc() iterating a 128GB zone has been benchmarked to take
over 400ms on some systems whereas any free page isolated and ready to
be split ends up failing in split_free_page() because of the low
watermark check and thus the iteration continues.The next time compaction occurs, the freeing scanner will likely start
at the end of the zone again since no success was made previously and we
get the same lengthy iteration until the zone is brought above the low
watermark. All thp page faults can take >400ms in such a state without
this fix.Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1606211820350.97086@chino.kir.corp.google.com
Signed-off-by: David Rientjes
Acked-by: Vlastimil Babka
Cc: Minchan Kim
Cc: Joonsoo Kim
Cc: Mel Gorman
Cc: Hugh Dickins
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
When kmemleak dumps contents of leaked objects it reads whole objects
regardless of user-requested size. This upsets KASAN. Disable KASAN
checks around object dump.Link: http://lkml.kernel.org/r/1466617631-68387-1-git-send-email-dvyukov@google.com
Signed-off-by: Dmitry Vyukov
Acked-by: Catalin Marinas
Cc: Andrey Ryabinin
Cc: Alexander Potapenko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
While working on s390 support for gigantic hugepages I ran into the
following "Bad page state" warning when freeing gigantic pages:BUG: Bad page state in process bash pfn:580001
page:000003d116000040 count:0 mapcount:0 mapping:ffffffff00000000 index:0x0
flags: 0x7fffc0000000000()
page dumped because: non-NULL mappingThis is because page->compound_mapcount, which is part of a union with
page->mapping, is initialized with -1 in prep_compound_gigantic_page(),
and not cleared again during destroy_compound_gigantic_page(). Fix this
by clearing the compound_mapcount in destroy_compound_gigantic_page()
before clearing compound_head.Interestingly enough, the warning will not show up on x86_64, although
this should not be architecture specific. Apparently there is an
endianness issue, combined with the fact that the union contains both a
64 bit ->mapping pointer and a 32 bit atomic_t ->compound_mapcount as
members. The resulting bogus page->mapping on x86_64 therefore contains
00000000ffffffff instead of ffffffff00000000 on s390, which will falsely
trigger the PageAnon() check in free_pages_prepare() because
page->mapping & PAGE_MAPPING_ANON is true on little-endian architectures
like x86_64 in this case (the page is not compound anymore,
->compound_head was already cleared before). As a result, page->mapping
will be cleared before doing the checks in free_pages_check().Not sure if the bogus "PageAnon() returning true" on x86_64 for the
first tail page of a gigantic page (at this stage) has other theoretical
implications, but they would also be fixed with this patch.Link: http://lkml.kernel.org/r/1466612719-5642-1-git-send-email-gerald.schaefer@de.ibm.com
Signed-off-by: Gerald Schaefer
Reviewed-by: Mike Kravetz
Cc: Luiz Capitulino
Cc: Naoya Horiguchi
Cc: Hillf Danton
Cc: "Kirill A . Shutemov"
Cc: Dave Hansen
Cc: Paul Gortmaker
Cc: "Aneesh Kumar K . V"
Cc: Martin Schwidefsky
Cc: Heiko Carstens
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds