26 Mar, 2018
16 commits
-
stat(1) is not standardized and different implementations have their own
(conflicting) flags for querying the size of a file.ls(1) provides the same information (value of st.st_size) in the 5th
column, except when the file is a character or block device. This output
is standardized[0]. The -n option turns on -l, which writes lines
formatted like"%s %u %s %s %u %s %s\n", , ,
, , , ,
but instead of writing the and , it writes the
numeric owner and group IDs (this avoids /etc/passwd and /etc/group
lookups as well as potential field splitting issues).The field is specified as "the value that would be returned for
the file in the st_size field of struct stat".To avoid duplicating logic in several locations in the tree, create
scripts/file-size.sh and update callers to use that instead of stat(1).[0] http://pubs.opengroup.org/onlinepubs/9699919799/utilities/ls.html#tag_20_73_10
Signed-off-by: Michael Forney
Signed-off-by: Masahiro Yamada -
If CONFIG_TRIM_UNUSED_KSYMS is enabled and the kernel is built from
a pristine state, the vmlinux is linked twice.[1] A user runs 'make'
[2] First build with empty autoksyms.h
[3] adjust_autoksyms.sh updates autoksyms.h and recurses 'make vmlinux'
--------(begin sub-make)--------
[4] Second build with new autoksyms.h[5] link-vmlinux.sh is invoked because vmlinux is missing
---------(end sub-make)---------[6] link-vmlinux.sh is invoked again despite vmlinux is up-to-date.
The reason of [6] is probably because Make already decided to update
vmlinux at the time of [2] because vmlinux was missing when Make
built up the dependency graph.Because if_changed is implemented based on $?, this issue can be
narrowed down to how Make handles $?.You can test it with the following simple code:
[Test Makefile]
A: B
@echo newer prerequisite: $?
cp B AB: C
cp C B
touch A[Result]
$ rm -f A B
$ touch C
$ make
cp C B
touch A
newer prerequisite: B
cp B AHere, 'A' has been touched in the recipe of 'B'. So, the dependency
'A: B' has already been met before the recipe of 'A' is executed.
However, Make does not notice the fact that the recipe of 'B' also
updates 'A' as a side-effect.The situation is similar in this case; the vmlinux has actually been
updated in the vmlinux_prereq target. Make cannot predict this, so
judges the vmlinux is old.link-vmlinux.sh is costly, so it is better to not run it when unneeded.
Split CONFIG_TRIM_UNUSED_KSYMS recursion to a dedicated target.The reason of commit 2441e78b1919 ("kbuild: better abstract vmlinux
sequential prerequisites") was to cater to CONFIG_BUILD_DOCSRC, but
it was later removed by commit 184892925118 ("samples: move blackfin
gptimers-example from Documentation").Signed-off-by: Masahiro Yamada
Acked-by: Nicolas Pitre -
The idea of using fixdep was inspired by Kconfig, but autoksyms
belongs to a different group. So, I want to move those touched
files under include/config/ksym/ to include/ksym/.The directory include/ksym/ can be removed by 'make clean' because
it is meaningless for the external module building.Signed-off-by: Masahiro Yamada
Acked-by: Nicolas Pitre -
The external module building does not need to parse this code because
KBUILD_MODULES is always set anyway.Move this code inside the "ifeq ($(KBUILD_EXTMOD),) ... endif" block.
Signed-off-by: Masahiro Yamada
Acked-by: Nicolas Pitre -
Commit d3fc425e819b ("kbuild: make sure autoksyms.h exists early")
moved the code that touches autoksyms.h to scripts/kconfig/Makefile
with obscure reason.From Nicolas' comment [1], he did not seem to be sure about the root
cause.I guess I figured it out, so here is a fix-up I think is more correct.
According to the error log in the original post [2], the build failed
in scripts/mod/devicetable-offsets.cscripts/mod/Makefile is descended from scripts/Makefile, which is
invoked from the top-level Makefile by the 'scripts' target.To build vmlinux and/or modules, Kbuild descend into $(vmlinux-dirs).
This depends on 'prepare' and 'scripts' as follows:$(vmlinux-dirs): prepare scripts
Because there is no dependency between 'prepare' and 'scripts', the
parallel building can execute them simultaneously.'prepare' depends on 'prepare1', which touched autoksyms.h, while
'scripts' descends into script/, then scripts/mod/, which needs
if CONFIG_TRIM_UNUSED_KSYMS. It was the
reason of the race.I am not happy to have unrelated code in the Kconfig Makefile, so
getting it back to the top Makefile.I removed the standalone test target because I want to use it to
create an empty autoksyms.h file. Here is a little improvement;
unnecessary autoksyms.h is not created when CONFIG_TRIM_UNUSED_KSYMS
is disabled.[1] https://lkml.org/lkml/2016/11/30/734
[2] https://lkml.org/lkml/2016/11/30/531Signed-off-by: Masahiro Yamada
Acked-by: Nicolas Pitre -
Just a trivial change to prepare for the next commit.
This target is still invisible from external module building.Signed-off-by: Masahiro Yamada
-
The comment mentions it creates autoksyms.h in case it is missing,
but the actual code touches it when it does exists.The build system creates it anyway because and
need it.The code would not have worked as intended, and people have not
noticed it. This is a proof that we can simply remove it.Signed-off-by: Masahiro Yamada
Acked-by: Nicolas Pitre -
Currently LDFLAGS is not cleared, so same flags are accumulated in
LDFLAGS when the top Makefile is recursively invoked.I found unneeded rebuild for ARCH=arm64 when CONFIG_TRIM_UNUSED_KSYMS
is enabled. If include/generated/autoksyms.h is updated, the top
Makefile is recursively invoked, then arch/arm64/Makefile adds one
more '-maarch64linux'. Due to the command line change, modules are
rebuilt needlessly.Signed-off-by: Masahiro Yamada
Acked-by: Nicolas Pitre -
Documentation/kbuild/makefiles.txt lists variables used in Makefile
whereas Documentation/kbuild/kbuild.txt describes user assignable
parameters given via environments or the command line.The top Makefile and arch/*/Makefile accumulate proper linker flags to
LDFLAGS_vmlinux. So, users can not override it from the command line.
Generally, per-file options are not supposed to be user-assignable.
Remove the misleading entry from kbuild.txt.If we need a way to append user-specific flags for linking the kernel,
LDFLAGS_KERNEL would be a consistent choice because we already expose
LDFLAGS_MODULE counter-part to users.Signed-off-by: Masahiro Yamada
-
Documentation/kbuild/makefiles.txt lists variables used in Makefile
whereas Documentation/kbuild/kbuild.txt describes user assignable
parameters given via environments or the command line.LDFLAGS_MODULE is a command line interface, so it should be dropped
from makefiles.txt.Some lines below in this file, it is clearly explained that
KBUILD_LDFLAGS_MODULE is the right one for the internal use:KBUILD_LDFLAGS_MODULE Options for $(LD) when linking modules
$(KBUILD_LDFLAGS_MODULE) is used to add arch-specific options
used when linking modules. This is often a linker script.
From commandline LDFLAGS_MODULE shall be used (see kbuild.txt).Then, kbuild.txt explains LDFLAGS_MODULE, like follows:
LDFLAGS_MODULE
--------------------------------------------------
Additional options used for $(LD) when linking modules.Signed-off-by: Masahiro Yamada
-
Currently, linker options are tested by the coordination of $(CC) and
$(LD) because $(LD) needs some object to link.As commit 86a9df597cdd ("kbuild: fix linker feature test macros when
cross compiling with Clang") addressed, we need to make sure $(CC)
and $(LD) agree the underlying architecture of the passed object.This could be a bit complex when we combine tools from different groups.
For example, we can use clang for $(CC), but we still need to rely on
GCC toolchain for $(LD).So, I was searching for a way of standalone testing of linker options.
A trick I found is to use '-v'; this not only prints the version string,
but also tests if the given option is recognized.If a given option is supported,
$ aarch64-linux-gnu-ld -v --fix-cortex-a53-843419
GNU ld (Linaro_Binutils-2017.11) 2.28.2.20170706
$ echo $?
0If unsupported,
$ aarch64-linux-gnu-ld -v --fix-cortex-a53-843419
GNU ld (crosstool-NG linaro-1.13.1-4.7-2013.04-20130415 - Linaro GCC 2013.04) 2.23.1
aarch64-linux-gnu-ld: unrecognized option '--fix-cortex-a53-843419'
aarch64-linux-gnu-ld: use the --help option for usage information
$ echo $?
1Gold works likewise.
$ aarch64-linux-gnu-ld.gold -v --fix-cortex-a53-843419
GNU gold (Linaro_Binutils-2017.11 2.28.2.20170706) 1.14
masahiro@pug:~/ref/linux$ echo $?
0
$ aarch64-linux-gnu-ld.gold -v --fix-cortex-a53-999999
GNU gold (Linaro_Binutils-2017.11 2.28.2.20170706) 1.14
aarch64-linux-gnu-ld.gold: --fix-cortex-a53-999999: unknown option
aarch64-linux-gnu-ld.gold: use the --help option for usage information
$ echo $?
1LLD too.
$ ld.lld -v --gc-sections
LLD 7.0.0 (http://llvm.org/git/lld.git 4a0e4190e74cea19f8a8dc625ccaebdf8b5d1585) (compatible with GNU linkers)
$ echo $?
0
$ ld.lld -v --fix-cortex-a53-843419
LLD 7.0.0 (http://llvm.org/git/lld.git 4a0e4190e74cea19f8a8dc625ccaebdf8b5d1585) (compatible with GNU linkers)
$ echo $?
0
$ ld.lld -v --fix-cortex-a53-999999
ld.lld: error: unknown argument: --fix-cortex-a53-999999
LLD 7.0.0 (http://llvm.org/git/lld.git 4a0e4190e74cea19f8a8dc625ccaebdf8b5d1585) (compatible with GNU linkers)
$ echo $?
1Signed-off-by: Masahiro Yamada
Tested-by: Nick Desaulniers -
Support parallel building of clean, config, and build targets in a
single command.For example,
make -j clean all
or
make -j mrproper defconfig all
They should be handled one by one.
Signed-off-by: Masahiro Yamada
-
Incremental linking is gone, so rename built-in.o to built-in.a, which
is the usual extension for archive files.This patch does two things, first is a simple search/replace:
git grep -l 'built-in\.o' | xargs sed -i 's/built-in\.o/built-in\.a/g'
The second is to invert nesting of nested text manipulations to avoid
filtering built-in.a out from libs-y2:-libs-y2 := $(filter-out %.a, $(patsubst %/, %/built-in.a, $(libs-y)))
+libs-y2 := $(patsubst %/, %/built-in.a, $(filter-out %.a, $(libs-y)))Signed-off-by: Nicholas Piggin
Signed-off-by: Masahiro Yamada -
This removes the old `ld -r` incremental link option, which has not
been selected by any architecture since June 2017.Signed-off-by: Nicholas Piggin
Signed-off-by: Masahiro Yamada -
* Use BREs where EREs aren't necessary.
* Pass -E instead of -r to use EREs. This will be standardized in the
next POSIX revision[0]. GNU sed supports this since 4.2 (May 2009),
and busybox since 1.22.0 (Jan 2014).
* Use the [:space:] character class instead of ` \t` in bracket
expressions. In bracket expressions, POSIX says that loses
its special meaning, so a conforming implementation cannot expand \t
to [1].
* In BREs, use interval expressions (\{n,m\}) instead of non-standard
features like \+ and \?.
* Use a loop instead of -s flag.There are still plenty of other cases of non-standard sed invocations
(use of ERE features in BREs, in-place editing), but this fixes some
core ones.[0] http://austingroupbugs.net/view.php?id=528
[1] http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03_05Signed-off-by: Michael Forney
Signed-off-by: Masahiro Yamada -
Based on gcc-version.sh, clang-version.sh prints out the correct
version of clang.Signed-off-by: Sami Tolvanen
Tested-by: Nick Desaulniers
Signed-off-by: Masahiro Yamada
12 Mar, 2018
8 commits
-
Pull x86/pti updates from Thomas Gleixner:
"Yet another pile of melted spectrum related updates:- Drop native vsyscall support finally as it causes more trouble than
benefit.- Make microcode loading more robust. There were a few issues
especially related to late loading which are now surfacing because
late loading of the IB* microcodes addressing spectre issues has
become more widely used.- Simplify and robustify the syscall handling in the entry code
- Prevent kprobes on the entry trampoline code which lead to kernel
crashes when the probe hits before CR3 is updated- Don't check microcode versions when running on hypervisors as they
are considered as lying anyway.- Fix the 32bit objtool build and a coment typo"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/kprobes: Fix kernel crash when probing .entry_trampoline code
x86/pti: Fix a comment typo
x86/microcode: Synchronize late microcode loading
x86/microcode: Request microcode on the BSP
x86/microcode/intel: Look into the patch cache first
x86/microcode: Do not upload microcode if CPUs are offline
x86/microcode/intel: Writeback and invalidate caches before updating microcode
x86/microcode/intel: Check microcode revision before updating sibling threads
x86/microcode: Get rid of struct apply_microcode_ctx
x86/spectre_v2: Don't check microcode versions when running under hypervisors
x86/vsyscall/64: Drop "native" vsyscalls
x86/entry/64/compat: Save one instruction in entry_INT80_compat()
x86/entry: Do not special-case clone(2) in compat entry
x86/syscalls: Use COMPAT_SYSCALL_DEFINEx() macros for x86-only compat syscalls
x86/syscalls: Use proper syscall definition for sys_ioperm()
x86/entry: Remove stale syscall prototype
x86/syscalls/32: Simplify $entry == $compat entries
objtool: Fix 32-bit build -
Pull timer fix from Thomas Gleixner:
"Just a single fix which adds a missing Kconfig dependency to avoid
unmet dependency warnings"* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clocksource/atmel-st: Add 'depends on HAS_IOMEM' to fix unmet dependency -
Pull RAS fixes from Thomas Gleixner:
"Two small fixes for RAS/MCE:- Serialize sysfs changes to avoid concurrent modificaiton of
underlying data- Add microcode revision to Machine Check records. This should have
been there forever, but now with the broken microcode versions in
the wild it has become important"* 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/MCE: Serialize sysfs changes
x86/MCE: Save microcode revision in machine check records -
Pull perf updates from Thomas Gleixner:
"Another set of perf updates:- Fix a Skylake Uncore event format declaration
- Prevent perf pipe mode from crahsing which was caused by a missing
buffer allocation- Make the perf top popup message which tells the user that it uses
fallback mode on older kernels a debug message.- Make perf context rescheduling work correcctly
- Robustify the jump error drawing in perf browser mode so it does
not try to create references to NULL initialized offset entries- Make trigger_on() robust so it does not enable the trigger before
everything is set up correctly to handle it- Make perf auxtrace respect the --no-itrace option so it does not
try to queue AUX data for decoding.- Prevent having different number of field separators in CVS output
lines when a counter is not supported.- Make the perf kallsyms man page usage behave like it does for all
other perf commands.- Synchronize the kernel headers"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: Fix ctx_event_type in ctx_resched()
perf tools: Fix trigger class trigger_on()
perf auxtrace: Prevent decoding when --no-itrace
perf stat: Fix CVS output format for non-supported counters
tools headers: Sync x86's cpufeatures.h
tools headers: Sync copy of kvm UAPI headers
perf record: Fix crash in pipe mode
perf annotate browser: Be more robust when drawing jump arrows
perf top: Fix annoying fallback message on older kernels
perf kallsyms: Fix the usage on the man page
perf/x86/intel/uncore: Fix Skylake UPI event format -
Pull locking fix from Thomas Gleixner:
"rt_mutex_futex_unlock() grew a new irq-off call site, but the function
assumes that its always called from irq enabled context.Use (un)lock_irqsafe() to handle the new call site correctly"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
rtmutex: Make rt_mutex_futex_unlock() safe for irq-off callsites -
Pull dmaengine fixes from Vinod Koul:
"Two small fixes are for this cycle:- fix max_chunk_size for rcar-dmac for R-Car Gen3
- fix clock resource of mv_xor_v2"
* tag 'dmaengine-fix-4.16-rc5' of git://git.infradead.org/users/vkoul/slave-dma:
dmaengine: mv_xor_v2: Fix clock resource by adding a register clock
dmaengine: rcar-dmac: fix max_chunk_size for R-Car Gen3 -
Pull GPIO fix from Linus Walleij:
"This is a single GPIO fix for the v4.16 series affecting the Renesas
driver, and fixes wakeup from external stuff"* tag 'gpio-v4.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpio: rcar: Use wakeup_path i.s.o. explicit clock handling
11 Mar, 2018
7 commits
-
On the CP110 components which are present on the Armada 7K/8K SoC we need
to explicitly enable the clock for the registers. However it is not
needed for the AP8xx component, that's why this clock is optional.With this patch both clock have now a name, but in order to be backward
compatible, the name of the first clock is not used. It allows to still
use this clock with a device tree using the old binding.Signed-off-by: Gregory CLEMENT
Reviewed-by: Rob Herring
Signed-off-by: Vinod Koul -
…t/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- make fixdep parse kconfig.h to fix missing rebuild
- replace hyphens with underscores in builtin DTB label names
- fix typos
* tag 'kbuild-fixes-v4.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: Handle builtin dtb file names containing hyphens
scripts/bloat-o-meter: fix typos in help
fixdep: do not ignore kconfig.h
fixdep: remove some false CONFIG_ matches
fixdep: remove stale references to uml-config.h -
Pull watchdog fixes from Wim Van Sebroeck:
- f71808e_wdt: Fix magic close handling
- sbsa: 32-bit read fix for WCV
- hpwdt: Remove legacy NMI sourcing
* tag 'linux-watchdog-4.16-fixes-2' of git://www.linux-watchdog.org/linux-watchdog:
watchdog: hpwdt: Remove legacy NMI sourcing.
watchdog: sbsa: use 32-bit read for WCV
watchdog: f71808e_wdt: Fix magic close handling -
Pull block fixes from Jens Axboe:
- a xen-blkfront fix from Bhavesh with a multiqueue fix when
detaching/re-attaching- a few important NVMe fixes, including a revert for a sysfs fix that
caused some user space confusion- two bcache fixes by way of Michael Lyle
- a loop regression fix, fixing an issue with lost writes on DAX.
* tag 'for-linus-20180309' of git://git.kernel.dk/linux-block:
loop: Fix lost writes caused by missing flag
nvme_fc: rework sqsize handling
nvme-fabrics: Ignore nr_io_queues option for discovery controllers
xen-blkfront: move negotiate_mq to cover all cases of new VBDs
Revert "nvme: create 'slaves' and 'holders' entries for hidden controllers"
bcache: don't attach backing with duplicate UUID
bcache: fix crashes in duplicate cache device register
nvme: pci: pass max vectors as num_possible_cpus() to pci_alloc_irq_vectors
nvme-pci: Fix EEH failure on ppc -
…/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- Fix an uninitialized variable false warning in dm bufio
- Fix DM's passthrough ioctl support to be race free against an
underlying device being removed.- Fix corner-case of DM raid resync reporting if/when the raid becomes
degraded during resync; otherwise automated raid repair will fail.- A few DM multipath fixes to make non-SCSI optimizations, that were
introduced during the 4.16 merge, useful for all non-SCSI devices,
rather than narrowly define this non-SCSI mode in terms of "nvme".This allows the removal of "queue_mode nvme" that really didn't need
to be introduced. Instead DM core will internalize whether
nvme-specific IO submission optimizations are doable and DM multipath
will only do SCSI-specific device handler operations if SCSI is in
use.* tag 'for-4.16/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm table: allow upgrade from bio-based to specialized bio-based variant
dm mpath: remove unnecessary NVMe branching in favor of scsi_dh checks
dm table: fix "nvme" test
dm raid: fix incorrect sync_ratio when degraded
dm: use blkdev_get rather than bdgrab when issuing pass-through ioctl
dm bufio: avoid false-positive Wmaybe-uninitialized warning -
Pull rdma fixes from Doug Ledford:
- Various driver bug fixes in mlx5, mlx4, bnxt_re and qedr, ranging
from bugs under load to bad error case handling- There in one largish patch fixing the locking in bnxt_re to avoid a
machine hard lock situation- A few core bugs on error paths
- A patch to reduce stack usage in the new CQ API
- One mlx5 regression introduced in this merge window
- There were new syzkaller scripts written for the RDMA subsystem and
we are fixing issues found by the bot- One of the commits (aa0de36a40f4 “RDMA/mlx5: Fix integer overflow
while resizing CQ”) is missing part of the commit log message and one
of the SOB lines. The original patch was from Leon Romanovsky, and a
cut-n-paste separator in the commit message confused patchworks which
then put the end of message separator in the wrong place in the
downloaded patch, and I didn’t notice in time. The patch made it into
the official branch, and the only way to fix it in-place was to
rebase. Given the pain that a rebase causes, and the fact that the
patch has relevant tags for stable and syzkaller, a revert of the
munged patch and a reapplication of the original patch with the log
message intact was done.* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (25 commits)
RDMA/mlx5: Fix integer overflow while resizing CQ
Revert "RDMA/mlx5: Fix integer overflow while resizing CQ"
RDMA/ucma: Check that user doesn't overflow QP state
RDMA/mlx5: Fix integer overflow while resizing CQ
RDMA/ucma: Limit possible option size
IB/core: Fix possible crash to access NULL netdev
RDMA/bnxt_re: Avoid Hard lockup during error CQE processing
RDMA/core: Reduce poll batch for direct cq polling
IB/mlx5: Fix an error code in __mlx5_ib_modify_qp()
IB/mlx5: When not in dual port RoCE mode, use provided port as native
IB/mlx4: Include GID type when deleting GIDs from HW table under RoCE
IB/mlx4: Fix corruption of RoCEv2 IPv4 GIDs
RDMA/qedr: Fix iWARP write and send with immediate
RDMA/qedr: Fix kernel panic when running fio over NFSoRDMA
RDMA/qedr: Fix iWARP connect with port mapper
RDMA/qedr: Fix ipv6 destination address resolution
IB/core : Add null pointer check in addr_resolve
RDMA/bnxt_re: Fix the ib_reg failure cleanup
RDMA/bnxt_re: Fix incorrect DB offset calculation
RDMA/bnxt_re: Unconditionly fence non wire memory operations
... -
Pull x86 platform driver fixes from Darren Hart:
"Correct a module loading race condition between the DELL_SMBIOS
backend modules and the first user by converting them to bool features
of the DELL_SMBIOS driver. Fixup the resulting Kconfig dependency
issue with DCDBAS"* tag 'platform-drivers-x86-v4.16-6' of git://git.infradead.org/linux-platform-drivers-x86:
platform/x86: dell-smbios: Resolve dependency error on DCDBAS
platform/x86: Allow for SMBIOS backend defaults
platform/x86: dell-smbios: Link all dell-smbios-* modules together
platform/x86: dell-smbios: Rename dell-smbios source to dell-smbios-base
platform/x86: dell-smbios: Correct some style warnings
10 Mar, 2018
9 commits
-
Pull KVM fixes from Radim Krčmář:
"PPC:- Fix guest time accounting in the host
- Fix large-page backing for radix guests on POWER9
- Fix HPT guests on POWER9 backed by 2M or 1G pages
- Compile fixes for some configs and gcc versions
s390:
- Fix random memory corruption when running as guest2 (e.g. KVM in
LPAR) and starting guest3 (e.g. nested KVM) with many CPUs- Export forgotten io interrupt delivery statistics counter"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: s390: fix memory overwrites when not using SCA entries
KVM: PPC: Book3S HV: Fix guest time accounting with VIRT_CPU_ACCOUNTING_GEN
KVM: PPC: Book3S HV: Fix VRMA initialization with 2MB or 1GB memory backing
KVM: PPC: Book3S HV: Fix handling of large pages in radix page fault handler
KVM: s390: provide io interrupt kvm_stat
KVM: PPC: Book3S: Fix compile error that occurs with some gcc versions
KVM: PPC: Fix compile error that occurs when CONFIG_ALTIVEC=n -
Pull xen fix from Juergen Gross:
"Just one fix for the correct error handling after a failed
device_register()"* tag 'for-linus-4.16a-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen: xenbus: use put_device() instead of kfree() -
Pull arm64 fixes from Catalin Marinas:
- The SMCCC firmware interface for the spectre variant 2 mitigation has
been updated to allow the discovery of whether the CPU needs the
workaround. This pull request relaxes the kernel check on the return
value from firmware.- Fix the commit allowing changing from global to non-global page table
entries which inadvertently disallowed other safe attribute changes.- Fix sleeping in atomic during the arm_perf_teardown_cpu() code.
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: Relax ARM_SMCCC_ARCH_WORKAROUND_1 discovery
arm_pmu: Use disable_irq_nosync when disabling SPI in CPU teardown hook
arm64: mm: fix thinko in non-global page table attribute check -
Pull Documentation build fix from Jonathan Corbet:
"The Sphinx 1.7 release broke the build process for reasons that are
mostly our fault.This is a single fix cherry-picked from docs-next that restores docs
buildability for all supported Sphinx versions"* tag 'docs-4.16-fix' of git://git.lwn.net/linux:
Documentation/sphinx: Fix Directive import error -
Merge misc fixes from Andrew Morton:
"8 fixes"* emailed patches from Andrew Morton :
lib/test_kmod.c: fix limit check on number of test devices created
selftests/vm/run_vmtests: adjust hugetlb size according to nr_cpus
mm/page_alloc: fix memmap_init_zone pageblock alignment
mm/memblock.c: hardcode the end_pfn being -1
mm/gup.c: teach get_user_pages_unlocked to handle FOLL_NOWAIT
lib/bug.c: exclude non-BUG/WARN exceptions from report_bug()
bug: use %pB in BUG and stack protector failure
hugetlb: fix surplus pages accounting -
As reported by Dan the parentheses is in the wrong place, and since
unlikely() call returns either 0 or 1 it's never less than zero. The
second issue is that signed integer overflows like "INT_MAX + 1" are
undefined behavior.Since num_test_devs represents the number of devices, we want to stop
prior to hitting the max, and not rely on the wrap arround at all. So
just cap at num_test_devs + 1, prior to assigning a new device.Link: http://lkml.kernel.org/r/20180224030046.24238-1-mcgrof@kernel.org
Fixes: d9c6a72d6fa2 ("kmod: add test driver to stress test the module loader")
Reported-by: Dan Carpenter
Signed-off-by: Luis R. Rodriguez
Acked-by: Kees Cook
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Fix userfaultfd_hugetlb on hosts which have more than 64 cpus.
---------------------------
running userfaultfd_hugetlb
---------------------------
invalid MiB
Usage:
[FAIL]Via userfaultfd.c we can know, hugetlb_size needs to meet hugetlb_size
>= nr_cpus * hugepage_size. hugepage_size is often 2M, so when host
cpus > 64, it requires more than 128M.[zhijianx.li@intel.com: update changelog/comments and variable name]
Link: http://lkml.kernel.org/r/20180302024356.83359-1-zhijianx.li@intel.com
Link: http://lkml.kernel.org/r/20180303125027.81638-1-zhijianx.li@intel.com
Link: http://lkml.kernel.org/r/20180302024356.83359-1-zhijianx.li@intel.com
Signed-off-by: Li Zhijian
Cc: Shuah Khan
Cc: SeongJae Park
Cc: Philippe Ombredanne
Cc: Aneesh Kumar K.V
Cc: Mike Kravetz
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
where possible") introduced a bug where move_freepages() triggers a
VM_BUG_ON() on uninitialized page structure due to pageblock alignment.
To fix this, simply align the skipped pfns in memmap_init_zone() the
same way as in move_freepages_block().Seen in one of the RHEL reports:
crash> log | grep -e BUG -e RIP -e Call.Trace -e move_freepages_block -e rmqueue -e freelist -A1
kernel BUG at mm/page_alloc.c:1389!
invalid opcode: 0000 [#1] SMP
--
RIP: 0010:[] [] move_freepages+0x15e/0x160
RSP: 0018:ffff88054d727688 EFLAGS: 00010087
--
Call Trace:
[] move_freepages_block+0x73/0x80
[] __rmqueue+0x263/0x460
[] get_page_from_freelist+0x7e1/0x9e0
[] __alloc_pages_nodemask+0x176/0x420
--
RIP [] move_freepages+0x15e/0x160
RSPcrash> page_init_bug -v | grep RAM
1000 - 9bfff System RAM (620.00 KiB)
100000 - 430bffff System RAM ( 1.05 GiB = 1071.75 MiB = 1097472.00 KiB)
4b0c8000 - 4bf9cfff System RAM ( 14.83 MiB = 15188.00 KiB)
4bfac000 - 646b1fff System RAM (391.02 MiB = 400408.00 KiB)
7b788000 - 7b7fffff System RAM (480.00 KiB)
100000000 - 67fffffff System RAM ( 22.00 GiB)crash> page_init_bug | head -6
7b788000 - 7b7fffff System RAM (480.00 KiB)
1fffff00000000 0 1 DMA32 4096 1048575
505736 505344 505855
0 0 0 DMA 1 4095
1fffff00000400 0 1 DMA32 4096 1048575
BUG, zones differ!Note that this range follows two not populated sections
68000000-77ffffff in this zone. 7b788000-7b7fffff is the first one
after a gap. This makes memmap_init_zone() skip all the pfns up to the
beginning of this range. But this range is not pageblock (2M) aligned.
In fact no range has to be.crash> kmem -p 77fff000 78000000 7b5ff000 7b600000 7b787000 7b788000
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
ffffea0001e00000 78000000 0 0 0 0
ffffea0001ed7fc0 7b5ff000 0 0 0 0
ffffea0001ed8000 7b600000 0 0 0 0 <<<<
ffffea0001ede1c0 7b787000 0 0 0 0
ffffea0001ede200 7b788000 0 0 1 1fffff00000000Top part of page flags should contain nodeid and zonenr, which is not
the case for page ffffea0001ed8000 here (<<< log | grep -o fffea0001ed[^\ ]* | sort -u
fffea0001ed8000
fffea0001eded20
fffea0001edffc0crash> bt -r | grep -o fffea0001ed[^\ ]* | sort -u
fffea0001ed8000
fffea0001eded00
fffea0001eded20
fffea0001edffc0Initialization of the whole beginning of the section is skipped up to
the start of the range due to the commit b92df1de5d28. Now any code
calling move_freepages_block() (like reusing the page from a freelist as
in this example) with a page from the beginning of the range will get
the page rounded down to start_page ffffea0001ed8000 and passed to
move_freepages() which crashes on assertion getting wrong zonenr.> VM_BUG_ON(page_zone(start_page) != page_zone(end_page));
Note, page_zone() derives the zone from page flags here.
From similar machine before commit b92df1de5d28:
crash> kmem -p 77fff000 78000000 7b5ff000 7b600000 7b7fe000 7b7ff000
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
fffff73941e00000 78000000 0 0 1 1fffff00000000
fffff73941ed7fc0 7b5ff000 0 0 1 1fffff00000000
fffff73941ed8000 7b600000 0 0 1 1fffff00000000
fffff73941edff80 7b7fe000 0 0 1 1fffff00000000
fffff73941edffc0 7b7ff000 ffff8e67e04d3ae0 ad84 1 1fffff00020068 uptodate,lru,active,mappedtodiskAll the pages since the beginning of the section are initialized.
move_freepages()' not gonna blow up.The same machine with this fix applied:
crash> kmem -p 77fff000 78000000 7b5ff000 7b600000 7b7fe000 7b7ff000
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
ffffea0001e00000 78000000 0 0 0 0
ffffea0001e00000 7b5ff000 0 0 0 0
ffffea0001ed8000 7b600000 0 0 1 1fffff00000000
ffffea0001edff80 7b7fe000 0 0 1 1fffff00000000
ffffea0001edffc0 7b7ff000 ffff88017fb13720 8 2 1fffff00020068 uptodate,lru,active,mappedtodiskAt least the bare minimum of pages is initialized preventing the crash
as well.Customers started to report this as soon as 7.4 (where b92df1de5d28 was
merged in RHEL) was released. I remember reports from
September/October-ish times. It's not easily reproduced and happens on
a handful of machines only. I guess that's why. But that does not make
it less serious, I think.Though there actually is a report here:
https://bugzilla.kernel.org/show_bug.cgi?id=196443And there are reports for Fedora from July:
https://bugzilla.redhat.com/show_bug.cgi?id=1473242
and CentOS:
https://bugs.centos.org/view.php?id=13964
and we internally track several dozens reports for RHEL bug
https://bugzilla.redhat.com/show_bug.cgi?id=1525121Link: http://lkml.kernel.org/r/0485727b2e82da7efbce5f6ba42524b429d0391a.1520011945.git.neelx@redhat.com
Fixes: b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns where possible")
Signed-off-by: Daniel Vacek
Cc: Mel Gorman
Cc: Michal Hocko
Cc: Paul Burton
Cc: Pavel Tatashin
Cc: Vlastimil Babka
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This is just a cleanup. It aids handling the special end case in the
next commit.[akpm@linux-foundation.org: make it work against current -linus, not against -mm]
[akpm@linux-foundation.org: make it work against current -linus, not against -mm some more]
Link: http://lkml.kernel.org/r/1ca478d4269125a99bcfb1ca04d7b88ac1aee924.1520011944.git.neelx@redhat.com
Signed-off-by: Daniel Vacek
Cc: Michal Hocko
Cc: Vlastimil Babka
Cc: Mel Gorman
Cc: Pavel Tatashin
Cc: Paul Burton
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds