Eric Lee / smarc-fsl-linux-kernel

21 May, 2019

2 commits

ec8f24b7f treewide: Add SPDX license identifier - Makefile/Kconfig ... Browse Code »

Add SPDX license identifiers to all Make/Kconfig files which:

- Have no license information of any form

These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:

GPL-2.0-only

Signed-off-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2019-05-21 16:50:46 +0800
457c89965 treewide: Add SPDX license identifier for missed files ... Browse Code »

Add SPDX license identifiers to all files which:

- Have no license information of any form

- Have EXPORT_.*_SYMBOL_GPL inside which was used in the
initial scan/conversion to ignore the file

These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:

GPL-2.0-only

Signed-off-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2019-05-21 16:50:45 +0800

19 May, 2019

1 commit

5d59aa8f9 initramfs: don't free a non-existent initrd ... Browse Code »

Since commit 54c7a8916a88 ("initramfs: free initrd memory if opening
/initrd.image fails"), the kernel has unconditionally attempted to free
the initrd even if it doesn't exist.

In the non-existent case this causes a boot-time splat if
CONFIG_DEBUG_VIRTUAL is enabled due to a call to virt_to_phys() with a
NULL address.

Instead we should check that the initrd actually exists and only attempt
to free it if it does.

Link: http://lkml.kernel.org/r/20190516143125.48948-1-steven.price@arm.com
Fixes: 54c7a8916a88 ("initramfs: free initrd memory if opening /initrd.image fails")
Signed-off-by: Steven Price
Reported-by: Mark Rutland
Tested-by: Mark Rutland
Reviewed-by: Mike Rapoport
Cc: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Steven Price
2019-05-19 06:52:26 +0800

15 May, 2019

10 commits

e900a918b mm: shuffle initial free memory to improve memory-side-cache utilization ... Browse Code »

Patch series "mm: Randomize free memory", v10.

This patch (of 3):

Randomization of the page allocator improves the average utilization of
a direct-mapped memory-side-cache. Memory side caching is a platform
capability that Linux has been previously exposed to in HPC
(high-performance computing) environments on specialty platforms. In
that instance it was a smaller pool of high-bandwidth-memory relative to
higher-capacity / lower-bandwidth DRAM. Now, this capability is going
to be found on general purpose server platforms where DRAM is a cache in
front of higher latency persistent memory [1].

Robert offered an explanation of the state of the art of Linux
interactions with memory-side-caches [2], and I copy it here:

It's been a problem in the HPC space:
http://www.nersc.gov/research-and-development/knl-cache-mode-performance-coe/

A kernel module called zonesort is available to try to help:
https://software.intel.com/en-us/articles/xeon-phi-software

and this abandoned patch series proposed that for the kernel:
https://lkml.kernel.org/r/20170823100205.17311-1-lukasz.daniluk@intel.com

Dan's patch series doesn't attempt to ensure buffers won't conflict, but
also reduces the chance that the buffers will. This will make performance
more consistent, albeit slower than "optimal" (which is near impossible
to attain in a general-purpose kernel). That's better than forcing
users to deploy remedies like:
"To eliminate this gradual degradation, we have added a Stream
measurement to the Node Health Check that follows each job;
nodes are rebooted whenever their measured memory bandwidth
falls below 300 GB/s."

A replacement for zonesort was merged upstream in commit cc9aec03e58f
("x86/numa_emulation: Introduce uniform split capability"). With this
numa_emulation capability, memory can be split into cache sized
("near-memory" sized) numa nodes. A bind operation to such a node, and
disabling workloads on other nodes, enables full cache performance.
However, once the workload exceeds the cache size then cache conflicts
are unavoidable. While HPC environments might be able to tolerate
time-scheduling of cache sized workloads, for general purpose server
platforms, the oversubscribed cache case will be the common case.

The worst case scenario is that a server system owner benchmarks a
workload at boot with an un-contended cache only to see that performance
degrade over time, even below the average cache performance due to
excessive conflicts. Randomization clips the peaks and fills in the
valleys of cache utilization to yield steady average performance.

Here are some performance impact details of the patches:

1/ An Intel internal synthetic memory bandwidth measurement tool, saw a
3X speedup in a contrived case that tries to force cache conflicts.
The contrived cased used the numa_emulation capability to force an
instance of the benchmark to be run in two of the near-memory sized
numa nodes. If both instances were placed on the same emulated they
would fit and cause zero conflicts. While on separate emulated nodes
without randomization they underutilized the cache and conflicted
unnecessarily due to the in-order allocation per node.

2/ A well known Java server application benchmark was run with a heap
size that exceeded cache size by 3X. The cache conflict rate was 8%
for the first run and degraded to 21% after page allocator aging. With
randomization enabled the rate levelled out at 11%.

3/ A MongoDB workload did not observe measurable difference in
cache-conflict rates, but the overall throughput dropped by 7% with
randomization in one case.

4/ Mel Gorman ran his suite of performance workloads with randomization
enabled on platforms without a memory-side-cache and saw a mix of some
improvements and some losses [3].

While there is potentially significant improvement for applications that
depend on low latency access across a wide working-set, the performance
may be negligible to negative for other workloads. For this reason the
shuffle capability defaults to off unless a direct-mapped
memory-side-cache is detected. Even then, the page_alloc.shuffle=0
parameter can be specified to disable the randomization on those systems.

Outside of memory-side-cache utilization concerns there is potentially
security benefit from randomization. Some data exfiltration and
return-oriented-programming attacks rely on the ability to infer the
location of sensitive data objects. The kernel page allocator, especially
early in system boot, has predictable first-in-first out behavior for
physical pages. Pages are freed in physical address order when first
onlined.

Quoting Kees:
"While we already have a base-address randomization
(CONFIG_RANDOMIZE_MEMORY), attacks against the same hardware and
memory layouts would certainly be using the predictability of
allocation ordering (i.e. for attacks where the base address isn't
important: only the relative positions between allocated memory).
This is common in lots of heap-style attacks. They try to gain
control over ordering by spraying allocations, etc.

I'd really like to see this because it gives us something similar
to CONFIG_SLAB_FREELIST_RANDOM but for the page allocator."

While SLAB_FREELIST_RANDOM reduces the predictability of some local slab
caches it leaves vast bulk of memory to be predictably in order allocated.
However, it should be noted, the concrete security benefits are hard to
quantify, and no known CVE is mitigated by this randomization.

Introduce shuffle_free_memory(), and its helper shuffle_zone(), to perform
a Fisher-Yates shuffle of the page allocator 'free_area' lists when they
are initially populated with free memory at boot and at hotplug time. Do
this based on either the presence of a page_alloc.shuffle=Y command line
parameter, or autodetection of a memory-side-cache (to be added in a
follow-on patch).

The shuffling is done in terms of CONFIG_SHUFFLE_PAGE_ORDER sized free
pages where the default CONFIG_SHUFFLE_PAGE_ORDER is MAX_ORDER-1 i.e. 10,
4MB this trades off randomization granularity for time spent shuffling.
MAX_ORDER-1 was chosen to be minimally invasive to the page allocator
while still showing memory-side cache behavior improvements, and the
expectation that the security implications of finer granularity
randomization is mitigated by CONFIG_SLAB_FREELIST_RANDOM. The
performance impact of the shuffling appears to be in the noise compared to
other memory initialization work.

This initial randomization can be undone over time so a follow-on patch is
introduced to inject entropy on page free decisions. It is reasonable to
ask if the page free entropy is sufficient, but it is not enough due to
the in-order initial freeing of pages. At the start of that process
putting page1 in front or behind page0 still keeps them close together,
page2 is still near page1 and has a high chance of being adjacent. As
more pages are added ordering diversity improves, but there is still high
page locality for the low address pages and this leads to no significant
impact to the cache conflict rate.

[1]: https://itpeernetwork.intel.com/intel-optane-dc-persistent-memory-operating-modes/
[2]: https://lkml.kernel.org/r/AT5PR8401MB1169D656C8B5E121752FC0F8AB120@AT5PR8401MB1169.NAMPRD84.PROD.OUTLOOK.COM
[3]: https://lkml.org/lkml/2018/10/12/309

[dan.j.williams@intel.com: fix shuffle enable]
Link: http://lkml.kernel.org/r/154943713038.3858443.4125180191382062871.stgit@dwillia2-desk3.amr.corp.intel.com
[cai@lca.pw: fix SHUFFLE_PAGE_ALLOCATOR help texts]
Link: http://lkml.kernel.org/r/20190425201300.75650-1-cai@lca.pw
Link: http://lkml.kernel.org/r/154899811738.3165233.12325692939590944259.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams
Signed-off-by: Qian Cai
Reviewed-by: Kees Cook
Acked-by: Michal Hocko
Cc: Dave Hansen
Cc: Keith Busch
Cc: Robert Elliott
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dan Williams
2019-05-15 10:52:48 +0800
f40399992 init: free_initmem: poison freed init memory ... Browse Code »

Various architectures including x86 poison the freed init memory. Do the
same in the generic free_initmem implementation and switch sparc32
architecture that is identical to the generic code over to it now.

Link: http://lkml.kernel.org/r/1550515285-17446-4-git-send-email-rppt@linux.ibm.com
Signed-off-by: Mike Rapoport
Reviewed-by: Andrew Morton
Cc: Christoph Hellwig
Cc: Palmer Dabbelt
Cc: Richard Kuo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mike Rapoport
2019-05-15 00:47:47 +0800
997aef68a init: provide a generic free_initmem implementation ... Browse Code »

Patch series "provide a generic free_initmem implementation", v2.

Many architectures implement free_initmem() in exactly the same or very
similar way: they wrap the call to free_initmem_default() with sometimes
different 'poison' parameter.

These patches switch those architectures to use a generic implementation
that does free_initmem_default(POISON_FREE_INITMEM).

This was inspired by Christoph's patches for free_initrd_mem [1] and I
shamelessly copied changelog entries from his patches :)

[1] https://lore.kernel.org/lkml/20190213174621.29297-1-hch@lst.de/

This patch (of 2):

For most architectures free_initmem just a wrapper for the same
free_initmem_default(-1) call. Provide that as a generic implementation
marked __weak.

Link: http://lkml.kernel.org/r/1550515285-17446-2-git-send-email-rppt@linux.ibm.com
Signed-off-by: Mike Rapoport
Reviewed-by: Andrew Morton
Cc: Christoph Hellwig
Cc: Palmer Dabbelt
Cc: Richard Kuo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mike Rapoport
2019-05-15 00:47:47 +0800
f94f7434c initramfs: poison freed initrd memory ... Browse Code »

Various architectures including x86 poison the freed initrd memory. Do
the same in the generic free_initrd_mem implementation and switch a few
more architectures that are identical to the generic code over to it now.

Link: http://lkml.kernel.org/r/20190213174621.29297-9-hch@lst.de
Signed-off-by: Christoph Hellwig
Acked-by: Mike Rapoport
Cc: Catalin Marinas [arm64]
Cc: Geert Uytterhoeven [m68k]
Cc: Steven Price
Cc: Alexander Viro
Cc: Guan Xuetao
Cc: Russell King
Cc: Will Deacon
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2019-05-15 00:47:47 +0800
4afd58e14 initramfs: provide a generic free_initrd_mem implementation ... Browse Code »

For most architectures free_initrd_mem just expands to the same
free_reserved_area call. Provide that as a generic implementation marked
__weak.

Link: http://lkml.kernel.org/r/20190213174621.29297-8-hch@lst.de
Signed-off-by: Christoph Hellwig
Acked-by: Geert Uytterhoeven [m68k]
Acked-by: Mike Rapoport
Cc: Catalin Marinas [arm64]
Cc: Steven Price
Cc: Alexander Viro
Cc: Guan Xuetao
Cc: Russell King
Cc: Will Deacon
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2019-05-15 00:47:47 +0800
d8ae8a376 initramfs: move the legacy keepinitrd parameter to core code ... Browse Code »

No need to handle the freeing disable in arch code when we already have a
core hook (and a different name for the option) for it.

Link: http://lkml.kernel.org/r/20190213174621.29297-7-hch@lst.de
Signed-off-by: Christoph Hellwig
Acked-by: Catalin Marinas [arm64]
Acked-by: Mike Rapoport
Cc: Geert Uytterhoeven [m68k]
Cc: Steven Price
Cc: Alexander Viro
Cc: Guan Xuetao
Cc: Russell King
Cc: Will Deacon
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2019-05-15 00:47:47 +0800
afef7889c initramfs: cleanup populate_rootfs ... Browse Code »

The code for kernels that support ramdisks or not is mostly the same.
Unify it by using an IS_ENABLED for the info message, and moving the error
message into a stub for populate_initrd_image.

[cai@lca.pw: fix a compilation error]
Link: http://lkml.kernel.org/r/20190328014806.36375-1-cai@lca.pw
Link: http://lkml.kernel.org/r/20190213174621.29297-6-hch@lst.de
Signed-off-by: Christoph Hellwig
Signed-off-by: Qian Cai
Acked-by: Mike Rapoport
Cc: Catalin Marinas [arm64]
Cc: Geert Uytterhoeven [m68k]
Cc: Steven Price
Cc: Alexander Viro
Cc: Guan Xuetao
Cc: Russell King
Cc: Will Deacon
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2019-05-15 00:47:47 +0800
7c184ecd2 initramfs: factor out a helper to populate the initrd image ... Browse Code »

This will allow for cleaner code sharing in the caller.

Link: http://lkml.kernel.org/r/20190213174621.29297-5-hch@lst.de
Signed-off-by: Christoph Hellwig
Acked-by: Mike Rapoport
Cc: Catalin Marinas [arm64]
Cc: Geert Uytterhoeven [m68k]
Cc: Steven Price
Cc: Alexander Viro
Cc: Guan Xuetao
Cc: Russell King
Cc: Will Deacon
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2019-05-15 00:47:47 +0800
23091e287 initramfs: cleanup initrd freeing ... Browse Code »

Factor the kexec logic into a separate helper, and then inline the rest of
free_initrd into the only caller.

Link: http://lkml.kernel.org/r/20190213174621.29297-4-hch@lst.de
Signed-off-by: Christoph Hellwig
Acked-by: Mike Rapoport
Cc: Catalin Marinas [arm64]
Cc: Geert Uytterhoeven [m68k]
Cc: Steven Price
Cc: Alexander Viro
Cc: Guan Xuetao
Cc: Russell King
Cc: Will Deacon
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2019-05-15 00:47:47 +0800
54c7a8916 initramfs: free initrd memory if opening /initrd.image fails ... Browse Code »

Patch series "initramfs tidyups".

I've spent some time chasing down behavior in initramfs and found
plenty of opportunity to improve the code. A first stab on that is
contained in this series.

This patch (of 7):

We free the initrd memory for all successful or error cases except for the
case where opening /initrd.image fails, which looks like an oversight.

Steven said:

: This also changes the behaviour when CONFIG_INITRAMFS_FORCE is enabled
: - specifically it means that the initrd is freed (previously it was
: ignored and never freed). But that seems like reasonable behaviour and
: the previous behaviour looks like another oversight.

Link: http://lkml.kernel.org/r/20190213174621.29297-3-hch@lst.de
Signed-off-by: Christoph Hellwig
Reviewed-by: Steven Price
Acked-by: Mike Rapoport
Cc: Catalin Marinas [arm64]
Cc: Geert Uytterhoeven [m68k]
Cc: Alexander Viro
Cc: Russell King
Cc: Will Deacon
Cc: Guan Xuetao
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2019-05-15 00:47:47 +0800

08 May, 2019

4 commits

dd5001e21 Merge tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random ... Browse Code »

Pull randomness updates from Ted Ts'o:

- initialize the random driver earler

- fix CRNG initialization when we trust the CPU's RNG on NUMA systems

- other miscellaneous cleanups and fixes.

* tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random:
random: add a spinlock_t to struct batched_entropy
random: document get_random_int() family
random: fix CRNG initialization when random.trust_cpu=1
random: move rand_initialize() earlier
random: only read from /dev/random after its pool has received 128 bits
drivers/char/random.c: make primary_crng static
drivers/char/random.c: remove unused stuct poolinfo::poolbits
drivers/char/random.c: constify poolinfo_table

Linus Torvalds
2019-05-08 12:42:23 +0800
cf482a49a Merge tag 'driver-core-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core ... Browse Code »

Pull driver core/kobject updates from Greg KH:
"Here is the "big" set of driver core patches for 5.2-rc1

There are a number of ACPI patches in here as well, as Rafael said
they should go through this tree due to the driver core changes they
required. They have all been acked by the ACPI developers.

There are also a number of small subsystem-specific changes in here,
due to some changes to the kobject core code. Those too have all been
acked by the various subsystem maintainers.

As for content, it's pretty boring outside of the ACPI changes:
- spdx cleanups
- kobject documentation updates
- default attribute groups for kobjects
- other minor kobject/driver core fixes

All have been in linux-next for a while with no reported issues"

* tag 'driver-core-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (47 commits)
kobject: clean up the kobject add documentation a bit more
kobject: Fix kernel-doc comment first line
kobject: Remove docstring reference to kset
firmware_loader: Fix a typo ("syfs" -> "sysfs")
kobject: fix dereference before null check on kobj
Revert "driver core: platform: Fix the usage of platform device name(pdev->name)"
init/config: Do not select BUILD_BIN2C for IKCONFIG
Provide in-kernel headers to make extending kernel easier
kobject: Improve doc clarity kobject_init_and_add()
kobject: Improve docs for kobject_add/del
driver core: platform: Fix the usage of platform device name(pdev->name)
livepatch: Replace klp_ktype_patch's default_attrs with groups
cpufreq: schedutil: Replace default_attrs field with groups
padata: Replace padata_attr_type default_attrs field with groups
irqdesc: Replace irq_kobj_type's default_attrs field with groups
net-sysfs: Replace ktype default_attrs field with groups
block: Replace all ktype default_attrs with groups
samples/kobject: Replace foo_ktype's default_attrs field with groups
kobject: Add support for default attribute groups to kobj_type
driver core: Postpone DMA tear-down until after devres release for probe failure
...

Linus Torvalds
2019-05-08 04:01:40 +0800
eac7078a0 Merge tag 'pidfd-v5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux ... Browse Code »

Pull pidfd updates from Christian Brauner:
"This patchset makes it possible to retrieve pidfds at process creation
time by introducing the new flag CLONE_PIDFD to the clone() system
call. Linus originally suggested to implement this as a new flag to
clone() instead of making it a separate system call.

After a thorough review from Oleg CLONE_PIDFD returns pidfds in the
parent_tidptr argument. This means we can give back the associated pid
and the pidfd at the same time. Access to process metadata information
thus becomes rather trivial.

As has been agreed, CLONE_PIDFD creates file descriptors based on
anonymous inodes similar to the new mount api. They are made
unconditional by this patchset as they are now needed by core kernel
code (vfs, pidfd) even more than they already were before (timerfd,
signalfd, io_uring, epoll etc.). The core patchset is rather small.
The bulky looking changelist is caused by David's very simple changes
to Kconfig to make anon inodes unconditional.

A pidfd comes with additional information in fdinfo if the kernel
supports procfs. The fdinfo file contains the pid of the process in
the callers pid namespace in the same format as the procfs status
file, i.e. "Pid:\t%d".

To remove worries about missing metadata access this patchset comes
with a sample/test program that illustrates how a combination of
CLONE_PIDFD and pidfd_send_signal() can be used to gain race-free
access to process metadata through /proc/.

Further work based on this patchset has been done by Joel. His work
makes pidfds pollable. It finished too late for this merge window. I
would prefer to have it sitting in linux-next for a while and send it
for inclusion during the 5.3 merge window"

* tag 'pidfd-v5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
samples: show race-free pidfd metadata access
signal: support CLONE_PIDFD with pidfd_send_signal
clone: add CLONE_PIDFD
Make anon_inodes unconditional

Linus Torvalds
2019-05-08 03:30:24 +0800
096862191 Merge tag 'printk-for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk ... Browse Code »

Pull printk updates from Petr Mladek:

- Allow state reset of printk_once() calls.

- Prevent crashes when dereferencing invalid pointers in vsprintf().
Only the first byte is checked for simplicity.

- Make vsprintf warnings consistent and inlined.

- Treewide conversion of obsolete %pf, %pF to %ps, %pF printf
modifiers.

- Some clean up of vsprintf and test_printf code.

* tag 'printk-for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk:
lib/vsprintf: Make function pointer_string static
vsprintf: Limit the length of inlined error messages
vsprintf: Avoid confusion between invalid address and value
vsprintf: Prevent crash when dereferencing invalid pointers
vsprintf: Consolidate handling of unknown pointer specifiers
vsprintf: Factor out %pO handler as kobject_string()
vsprintf: Factor out %pV handler as va_format()
vsprintf: Factor out %p[iI] handler as ip_addr_string()
vsprintf: Do not check address of well-known strings
vsprintf: Consistent %pK handling for kptr_restrict == 0
vsprintf: Shuffle restricted_pointer()
printk: Tie printk_once / printk_deferred_once into .data.once for reset
treewide: Switch printk users from %pf and %pF to %ps and %pS, respectively
lib/test_printf: Switch to bitmap_zalloc()

Linus Torvalds
2019-05-08 00:18:12 +0800

06 May, 2019

1 commit

caa841360 x86/mm: Initialize PGD cache during mm initialization ... Browse Code »

Poking-mm initialization might require to duplicate the PGD in early
stage. Initialize the PGD cache earlier to prevent boot failures.

Reported-by: kernel test robot
Signed-off-by: Nadav Amit
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Dave Hansen
Cc: H. Peter Anvin
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Rick Edgecombe
Cc: Rik van Riel
Cc: Stephen Rothwell
Cc: Thomas Gleixner
Fixes: 4fc19708b165 ("x86/alternatives: Initialize temporary mm for patching")
Link: http://lkml.kernel.org/r/20190505011124.39692-1-namit@vmware.com
Signed-off-by: Ingo Molnar

Nadav Amit
2019-05-06 02:32:46 +0800

30 Apr, 2019

1 commit

4fc19708b x86/alternatives: Initialize temporary mm for patching ... Browse Code »

To prevent improper use of the PTEs that are used for text patching, the
next patches will use a temporary mm struct. Initailize it by copying
the init mm.

The address that will be used for patching is taken from the lower area
that is usually used for the task memory. Doing so prevents the need to
frequently synchronize the temporary-mm (e.g., when BPF programs are
installed), since different PGDs are used for the task memory.

Finally, randomize the address of the PTEs to harden against exploits
that use these PTEs.

Suggested-by: Andy Lutomirski
Tested-by: Masami Hiramatsu
Signed-off-by: Nadav Amit
Signed-off-by: Rick Edgecombe
Signed-off-by: Peter Zijlstra (Intel)
Reviewed-by: Masami Hiramatsu
Cc: Borislav Petkov
Cc: Dave Hansen
Cc: H. Peter Anvin
Cc: Kees Cook
Cc: Linus Torvalds
Cc: Rik van Riel
Cc: Thomas Gleixner
Cc: akpm@linux-foundation.org
Cc: ard.biesheuvel@linaro.org
Cc: deneen.t.dock@intel.com
Cc: kernel-hardening@lists.openwall.com
Cc: kristen@linux.intel.com
Cc: linux_dti@icloud.com
Cc: will.deacon@arm.com
Link: https://lkml.kernel.org/r/20190426232303.28381-8-nadav.amit@gmail.com
Signed-off-by: Ingo Molnar

Nadav Amit
2019-04-30 18:37:52 +0800

29 Apr, 2019

2 commits

bc0c60457 init/config: Do not select BUILD_BIN2C for IKCONFIG ... Browse Code »

Since commit 13610aa908dc ("kernel/configs: use .incbin directive to
embed config_data.gz"), IKCONFIG no longer uses BUILD_BIN2C so prevent
it from being selected in Kconfig.

Reviewed-by: Masahiro Yamada
Signed-off-by: Joel Fernandes (Google)
Signed-off-by: Greg Kroah-Hartman

Joel Fernandes (Google)
2019-04-29 22:48:04 +0800
43d8ce9d6 Provide in-kernel headers to make extending kernel easier ... Browse Code »

Introduce in-kernel headers which are made available as an archive
through proc (/proc/kheaders.tar.xz file). This archive makes it
possible to run eBPF and other tracing programs that need to extend the
kernel for tracing purposes without any dependency on the file system
having headers.

A github PR is sent for the corresponding BCC patch at:
https://github.com/iovisor/bcc/pull/2312

On Android and embedded systems, it is common to switch kernels but not
have kernel headers available on the file system. Further once a
different kernel is booted, any headers stored on the file system will
no longer be useful. This is an issue even well known to distros.
By storing the headers as a compressed archive within the kernel, we can
avoid these issues that have been a hindrance for a long time.

The best way to use this feature is by building it in. Several users
have a need for this, when they switch debug kernels, they do not want to
update the filesystem or worry about it where to store the headers on
it. However, the feature is also buildable as a module in case the user
desires it not being part of the kernel image. This makes it possible to
load and unload the headers from memory on demand. A tracing program can
load the module, do its operations, and then unload the module to save
kernel memory. The total memory needed is 3.3MB.

By having the archive available at a fixed location independent of
filesystem dependencies and conventions, all debugging tools can
directly refer to the fixed location for the archive, without concerning
with where the headers on a typical filesystem which significantly
simplifies tooling that needs kernel headers.

The code to read the headers is based on /proc/config.gz code and uses
the same technique to embed the headers.

Other approaches were discussed such as having an in-memory mountable
filesystem, but that has drawbacks such as requiring an in-kernel xz
decompressor which we don't have today, and requiring usage of 42 MB of
kernel memory to host the decompressed headers at anytime. Also this
approach is simpler than such approaches.

Reviewed-by: Masahiro Yamada
Signed-off-by: Joel Fernandes (Google)
Signed-off-by: Greg Kroah-Hartman

Joel Fernandes (Google)
2019-04-29 22:48:03 +0800

20 Apr, 2019

2 commits

d55535232 random: move rand_initialize() earlier ... Browse Code »

Right now rand_initialize() is run as an early_initcall(), but it only
depends on timekeeping_init() (for mixing ktime_get_real() into the
pools). However, the call to boot_init_stack_canary() for stack canary
initialization runs earlier, which triggers a warning at boot:

random: get_random_bytes called from start_kernel+0x357/0x548 with crng_init=0

Instead, this moves rand_initialize() to after timekeeping_init(), and moves
canary initialization here as well.

Note that this warning may still remain for machines that do not have
UEFI RNG support (which initializes the RNG pools during setup_arch()),
or for x86 machines without RDRAND (or booting without "random.trust=on"
or CONFIG_RANDOM_TRUST_CPU=y).

Signed-off-by: Kees Cook
Signed-off-by: Theodore Ts'o

Kees Cook
2019-04-20 11:27:05 +0800
6041186a3 init: initialize jump labels before command line option parsing ... Browse Code »

When a module option, or core kernel argument, toggles a static-key it
requires jump labels to be initialized early. While x86, PowerPC, and
ARM64 arrange for jump_label_init() to be called before parse_args(),
ARM does not.

Kernel command line: rdinit=/sbin/init page_alloc.shuffle=1 panic=-1 console=ttyAMA0,115200 page_alloc.shuffle=1
------------[ cut here ]------------
WARNING: CPU: 0 PID: 0 at ./include/linux/jump_label.h:303
page_alloc_shuffle+0x12c/0x1ac
static_key_enable(): static key 'page_alloc_shuffle_key+0x0/0x4' used
before call to jump_label_init()
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted
5.1.0-rc4-next-20190410-00003-g3367c36ce744 #1
Hardware name: ARM Integrator/CP (Device Tree)
[] (unwind_backtrace) from [] (show_stack+0x10/0x18)
[] (show_stack) from [] (dump_stack+0x18/0x24)
[] (dump_stack) from [] (__warn+0xe0/0x108)
[] (__warn) from [] (warn_slowpath_fmt+0x44/0x6c)
[] (warn_slowpath_fmt) from []
(page_alloc_shuffle+0x12c/0x1ac)
[] (page_alloc_shuffle) from [] (shuffle_store+0x28/0x48)
[] (shuffle_store) from [] (parse_args+0x1f4/0x350)
[] (parse_args) from [] (start_kernel+0x1c0/0x488)

Move the fallback call to jump_label_init() to occur before
parse_args().

The redundant calls to jump_label_init() in other archs are left intact
in case they have static key toggling use cases that are even earlier
than option parsing.

Link: http://lkml.kernel.org/r/155544804466.1032396.13418949511615676665.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams
Reported-by: Guenter Roeck
Reviewed-by: Kees Cook
Cc: Mathieu Desnoyers
Cc: Thomas Gleixner
Cc: Mike Rapoport
Cc: Russell King
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dan Williams
2019-04-20 00:46:05 +0800

19 Apr, 2019

1 commit

5dd50aaeb Make anon_inodes unconditional ... Browse Code »

Make the anon_inodes facility unconditional so that it can be used by core
VFS code and pidfd code.

Signed-off-by: David Howells
Signed-off-by: Al Viro
[christian@brauner.io: adapt commit message to mention pidfds]
Signed-off-by: Christian Brauner

David Howells
2019-04-19 20:03:11 +0800

09 Apr, 2019

1 commit

d75f773c8 treewide: Switch printk users from %pf and %pF to %ps and %pS, respectively ... Browse Code »

%pF and %pf are functionally equivalent to %pS and %ps conversion
specifiers. The former are deprecated, therefore switch the current users
to use the preferred variant.

The changes have been produced by the following command:

git grep -l '%p[fF]' | grep -v '^$tools\|Documentation$/' | \
while read i; do perl -i -pe 's/%pf/%ps/g; s/%pF/%pS/g;' $i; done

And verifying the result.

Link: http://lkml.kernel.org/r/20190325193229.23390-1-sakari.ailus@linux.intel.com
Cc: Andy Shevchenko
Cc: linux-arm-kernel@lists.infradead.org
Cc: sparclinux@vger.kernel.org
Cc: linux-um@lists.infradead.org
Cc: xen-devel@lists.xenproject.org
Cc: linux-acpi@vger.kernel.org
Cc: linux-pm@vger.kernel.org
Cc: drbd-dev@lists.linbit.com
Cc: linux-block@vger.kernel.org
Cc: linux-mmc@vger.kernel.org
Cc: linux-nvdimm@lists.01.org
Cc: linux-pci@vger.kernel.org
Cc: linux-scsi@vger.kernel.org
Cc: linux-btrfs@vger.kernel.org
Cc: linux-f2fs-devel@lists.sourceforge.net
Cc: linux-mm@kvack.org
Cc: ceph-devel@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Sakari Ailus
Acked-by: David Sterba (for btrfs)
Acked-by: Mike Rapoport (for mm/memblock.c)
Acked-by: Bjorn Helgaas (for drivers/pci)
Acked-by: Rafael J. Wysocki
Signed-off-by: Petr Mladek

Sakari Ailus
2019-04-09 20:19:06 +0800

13 Mar, 2019

1 commit

f5c7310ac init/main: add checks for the return value of memblock_alloc*() ... Browse Code »

Add panic() calls if memblock_alloc() returns NULL.

The panic() format duplicates the one used by memblock itself and in
order to avoid explosion with long parameters list replace open coded
allocation size calculations with a local variable.

Link: http://lkml.kernel.org/r/1548057848-15136-18-git-send-email-rppt@linux.ibm.com
Signed-off-by: Mike Rapoport
Cc: Catalin Marinas
Cc: Christophe Leroy
Cc: Christoph Hellwig
Cc: "David S. Miller"
Cc: Dennis Zhou
Cc: Geert Uytterhoeven
Cc: Greentime Hu
Cc: Greg Kroah-Hartman
Cc: Guan Xuetao
Cc: Guo Ren
Cc: Guo Ren [c-sky]
Cc: Heiko Carstens
Cc: Juergen Gross [Xen]
Cc: Mark Salter
Cc: Matt Turner
Cc: Max Filippov
Cc: Michael Ellerman
Cc: Michal Simek
Cc: Paul Burton
Cc: Petr Mladek
Cc: Richard Weinberger
Cc: Rich Felker
Cc: Rob Herring
Cc: Rob Herring
Cc: Russell King
Cc: Stafford Horne
Cc: Tony Luck
Cc: Vineet Gupta
Cc: Yoshinori Sato
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mike Rapoport
2019-03-13 01:04:02 +0800

11 Mar, 2019

2 commits

ffd602eb4 Merge tag 'kbuild-v5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild ... Browse Code »

Pull Kbuild updates from Masahiro Yamada:

- do not generate unneeded top-level built-in.a

- let git ignore O= directory entirely

- optimize scripts/kallsyms slightly

- exclude DWARF info from *.s regardless of config options

- fix GCC toolchain search path for Clang to prepare ld.lld support

- do not generate modules.order when CONFIG_MODULES is disabled

- simplify single target rules and remove VPATH for external module
build

- allow to add optional flags to dpkg-buildpackage when building
deb-pkg

- move some compiler option tests from Makefile to Kconfig

- various Makefile cleanups

* tag 'kbuild-v5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (40 commits)
kbuild: remove scripts/basic/% build target
kbuild: use -Werror=implicit-... instead of -Werror-implicit-...
kbuild: clean up scripts/gcc-version.sh
kbuild: remove cc-version macro
kbuild: update comment block of scripts/clang-version.sh
kbuild: remove commented-out INITRD_COMPRESS
kbuild: move -gsplit-dwarf, -gdwarf-4 option tests to Kconfig
kbuild: [bin]deb-pkg: add DPKG_FLAGS variable
kbuild: move ".config not found!" message from Kconfig to Makefile
kbuild: invoke syncconfig if include/config/auto.conf.cmd is missing
kbuild: simplify single target rules
kbuild: remove empty rules for makefiles
kbuild: make -r/-R effective in top Makefile for old Make versions
kbuild: move tools_silent to a more relevant place
kbuild: compute false-positive -Wmaybe-uninitialized cases in Kconfig
kbuild: refactor cc-cross-prefix implementation
kbuild: hardcode genksyms path and remove GENKSYMS variable
scripts/gdb: refactor rules for symlink creation
kbuild: create symlink to vmlinux-gdb.py in scripts_gdb target
scripts/gdb: do not descend into scripts/gdb from scripts
...

Linus Torvalds
2019-03-11 08:48:21 +0800
a15f6b923 Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull timer fix from Thomas Gleixner:
"A single fix to prevent a unmet dependencies warning in Kconfig"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
time: Make VIRT_CPU_ACCOUNTING_GEN depend on GENERIC_CLOCKEVENTS

Linus Torvalds
2019-03-11 04:58:33 +0800

09 Mar, 2019

1 commit

38e7571c0 Merge tag 'io_uring-2019-03-06' of git://git.kernel.dk/linux-block ... Browse Code »

Pull io_uring IO interface from Jens Axboe:
"Second attempt at adding the io_uring interface.

Since the first one, we've added basic unit testing of the three
system calls, that resides in liburing like the other unit tests that
we have so far. It'll take a while to get full coverage of it, but
we're working towards it. I've also added two basic test programs to
tools/io_uring. One uses the raw interface and has support for all the
various features that io_uring supports outside of standard IO, like
fixed files, fixed IO buffers, and polled IO. The other uses the
liburing API, and is a simplified version of cp(1).

This adds support for a new IO interface, io_uring.

io_uring allows an application to communicate with the kernel through
two rings, the submission queue (SQ) and completion queue (CQ) ring.
This allows for very efficient handling of IOs, see the v5 posting for
some basic numbers:

https://lore.kernel.org/linux-block/20190116175003.17880-1-axboe@kernel.dk/

Outside of just efficiency, the interface is also flexible and
extendable, and allows for future use cases like the upcoming NVMe
key-value store API, networked IO, and so on. It also supports async
buffered IO, something that we've always failed to support in the
kernel.

Outside of basic IO features, it supports async polled IO as well.
This particular feature has already been tested at Facebook months ago
for flash storage boxes, with 25-33% improvements. It makes polled IO
actually useful for real world use cases, where even basic flash sees
a nice win in terms of efficiency, latency, and performance. These
boxes were IOPS bound before, now they are not.

This series adds three new system calls. One for setting up an
io_uring instance (io_uring_setup(2)), one for submitting/completing
IO (io_uring_enter(2)), and one for aux functions like registrating
file sets, buffers, etc (io_uring_register(2)). Through the help of
Arnd, I've coordinated the syscall numbers so merge on that front
should be painless.

Jon did a writeup of the interface a while back, which (except for
minor details that have been tweaked) is still accurate. Find that
here:

https://lwn.net/Articles/776703/

Huge thanks to Al Viro for helping getting the reference cycle code
correct, and to Jann Horn for his extensive reviews focused on both
security and bugs in general.

There's a userspace library that provides basic functionality for
applications that don't need or want to care about how to fiddle with
the rings directly. It has helpers to allow applications to easily set
up an io_uring instance, and submit/complete IO through it without
knowing about the intricacies of the rings. It also includes man pages
(thanks to Jeff Moyer), and will continue to grow support helper
functions and features as time progresses. Find it here:

git://git.kernel.dk/liburing

Fio has full support for the raw interface, both in the form of an IO
engine (io_uring), but also with a small test application (t/io_uring)
that can exercise and benchmark the interface"

* tag 'io_uring-2019-03-06' of git://git.kernel.dk/linux-block:
io_uring: add a few test tools
io_uring: allow workqueue item to handle multiple buffered requests
io_uring: add support for IORING_OP_POLL
io_uring: add io_kiocb ref count
io_uring: add submission polling
io_uring: add file set registration
net: split out functions related to registering inflight socket files
io_uring: add support for pre-mapped user IO buffers
block: implement bio helper to add iter bvec pages to bio
io_uring: batch io_kiocb allocation
io_uring: use fget/fput_many() for file references
fs: add fget_many() and fput_many()
io_uring: support for IO polling
io_uring: add fsync support
Add io_uring IO interface

Linus Torvalds
2019-03-09 06:48:40 +0800

08 Mar, 2019

3 commits

b5dd0c658 Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge more updates from Andrew Morton:

- some of the rest of MM

- various misc things

- dynamic-debug updates

- checkpatch

- some epoll speedups

- autofs

- rapidio

- lib/, lib/lzo/ updates

* emailed patches from Andrew Morton : (83 commits)
samples/mic/mpssd/mpssd.h: remove duplicate header
kernel/fork.c: remove duplicated include
include/linux/relay.h: fix percpu annotation in struct rchan
arch/nios2/mm/fault.c: remove duplicate include
unicore32: stop printing the virtual memory layout
MAINTAINERS: fix GTA02 entry and mark as orphan
mm: create the new vm_fault_t type
arm, s390, unicore32: remove oneliner wrappers for memblock_alloc()
arch: simplify several early memory allocations
openrisc: simplify pte_alloc_one_kernel()
sh: prefer memblock APIs returning virtual address
microblaze: prefer memblock API returning virtual address
powerpc: prefer memblock APIs returning virtual address
lib/lzo: separate lzo-rle from lzo
lib/lzo: implement run-length encoding
lib/lzo: fast 8-byte copy on arm64
lib/lzo: 64-bit CTZ on arm64
lib/lzo: tidy-up ifdefs
ipc/sem.c: replace kvmalloc/memset with kvzalloc and use struct_size
ipc: annotate implicit fall through
...

Linus Torvalds
2019-03-08 11:25:37 +0800
e5eed351f init/initramfs.c: provide more details in error messages ... Browse Code »

Use distinct error messages when archive decompression failed.

Link: http://lkml.kernel.org/r/20190212075635.7373-1-david.engraf@sysgo.com
Signed-off-by: David Engraf
Reviewed-by: Andrew Morton
Tested-by: Andy Shevchenko
Cc: Dominik Brodowski
Cc: Greg Kroah-Hartman
Cc: Philippe Ombredanne
Cc: Arnd Bergmann
Cc: Luc Van Oostenryck
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Engraf
2019-03-08 10:32:02 +0800
be37f21a0 Merge tag 'audit-pr-20190305' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit ... Browse Code »

Pull audit updates from Paul Moore:
"A lucky 13 audit patches for v5.1.

Despite the rather large diffstat, most of the changes are from two
bug fix patches that move code from one Kconfig option to another.

Beyond that bit of churn, the remaining changes are largely cleanups
and bug-fixes as we slowly march towards container auditing. It isn't
all boring though, we do have a couple of new things: file
capabilities v3 support, and expanded support for filtering on
filesystems to solve problems with remote filesystems.

All changes pass the audit-testsuite. Please merge for v5.1"

* tag 'audit-pr-20190305' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
audit: mark expected switch fall-through
audit: hide auditsc_get_stamp and audit_serial prototypes
audit: join tty records to their syscall
audit: remove audit_context when CONFIG_ AUDIT and not AUDITSYSCALL
audit: remove unused actx param from audit_rule_match
audit: ignore fcaps on umount
audit: clean up AUDITSYSCALL prototypes and stubs
audit: more filter PATH records keyed on filesystem magic
audit: add support for fcaps v3
audit: move loginuid and sessionid from CONFIG_AUDITSYSCALL to CONFIG_AUDIT
audit: add syscall information to CONFIG_CHANGE records
audit: hand taken context to audit_kill_trees for syscall logging
audit: give a clue what CONFIG_CHANGE op was involved

Linus Torvalds
2019-03-08 04:20:11 +0800

07 Mar, 2019

3 commits

041a15744 time: Make VIRT_CPU_ACCOUNTING_GEN depend on GENERIC_CLOCKEVENTS ... Browse Code »

Moving the CONTEXT_TRACKING Kconfig option into kernel/time/Kconfig added
an implicit dependency on the surrounding GENERIC_CLOCKEVENTS option, but
this is not always enabled when it is possible to select
VIRT_CPU_ACCOUNTING_GEN:

WARNING: unmet direct dependencies detected for CONTEXT_TRACKING
Depends on [n]: GENERIC_CLOCKEVENTS [=n]
Selected by [y]:
- VIRT_CPU_ACCOUNTING_GEN [=y] && && HAVE_CONTEXT_TRACKING [=y] && HAVE_VIRT_CPU_ACCOUNTING_GEN [=y]

Platforms without GENERIC_CLOCKEVENTS are rare enough so that corner case
can be just ignored. Make it a dependency for VIRT_CPU_ACCOUNTING_GEN to
simplify the configuration.

Fixes: a4cffdad7314 ("time: Move CONTEXT_TRACKING to kernel/time/Kconfig")
Signed-off-by: Arnd Bergmann
Signed-off-by: Thomas Gleixner
Cc: "Paul E . McKenney"
Cc: Frederic Weisbecker
Link: https://lkml.kernel.org/r/20190304200202.1163250-1-arnd@arndb.de

Arnd Bergmann
2019-03-07 03:43:08 +0800
8dcd175bc Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge misc updates from Andrew Morton:

- a few misc things

- ocfs2 updates

- most of MM

* emailed patches from Andrew Morton : (159 commits)
tools/testing/selftests/proc/proc-self-syscall.c: remove duplicate include
proc: more robust bulk read test
proc: test /proc/*/maps, smaps, smaps_rollup, statm
proc: use seq_puts() everywhere
proc: read kernel cpu stat pointer once
proc: remove unused argument in proc_pid_lookup()
fs/proc/thread_self.c: code cleanup for proc_setup_thread_self()
fs/proc/self.c: code cleanup for proc_setup_self()
proc: return exit code 4 for skipped tests
mm,mremap: bail out earlier in mremap_to under map pressure
mm/sparse: fix a bad comparison
mm/memory.c: do_fault: avoid usage of stale vm_area_struct
writeback: fix inode cgroup switching comment
mm/huge_memory.c: fix "orig_pud" set but not used
mm/hotplug: fix an imbalance with DEBUG_PAGEALLOC
mm/memcontrol.c: fix bad line in comment
mm/cma.c: cma_declare_contiguous: correct err handling
mm/page_ext.c: fix an imbalance with kmemleak
mm/compaction: pass pgdat to too_many_isolated() instead of zone
mm: remove zone_lru_lock() function, access ->lru_lock directly
...

Linus Torvalds
2019-03-07 02:31:36 +0800
45802da05 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull scheduler updates from Ingo Molnar:
"The main changes in this cycle were:

- refcount conversions

- Solve the rq->leaf_cfs_rq_list can of worms for real.

- improve power-aware scheduling

- add sysctl knob for Energy Aware Scheduling

- documentation updates

- misc other changes"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits)
kthread: Do not use TIMER_IRQSAFE
kthread: Convert worker lock to raw spinlock
sched/fair: Use non-atomic cpumask_{set,clear}_cpu()
sched/fair: Remove unused 'sd' parameter from select_idle_smt()
sched/wait: Use freezable_schedule() when possible
sched/fair: Prune, fix and simplify the nohz_balancer_kick() comment block
sched/fair: Explain LLC nohz kick condition
sched/fair: Simplify nohz_balancer_kick()
sched/topology: Fix percpu data types in struct sd_data & struct s_data
sched/fair: Simplify post_init_entity_util_avg() by calling it with a task_struct pointer argument
sched/fair: Fix O(nr_cgroups) in the load balancing path
sched/fair: Optimize update_blocked_averages()
sched/fair: Fix insertion in rq->leaf_cfs_rq_list
sched/fair: Add tmp_alone_branch assertion
sched/core: Use READ_ONCE()/WRITE_ONCE() in move_queued_task()/task_rq_lock()
sched/debug: Initialize sd_sysctl_cpus if !CONFIG_CPUMASK_OFFSTACK
sched/pelt: Skip updating util_est when utilization is higher than CPU's capacity
sched/fair: Update scale invariance of PELT
sched/fair: Move the rq_of() helper function
sched/core: Convert task_struct.stack_refcount to refcount_t
...

Linus Torvalds
2019-03-07 00:14:05 +0800

06 Mar, 2019

1 commit

98fa15f34 mm: replace all open encodings for NUMA_NO_NODE ... Browse Code »

Patch series "Replace all open encodings for NUMA_NO_NODE", v3.

All these places for replacement were found by running the following
grep patterns on the entire kernel code. Please let me know if this
might have missed some instances. This might also have replaced some
false positives. I will appreciate suggestions, inputs and review.

1. git grep "nid == -1"
2. git grep "node == -1"
3. git grep "nid = -1"
4. git grep "node = -1"

This patch (of 2):

At present there are multiple places where invalid node number is
encoded as -1. Even though implicitly understood it is always better to
have macros in there. Replace these open encodings for an invalid node
number with the global macro NUMA_NO_NODE. This helps remove NUMA
related assumptions like 'invalid node' from various places redirecting
them to a common definition.

Link: http://lkml.kernel.org/r/1545127933-10711-2-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual
Reviewed-by: David Hildenbrand
Acked-by: Jeff Kirsher [ixgbe]
Acked-by: Jens Axboe [mtip32xx]
Acked-by: Vinod Koul [dmaengine.c]
Acked-by: Michael Ellerman [powerpc]
Acked-by: Doug Ledford [drivers/infiniband]
Cc: Joseph Qi
Cc: Hans Verkuil
Cc: Stephen Rothwell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Anshuman Khandual
2019-03-06 13:07:14 +0800

04 Mar, 2019

1 commit

fa7295ab6 kbuild: clean up scripts/gcc-version.sh ... Browse Code »

Now that the Kconfig is the only user of this script, we can drop
unneeded code.

Remove the -p option, and stop prepending the output with zero,
so that Kconfig can directly use the output from this script.

Signed-off-by: Masahiro Yamada

Masahiro Yamada
2019-03-04 21:35:04 +0800

28 Feb, 2019

1 commit

2b188cc1b Add io_uring IO interface ... Browse Code »

The submission queue (SQ) and completion queue (CQ) rings are shared
between the application and the kernel. This eliminates the need to
copy data back and forth to submit and complete IO.

IO submissions use the io_uring_sqe data structure, and completions
are generated in the form of io_uring_cqe data structures. The SQ
ring is an index into the io_uring_sqe array, which makes it possible
to submit a batch of IOs without them being contiguous in the ring.
The CQ ring is always contiguous, as completion events are inherently
unordered, and hence any io_uring_cqe entry can point back to an
arbitrary submission.

Two new system calls are added for this:

io_uring_setup(entries, params)
Sets up an io_uring instance for doing async IO. On success,
returns a file descriptor that the application can mmap to
gain access to the SQ ring, CQ ring, and io_uring_sqes.

io_uring_enter(fd, to_submit, min_complete, flags, sigset, sigsetsize)
Initiates IO against the rings mapped to this fd, or waits for
them to complete, or both. The behavior is controlled by the
parameters passed in. If 'to_submit' is non-zero, then we'll
try and submit new IO. If IORING_ENTER_GETEVENTS is set, the
kernel will wait for 'min_complete' events, if they aren't
already available. It's valid to set IORING_ENTER_GETEVENTS
and 'min_complete' == 0 at the same time, this allows the
kernel to return already completed events without waiting
for them. This is useful only for polling, as for IRQ
driven IO, the application can just check the CQ ring
without entering the kernel.

With this setup, it's possible to do async IO with a single system
call. Future developments will enable polled IO with this interface,
and polled submission as well. The latter will enable an application
to do IO without doing ANY system calls at all.

For IRQ driven IO, an application only needs to enter the kernel for
completions if it wants to wait for them to occur.

Each io_uring is backed by a workqueue, to support buffered async IO
as well. We will only punt to an async context if the command would
need to wait for IO on the device side. Any data that can be accessed
directly in the page cache is done inline. This avoids the slowness
issue of usual threadpools, since cached data is accessed as quickly
as a sync interface.

Sample application: http://git.kernel.dk/cgit/fio/plain/t/io_uring.c

Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe

Jens Axboe
2019-02-28 23:24:23 +0800

27 Feb, 2019

1 commit

b303c6df8 kbuild: compute false-positive -Wmaybe-uninitialized cases in Kconfig ... Browse Code »

Since -Wmaybe-uninitialized was introduced by GCC 4.7, we have patched
various false positives:

- commit e74fc973b6e5 ("Turn off -Wmaybe-uninitialized when building
with -Os") turned off this option for -Os.

- commit 815eb71e7149 ("Kbuild: disable 'maybe-uninitialized' warning
for CONFIG_PROFILE_ALL_BRANCHES") turned off this option for
CONFIG_PROFILE_ALL_BRANCHES

- commit a76bcf557ef4 ("Kbuild: enable -Wmaybe-uninitialized warning
for "make W=1"") turned off this option for GCC < 4.9
Arnd provided more explanation in https://lkml.org/lkml/2017/3/14/903

I think this looks better by shifting the logic from Makefile to Kconfig.

Link: https://github.com/ClangBuiltLinux/linux/issues/350
Signed-off-by: Masahiro Yamada
Reviewed-by: Nathan Chancellor
Tested-by: Nick Desaulniers

Masahiro Yamada
2019-02-27 20:43:20 +0800

22 Feb, 2019

1 commit

a841c673f revert "initramfs: cleanup incomplete rootfs" ... Browse Code »

Revert ff1522bb7d9845 ("initramfs: cleanup incomplete rootfs").

Andy reports

: This breaks my setup where I have U-boot provided more size of initramfs
: than needed. This allows a bit of flexibility to increase or decrease
: initramfs compressed image without taking care of bootloader. The proper
: solution is to do this if we sure that we didn't get enough memory,
: otherwise I can't consider the error fatal to clean up rootfs.

Fixes: ff1522bb7d9845 ("initramfs: cleanup incomplete rootfs")
Reported-by: Andy Shevchenko
Tested-by: Andy Shevchenko
Cc: David Engraf
Cc: Dominik Brodowski
Cc: Greg Kroah-Hartman
Cc: Philippe Ombredanne
Cc: Arnd Bergmann
Cc: Luc Van Oostenryck
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2019-02-22 01:00:59 +0800