Eric Lee / smarc-fsl-linux-kernel

06 May, 2019

1 commit

caa841360 x86/mm: Initialize PGD cache during mm initialization ... Browse Code »

Poking-mm initialization might require to duplicate the PGD in early
stage. Initialize the PGD cache earlier to prevent boot failures.

Reported-by: kernel test robot
Signed-off-by: Nadav Amit
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Dave Hansen
Cc: H. Peter Anvin
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Rick Edgecombe
Cc: Rik van Riel
Cc: Stephen Rothwell
Cc: Thomas Gleixner
Fixes: 4fc19708b165 ("x86/alternatives: Initialize temporary mm for patching")
Link: http://lkml.kernel.org/r/20190505011124.39692-1-namit@vmware.com
Signed-off-by: Ingo Molnar

Nadav Amit
2019-05-06 02:32:46 +0800

30 Apr, 2019

1 commit

4fc19708b x86/alternatives: Initialize temporary mm for patching ... Browse Code »

To prevent improper use of the PTEs that are used for text patching, the
next patches will use a temporary mm struct. Initailize it by copying
the init mm.

The address that will be used for patching is taken from the lower area
that is usually used for the task memory. Doing so prevents the need to
frequently synchronize the temporary-mm (e.g., when BPF programs are
installed), since different PGDs are used for the task memory.

Finally, randomize the address of the PTEs to harden against exploits
that use these PTEs.

Suggested-by: Andy Lutomirski
Tested-by: Masami Hiramatsu
Signed-off-by: Nadav Amit
Signed-off-by: Rick Edgecombe
Signed-off-by: Peter Zijlstra (Intel)
Reviewed-by: Masami Hiramatsu
Cc: Borislav Petkov
Cc: Dave Hansen
Cc: H. Peter Anvin
Cc: Kees Cook
Cc: Linus Torvalds
Cc: Rik van Riel
Cc: Thomas Gleixner
Cc: akpm@linux-foundation.org
Cc: ard.biesheuvel@linaro.org
Cc: deneen.t.dock@intel.com
Cc: kernel-hardening@lists.openwall.com
Cc: kristen@linux.intel.com
Cc: linux_dti@icloud.com
Cc: will.deacon@arm.com
Link: https://lkml.kernel.org/r/20190426232303.28381-8-nadav.amit@gmail.com
Signed-off-by: Ingo Molnar

Nadav Amit
2019-04-30 18:37:52 +0800

20 Apr, 2019

1 commit

6041186a3 init: initialize jump labels before command line option parsing ... Browse Code »

When a module option, or core kernel argument, toggles a static-key it
requires jump labels to be initialized early. While x86, PowerPC, and
ARM64 arrange for jump_label_init() to be called before parse_args(),
ARM does not.

Kernel command line: rdinit=/sbin/init page_alloc.shuffle=1 panic=-1 console=ttyAMA0,115200 page_alloc.shuffle=1
------------[ cut here ]------------
WARNING: CPU: 0 PID: 0 at ./include/linux/jump_label.h:303
page_alloc_shuffle+0x12c/0x1ac
static_key_enable(): static key 'page_alloc_shuffle_key+0x0/0x4' used
before call to jump_label_init()
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted
5.1.0-rc4-next-20190410-00003-g3367c36ce744 #1
Hardware name: ARM Integrator/CP (Device Tree)
[] (unwind_backtrace) from [] (show_stack+0x10/0x18)
[] (show_stack) from [] (dump_stack+0x18/0x24)
[] (dump_stack) from [] (__warn+0xe0/0x108)
[] (__warn) from [] (warn_slowpath_fmt+0x44/0x6c)
[] (warn_slowpath_fmt) from []
(page_alloc_shuffle+0x12c/0x1ac)
[] (page_alloc_shuffle) from [] (shuffle_store+0x28/0x48)
[] (shuffle_store) from [] (parse_args+0x1f4/0x350)
[] (parse_args) from [] (start_kernel+0x1c0/0x488)

Move the fallback call to jump_label_init() to occur before
parse_args().

The redundant calls to jump_label_init() in other archs are left intact
in case they have static key toggling use cases that are even earlier
than option parsing.

Link: http://lkml.kernel.org/r/155544804466.1032396.13418949511615676665.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams
Reported-by: Guenter Roeck
Reviewed-by: Kees Cook
Cc: Mathieu Desnoyers
Cc: Thomas Gleixner
Cc: Mike Rapoport
Cc: Russell King
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dan Williams
2019-04-20 00:46:05 +0800

13 Mar, 2019

1 commit

f5c7310ac init/main: add checks for the return value of memblock_alloc*() ... Browse Code »

Add panic() calls if memblock_alloc() returns NULL.

The panic() format duplicates the one used by memblock itself and in
order to avoid explosion with long parameters list replace open coded
allocation size calculations with a local variable.

Link: http://lkml.kernel.org/r/1548057848-15136-18-git-send-email-rppt@linux.ibm.com
Signed-off-by: Mike Rapoport
Cc: Catalin Marinas
Cc: Christophe Leroy
Cc: Christoph Hellwig
Cc: "David S. Miller"
Cc: Dennis Zhou
Cc: Geert Uytterhoeven
Cc: Greentime Hu
Cc: Greg Kroah-Hartman
Cc: Guan Xuetao
Cc: Guo Ren
Cc: Guo Ren [c-sky]
Cc: Heiko Carstens
Cc: Juergen Gross [Xen]
Cc: Mark Salter
Cc: Matt Turner
Cc: Max Filippov
Cc: Michael Ellerman
Cc: Michal Simek
Cc: Paul Burton
Cc: Petr Mladek
Cc: Richard Weinberger
Cc: Rich Felker
Cc: Rob Herring
Cc: Rob Herring
Cc: Russell King
Cc: Stafford Horne
Cc: Tony Luck
Cc: Vineet Gupta
Cc: Yoshinori Sato
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mike Rapoport
2019-03-13 01:04:02 +0800

13 Feb, 2019

1 commit

2f1ee0913 Revert "mm: use early_pfn_to_nid in page_ext_init" ... Browse Code »

This reverts commit fe53ca54270a ("mm: use early_pfn_to_nid in
page_ext_init").

When booting a system with "page_owner=on",

start_kernel
page_ext_init
invoke_init_callbacks
init_section_page_ext
init_page_owner
init_early_allocated_pages
init_zones_in_node
init_pages_in_zone
lookup_page_ext
page_to_nid

The issue here is that page_to_nid() will not work since some page flags
have no node information until later in page_alloc_init_late() due to
DEFERRED_STRUCT_PAGE_INIT. Hence, it could trigger an out-of-bounds
access with an invalid nid.

UBSAN: Undefined behaviour in ./include/linux/mm.h:1104:50
index 7 is out of range for type 'zone [5]'

Also, kernel will panic since flags were poisoned earlier with,

CONFIG_DEBUG_VM_PGFLAGS=y
CONFIG_NODE_NOT_IN_PAGE_FLAGS=n

start_kernel
setup_arch
pagetable_init
paging_init
sparse_init
sparse_init_nid
memblock_alloc_try_nid_raw

It did not handle it well in init_pages_in_zone() which ends up calling
page_to_nid().

page:ffffea0004200000 is uninitialized and poisoned
raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
page_owner info is not active (free page?)
kernel BUG at include/linux/mm.h:990!
RIP: 0010:init_page_owner+0x486/0x520

This means that assumptions behind commit fe53ca54270a ("mm: use
early_pfn_to_nid in page_ext_init") are incomplete. Therefore, revert
the commit for now. A proper way to move the page_owner initialization
to sooner is to hook into memmap initialization.

Link: http://lkml.kernel.org/r/20190115202812.75820-1-cai@lca.pw
Signed-off-by: Qian Cai
Acked-by: Michal Hocko
Cc: Pasha Tatashin
Cc: Mel Gorman
Cc: Yang Shi
Cc: Joonsoo Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Qian Cai
2019-02-13 08:33:18 +0800

05 Jan, 2019

2 commits

fb5bf3172 fork: fix some -Wmissing-prototypes warnings ... Browse Code »

We get a warning when building kernel with W=1:

kernel/fork.c:167:13: warning: no previous prototype for `arch_release_thread_stack' [-Wmissing-prototypes]
kernel/fork.c:779:13: warning: no previous prototype for `fork_init' [-Wmissing-prototypes]

Add the missing declaration in head file to fix this.

Also, remove arch_release_thread_stack() completely because no arch
seems to implement it since bb9d81264 (arch: remove tile port).

Link: http://lkml.kernel.org/r/1542170087-23645-1-git-send-email-wang.yi59@zte.com.cn
Signed-off-by: Yi Wang
Acked-by: Michal Hocko
Acked-by: Mike Rapoport
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Yi Wang
2019-01-05 05:13:47 +0800
7c8f71935 init/main.c: make "initcall_level_names[]" const char * ... Browse Code »

Initcall names should not be changed.

Link: http://lkml.kernel.org/r/20181124091829.GD10969@avx2
Signed-off-by: Alexey Dobriyan
Reviewed-by: Andrew Morton
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2019-01-05 05:13:46 +0800

29 Dec, 2018

3 commits

f346b0bec Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge misc updates from Andrew Morton:

- large KASAN update to use arm's "software tag-based mode"

- a few misc things

- sh updates

- ocfs2 updates

- just about all of MM

* emailed patches from Andrew Morton : (167 commits)
kernel/fork.c: mark 'stack_vm_area' with __maybe_unused
memcg, oom: notify on oom killer invocation from the charge path
mm, swap: fix swapoff with KSM pages
include/linux/gfp.h: fix typo
mm/hmm: fix memremap.h, move dev_page_fault_t callback to hmm
hugetlbfs: Use i_mmap_rwsem to fix page fault/truncate race
hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization
memory_hotplug: add missing newlines to debugging output
mm: remove __hugepage_set_anon_rmap()
include/linux/vmstat.h: remove unused page state adjustment macro
mm/page_alloc.c: allow error injection
mm: migrate: drop unused argument of migrate_page_move_mapping()
blkdev: avoid migration stalls for blkdev pages
mm: migrate: provide buffer_migrate_page_norefs()
mm: migrate: move migrate_page_lock_buffers()
mm: migrate: lock buffers before migrate_page_move_mapping()
mm: migration: factor out code to compute expected number of page references
mm, page_alloc: enable pcpu_drain with zone capability
kmemleak: add config to select auto scan
mm/page_alloc.c: don't call kasan_free_pages() at deferred mem init
...

Linus Torvalds
2018-12-29 08:55:46 +0800
0e9da3fbf Merge tag 'for-4.21/block-20181221' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block updates from Jens Axboe:
"This is the main pull request for block/storage for 4.21.

Larger than usual, it was a busy round with lots of goodies queued up.
Most notable is the removal of the old IO stack, which has been a long
time coming. No new features for a while, everything coming in this
week has all been fixes for things that were previously merged.

This contains:

- Use atomic counters instead of semaphores for mtip32xx (Arnd)

- Cleanup of the mtip32xx request setup (Christoph)

- Fix for circular locking dependency in loop (Jan, Tetsuo)

- bcache (Coly, Guoju, Shenghui)
* Optimizations for writeback caching
* Various fixes and improvements

- nvme (Chaitanya, Christoph, Sagi, Jay, me, Keith)
* host and target support for NVMe over TCP
* Error log page support
* Support for separate read/write/poll queues
* Much improved polling
* discard OOM fallback
* Tracepoint improvements

- lightnvm (Hans, Hua, Igor, Matias, Javier)
* Igor added packed metadata to pblk. Now drives without metadata
per LBA can be used as well.
* Fix from Geert on uninitialized value on chunk metadata reads.
* Fixes from Hans and Javier to pblk recovery and write path.
* Fix from Hua Su to fix a race condition in the pblk recovery
code.
* Scan optimization added to pblk recovery from Zhoujie.
* Small geometry cleanup from me.

- Conversion of the last few drivers that used the legacy path to
blk-mq (me)

- Removal of legacy IO path in SCSI (me, Christoph)

- Removal of legacy IO stack and schedulers (me)

- Support for much better polling, now without interrupts at all.
blk-mq adds support for multiple queue maps, which enables us to
have a map per type. This in turn enables nvme to have separate
completion queues for polling, which can then be interrupt-less.
Also means we're ready for async polled IO, which is hopefully
coming in the next release.

- Killing of (now) unused block exports (Christoph)

- Unification of the blk-rq-qos and blk-wbt wait handling (Josef)

- Support for zoned testing with null_blk (Masato)

- sx8 conversion to per-host tag sets (Christoph)

- IO priority improvements (Damien)

- mq-deadline zoned fix (Damien)

- Ref count blkcg series (Dennis)

- Lots of blk-mq improvements and speedups (me)

- sbitmap scalability improvements (me)

- Make core inflight IO accounting per-cpu (Mikulas)

- Export timeout setting in sysfs (Weiping)

- Cleanup the direct issue path (Jianchao)

- Export blk-wbt internals in block debugfs for easier debugging
(Ming)

- Lots of other fixes and improvements"

* tag 'for-4.21/block-20181221' of git://git.kernel.dk/linux-block: (364 commits)
kyber: use sbitmap add_wait_queue/list_del wait helpers
sbitmap: add helpers for add/del wait queue handling
block: save irq state in blkg_lookup_create()
dm: don't reuse bio for flushes
nvme-pci: trace SQ status on completions
nvme-rdma: implement polling queue map
nvme-fabrics: allow user to pass in nr_poll_queues
nvme-fabrics: allow nvmf_connect_io_queue to poll
nvme-core: optionally poll sync commands
block: make request_to_qc_t public
nvme-tcp: fix spelling mistake "attepmpt" -> "attempt"
nvme-tcp: fix endianess annotations
nvmet-tcp: fix endianess annotations
nvme-pci: refactor nvme_poll_irqdisable to make sparse happy
nvme-pci: only set nr_maps to 2 if poll queues are supported
nvmet: use a macro for default error location
nvmet: fix comparison of a u16 with -1
blk-mq: enable IO poll if .nr_queues of type poll > 0
blk-mq: change blk_mq_queue_busy() to blk_mq_queue_inflight()
blk-mq: skip zero-queue maps in blk_mq_map_swqueue
...

Linus Torvalds
2018-12-29 05:19:59 +0800
a9ee3a63d debugobjects: call debug_objects_mem_init eariler ... Browse Code »

The current value of the early boot static pool size, 1024 is not big
enough for systems with large number of CPUs with timer or/and workqueue
objects selected. As the results, systems have 60+ CPUs with both timer
and workqueue objects enabled could trigger "ODEBUG: Out of memory.
ODEBUG disabled".

Some debug objects are allocated during the early boot. Enabling some
options like timers or workqueue objects may increase the size required
significantly with large number of CPUs. For example,

CONFIG_DEBUG_OBJECTS_TIMERS:
No. CPUs x 2 (worker pool) objects:
start_kernel
workqueue_init_early
init_worker_pool
init_timer_key
debug_object_init

plus No. CPUs objects (CONFIG_HIGH_RES_TIMERS):
sched_init
hrtick_rq_init
hrtimer_init

CONFIG_DEBUG_OBJECTS_WORK:
No. CPUs objects:
vmalloc_init
__init_work

plus No. CPUs x 6 (workqueue) objects:
workqueue_init_early
alloc_workqueue
__alloc_workqueue_key
alloc_and_link_pwqs
init_pwq

Also, plus No. CPUs objects:
perf_event_init
__init_srcu_struct
init_srcu_struct_fields
init_srcu_struct_nodes
__init_work

However, none of the things are actually used or required before
debug_objects_mem_init() is invoked, so just move the call right before
vmalloc_init().

According to tglx, "the reason why the call is at this place in
start_kernel() is historical. It's because back in the days when
debugobjects were added the memory allocator was enabled way later than
today."

Link: http://lkml.kernel.org/r/20181126102407.1836-1-cai@gmx.us
Signed-off-by: Qian Cai
Suggested-by: Thomas Gleixner
Cc: Waiman Long
Cc: Yang Shi
Cc: Arnd Bergmann
Cc: Catalin Marinas
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Qian Cai
2018-12-29 04:11:45 +0800

27 Dec, 2018

1 commit

684019dd1 Merge branch 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull EFI updates from Ingo Molnar:
"The main changes in this cycle were:

- Allocate the E820 buffer before doing the
GetMemoryMap/ExitBootServices dance so we don't run out of space

- Clear EFI boot services mappings when freeing the memory

- Harden efivars against callers that invoke it on non-EFI boots

- Reduce the number of memblock reservations resulting from extensive
use of the new efi_mem_reserve_persistent() API

- Other assorted fixes and cleanups"

* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/efi: Don't unmap EFI boot services code/data regions for EFI_OLD_MEMMAP and EFI_MIXED_MODE
efi: Reduce the amount of memblock reservations for persistent allocations
efi: Permit multiple entries in persistent memreserve data structure
efi/libstub: Disable some warnings for x86{,_64}
x86/efi: Move efi__boot_services() to arch/x86
x86/efi: Unmap EFI boot services code/data regions from efi_pgd
x86/mm/pageattr: Introduce helper function to unmap EFI boot services
efi/fdt: Simplify the get_fdt() flow
efi/fdt: Indentation fix
firmware/efi: Add NULL pointer checks in efivars API functions

Linus Torvalds
2018-12-27 05:38:38 +0800

30 Nov, 2018

1 commit

47c33a095 x86/efi: Move efi_<reserve/free>_boot_services() to arch/x86 ... Browse Code »

efi__boot_services() are x86 specific quirks and as such
should be in asm/efi.h, so move them from linux/efi.h. Also, call
efi_free_boot_services() from __efi_enter_virtual_mode() as it is x86
specific call and ideally shouldn't be part of init/main.c

Signed-off-by: Sai Praneeth Prakhya
Signed-off-by: Ard Biesheuvel
Acked-by: Thomas Gleixner
Cc: Andy Lutomirski
Cc: Arend van Spriel
Cc: Bhupesh Sharma
Cc: Borislav Petkov
Cc: Dave Hansen
Cc: Eric Snowberg
Cc: Hans de Goede
Cc: Joe Perches
Cc: Jon Hunter
Cc: Julien Thierry
Cc: Linus Torvalds
Cc: Marc Zyngier
Cc: Matt Fleming
Cc: Nathan Chancellor
Cc: Peter Zijlstra
Cc: Sedat Dilek
Cc: YiFei Zhu
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20181129171230.18699-7-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar

Sai Praneeth Prakhya
2018-11-30 16:10:31 +0800

28 Nov, 2018

1 commit

ba1803142 main: Replace rcu_barrier_sched() with rcu_barrier() ... Browse Code »

Now that all RCU flavors have been consolidated, rcu_barrier_sched()
is but a synonym for rcu_barrier(). This commit therefore replaces
the former with the latter.

Signed-off-by: Paul E. McKenney
Cc: Andrew Morton
Cc: "Steven Rostedt (VMware)"
Cc: Thomas Gleixner
Cc:

Paul E. McKenney
2018-11-28 01:21:41 +0800

08 Nov, 2018

1 commit

a1ce35fa4 block: remove dead elevator code ... Browse Code »

This removes a bunch of core and elevator related code. On the core
front, we remove anything related to queue running, draining,
initialization, plugging, and congestions. We also kill anything
related to request allocation, merging, retrieval, and completion.

Remove any checking for single queue IO schedulers, as they no
longer exist. This means we can also delete a bunch of code related
to request issue, adding, completion, etc - and all the SQ related
ops and helpers.

Also kill the load_default_modules(), as all that did was provide
for a way to load the default single queue elevator.

Tested-by: Ming Lei
Reviewed-by: Omar Sandoval
Signed-off-by: Jens Axboe

Jens Axboe
2018-11-08 04:42:32 +0800

31 Oct, 2018

4 commits

7e1c4e279 memblock: stop using implicit alignment to SMP_CACHE_BYTES ... Browse Code »

When a memblock allocation APIs are called with align = 0, the alignment
is implicitly set to SMP_CACHE_BYTES.

Implicit alignment is done deep in the memblock allocator and it can
come as a surprise. Not that such an alignment would be wrong even
when used incorrectly but it is better to be explicit for the sake of
clarity and the prinicple of the least surprise.

Replace all such uses of memblock APIs with the 'align' parameter
explicitly set to SMP_CACHE_BYTES and stop implicit alignment assignment
in the memblock internal allocation functions.

For the case when memblock APIs are used via helper functions, e.g. like
iommu_arena_new_node() in Alpha, the helper functions were detected with
Coccinelle's help and then manually examined and updated where
appropriate.

The direct memblock APIs users were updated using the semantic patch below:

@@
expression size, min_addr, max_addr, nid;
@@
(
|
- memblock_alloc_try_nid_raw(size, 0, min_addr, max_addr, nid)
+ memblock_alloc_try_nid_raw(size, SMP_CACHE_BYTES, min_addr, max_addr,
nid)
|
- memblock_alloc_try_nid_nopanic(size, 0, min_addr, max_addr, nid)
+ memblock_alloc_try_nid_nopanic(size, SMP_CACHE_BYTES, min_addr, max_addr,
nid)
|
- memblock_alloc_try_nid(size, 0, min_addr, max_addr, nid)
+ memblock_alloc_try_nid(size, SMP_CACHE_BYTES, min_addr, max_addr, nid)
|
- memblock_alloc(size, 0)
+ memblock_alloc(size, SMP_CACHE_BYTES)
|
- memblock_alloc_raw(size, 0)
+ memblock_alloc_raw(size, SMP_CACHE_BYTES)
|
- memblock_alloc_from(size, 0, min_addr)
+ memblock_alloc_from(size, SMP_CACHE_BYTES, min_addr)
|
- memblock_alloc_nopanic(size, 0)
+ memblock_alloc_nopanic(size, SMP_CACHE_BYTES)
|
- memblock_alloc_low(size, 0)
+ memblock_alloc_low(size, SMP_CACHE_BYTES)
|
- memblock_alloc_low_nopanic(size, 0)
+ memblock_alloc_low_nopanic(size, SMP_CACHE_BYTES)
|
- memblock_alloc_from_nopanic(size, 0, min_addr)
+ memblock_alloc_from_nopanic(size, SMP_CACHE_BYTES, min_addr)
|
- memblock_alloc_node(size, 0, nid)
+ memblock_alloc_node(size, SMP_CACHE_BYTES, nid)
)

[mhocko@suse.com: changelog update]
[akpm@linux-foundation.org: coding-style fixes]
[rppt@linux.ibm.com: fix missed uses of implicit alignment]
Link: http://lkml.kernel.org/r/20181016133656.GA10925@rapoport-lnx
Link: http://lkml.kernel.org/r/1538687224-17535-1-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport
Suggested-by: Michal Hocko
Acked-by: Paul Burton [MIPS]
Acked-by: Michael Ellerman [powerpc]
Acked-by: Michal Hocko
Cc: Catalin Marinas
Cc: Chris Zankel
Cc: Geert Uytterhoeven
Cc: Guan Xuetao
Cc: Ingo Molnar
Cc: Matt Turner
Cc: Michal Simek
Cc: Richard Weinberger
Cc: Russell King
Cc: Thomas Gleixner
Cc: Tony Luck
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mike Rapoport
2018-10-31 23:54:16 +0800
57c8a661d mm: remove include/linux/bootmem.h ... Browse Code »

Move remaining definitions and declarations from include/linux/bootmem.h
into include/linux/memblock.h and remove the redundant header.

The includes were replaced with the semantic patch below and then
semi-automated removal of duplicated '#include

@@
@@
- #include
+ #include

[sfr@canb.auug.org.au: dma-direct: fix up for the removal of linux/bootmem.h]
Link: http://lkml.kernel.org/r/20181002185342.133d1680@canb.auug.org.au
[sfr@canb.auug.org.au: powerpc: fix up for removal of linux/bootmem.h]
Link: http://lkml.kernel.org/r/20181005161406.73ef8727@canb.auug.org.au
[sfr@canb.auug.org.au: x86/kaslr, ACPI/NUMA: fix for linux/bootmem.h removal]
Link: http://lkml.kernel.org/r/20181008190341.5e396491@canb.auug.org.au
Link: http://lkml.kernel.org/r/1536927045-23536-30-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport
Signed-off-by: Stephen Rothwell
Acked-by: Michal Hocko
Cc: Catalin Marinas
Cc: Chris Zankel
Cc: "David S. Miller"
Cc: Geert Uytterhoeven
Cc: Greentime Hu
Cc: Greg Kroah-Hartman
Cc: Guan Xuetao
Cc: Ingo Molnar
Cc: "James E.J. Bottomley"
Cc: Jonas Bonn
Cc: Jonathan Corbet
Cc: Ley Foon Tan
Cc: Mark Salter
Cc: Martin Schwidefsky
Cc: Matt Turner
Cc: Michael Ellerman
Cc: Michal Simek
Cc: Palmer Dabbelt
Cc: Paul Burton
Cc: Richard Kuo
Cc: Richard Weinberger
Cc: Rich Felker
Cc: Russell King
Cc: Serge Semin
Cc: Thomas Gleixner
Cc: Tony Luck
Cc: Vineet Gupta
Cc: Yoshinori Sato
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mike Rapoport
2018-10-31 23:54:16 +0800
2a5bda5a6 memblock: replace alloc_bootmem with memblock_alloc ... Browse Code »

The alloc_bootmem(size) is a shortcut for allocation of SMP_CACHE_BYTES
aligned memory. When the align parameter of memblock_alloc() is 0, the
alignment is implicitly set to SMP_CACHE_BYTES and thus alloc_bootmem(size)
and memblock_alloc(size, 0) are equivalent.

The conversion is done using the following semantic patch:

@@
expression size;
@@
- alloc_bootmem(size)
+ memblock_alloc(size, 0)

Link: http://lkml.kernel.org/r/1536927045-23536-22-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport
Cc: Catalin Marinas
Cc: Chris Zankel
Cc: "David S. Miller"
Cc: Geert Uytterhoeven
Cc: Greentime Hu
Cc: Greg Kroah-Hartman
Cc: Guan Xuetao
Cc: Ingo Molnar
Cc: "James E.J. Bottomley"
Cc: Jonas Bonn
Cc: Jonathan Corbet
Cc: Ley Foon Tan
Cc: Mark Salter
Cc: Martin Schwidefsky
Cc: Matt Turner
Cc: Michael Ellerman
Cc: Michal Hocko
Cc: Michal Simek
Cc: Palmer Dabbelt
Cc: Paul Burton
Cc: Richard Kuo
Cc: Richard Weinberger
Cc: Rich Felker
Cc: Russell King
Cc: Serge Semin
Cc: Thomas Gleixner
Cc: Tony Luck
Cc: Vineet Gupta
Cc: Yoshinori Sato
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mike Rapoport
2018-10-31 23:54:16 +0800
eb31d559f memblock: remove _virt from APIs returning virtual address ... Browse Code »

The conversion is done using

sed -i 's@memblock_virt_alloc@memblock_alloc@g' \
$(git grep -l memblock_virt_alloc)

Link: http://lkml.kernel.org/r/1536927045-23536-8-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport
Cc: Catalin Marinas
Cc: Chris Zankel
Cc: "David S. Miller"
Cc: Geert Uytterhoeven
Cc: Greentime Hu
Cc: Greg Kroah-Hartman
Cc: Guan Xuetao
Cc: Ingo Molnar
Cc: "James E.J. Bottomley"
Cc: Jonas Bonn
Cc: Jonathan Corbet
Cc: Ley Foon Tan
Cc: Mark Salter
Cc: Martin Schwidefsky
Cc: Matt Turner
Cc: Michael Ellerman
Cc: Michal Hocko
Cc: Michal Simek
Cc: Palmer Dabbelt
Cc: Paul Burton
Cc: Richard Kuo
Cc: Richard Weinberger
Cc: Rich Felker
Cc: Russell King
Cc: Serge Semin
Cc: Thomas Gleixner
Cc: Tony Luck
Cc: Vineet Gupta
Cc: Yoshinori Sato
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mike Rapoport
2018-10-31 23:54:15 +0800

23 Oct, 2018

1 commit

0200fbdd4 Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull locking and misc x86 updates from Ingo Molnar:
"Lots of changes in this cycle - in part because locking/core attracted
a number of related x86 low level work which was easier to handle in a
single tree:

- Linux Kernel Memory Consistency Model updates (Alan Stern, Paul E.
McKenney, Andrea Parri)

- lockdep scalability improvements and micro-optimizations (Waiman
Long)

- rwsem improvements (Waiman Long)

- spinlock micro-optimization (Matthew Wilcox)

- qspinlocks: Provide a liveness guarantee (more fairness) on x86.
(Peter Zijlstra)

- Add support for relative references in jump tables on arm64, x86
and s390 to optimize jump labels (Ard Biesheuvel, Heiko Carstens)

- Be a lot less permissive on weird (kernel address) uaccess faults
on x86: BUG() when uaccess helpers fault on kernel addresses (Jann
Horn)

- macrofy x86 asm statements to un-confuse the GCC inliner. (Nadav
Amit)

- ... and a handful of other smaller changes as well"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (57 commits)
locking/lockdep: Make global debug_locks* variables read-mostly
locking/lockdep: Fix debug_locks off performance problem
locking/pvqspinlock: Extend node size when pvqspinlock is configured
locking/qspinlock_stat: Count instances of nested lock slowpaths
locking/qspinlock, x86: Provide liveness guarantee
x86/asm: 'Simplify' GEN_*_RMWcc() macros
locking/qspinlock: Rework some comments
locking/qspinlock: Re-order code
locking/lockdep: Remove duplicated 'lock_class_ops' percpu array
x86/defconfig: Enable CONFIG_USB_XHCI_HCD=y
futex: Replace spin_is_locked() with lockdep
locking/lockdep: Make class->ops a percpu counter and move it under CONFIG_DEBUG_LOCKDEP=y
x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs
x86/cpufeature: Macrofy inline assembly code to work around GCC inlining bugs
x86/extable: Macrofy inline assembly code to work around GCC inlining bugs
x86/paravirt: Work around GCC inlining bugs when compiling paravirt ops
x86/bug: Macrofy the BUG table section handling, to work around GCC inlining bugs
x86/alternatives: Macrofy lock prefixes to work around GCC inlining bugs
x86/refcount: Work around GCC inlining bug
x86/objtool: Use asm macros to work around GCC inlining bugs
...

Linus Torvalds
2018-10-23 20:08:53 +0800

09 Oct, 2018

1 commit

53c99bd66 init: add arch_call_rest_init to allow stack switching ... Browse Code »

With CONFIG_VMAP_STACK=y the kernel stack of all tasks should be
allocated in the vmalloc space. The initial stack used for all
the early init code is in the init_thread_union. To be able to
switch from this early stack to a properly allocated stack
from vmalloc the architecture needs a switch-over point.

Introduce the arch_call_rest_init() function with a weak definition
in init/main.c with the only purpose to call rest_init() from the
end of start_kernel(). The architecture override can then do the
necessary magic to switch to the new vmalloc'ed stack.

Signed-off-by: Martin Schwidefsky

Martin Schwidefsky
2018-10-09 17:20:55 +0800

27 Sep, 2018

1 commit

194836776 jump_label: Annotate entries that operate on __init code earlier ... Browse Code »

Jump table entries are mostly read-only, with the exception of the
init and module loader code that defuses entries that point into init
code when the code being referred to is freed.

For robustness, it would be better to move these entries into the
ro_after_init section, but clearing the 'code' member of each jump
table entry referring to init code at module load time races with the
module_enable_ro() call that remaps the ro_after_init section read
only, so we'd like to do it earlier.

So given that whether such an entry refers to init code can be decided
much earlier, we can pull this check forward. Since we may still need
the code entry at this point, let's switch to setting a low bit in the
'key' member just like we do to annotate the default state of a jump
table entry.

Signed-off-by: Ard Biesheuvel
Signed-off-by: Thomas Gleixner
Reviewed-by: Kees Cook
Acked-by: Peter Zijlstra (Intel)
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-s390@vger.kernel.org
Cc: Arnd Bergmann
Cc: Heiko Carstens
Cc: Will Deacon
Cc: Catalin Marinas
Cc: Steven Rostedt
Cc: Martin Schwidefsky
Cc: Jessica Yu
Link: https://lkml.kernel.org/r/20180919065144.25010-8-ard.biesheuvel@linaro.org

Ard Biesheuvel
2018-09-27 23:56:48 +0800

23 Aug, 2018

2 commits

3f5c15d8a init/main.c: log init process file name ... Browse Code »

Add a log message to `run_init_process()`.

This log message serves two purposes.

1. If the init process is not specified on the Linux Kernel command
line, the user sees, what file was chosen.

2. The time stamps shows exactly, when the Linux kernel handed over
control to the init process.

Link: http://lkml.kernel.org/r/b1fc97fa-4aa9-1904-ddb5-859e78995c41@molgen.mpg.de
Signed-off-by: Paul Menzel
Reviewed-by: Andrew Morton
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Menzel
2018-08-23 01:52:49 +0800
1b1eeca7e init: allow initcall tables to be emitted using relative references ... Browse Code »

Allow the initcall tables to be emitted using relative references that
are only half the size on 64-bit architectures and don't require fixups
at runtime on relocatable kernels.

Link: http://lkml.kernel.org/r/20180704083651.24360-5-ard.biesheuvel@linaro.org
Acked-by: James Morris
Acked-by: Sergey Senozhatsky
Acked-by: Petr Mladek
Acked-by: Michael Ellerman
Acked-by: Ingo Molnar
Signed-off-by: Ard Biesheuvel
Cc: Arnd Bergmann
Cc: Benjamin Herrenschmidt
Cc: Bjorn Helgaas
Cc: Catalin Marinas
Cc: James Morris
Cc: Jessica Yu
Cc: Josh Poimboeuf
Cc: Kees Cook
Cc: Nicolas Pitre
Cc: Paul Mackerras
Cc: Russell King
Cc: "Serge E. Hallyn"
Cc: Steven Rostedt
Cc: Thomas Garnier
Cc: Thomas Gleixner
Cc: Will Deacon
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ard Biesheuvel
2018-08-23 01:52:47 +0800

21 Aug, 2018

1 commit

7140ad389 Merge tag 'trace-v4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace ... Browse Code »

Pull tracing updates from Steven Rostedt:

- Restructure of lockdep and latency tracers

This is the biggest change. Joel Fernandes restructured the hooks
from irqs and preemption disabling and enabling. He got rid of a lot
of the preprocessor #ifdef mess that they caused.

He turned both lockdep and the latency tracers to use trace events
inserted in the preempt/irqs disabling paths. But unfortunately,
these started to cause issues in corner cases. Thus, parts of the
code was reverted back to where lockdep and the latency tracers just
get called directly (without using the trace events). But because the
original change cleaned up the code very nicely we kept that, as well
as the trace events for preempt and irqs disabling, but they are
limited to not being called in NMIs.

- Have trace events use SRCU for "rcu idle" calls. This was required
for the preempt/irqs off trace events. But it also had to not allow
them to be called in NMI context. Waiting till Paul makes an NMI safe
SRCU API.

- New notrace SRCU API to allow trace events to use SRCU.

- Addition of mcount-nop option support

- SPDX headers replacing GPL templates.

- Various other fixes and clean ups.

- Some fixes are marked for stable, but were not fully tested before
the merge window opened.

* tag 'trace-v4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (44 commits)
tracing: Fix SPDX format headers to use C++ style comments
tracing: Add SPDX License format tags to tracing files
tracing: Add SPDX License format to bpf_trace.c
blktrace: Add SPDX License format header
s390/ftrace: Add -mfentry and -mnop-mcount support
tracing: Add -mcount-nop option support
tracing: Avoid calling cc-option -mrecord-mcount for every Makefile
tracing: Handle CC_FLAGS_FTRACE more accurately
Uprobe: Additional argument arch_uprobe to uprobe_write_opcode()
Uprobes: Simplify uprobe_register() body
tracepoints: Free early tracepoints after RCU is initialized
uprobes: Use synchronize_rcu() not synchronize_sched()
tracing: Fix synchronizing to event changes with tracepoint_synchronize_unregister()
ftrace: Remove unused pointer ftrace_swapper_pid
tracing: More reverting of "tracing: Centralize preemptirq tracepoints and unify their usage"
tracing/irqsoff: Handle preempt_count for different configs
tracing: Partial revert of "tracing: Centralize preemptirq tracepoints and unify their usage"
tracing: irqsoff: Account for additional preempt_disable
trace: Use rcu_dereference_raw for hooks from trace-event subsystem
tracing/kprobes: Fix within_notrace_func() to check only notrace functions
...

Linus Torvalds
2018-08-21 09:32:00 +0800

14 Aug, 2018

2 commits

13e091b6d Merge branch 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull x86 timer updates from Thomas Gleixner:
"Early TSC based time stamping to allow better boot time analysis.

This comes with a general cleanup of the TSC calibration code which
grew warts and duct taping over the years and removes 250 lines of
code. Initiated and mostly implemented by Pavel with help from various
folks"

* 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits)
x86/kvmclock: Mark kvm_get_preset_lpj() as __init
x86/tsc: Consolidate init code
sched/clock: Disable interrupts when calling generic_sched_clock_init()
timekeeping: Prevent false warning when persistent clock is not available
sched/clock: Close a hole in sched_clock_init()
x86/tsc: Make use of tsc_calibrate_cpu_early()
x86/tsc: Split native_calibrate_cpu() into early and late parts
sched/clock: Use static key for sched_clock_running
sched/clock: Enable sched clock early
sched/clock: Move sched clock initialization and merge with generic clock
x86/tsc: Use TSC as sched clock early
x86/tsc: Initialize cyc2ns when tsc frequency is determined
x86/tsc: Calibrate tsc only once
ARM/time: Remove read_boot_clock64()
s390/time: Remove read_boot_clock64()
timekeeping: Default boot time offset to local_clock()
timekeeping: Replace read_boot_clock64() with read_persistent_wall_and_boot_offset()
s390/time: Add read_persistent_wall_and_boot_offset()
x86/xen/time: Output xen sched_clock time from 0
x86/xen/time: Initialize pv xen time in init_hypervisor_platform()
...

Linus Torvalds
2018-08-14 09:28:19 +0800
eac341194 Merge branch 'x86/pti' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull x86 PTI updates from Thomas Gleixner:
"The Speck brigade sadly provides yet another large set of patches
destroying the perfomance which we carefully built and preserved

- PTI support for 32bit PAE. The missing counter part to the 64bit
PTI code implemented by Joerg.

- A set of fixes for the Global Bit mechanics for non PCID CPUs which
were setting the Global Bit too widely and therefore possibly
exposing interesting memory needlessly.

- Protection against userspace-userspace SpectreRSB

- Support for the upcoming Enhanced IBRS mode, which is preferred
over IBRS. Unfortunately we dont know the performance impact of
this, but it's expected to be less horrible than the IBRS
hammering.

- Cleanups and simplifications"

* 'x86/pti' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits)
x86/mm/pti: Move user W+X check into pti_finalize()
x86/relocs: Add __end_rodata_aligned to S_REL
x86/mm/pti: Clone kernel-image on PTE level for 32 bit
x86/mm/pti: Don't clear permissions in pti_clone_pmd()
x86/mm/pti: Fix 32 bit PCID check
x86/mm/init: Remove freed kernel image areas from alias mapping
x86/mm/init: Add helper for freeing kernel image pages
x86/mm/init: Pass unconverted symbol addresses to free_init_pages()
mm: Allow non-direct-map arguments to free_reserved_area()
x86/mm/pti: Clear Global bit more aggressively
x86/speculation: Support Enhanced IBRS on future CPUs
x86/speculation: Protect against userspace-userspace spectreRSB
x86/kexec: Allocate 8k PGDs for PTI
Revert "perf/core: Make sure the ring-buffer is mapped in all page-tables"
x86/mm: Remove in_nmi() warning from vmalloc_fault()
x86/entry/32: Check for VM86 mode in slow-path check
perf/core: Make sure the ring-buffer is mapped in all page-tables
x86/pti: Check the return value of pti_user_pagetable_walk_pmd()
x86/pti: Check the return value of pti_user_pagetable_walk_p4d()
x86/entry/32: Add debug code to check entry/exit CR3
...

Linus Torvalds
2018-08-14 08:54:17 +0800

13 Aug, 2018

1 commit

b5b1404d0 init: rename and re-order boot_cpu_state_init() ... Browse Code »

This is purely a preparatory patch for upcoming changes during the 4.19
merge window.

We have a function called "boot_cpu_state_init()" that isn't really
about the bootup cpu state: that is done much earlier by the similarly
named "boot_cpu_init()" (note lack of "state" in name).

This function initializes some hotplug CPU state, and needs to run after
the percpu data has been properly initialized. It even has a comment to
that effect.

Except it _doesn't_ actually run after the percpu data has been properly
initialized. On x86 it happens to do that, but on at least arm and
arm64, the percpu base pointers are initialized by the arch-specific
'smp_prepare_boot_cpu()' hook, which ran _after_ boot_cpu_state_init().

This had some unexpected results, and in particular we have a patch
pending for the merge window that did the obvious cleanup of using
'this_cpu_write()' in the cpu hotplug init code:

- per_cpu_ptr(&cpuhp_state, smp_processor_id())->state = CPUHP_ONLINE;
+ this_cpu_write(cpuhp_state.state, CPUHP_ONLINE);

which is obviously the right thing to do. Except because of the
ordering issue, it actually failed miserably and unexpectedly on arm64.

So this just fixes the ordering, and changes the name of the function to
be 'boot_cpu_hotplug_init()' to make it obvious that it's about cpu
hotplug state, because the core CPU state was supposed to have already
been done earlier.

Marked for stable, since the (not yet merged) patch that will show this
problem is marked for stable.

Reported-by: Vlastimil Babka
Reported-by: Mian Yousaf Kaukab
Suggested-by: Catalin Marinas
Acked-by: Thomas Gleixner
Cc: Will Deacon
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds

Linus Torvalds
2018-08-13 03:19:42 +0800

11 Aug, 2018

1 commit

bff1b208a tracing: Partial revert of "tracing: Centralize preemptirq tracepoints and unify their usage" ... Browse Code »

Joel Fernandes created a nice patch that cleaned up the duplicate hooks used
by lockdep and irqsoff latency tracer. It made both use tracepoints. But it
caused lockdep to trigger several false positives. We have not figured out
why yet, but removing lockdep from using the trace event hooks and just call
its helper functions directly (like it use to), makes the problem go away.

This is a partial revert of the clean up patch c3bc8fd637a9 ("tracing:
Centralize preemptirq tracepoints and unify their usage") that adds direct
calls for lockdep, but also keeps most of the clean up done to get rid of
the horrible preprocessor if statements.

Link: http://lkml.kernel.org/r/20180806155058.5ee875f4@gandalf.local.home

Cc: Peter Zijlstra
Reviewed-by: Joel Fernandes (Google)
Fixes: c3bc8fd637a9 ("tracing: Centralize preemptirq tracepoints and unify their usage")
Signed-off-by: Steven Rostedt (VMware)

Steven Rostedt (VMware)
2018-08-11 03:11:25 +0800

31 Jul, 2018

1 commit

c3bc8fd63 tracing: Centralize preemptirq tracepoints and unify their usage ... Browse Code »

This patch detaches the preemptirq tracepoints from the tracers and
keeps it separate.

Advantages:
* Lockdep and irqsoff event can now run in parallel since they no longer
have their own calls.

* This unifies the usecase of adding hooks to an irqsoff and irqson
event, and a preemptoff and preempton event.
3 users of the events exist:
- Lockdep
- irqsoff and preemptoff tracers
- irqs and preempt trace events

The unification cleans up several ifdefs and makes the code in preempt
tracer and irqsoff tracers simpler. It gets rid of all the horrific
ifdeferry around PROVE_LOCKING and makes configuration of the different
users of the tracepoints more easy and understandable. It also gets rid
of the time_* function calls from the lockdep hooks used to call into
the preemptirq tracer which is not needed anymore. The negative delta in
lines of code in this patch is quite large too.

In the patch we introduce a new CONFIG option PREEMPTIRQ_TRACEPOINTS
as a single point for registering probes onto the tracepoints. With
this,
the web of config options for preempt/irq toggle tracepoints and its
users becomes:

PREEMPT_TRACER PREEMPTIRQ_EVENTS IRQSOFF_TRACER PROVE_LOCKING
| | \ | |
\ (selects) / \ \ (selects) /
TRACE_PREEMPT_TOGGLE ----> TRACE_IRQFLAGS
\ /
\ (depends on) /
PREEMPTIRQ_TRACEPOINTS

Other than the performance tests mentioned in the previous patch, I also
ran the locking API test suite. I verified that all tests cases are
passing.

I also injected issues by not registering lockdep probes onto the
tracepoints and I see failures to confirm that the probes are indeed
working.

This series + lockdep probes not registered (just to inject errors):
[ 0.000000] hard-irqs-on + irq-safe-A/21: ok | ok | ok |
[ 0.000000] soft-irqs-on + irq-safe-A/21: ok | ok | ok |
[ 0.000000] sirq-safe-A => hirqs-on/12:FAILED|FAILED| ok |
[ 0.000000] sirq-safe-A => hirqs-on/21:FAILED|FAILED| ok |
[ 0.000000] hard-safe-A + irqs-on/12:FAILED|FAILED| ok |
[ 0.000000] soft-safe-A + irqs-on/12:FAILED|FAILED| ok |
[ 0.000000] hard-safe-A + irqs-on/21:FAILED|FAILED| ok |
[ 0.000000] soft-safe-A + irqs-on/21:FAILED|FAILED| ok |
[ 0.000000] hard-safe-A + unsafe-B #1/123: ok | ok | ok |
[ 0.000000] soft-safe-A + unsafe-B #1/123: ok | ok | ok |

With this series + lockdep probes registered, all locking tests pass:

[ 0.000000] hard-irqs-on + irq-safe-A/21: ok | ok | ok |
[ 0.000000] soft-irqs-on + irq-safe-A/21: ok | ok | ok |
[ 0.000000] sirq-safe-A => hirqs-on/12: ok | ok | ok |
[ 0.000000] sirq-safe-A => hirqs-on/21: ok | ok | ok |
[ 0.000000] hard-safe-A + irqs-on/12: ok | ok | ok |
[ 0.000000] soft-safe-A + irqs-on/12: ok | ok | ok |
[ 0.000000] hard-safe-A + irqs-on/21: ok | ok | ok |
[ 0.000000] soft-safe-A + irqs-on/21: ok | ok | ok |
[ 0.000000] hard-safe-A + unsafe-B #1/123: ok | ok | ok |
[ 0.000000] soft-safe-A + unsafe-B #1/123: ok | ok | ok |

Link: http://lkml.kernel.org/r/20180730222423.196630-4-joel@joelfernandes.org

Acked-by: Peter Zijlstra (Intel)
Reviewed-by: Namhyung Kim
Signed-off-by: Joel Fernandes (Google)
Signed-off-by: Steven Rostedt (VMware)

Joel Fernandes (Google)
2018-07-31 23:32:27 +0800

20 Jul, 2018

3 commits

b976690f5 x86/mm/pti: Introduce pti_finalize() ... Browse Code »

Introduce a new function to finalize the kernel mappings for the userspace
page-table after all ro/nx protections have been applied to the kernel
mappings.

Also move the call to pti_clone_kernel_text() to that function so that it
will run on 32 bit kernels too.

Signed-off-by: Joerg Roedel
Signed-off-by: Thomas Gleixner
Tested-by: Pavel Machek
Cc: "H . Peter Anvin"
Cc: linux-mm@kvack.org
Cc: Linus Torvalds
Cc: Andy Lutomirski
Cc: Dave Hansen
Cc: Josh Poimboeuf
Cc: Juergen Gross
Cc: Peter Zijlstra
Cc: Borislav Petkov
Cc: Jiri Kosina
Cc: Boris Ostrovsky
Cc: Brian Gerst
Cc: David Laight
Cc: Denys Vlasenko
Cc: Eduardo Valentin
Cc: Greg KH
Cc: Will Deacon
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: Andrea Arcangeli
Cc: Waiman Long
Cc: "David H . Gutteridge"
Cc: joro@8bytes.org
Link: https://lkml.kernel.org/r/1531906876-13451-30-git-send-email-joro@8bytes.org

Joerg Roedel
2018-07-20 07:11:45 +0800
857baa87b sched/clock: Enable sched clock early ... Browse Code »

Allow sched_clock() to be used before schec_clock_init() is called. This
provides a way to get early boot timestamps on machines with unstable
clocks.

Signed-off-by: Pavel Tatashin
Signed-off-by: Thomas Gleixner
Cc: steven.sistare@oracle.com
Cc: daniel.m.jordan@oracle.com
Cc: linux@armlinux.org.uk
Cc: schwidefsky@de.ibm.com
Cc: heiko.carstens@de.ibm.com
Cc: john.stultz@linaro.org
Cc: sboyd@codeaurora.org
Cc: hpa@zytor.com
Cc: douly.fnst@cn.fujitsu.com
Cc: peterz@infradead.org
Cc: prarit@redhat.com
Cc: feng.tang@intel.com
Cc: pmladek@suse.com
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: linux-s390@vger.kernel.org
Cc: boris.ostrovsky@oracle.com
Cc: jgross@suse.com
Cc: pbonzini@redhat.com
Link: https://lkml.kernel.org/r/20180719205545.16512-24-pasha.tatashin@oracle.com

Pavel Tatashin
2018-07-20 06:02:43 +0800
5d2a4e91a sched/clock: Move sched clock initialization and merge with generic clock ... Browse Code »

sched_clock_postinit() initializes a generic clock on systems where no
other clock is provided. This function may be called only after
timekeeping_init().

Rename sched_clock_postinit to generic_clock_inti() and call it from
sched_clock_init(). Move the call for sched_clock_init() until after
time_init().

Suggested-by: Peter Zijlstra
Signed-off-by: Pavel Tatashin
Signed-off-by: Thomas Gleixner
Cc: steven.sistare@oracle.com
Cc: daniel.m.jordan@oracle.com
Cc: linux@armlinux.org.uk
Cc: schwidefsky@de.ibm.com
Cc: heiko.carstens@de.ibm.com
Cc: john.stultz@linaro.org
Cc: sboyd@codeaurora.org
Cc: hpa@zytor.com
Cc: douly.fnst@cn.fujitsu.com
Cc: prarit@redhat.com
Cc: feng.tang@intel.com
Cc: pmladek@suse.com
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: linux-s390@vger.kernel.org
Cc: boris.ostrovsky@oracle.com
Cc: jgross@suse.com
Cc: pbonzini@redhat.com
Link: https://lkml.kernel.org/r/20180719205545.16512-23-pasha.tatashin@oracle.com

Pavel Tatashin
2018-07-20 06:02:43 +0800

26 May, 2018

1 commit

ae67d58d0 init/main.c: include <linux/mem_encrypt.h> ... Browse Code »

In commit c7753208a94c ("x86, swiotlb: Add memory encryption support") a
call to function `mem_encrypt_init' was added. Include prototype
defined in header to prevent a warning reported
during compilation with W=1:

init/main.c:494:20: warning: no previous prototype for `mem_encrypt_init' [-Wmissing-prototypes]

Link: http://lkml.kernel.org/r/20180522195533.31415-1-malat@debian.org
Signed-off-by: Mathieu Malaterre
Reviewed-by: Andrew Morton
Acked-by: Steven Rostedt (VMware)
Cc: Tom Lendacky
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: Kees Cook
Cc: Laura Abbott
Cc: Dominik Brodowski
Cc: Gargi Sharma
Cc: Josh Poimboeuf
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mathieu Malaterre
2018-05-26 09:12:11 +0800

12 May, 2018

1 commit

ae646f0b9 init: fix false positives in W+X checking ... Browse Code »

load_module() creates W+X mappings via __vmalloc_node_range() (from
layout_and_allocate()->move_module()->module_alloc()) by using
PAGE_KERNEL_EXEC. These mappings are later cleaned up via
"call_rcu_sched(&freeinit->rcu, do_free_init)" from do_init_module().

This is a problem because call_rcu_sched() queues work, which can be run
after debug_checkwx() is run, resulting in a race condition. If hit,
the race results in a nasty splat about insecure W+X mappings, which
results in a poor user experience as these are not the mappings that
debug_checkwx() is intended to catch.

This issue is observed on multiple arm64 platforms, and has been
artificially triggered on an x86 platform.

Address the race by flushing the queued work before running the
arch-defined mark_rodata_ro() which then calls debug_checkwx().

Link: http://lkml.kernel.org/r/1525103946-29526-1-git-send-email-jhugo@codeaurora.org
Fixes: e1a58320a38d ("x86/mm: Warn on W^X mappings")
Signed-off-by: Jeffrey Hugo
Reported-by: Timur Tabi
Reported-by: Jan Glauber
Acked-by: Kees Cook
Acked-by: Ingo Molnar
Acked-by: Will Deacon
Acked-by: Laura Abbott
Cc: Mark Rutland
Cc: Ard Biesheuvel
Cc: Catalin Marinas
Cc: Stephen Smalley
Cc: Thomas Gleixner
Cc: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jeffrey Hugo
2018-05-12 08:28:45 +0800

07 May, 2018

1 commit

f142f08bf Fix typo in comment. ... Browse Code »

CONFIG_PRREMPT -> CONFIG_PREEMPT

Signed-off-by: Florian La Roche
Signed-off-by: Linus Torvalds

Florian La Roche
2018-05-07 23:41:46 +0800

12 Apr, 2018

2 commits

096523203 seq_file: allocate seq_file from kmem_cache ... Browse Code »

For fine-grained debugging and usercopy protection.

Link: http://lkml.kernel.org/r/20180310085027.GA17121@avx2
Signed-off-by: Alexey Dobriyan
Reviewed-by: Andrew Morton
Cc: Al Viro
Cc: Glauber Costa
Cc: Vladimir Davydov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2018-04-12 01:28:36 +0800
3ea056c50 uts: create "struct uts_namespace" from kmem_cache ... Browse Code »

So "struct uts_namespace" can enjoy fine-grained SLAB debugging and
usercopy protection.

I'd prefer shorter name "utsns" but there is "user_namespace" already.

Link: http://lkml.kernel.org/r/20180228215158.GA23146@avx2
Signed-off-by: Alexey Dobriyan
Reviewed-by: Andrew Morton
Cc: "Eric W. Biederman"
Cc: Serge Hallyn
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2018-04-12 01:28:35 +0800

11 Apr, 2018

1 commit

2a56bb596 Merge tag 'trace-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace ... Browse Code »

Pull tracing updates from Steven Rostedt:
"New features:

- Tom Zanussi's extended histogram work.

This adds the synthetic events to have histograms from multiple
event data Adds triggers "onmatch" and "onmax" to call the
synthetic events Several updates to the histogram code from this

- Allow way to nest ring buffer calls in the same context

- Allow absolute time stamps in ring buffer

- Rewrite of filter code parsing based on Al Viro's suggestions

- Setting of trace_clock to global if TSC is unstable (on boot)

- Better OOM handling when allocating large ring buffers

- Added initcall tracepoints (consolidated initcall_debug code with
them)

And other various fixes and clean ups"

* tag 'trace-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (68 commits)
init: Have initcall_debug still work without CONFIG_TRACEPOINTS
init, tracing: Have printk come through the trace events for initcall_debug
init, tracing: instrument security and console initcall trace events
init, tracing: Add initcall trace events
tracing: Add rcu dereference annotation for test func that touches filter->prog
tracing: Add rcu dereference annotation for filter->prog
tracing: Fixup logic inversion on setting trace_global_clock defaults
tracing: Hide global trace clock from lockdep
ring-buffer: Add set/clear_current_oom_origin() during allocations
ring-buffer: Check if memory is available before allocation
lockdep: Add print_irqtrace_events() to __warn
vsprintf: Do not preprocess non-dereferenced pointers for bprintf (%px and %pK)
tracing: Uninitialized variable in create_tracing_map_fields()
tracing: Make sure variable string fields are NULL-terminated
tracing: Add action comparisons when testing matching hist triggers
tracing: Don't add flag strings when displaying variable references
tracing: Fix display of hist trigger expressions containing timestamps
ftrace: Drop a VLA in module_exists()
tracing: Mention trace_clock=global when warning about unstable clocks
tracing: Default to using trace_global_clock if sched_clock is unstable
...

Linus Torvalds
2018-04-11 02:27:30 +0800

08 Apr, 2018

1 commit

b0dc52f15 init: Have initcall_debug still work without CONFIG_TRACEPOINTS ... Browse Code »

Add macros around the initcall_debug tracepoint code to have the code to
default back to the old method if CONFIG_TRACEPOINTS is not enabled.

Signed-off-by: Steven Rostedt (VMware)

Steven Rostedt (VMware)
2018-04-08 08:24:40 +0800

06 Apr, 2018

1 commit

4e37958d1 init, tracing: Have printk come through the trace events for initcall_debug ... Browse Code »

With trace events set before and after the initcall function calls, instead
of having a separate routine for printing out the initcalls when
initcall_debug is specified on the kernel command line, have the code
register a callback to the tracepoints where the initcall trace events are.

This removes the need for having a separate function to do the initcalls as
the tracepoint callbacks can handle the printk. It also includes other
initcalls that are not called by the do_one_initcall() which includes
console and security initcalls.

Signed-off-by: Steven Rostedt (VMware)

Steven Rostedt (VMware)
2018-04-06 20:56:55 +0800