Eric Lee / smarc-fsl-linux-kernel

27 Nov, 2018

1 commit

4074ca7d8 x86/mm: Move LDT remap out of KASLR region on 5-level paging ... Browse Code »

commit d52888aa2753e3063a9d3a0c9f72f94aa9809c15 upstream

On 5-level paging the LDT remap area is placed in the middle of the KASLR
randomization region and it can overlap with the direct mapping, the
vmalloc or the vmap area.

The LDT mapping is per mm, so it cannot be moved into the P4D page table
next to the CPU_ENTRY_AREA without complicating PGD table allocation for
5-level paging.

The 4 PGD slot gap just before the direct mapping is reserved for
hypervisors, so it cannot be used.

Move the direct mapping one slot deeper and use the resulting gap for the
LDT remap area. The resulting layout is the same for 4 and 5 level paging.

[ tglx: Massaged changelog ]

Fixes: f55f0501cbf6 ("x86/pti: Put the LDT in its own PGD if PTI is on")
Signed-off-by: Kirill A. Shutemov
Signed-off-by: Thomas Gleixner
Reviewed-by: Andy Lutomirski
Cc: bp@alien8.de
Cc: hpa@zytor.com
Cc: dave.hansen@linux.intel.com
Cc: peterz@infradead.org
Cc: boris.ostrovsky@oracle.com
Cc: jgross@suse.com
Cc: bhe@redhat.com
Cc: willy@infradead.org
Cc: linux-mm@kvack.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20181026122856.66224-2-kirill.shutemov@linux.intel.com
Signed-off-by: Sasha Levin

Kirill A. Shutemov
2018-11-27 23:13:08 +0800

10 Sep, 2018

1 commit

07e846bac x86/doc: Fix Documentation/x86/earlyprintk.txt ... Browse Code »

Fix a few issues in Documentation/x86/earlyprintk.txt:

- correct typos, punctuation, missing word, wrong word
- change product name from Netchip to NetChip
- expand where to add "earlyprintk=dbg"

Signed-off-by: Randy Dunlap
Cc: Eric W. Biederman
Cc: Jason Wessel
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Yinghai Lu
Cc: linux-doc@vger.kernel.org
Cc: linux-usb@vger.kernel.org
Link: http://lkml.kernel.org/r/d0c40ac3-7659-6374-dbda-23d3d2577f30@infradead.org
Signed-off-by: Ingo Molnar

Randy Dunlap
2018-09-10 21:09:30 +0800

14 Aug, 2018

2 commits

13e091b6d Merge branch 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull x86 timer updates from Thomas Gleixner:
"Early TSC based time stamping to allow better boot time analysis.

This comes with a general cleanup of the TSC calibration code which
grew warts and duct taping over the years and removes 250 lines of
code. Initiated and mostly implemented by Pavel with help from various
folks"

* 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits)
x86/kvmclock: Mark kvm_get_preset_lpj() as __init
x86/tsc: Consolidate init code
sched/clock: Disable interrupts when calling generic_sched_clock_init()
timekeeping: Prevent false warning when persistent clock is not available
sched/clock: Close a hole in sched_clock_init()
x86/tsc: Make use of tsc_calibrate_cpu_early()
x86/tsc: Split native_calibrate_cpu() into early and late parts
sched/clock: Use static key for sched_clock_running
sched/clock: Enable sched clock early
sched/clock: Move sched clock initialization and merge with generic clock
x86/tsc: Use TSC as sched clock early
x86/tsc: Initialize cyc2ns when tsc frequency is determined
x86/tsc: Calibrate tsc only once
ARM/time: Remove read_boot_clock64()
s390/time: Remove read_boot_clock64()
timekeeping: Default boot time offset to local_clock()
timekeeping: Replace read_boot_clock64() with read_persistent_wall_and_boot_offset()
s390/time: Add read_persistent_wall_and_boot_offset()
x86/xen/time: Output xen sched_clock time from 0
x86/xen/time: Initialize pv xen time in init_hypervisor_platform()
...

Linus Torvalds
2018-08-14 09:28:19 +0800
30de24c7d Merge branch 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull x86 cache QoS (RDT/CAR) updates from Thomas Gleixner:
"Add support for pseudo-locked cache regions.

Cache Allocation Technology (CAT) allows on certain CPUs to isolate a
region of cache and 'lock' it. Cache pseudo-locking builds on the fact
that a CPU can still read and write data pre-allocated outside its
current allocated area on cache hit. With cache pseudo-locking data
can be preloaded into a reserved portion of cache that no application
can fill, and from that point on will only serve cache hits. The cache
pseudo-locked memory is made accessible to user space where an
application can map it into its virtual address space and thus have a
region of memory with reduced average read latency.

The locking is not perfect and gets totally screwed by WBINDV and
similar mechanisms, but it provides a reasonable enhancement for
certain types of latency sensitive applications.

The implementation extends the current CAT mechanism and provides a
generally useful exclusive CAT mode on which it builds the extra
pseude-locked regions"

* 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (45 commits)
x86/intel_rdt: Disable PMU access
x86/intel_rdt: Fix possible circular lock dependency
x86/intel_rdt: Make CPU information accessible for pseudo-locked regions
x86/intel_rdt: Support restoration of subset of permissions
x86/intel_rdt: Fix cleanup of plr structure on error
x86/intel_rdt: Move pseudo_lock_region_clear()
x86/intel_rdt: Limit C-states dynamically when pseudo-locking active
x86/intel_rdt: Support L3 cache performance event of Broadwell
x86/intel_rdt: More precise L2 hit/miss measurements
x86/intel_rdt: Create character device exposing pseudo-locked region
x86/intel_rdt: Create debugfs files for pseudo-locking testing
x86/intel_rdt: Create resctrl debug area
x86/intel_rdt: Ensure RDT cleanup on exit
x86/intel_rdt: Resctrl files reflect pseudo-locked information
x86/intel_rdt: Support creation/removal of pseudo-locked region
x86/intel_rdt: Pseudo-lock region creation/removal core
x86/intel_rdt: Discover supported platforms via prefetch disable bits
x86/intel_rdt: Add utilities to test pseudo-locked region possibility
x86/intel_rdt: Split resource group removal in two
x86/intel_rdt: Enable entering of pseudo-locksetup mode
...

Linus Torvalds
2018-08-14 07:01:46 +0800

20 Jul, 2018

1 commit

fe9af81e5 x86/tsc: Redefine notsc to behave as tsc=unstable ... Browse Code »

Currently, the notsc kernel parameter disables the use of the TSC by
sched_clock(). However, this parameter does not prevent the kernel from
accessing tsc in other places.

The only rationale to boot with notsc is to avoid timing discrepancies on
multi-socket systems where TSC are not properly synchronized, and thus
exclude TSC from being used for time keeping. But that prevents using TSC
as sched_clock() as well, which is not necessary as the core sched_clock()
implementation can handle non synchronized TSC based sched clocks just
fine.

However, there is another method to solve the above problem: booting with
tsc=unstable parameter. This parameter allows sched_clock() to use TSC and
just excludes it from timekeeping.

So there is no real reason to keep notsc, but for compatibility reasons the
parameter has to stay. Make it behave like 'tsc=unstable' instead.

[ tglx: Massaged changelog ]

Signed-off-by: Pavel Tatashin
Signed-off-by: Thomas Gleixner
Reviewed-by: Dou Liyang
Reviewed-by: Thomas Gleixner
Cc: steven.sistare@oracle.com
Cc: daniel.m.jordan@oracle.com
Cc: linux@armlinux.org.uk
Cc: schwidefsky@de.ibm.com
Cc: heiko.carstens@de.ibm.com
Cc: john.stultz@linaro.org
Cc: sboyd@codeaurora.org
Cc: hpa@zytor.com
Cc: peterz@infradead.org
Cc: prarit@redhat.com
Cc: feng.tang@intel.com
Cc: pmladek@suse.com
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: linux-s390@vger.kernel.org
Cc: boris.ostrovsky@oracle.com
Cc: jgross@suse.com
Cc: pbonzini@redhat.com
Link: https://lkml.kernel.org/r/20180719205545.16512-12-pasha.tatashin@oracle.com

Pavel Tatashin
2018-07-20 06:02:39 +0800

07 Jul, 2018

1 commit

cc9aec03e x86/numa_emulation: Introduce uniform split capability ... Browse Code »

The current NUMA emulation capabilities for splitting System RAM by a
fixed size or by a set number of nodes may result in some nodes being
larger than others. The implementation prioritizes establishing a
minimum usable memory size over satisfying the requested number of NUMA
nodes.

Introduce a uniform split capability that evenly partitions each
physical NUMA node into N emulated nodes. For example numa=fake=3U
creates 6 emulated nodes total on a system that has 2 physical nodes.

This capability is useful for debugging and evaluating platform
memory-side-cache capabilities as described by the ACPI HMAT (see
5.2.27.5 Memory Side Cache Information Structure in ACPI 6.2a)

Compare numa=fake=6 that results in only 5 nodes being created against
numa=fake=3U which takes the 2 physical nodes and evenly divides them.

numa=fake=6
available: 5 nodes (0-4)
node 0 cpus: 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38
node 0 size: 2648 MB
node 0 free: 2443 MB
node 1 cpus: 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39
node 1 size: 2672 MB
node 1 free: 2442 MB
node 2 cpus: 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38
node 2 size: 5291 MB
node 2 free: 5278 MB
node 3 cpus: 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39
node 3 size: 2677 MB
node 3 free: 2665 MB
node 4 cpus: 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39
node 4 size: 2676 MB
node 4 free: 2663 MB
node distances:
node 0 1 2 3 4
0: 10 20 10 20 20
1: 20 10 20 10 10
2: 10 20 10 20 20
3: 20 10 20 10 10
4: 20 10 20 10 10

numa=fake=3U
available: 6 nodes (0-5)
node 0 cpus: 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38
node 0 size: 2900 MB
node 0 free: 2637 MB
node 1 cpus: 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38
node 1 size: 3023 MB
node 1 free: 3012 MB
node 2 cpus: 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38
node 2 size: 2015 MB
node 2 free: 2004 MB
node 3 cpus: 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39
node 3 size: 2704 MB
node 3 free: 2522 MB
node 4 cpus: 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39
node 4 size: 2709 MB
node 4 free: 2698 MB
node 5 cpus: 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39
node 5 size: 2612 MB
node 5 free: 2601 MB
node distances:
node 0 1 2 3 4 5
0: 10 10 10 20 20 20
1: 10 10 10 20 20 20
2: 10 10 10 20 20 20
3: 20 20 20 10 10 10
4: 20 20 20 10 10 10
5: 20 20 20 10 10 10

Signed-off-by: Dan Williams
Cc: David Rientjes
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Wei Yang
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/153089328617.27680.14930758266174305832.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Ingo Molnar

Dan Williams
2018-07-07 00:48:58 +0800

03 Jul, 2018

1 commit

33dc3e410 x86/intel_rdt: Make CPU information accessible for pseudo-locked regions ... Browse Code »

When a resource group enters pseudo-locksetup mode it reflects that the
platform supports cache pseudo-locking and the resource group is unused,
ready to be used for a pseudo-locked region. Until it is set up as a
pseudo-locked region the resource group is "locked down" such that no new
tasks or cpus can be assigned to it. This is accomplished in a user visible
way by making the cpus, cpus_list, and tasks resctrl files inaccassible
(user cannot read from or write to these files).

When the resource group changes to pseudo-locked mode it represents a cache
pseudo-locked region. While not appropriate to make any changes to the cpus
assigned to this region it is useful to make it easy for the user to see
which cpus are associated with the pseudo-locked region.

Modify the permissions of the cpus/cpus_list file when the resource group
changes to pseudo-locked mode to support reading (not writing). The
information presented to the user when reading the file are the cpus
associated with the pseudo-locked region.

Signed-off-by: Reinette Chatre
Signed-off-by: Thomas Gleixner
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/12756b7963b6abc1bffe8fb560b87b75da827bd1.1530421961.git.reinette.chatre@intel.com

Reinette Chatre
2018-07-03 14:38:40 +0800

24 Jun, 2018

1 commit

6fc0de37f x86/intel_rdt: Limit C-states dynamically when pseudo-locking active ... Browse Code »

Deeper C-states impact cache content through shrinking of the cache or
flushing entire cache to memory before reducing power to the cache.
Deeper C-states will thus negatively impact the pseudo-locked regions.

To avoid impacting pseudo-locked regions C-states are limited on
pseudo-locked region creation so that cores associated with the
pseudo-locked region are prevented from entering deeper C-states.
This is accomplished by requesting a CPU latency target which will
prevent the core from entering C6 across all supported platforms.

Signed-off-by: Reinette Chatre
Signed-off-by: Thomas Gleixner
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/1ef4f99dd6ba12fa6fb44c5a1141e75f952b9cd9.1529706536.git.reinette.chatre@intel.com

Reinette Chatre
2018-06-24 21:35:48 +0800

23 Jun, 2018

2 commits

e17e73307 x86/intel_rdt: Documentation for Cache Pseudo-Locking ... Browse Code »

Add description of Cache Pseudo-Locking feature, its interface, as well as
an example of its usage.

Signed-off-by: Reinette Chatre
Signed-off-by: Thomas Gleixner
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/6e118c15d2c254a27b8891783505cd1bb94a2b10.1529706536.git.reinette.chatre@intel.com

Reinette Chatre
2018-06-23 19:03:44 +0800
cba1aab84 x86/intel_rdt: Document new mode, size, and bit_usage ... Browse Code »

By default resource groups allow sharing of their cache allocations. There
is nothing that prevents a resource group from configuring a cache
allocation that overlaps with that of an existing resource group.

To enable resource groups to specify that their cache allocations cannot be
shared a resource group "mode" is introduced to support two possible modes:
"shareable" and "exclusive". A "shareable" resource group allows sharing of
its cache allocations, an "exclusive" resource group does not. A new
resctrl file "mode" associated with each resource group is used to
communicate its (the associated resource group's) mode setting and allow
the mode to be changed. The new "mode" file as well as two other resctrl
files, "bit_usage" and "size", are introduced in this series.

Add documentation for the three new resctrl files as well as one example
demonstrating their use.

Signed-off-by: Reinette Chatre
Signed-off-by: Thomas Gleixner
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/f03a3059ec40ae719be6f3fba9f446bb055e0064.1529706536.git.reinette.chatre@intel.com

Reinette Chatre
2018-06-23 19:03:40 +0800

05 Jun, 2018

1 commit

ab20fd001 Merge branch 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull x86 cache resource controller updates from Thomas Gleixner:
"An update for the Intel Resource Director Technolgy (RDT) which adds a
feedback driven software controller to runtime adjust the bandwidth
allocation MSRs.

This makes the allocations more accurate and allows to use bandwidth
values in understandable units (MB/s) instead of using percentage
based allocations as the original, still available, interface.

The software controller can be enabled with a new mount option for the
resctrl filesystem"

* 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/intel_rdt/mba_sc: Feedback loop to dynamically update mem bandwidth
x86/intel_rdt/mba_sc: Prepare for feedback loop
x86/intel_rdt/mba_sc: Add schemata support
x86/intel_rdt/mba_sc: Add initialization support
x86/intel_rdt/mba_sc: Enable/disable MBA software controller
x86/intel_rdt/mba_sc: Documentation for MBA software controller(mba_sc)

Linus Torvalds
2018-06-05 12:34:39 +0800

28 May, 2018

3 commits

098afd981 x86/pci-dma: remove the explicit nodac and allowdac option ... Browse Code »

This is something drivers should decide (modulo chipset quirks like
for VIA), which as far as I can tell is how things have been handled
for the last 15 years.

Note that we keep the usedac option for now, as it is used in the wild
to override the too generic VIA quirk.

Signed-off-by: Christoph Hellwig
Reviewed-by: Thomas Gleixner

Christoph Hellwig
2018-05-28 18:48:21 +0800
06e9552f5 x86/pci-dma: remove the experimental forcesac boot option ... Browse Code »

Limiting the dma mask to avoid PCI (pre-PCIe) DAC cycles while paying
the huge overhead of an IOMMU is rather pointless, and this seriously
gets in the way of dma mapping work.

Signed-off-by: Christoph Hellwig
Reviewed-by: Thomas Gleixner

Christoph Hellwig
2018-05-28 18:48:16 +0800
84564d1c7 Documentation/x86: remove a stray reference to pci-nommu.c ... Browse Code »

This is just the minimal workaround. The file is mostly either stale
and/or duplicative of Documentation/admin-guide/kernel-parameters.txt,
but that is much more work than I'm willing to do right now.

Signed-off-by: Christoph Hellwig
Reviewed-by: Thomas Gleixner

Christoph Hellwig
2018-05-28 18:48:12 +0800

19 May, 2018

1 commit

d6c64a4f4 x86/intel_rdt/mba_sc: Documentation for MBA software controller(mba_sc) ... Browse Code »

Add documentation about the feedback loop mechanism (MBA software
controller) which lets the user specify the memory bandwidth allocation
in MBps. This includes some changes to "schemata" formati with
examples.

Signed-off-by: Vikas Shivappa
Signed-off-by: Thomas Gleixner
Cc: ravi.v.shankar@intel.com
Cc: tony.luck@intel.com
Cc: fenghua.yu@intel.com
Cc: vikas.shivappa@intel.com
Cc: ak@linux.intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/1524263781-14267-2-git-send-email-vikas.shivappa@linux.intel.com

Vikas Shivappa
2018-05-19 19:16:42 +0800

16 Apr, 2018

1 commit

9fb71c2f2 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull x86 fixes from Thomas Gleixner:
"A set of fixes and updates for x86:

- Address a swiotlb regression which was caused by the recent DMA
rework and made driver fail because dma_direct_supported() returned
false

- Fix a signedness bug in the APIC ID validation which caused invalid
APIC IDs to be detected as valid thereby bloating the CPU possible
space.

- Fix inconsisten config dependcy/select magic for the MFD_CS5535
driver.

- Fix a corruption of the physical address space bits when encryption
has reduced the address space and late cpuinfo updates overwrite
the reduced bit information with the original value.

- Dominiks syscall rework which consolidates the architecture
specific syscall functions so all syscalls can be wrapped with the
same macros. This allows to switch x86/64 to struct pt_regs based
syscalls. Extend the clearing of user space controlled registers in
the entry patch to the lower registers"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/apic: Fix signedness bug in APIC ID validity checks
x86/cpu: Prevent cpuinfo_x86::x86_phys_bits adjustment corruption
x86/olpc: Fix inconsistent MFD_CS5535 configuration
swiotlb: Use dma_direct_supported() for swiotlb_ops
syscalls/x86: Adapt syscall_wrapper.h to the new syscall stub naming convention
syscalls/core, syscalls/x86: Rename struct pt_regs-based sys_*() to __x64_sys_*()
syscalls/core, syscalls/x86: Clean up compat syscall stub naming convention
syscalls/core, syscalls/x86: Clean up syscall stub naming convention
syscalls/x86: Extend register clearing on syscall entry to lower registers
syscalls/x86: Unconditionally enable 'struct pt_regs' based syscalls on x86_64
syscalls/x86: Use 'struct pt_regs' based syscall calling for IA32_EMULATION and x32
syscalls/core: Prepare CONFIG_ARCH_HAS_SYSCALL_WRAPPER=y for compat syscalls
syscalls/x86: Use 'struct pt_regs' based syscall calling convention for 64-bit syscalls
syscalls/core: Introduce CONFIG_ARCH_HAS_SYSCALL_WRAPPER=y
x86/syscalls: Don't pointlessly reload the system call number
x86/mm: Fix documentation of module mapping range with 4-level paging
x86/cpuid: Switch to 'static const' specifier

Linus Torvalds
2018-04-16 07:12:35 +0800

12 Apr, 2018

1 commit

ef389b734 Merge branch 'WIP.x86/asm' into x86/urgent, because the topic is ready ... Browse Code »

Signed-off-by: Ingo Molnar

Ingo Molnar
2018-04-12 15:42:34 +0800

06 Apr, 2018

1 commit

672a9c106 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial ... Browse Code »

Pull trivial tree updates from Jiri Kosina.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
kfifo: fix inaccurate comment
tools/thermal: tmon: fix for segfault
net: Spelling s/stucture/structure/
edd: don't spam log if no EDD information is present
Documentation: Fix early-microcode.txt references after file rename
tracing: Block comments should align the * on each line
treewide: Fix typos in printk
GenWQE: Fix a typo in two comments
treewide: Align function definition open/close braces

Linus Torvalds
2018-04-06 02:56:35 +0800

03 Apr, 2018

1 commit

9a3b7e5e6 x86/mm: Fix documentation of module mapping range with 4-level paging ... Browse Code »

Commit:

f5a40711fa58 ("x86/mm: Set MODULES_END to 0xffffffffff000000")

changed MODULES_END back to a fixed value, but didn't update the documentation
of memory layout for 4-level paging.

Signed-off-by: Kirill A. Shutemov
Acked-by: Andrey Ryabinin
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Fixes: f5a40711fa58 ("x86/mm: Set MODULES_END to 0xffffffffff000000")
Link: http://lkml.kernel.org/r/20180402121025.10244-1-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar

Kirill A. Shutemov
2018-04-03 18:59:02 +0800

27 Mar, 2018

1 commit

1897a9691 Documentation: Fix early-microcode.txt references after file rename ... Browse Code »

The file Documentation/x86/early-microcode.txt was renamed to
Documentation/x86/microcode.txt in 0e3258753f81, but it was still
referenced by its old name in a three places:

* Documentation/x86/00-INDEX
* arch/x86/Kconfig
* arch/x86/kernel/cpu/microcode/amd.c

This commit updates these references accordingly.

Fixes: 0e3258753f81 ("x86/microcode: Document the three loading methods")
Signed-off-by: Jaak Ristioja
Signed-off-by: Jiri Kosina

Jaak Ristioja
2018-03-27 15:51:23 +0800

15 Mar, 2018

1 commit

745dd37f9 Merge branch 'x86/urgent' into x86/mm to pick up dependencies Browse Code »

Thomas Gleixner
2018-03-15 03:23:25 +0800

01 Mar, 2018

1 commit

300097461 Documentation, x86, resctrl: Make text and sample command match ... Browse Code »

The text says "Move the cpus 4-7 over to p1", but the sample command writes
to p0/cpus.

Signed-off-by: Li RongQing
Signed-off-by: Thomas Gleixner
Cc: fenghua.yu@intel.com
Cc: linux-doc@vger.kernel.org
Link: https://lkml.kernel.org/r/1519712271-8802-1-git-send-email-lirongqing@baidu.com

Li RongQing
2018-03-01 02:59:05 +0800

26 Feb, 2018

1 commit

3f7df3efe Merge tag 'v4.16-rc3' into x86/mm, to pick up fixes ... Browse Code »

Signed-off-by: Ingo Molnar

Ingo Molnar
2018-02-26 15:41:15 +0800

23 Feb, 2018

1 commit

0c52f7c54 x86/topology: Fix function name in documentation ... Browse Code »

topology_sibling_cpumask() is the correct thread-related topology
function in the kernel:

s/topology_sibling_mask/topology_sibling_cpumask

Signed-off-by: Dou Liyang
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: corbet@lwn.net
Cc: linux-doc@vger.kernel.org
Link: http://lkml.kernel.org/r/20180222084812.14497-1-douly.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar

Dou Liyang
2018-02-23 15:40:12 +0800

16 Feb, 2018

1 commit

6657fca06 x86/mm: Allow to boot without LA57 if CONFIG_X86_5LEVEL=y ... Browse Code »

All pieces of the puzzle are in place and we can now allow to boot with
CONFIG_X86_5LEVEL=y on a machine without LA57 support.

Kernel will detect that LA57 is missing and fold p4d at runtime.

Update the documentation and the Kconfig option description to reflect the
change.

Signed-off-by: Kirill A. Shutemov
Cc: Andy Lutomirski
Cc: Arjan van de Ven
Cc: Borislav Petkov
Cc: Dan Williams
Cc: Dave Hansen
Cc: David Woodhouse
Cc: Josh Poimboeuf
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180214182542.69302-10-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar

Kirill A. Shutemov
2018-02-16 17:48:49 +0800

02 Feb, 2018

1 commit

47fcc0360 Merge tag 'driver-core-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core ... Browse Code »

Pull driver core updates from Greg KH:
"Here is the set of "big" driver core patches for 4.16-rc1.

The majority of the work here is in the firmware subsystem, with
reworks to try to attempt to make the code easier to handle in the
long run, but no functional change. There's also some tree-wide sysfs
attribute fixups with lots of acks from the various subsystem
maintainers, as well as a handful of other normal fixes and changes.

And finally, some license cleanups for the driver core and sysfs code.

All have been in linux-next for a while with no reported issues"

* tag 'driver-core-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (48 commits)
device property: Define type of PROPERTY_ENRTY_*() macros
device property: Reuse property_entry_free_data()
device property: Move property_entry_free_data() upper
firmware: Fix up docs referring to FIRMWARE_IN_KERNEL
firmware: Drop FIRMWARE_IN_KERNEL Kconfig option
USB: serial: keyspan: Drop firmware Kconfig options
sysfs: remove DEBUG defines
sysfs: use SPDX identifiers
drivers: base: add coredump driver ops
sysfs: add attribute specification for /sysfs/devices/.../coredump
test_firmware: fix missing unlock on error in config_num_requests_store()
test_firmware: make local symbol test_fw_config static
sysfs: turn WARN() into pr_warn()
firmware: Fix a typo in fallback-mechanisms.rst
treewide: Use DEVICE_ATTR_WO
treewide: Use DEVICE_ATTR_RO
treewide: Use DEVICE_ATTR_RW
sysfs.h: Use octal permissions
component: add debugfs support
bus: simple-pm-bus: convert bool SIMPLE_PM_BUS to tristate
...

Linus Torvalds
2018-02-02 02:00:28 +0800

30 Jan, 2018

1 commit

f0b13428c Merge branch 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull x86/cache updates from Thomas Gleixner:
"A set of patches which add support for L2 cache partitioning to the
Intel RDT facility"

* 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/intel_rdt: Add command line parameter to control L2_CDP
x86/intel_rdt: Enable L2 CDP in MSR IA32_L2_QOS_CFG
x86/intel_rdt: Add two new resources for L2 Code and Data Prioritization (CDP)
x86/intel_rdt: Enumerate L2 Code and Data Prioritization (CDP) feature
x86/intel_rdt: Add L2CDP support in documentation
x86/intel_rdt: Update documentation

Linus Torvalds
2018-01-30 09:48:22 +0800

25 Jan, 2018

1 commit

c508c46e6 firmware: Fix up docs referring to FIRMWARE_IN_KERNEL ... Browse Code »

We've removed the option, so stop talking about it.

Signed-off-by: Benjamin Gilbert
Acked-by: Ingo Molnar
Cc: Borislav Petkov
Cc: Thomas Gleixner
Cc: H. Peter Anvin
Signed-off-by: Greg Kroah-Hartman

Benjamin Gilbert
2018-01-25 19:46:30 +0800

22 Jan, 2018

1 commit

551511421 Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull x86 pti fixes from Thomas Gleixner:
"A small set of fixes for the meltdown/spectre mitigations:

- Make kprobes aware of retpolines to prevent probes in the retpoline
thunks.

- Make the machine check exception speculation protected. MCE used to
issue an indirect call directly from the ASM entry code. Convert
that to a direct call into a C-function and issue the indirect call
from there so the compiler can add the retpoline protection,

- Make the vmexit_fill_RSB() assembly less stupid

- Fix a typo in the PTI documentation"

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/retpoline: Optimize inline assembler for vmexit_fill_RSB
x86/pti: Document fix wrong index
kprobes/x86: Disable optimizing on the function jumps to indirect thunk
kprobes/x86: Blacklist indirect thunk functions for kprobes
retpoline: Introduce start/end markers of indirect thunk
x86/mce: Make machine check speculation protected

Linus Torvalds
2018-01-22 02:48:35 +0800

19 Jan, 2018

1 commit

98f0fceec x86/pti: Document fix wrong index ... Browse Code »

In section , fix wrong index.

Signed-off-by: zhenwei.pi
Signed-off-by: Thomas Gleixner
Cc: dave.hansen@linux.intel.com
Link: https://lkml.kernel.org/r/1516237492-27739-1-git-send-email-zhenwei.pi@youruncloud.com

zhenwei.pi
2018-01-19 23:31:29 +0800

18 Jan, 2018

2 commits

aa55d5a4b x86/intel_rdt: Add L2CDP support in documentation ... Browse Code »

L2 and L3 Code and Data Prioritization (CDP) can be enabled separately.
The existing mount parameter "cdp" is only for enabling L3 CDP and will be
kept for backwards compability.

Add a new mount parameter 'cdpl2' for L2 CDP.

[ tglx: Made changelog readable ]

Signed-off-by: Fenghua Yu
Signed-off-by: Thomas Gleixner
Cc: "Ravi V Shankar"
Cc: "Tony Luck"
Cc: Vikas"
Cc: Sai Praneeth"
Cc: Reinette"
Link: https://lkml.kernel.org/r/1513810644-78015-3-git-send-email-fenghua.yu@intel.com

Fenghua Yu
2018-01-18 16:33:30 +0800
0ff8e080b x86/intel_rdt: Update documentation ... Browse Code »

With more flag bits in /proc/cpuinfo for RDT, it's better to classify the
bits for readability.

Some previously missing bits are added as well.

Signed-off-by: Fenghua Yu
Signed-off-by: Thomas Gleixner
Cc: "Ravi V Shankar"
Cc: "Tony Luck"
Cc: Vikas"
Cc: Sai Praneeth"
Cc: Reinette"
Link: https://lkml.kernel.org/r/1513810644-78015-2-git-send-email-fenghua.yu@intel.com

Fenghua Yu
2018-01-18 16:33:30 +0800

15 Jan, 2018

1 commit

40548c6b6 Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull x86 pti updates from Thomas Gleixner:
"This contains:

- a PTI bugfix to avoid setting reserved CR3 bits when PCID is
disabled. This seems to cause issues on a virtual machine at least
and is incorrect according to the AMD manual.

- a PTI bugfix which disables the perf BTS facility if PTI is
enabled. The BTS AUX buffer is not globally visible and causes the
CPU to fault when the mapping disappears on switching CR3 to user
space. A full fix which restores BTS on PTI is non trivial and will
be worked on.

- PTI bugfixes for EFI and trusted boot which make sure that the user
space visible page table entries have the NX bit cleared

- removal of dead code in the PTI pagetable setup functions

- add PTI documentation

- add a selftest for vsyscall to verify that the kernel actually
implements what it advertises.

- a sysfs interface to expose vulnerability and mitigation
information so there is a coherent way for users to retrieve the
status.

- the initial spectre_v2 mitigations, aka retpoline:

+ The necessary ASM thunk and compiler support

+ The ASM variants of retpoline and the conversion of affected ASM
code

+ Make LFENCE serializing on AMD so it can be used as speculation
trap

+ The RSB fill after vmexit

- initial objtool support for retpoline

As I said in the status mail this is the most of the set of patches
which should go into 4.15 except two straight forward patches still on
hold:

- the retpoline add on of LFENCE which waits for ACKs

- the RSB fill after context switch

Both should be ready to go early next week and with that we'll have
covered the major holes of spectre_v2 and go back to normality"

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (28 commits)
x86,perf: Disable intel_bts when PTI
security/Kconfig: Correct the Documentation reference for PTI
x86/pti: Fix !PCID and sanitize defines
selftests/x86: Add test_vsyscall
x86/retpoline: Fill return stack buffer on vmexit
x86/retpoline/irq32: Convert assembler indirect jumps
x86/retpoline/checksum32: Convert assembler indirect jumps
x86/retpoline/xen: Convert Xen hypercall indirect jumps
x86/retpoline/hyperv: Convert assembler indirect jumps
x86/retpoline/ftrace: Convert ftrace assembler indirect jumps
x86/retpoline/entry: Convert entry assembler indirect jumps
x86/retpoline/crypto: Convert crypto assembler indirect jumps
x86/spectre: Add boot time option to select Spectre v2 mitigation
x86/retpoline: Add initial retpoline support
objtool: Allow alternatives to be ignored
objtool: Detect jumps to retpoline thunks
x86/pti: Make unpoison of pgd for trusted boot work for real
x86/alternatives: Fix optimize_nops() checking
sysfs/cpu: Fix typos in vulnerability documentation
x86/cpu/AMD: Use LFENCE_RDTSC in preference to MFENCE_RDTSC
...

Linus Torvalds
2018-01-15 01:51:25 +0800

07 Jan, 2018

1 commit

01c9b17bf x86/Documentation: Add PTI description ... Browse Code »

Add some details about how PTI works, what some of the downsides
are, and how to debug it when things go wrong.

Also document the kernel parameter: 'pti/nopti'.

Signed-off-by: Dave Hansen
Signed-off-by: Thomas Gleixner
Reviewed-by: Randy Dunlap
Reviewed-by: Kees Cook
Cc: Moritz Lipp
Cc: Daniel Gruss
Cc: Michael Schwarz
Cc: Richard Fellner
Cc: Andy Lutomirski
Cc: Linus Torvalds
Cc: Hugh Dickins
Cc: Andi Lutomirsky
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180105174436.1BC6FA2B@viggo.jf.intel.com

Dave Hansen
2018-01-07 04:39:10 +0800

06 Jan, 2018

1 commit

abb7099db Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull more x86 pti fixes from Thomas Gleixner:
"Another small stash of fixes for fallout from the PTI work:

- Fix the modules vs. KASAN breakage which was caused by making
MODULES_END depend of the fixmap size. That was done when the cpu
entry area moved into the fixmap, but now that we have a separate
map space for that this is causing more issues than it solves.

- Use the proper cache flush methods for the debugstore buffers as
they are mapped/unmapped during runtime and not statically mapped
at boot time like the rest of the cpu entry area.

- Make the map layout of the cpu_entry_area consistent for 4 and 5
level paging and fix the KASLR vaddr_end wreckage.

- Use PER_CPU_EXPORT for per cpu variable and while at it unbreak
nvidia gfx drivers by dropping the GPL export. The subject line of
the commit tells it the other way around, but I noticed that too
late.

- Fix the ASM alternative macros so they can be used in the middle of
an inline asm block.

- Rename the BUG_CPU_INSECURE flag to BUG_CPU_MELTDOWN so the attack
vector is properly identified. The Spectre mitigations will come
with their own bug bits later"

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/pti: Rename BUG_CPU_INSECURE to BUG_CPU_MELTDOWN
x86/alternatives: Add missing '\n' at end of ALTERNATIVE inline asm
x86/tlb: Drop the _GPL from the cpu_tlbstate export
x86/events/intel/ds: Use the proper cache flush method for mapping ds buffers
x86/kaslr: Fix the vaddr_end mess
x86/mm: Map cpu_entry_area at the same place on 4/5 level
x86/mm: Set MODULES_END to 0xffffffffff000000

Linus Torvalds
2018-01-06 04:23:57 +0800

05 Jan, 2018

3 commits

1dddd2512 x86/kaslr: Fix the vaddr_end mess ... Browse Code »

vaddr_end for KASLR is only documented in the KASLR code itself and is
adjusted depending on config options. So it's not surprising that a change
of the memory layout causes KASLR to have the wrong vaddr_end. This can map
arbitrary stuff into other areas causing hard to understand problems.

Remove the whole ifdef magic and define the start of the cpu_entry_area to
be the end of the KASLR vaddr range.

Add documentation to that effect.

Fixes: 92a0f81d8957 ("x86/cpu_entry_area: Move it out of the fixmap")
Reported-by: Benjamin Gilbert
Signed-off-by: Thomas Gleixner
Tested-by: Benjamin Gilbert
Cc: Andy Lutomirski
Cc: Greg Kroah-Hartman
Cc: stable
Cc: Dave Hansen
Cc: Peter Zijlstra
Cc: Thomas Garnier ,
Cc: Alexander Kuleshov
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801041320360.1771@nanos

Thomas Gleixner
2018-01-05 07:39:57 +0800
f20789048 x86/mm: Map cpu_entry_area at the same place on 4/5 level ... Browse Code »

There is no reason for 4 and 5 level pagetables to have a different
layout. It just makes determining vaddr_end for KASLR harder than
necessary.

Fixes: 92a0f81d8957 ("x86/cpu_entry_area: Move it out of the fixmap")
Signed-off-by: Thomas Gleixner
Cc: Andy Lutomirski
Cc: Benjamin Gilbert
Cc: Greg Kroah-Hartman
Cc: stable
Cc: Dave Hansen
Cc: Peter Zijlstra
Cc: Thomas Garnier ,
Cc: Alexander Kuleshov
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801041320360.1771@nanos

Thomas Gleixner
2018-01-05 06:04:57 +0800
f5a40711f x86/mm: Set MODULES_END to 0xffffffffff000000 ... Browse Code »

Since f06bdd4001c2 ("x86/mm: Adapt MODULES_END based on fixmap section size")
kasan_mem_to_shadow(MODULES_END) could be not aligned to a page boundary.

So passing page unaligned address to kasan_populate_zero_shadow() have two
possible effects:

1) It may leave one page hole in supposed to be populated area. After commit
21506525fb8d ("x86/kasan/64: Teach KASAN about the cpu_entry_area") that
hole happens to be in the shadow covering fixmap area and leads to crash:

BUG: unable to handle kernel paging request at fffffbffffe8ee04
RIP: 0010:check_memory_region+0x5c/0x190

Call Trace:

memcpy+0x1f/0x50
ghes_copy_tofrom_phys+0xab/0x180
ghes_read_estatus+0xfb/0x280
ghes_notify_nmi+0x2b2/0x410
nmi_handle+0x115/0x2c0
default_do_nmi+0x57/0x110
do_nmi+0xf8/0x150
end_repeat_nmi+0x1a/0x1e

Note, the crash likely disappeared after commit 92a0f81d8957, which
changed kasan_populate_zero_shadow() call the way it was before
commit 21506525fb8d.

2) Attempt to load module near MODULES_END will fail, because
__vmalloc_node_range() called from kasan_module_alloc() will hit the
WARN_ON(!pte_none(*pte)) in the vmap_pte_range() and bail out with error.

To fix this we need to make kasan_mem_to_shadow(MODULES_END) page aligned
which means that MODULES_END should be 8*PAGE_SIZE aligned.

The whole point of commit f06bdd4001c2 was to move MODULES_END down if
NR_CPUS is big, so the cpu_entry_area takes a lot of space.
But since 92a0f81d8957 ("x86/cpu_entry_area: Move it out of the fixmap")
the cpu_entry_area is no longer in fixmap, so we could just set
MODULES_END to a fixed 8*PAGE_SIZE aligned address.

Fixes: f06bdd4001c2 ("x86/mm: Adapt MODULES_END based on fixmap section size")
Reported-by: Jakub Kicinski
Signed-off-by: Andrey Ryabinin
Signed-off-by: Thomas Gleixner
Cc: stable@vger.kernel.org
Cc: Andy Lutomirski
Cc: Thomas Garnier
Link: https://lkml.kernel.org/r/20171228160620.23818-1-aryabinin@virtuozzo.com

Andrey Ryabinin
2018-01-05 06:04:57 +0800

30 Dec, 2017

1 commit

5aa90a845 Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull x86 page table isolation updates from Thomas Gleixner:
"This is the final set of enabling page table isolation on x86:

- Infrastructure patches for handling the extra page tables.

- Patches which map the various bits and pieces which are required to
get in and out of user space into the user space visible page
tables.

- The required changes to have CR3 switching in the entry/exit code.

- Optimizations for the CR3 switching along with documentation how
the ASID/PCID mechanism works.

- Updates to dump pagetables to cover the user space page tables for
W+X scans and extra debugfs files to analyze both the kernel and
the user space visible page tables

The whole functionality is compile time controlled via a config switch
and can be turned on/off on the command line as well"

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits)
x86/ldt: Make the LDT mapping RO
x86/mm/dump_pagetables: Allow dumping current pagetables
x86/mm/dump_pagetables: Check user space page table for WX pages
x86/mm/dump_pagetables: Add page table directory to the debugfs VFS hierarchy
x86/mm/pti: Add Kconfig
x86/dumpstack: Indicate in Oops whether PTI is configured and enabled
x86/mm: Clarify the whole ASID/kernel PCID/user PCID naming
x86/mm: Use INVPCID for __native_flush_tlb_single()
x86/mm: Optimize RESTORE_CR3
x86/mm: Use/Fix PCID to optimize user/kernel switches
x86/mm: Abstract switching CR3
x86/mm: Allow flushing for future ASID switches
x86/pti: Map the vsyscall page if needed
x86/pti: Put the LDT in its own PGD if PTI is on
x86/mm/64: Make a full PGD-entry size hole in the memory map
x86/events/intel/ds: Map debug buffers in cpu_entry_area
x86/cpu_entry_area: Add debugstore entries to cpu_entry_area
x86/mm/pti: Map ESPFIX into user space
x86/mm/pti: Share entry text PMD
x86/entry: Align entry text section to PMD boundary
...

Linus Torvalds
2017-12-30 09:02:49 +0800

24 Dec, 2017

1 commit

f55f0501c x86/pti: Put the LDT in its own PGD if PTI is on ... Browse Code »

With PTI enabled, the LDT must be mapped in the usermode tables somewhere.
The LDT is per process, i.e. per mm.

An earlier approach mapped the LDT on context switch into a fixmap area,
but that's a big overhead and exhausted the fixmap space when NR_CPUS got
big.

Take advantage of the fact that there is an address space hole which
provides a completely unused pgd. Use this pgd to manage per-mm LDT
mappings.

This has a down side: the LDT isn't (currently) randomized, and an attack
that can write the LDT is instant root due to call gates (thanks, AMD, for
leaving call gates in AMD64 but designing them wrong so they're only useful
for exploits). This can be mitigated by making the LDT read-only or
randomizing the mapping, either of which is strightforward on top of this
patch.

This will significantly slow down LDT users, but that shouldn't matter for
important workloads -- the LDT is only used by DOSEMU(2), Wine, and very
old libc implementations.

[ tglx: Cleaned it up. ]

Signed-off-by: Andy Lutomirski
Signed-off-by: Thomas Gleixner
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Dave Hansen
Cc: Dave Hansen
Cc: David Laight
Cc: H. Peter Anvin
Cc: Josh Poimboeuf
Cc: Juergen Gross
Cc: Kees Cook
Cc: Kirill A. Shutemov
Cc: Linus Torvalds
Cc: Peter Zijlstra
Signed-off-by: Ingo Molnar

Andy Lutomirski
2017-12-24 04:13:00 +0800