29 Oct, 2018
1 commit
-
Interactive governor has lived in Android sources for a very long time
and this commit is based on the code present in following branch:https://android.googlesource.com/kernel/common android-4.4
The Interactive governor is designed for latency-sensitive workloads,
such as interactive user interfaces like the mobile phones and tablets.
The interactive governor aims to be significantly more responsive to
ramp CPU quickly up when CPU-intensive activity begins.Existing governors sample CPU load at a particular rate, typically every
X ms and then update the frequency from a work-handler. This can lead
to under-powering UI threads for the period of time during which the
user begins interacting with a previously-idle system until the next
sample period happens.The 'interactive' governor uses a different approach.
A real-time thread is used for scaling up, giving the remaining tasks
the CPU performance benefit, unlike existing governors which are more
likely to schedule ramp-up work to occur after your performance starved
tasks have completed.The Android version of interactive governor also checks whether to scale
the CPU frequency up soon after coming out of idle. When the CPU comes
out of idle, the governor check if the CPU sampling is overdue or not.
If yes, it immediately starts the sampling. Otherwise, the utilization
hooks from the scheduler handle the sampling later. If the CPU is very
busy from exiting idle to when the evaluation happens, then it assumes
that the CPU is under-powered and ramps it to MAX speed.If the CPU was not sufficiently busy to immediately ramp to MAX speed,
then the governor evaluates the CPU load since the last speed
adjustment, choosing the highest value between that longer-term load or
the short-term load since idle exit to determine the CPU speed to ramp
to.Idle notifiers will be be handled later and are not included for now.
The core of this code is written and maintained (in Android
repositories) by Mike Chan and Todd Poyner over a long period of time.Vireshk has made changes to to the governor to align it with the current
practices followed with mainline governors, like using utilization hooks
from the scheduler and handling kobject (for governor's sysfs directory)
in a race free manner. And of course this included general cleanup of
the governor as well.Signed-off-by: Mike Chan
Signed-off-by: Todd Poynor
Signed-off-by: Viresh Kumar
16 Aug, 2018
9 commits
-
commit 5b76a3cff011df2dcb6186c965a2e4d809a05ad4 upstream
When nested virtualization is in use, VMENTER operations from the nested
hypervisor into the nested guest will always be processed by the bare metal
hypervisor, and KVM's "conditional cache flushes" mode in particular does a
flush on nested vmentry. Therefore, include the "skip L1D flush on
vmentry" bit in KVM's suggested ARCH_CAPABILITIES setting.Add the relevant Documentation.
Signed-off-by: Paolo Bonzini
Signed-off-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman -
commit 58331136136935c631c2b5f06daf4c3006416e91 upstream
Dave reported, that it's not confirmed that Yonah processors are
unaffected. Remove them from the list.Reported-by: ave Hansen
Signed-off-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman -
commit 1949f9f49792d65dba2090edddbe36a5f02e3ba3 upstream
Fix spelling and other typos
Signed-off-by: Tony Luck
Signed-off-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman -
commit 3ec8ce5d866ec6a08a9cfab82b62acf4a830b35f upstream
Add documentation for the L1TF vulnerability and the mitigation mechanisms:
- Explain the problem and risks
- Document the mitigation mechanisms
- Document the command line controls
- Document the sysfs filesSigned-off-by: Thomas Gleixner
Reviewed-by: Greg Kroah-Hartman
Reviewed-by: Josh Poimboeuf
Acked-by: Linus Torvalds
Link: https://lkml.kernel.org/r/20180713142323.287429944@linutronix.de
Signed-off-by: Greg Kroah-Hartman -
commit d90a7a0ec83fb86622cd7dae23255d3c50a99ec8 upstream
Introduce the 'l1tf=' kernel command line option to allow for boot-time
switching of mitigation that is used on processors affected by L1TF.The possible values are:
full
Provides all available mitigations for the L1TF vulnerability. Disables
SMT and enables all mitigations in the hypervisors. SMT control via
/sys/devices/system/cpu/smt/control is still possible after boot.
Hypervisors will issue a warning when the first VM is started in
a potentially insecure configuration, i.e. SMT enabled or L1D flush
disabled.full,force
Same as 'full', but disables SMT control. Implies the 'nosmt=force'
command line option. sysfs control of SMT and the hypervisor flush
control is disabled.flush
Leaves SMT enabled and enables the conditional hypervisor mitigation.
Hypervisors will issue a warning when the first VM is started in a
potentially insecure configuration, i.e. SMT enabled or L1D flush
disabled.flush,nosmt
Disables SMT and enables the conditional hypervisor mitigation. SMT
control via /sys/devices/system/cpu/smt/control is still possible
after boot. If SMT is reenabled or flushing disabled at runtime
hypervisors will issue a warning.flush,nowarn
Same as 'flush', but hypervisors will not warn when
a VM is started in a potentially insecure configuration.off
Disables hypervisor mitigations and doesn't emit any warnings.Default is 'flush'.
Let KVM adhere to these semantics, which means:
- 'lt1f=full,force' : Performe L1D flushes. No runtime control
possible.- 'l1tf=full'
- 'l1tf-flush'
- 'l1tf=flush,nosmt' : Perform L1D flushes and warn on VM start if
SMT has been runtime enabled or L1D flushing
has been run-time enabled- 'l1tf=flush,nowarn' : Perform L1D flushes and no warnings are emitted.
- 'l1tf=off' : L1D flushes are not performed and no warnings
are emitted.KVM can always override the L1D flushing behavior using its 'vmentry_l1d_flush'
module parameter except when lt1f=full,force is set.This makes KVM's private 'nosmt' option redundant, and as it is a bit
non-systematic anyway (this is something to control globally, not on
hypervisor level), remove that option.Add the missing Documentation entry for the l1tf vulnerability sysfs file
while at it.Signed-off-by: Jiri Kosina
Signed-off-by: Thomas Gleixner
Tested-by: Jiri Kosina
Reviewed-by: Greg Kroah-Hartman
Reviewed-by: Josh Poimboeuf
Link: https://lkml.kernel.org/r/20180713142323.202758176@linutronix.de
Signed-off-by: Greg Kroah-Hartman -
commit a399477e52c17e148746d3ce9a483f681c2aa9a0 upstream
Add a mitigation mode parameter "vmentry_l1d_flush" for CVE-2018-3620, aka
L1 terminal fault. The valid arguments are:- "always" L1D cache flush on every VMENTER.
- "cond" Conditional L1D cache flush, explained below
- "never" Disable the L1D cache flush mitigation"cond" is trying to avoid L1D cache flushes on VMENTER if the code executed
between VMEXIT and VMENTER is considered safe, i.e. is not bringing any
interesting information into L1D which might exploited.[ tglx: Split out from a larger patch ]
Signed-off-by: Konrad Rzeszutek Wilk
Signed-off-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman -
commit 26acfb666a473d960f0fd971fe68f3e3ad16c70b upstream
If the L1TF CPU bug is present we allow the KVM module to be loaded as the
major of users that use Linux and KVM have trusted guests and do not want a
broken setup.Cloud vendors are the ones that are uncomfortable with CVE 2018-3620 and as
such they are the ones that should set nosmt to one.Setting 'nosmt' means that the system administrator also needs to disable
SMT (Hyper-threading) in the BIOS, or via the 'nosmt' command line
parameter, or via the /sys/devices/system/cpu/smt/control. See commit
05736e4ac13c ("cpu/hotplug: Provide knobs to control SMT").Other mitigations are to use task affinity, cpu sets, interrupt binding,
etc - anything to make sure that _only_ the same guests vCPUs are running
on sibling threads.Signed-off-by: Konrad Rzeszutek Wilk
Signed-off-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman -
commit 506a66f374891ff08e064a058c446b336c5ac760 upstream
Dave Hansen reported, that it's outright dangerous to keep SMT siblings
disabled completely so they are stuck in the BIOS and wait for SIPI.The reason is that Machine Check Exceptions are broadcasted to siblings and
the soft disabled sibling has CR4.MCE = 0. If a MCE is delivered to a
logical core with CR4.MCE = 0, it asserts IERR#, which shuts down or
reboots the machine. The MCE chapter in the SDM contains the following
blurb:Because the logical processors within a physical package are tightly
coupled with respect to shared hardware resources, both logical
processors are notified of machine check errors that occur within a
given physical processor. If machine-check exceptions are enabled when
a fatal error is reported, all the logical processors within a physical
package are dispatched to the machine-check exception handler. If
machine-check exceptions are disabled, the logical processors enter the
shutdown state and assert the IERR# signal. When enabling machine-check
exceptions, the MCE flag in control register CR4 should be set for each
logical processor.Reverting the commit which ignores siblings at enumeration time solves only
half of the problem. The core cpuhotplug logic needs to be adjusted as
well.This thoughtful engineered mechanism also turns the boot process on all
Intel HT enabled systems into a MCE lottery. MCE is enabled on the boot CPU
before the secondary CPUs are brought up. Depending on the number of
physical cores the window in which this situation can happen is smaller or
larger. On a HSW-EX it's about 750ms:MCE is enabled on the boot CPU:
[ 0.244017] mce: CPU supports 22 MCE banks
The corresponding sibling #72 boots:
[ 1.008005] .... node #0, CPUs: #72
That means if an MCE hits on physical core 0 (logical CPUs 0 and 72)
between these two points the machine is going to shutdown. At least it's a
known safe state.It's obvious that the early boot can be hit by an MCE as well and then runs
into the same situation because MCEs are not yet enabled on the boot CPU.
But after enabling them on the boot CPU, it does not make any sense to
prevent the kernel from recovering.Adjust the nosmt kernel parameter documentation as well.
Reverts: 2207def700f9 ("x86/apic: Ignore secondary threads if nosmt=force")
Reported-by: Dave Hansen
Signed-off-by: Thomas Gleixner
Tested-by: Tony Luck
Signed-off-by: Greg Kroah-Hartman -
commit 05736e4ac13c08a4a9b1ef2de26dd31a32cbee57 upstream
Provide a command line and a sysfs knob to control SMT.
The command line options are:
'nosmt': Enumerate secondary threads, but do not online them
'nosmt=force': Ignore secondary threads completely during enumeration
via MP table and ACPI/MADT.The sysfs control file has the following states (read/write):
'on': SMT is enabled. Secondary threads can be freely onlined
'off': SMT is disabled. Secondary threads, even if enumerated
cannot be onlined
'forceoff': SMT is permanentely disabled. Writes to the control
file are rejected.
'notsupported': SMT is not supported by the CPUThe command line option 'nosmt' sets the sysfs control to 'off'. This
can be changed to 'on' to reenable SMT during runtime.The command line option 'nosmt=force' sets the sysfs control to
'forceoff'. This cannot be changed during runtime.When SMT is 'on' and the control file is changed to 'off' then all online
secondary threads are offlined and attempts to online a secondary thread
later on are rejected.When SMT is 'off' and the control file is changed to 'on' then secondary
threads can be onlined again. The 'off' -> 'on' transition does not
automatically online the secondary threads.When the control file is set to 'forceoff', the behaviour is the same as
setting it to 'off', but the operation is irreversible and later writes to
the control file are rejected.When the control status is 'notsupported' then writes to the control file
are rejected.Signed-off-by: Thomas Gleixner
Reviewed-by: Konrad Rzeszutek Wilk
Acked-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman
22 Jul, 2018
1 commit
-
commit a43ae4dfe56a01f5b98ba0cb2f784b6a43bafcc6 upstream.
On a system where the firmware implements ARCH_WORKAROUND_2,
it may be useful to either permanently enable or disable the
workaround for cases where the user decides that they'd rather
not get a trap overhead, and keep the mitigation permanently
on or off instead of switching it on exception entry/exit.In any case, default to the mitigation being enabled.
Reviewed-by: Julien Grall
Reviewed-by: Mark Rutland
Acked-by: Will Deacon
Signed-off-by: Marc Zyngier
Signed-off-by: Catalin Marinas
Signed-off-by: Marc Zyngier
Signed-off-by: Greg Kroah-Hartman
23 May, 2018
3 commits
-
commit f21b53b20c754021935ea43364dbf53778eeba32 upstream
Unless explicitly opted out of, anything running under seccomp will have
SSB mitigations enabled. Choosing the "prctl" mode will disable this.[ tglx: Adjusted it to the new arch_seccomp_spec_mitigate() mechanism ]
Signed-off-by: Kees Cook
Signed-off-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman -
commit a73ec77ee17ec556fe7f165d00314cb7c047b1ac upstream
Add prctl based control for Speculative Store Bypass mitigation and make it
the default mitigation for Intel and AMD.Andi Kleen provided the following rationale (slightly redacted):
There are multiple levels of impact of Speculative Store Bypass:
1) JITed sandbox.
It cannot invoke system calls, but can do PRIME+PROBE and may have call
interfaces to other code2) Native code process.
No protection inside the process at this level.3) Kernel.
4) Between processes.
The prctl tries to protect against case (1) doing attacks.
If the untrusted code can do random system calls then control is already
lost in a much worse way. So there needs to be system call protection in
some way (using a JIT not allowing them or seccomp). Or rather if the
process can subvert its environment somehow to do the prctl it can already
execute arbitrary code, which is much worse than SSB.To put it differently, the point of the prctl is to not allow JITed code
to read data it shouldn't read from its JITed sandbox. If it already has
escaped its sandbox then it can already read everything it wants in its
address space, and do much worse.The ability to control Speculative Store Bypass allows to enable the
protection selectively without affecting overall system performance.Based on an initial patch from Tim Chen. Completely rewritten.
Signed-off-by: Thomas Gleixner
Reviewed-by: Konrad Rzeszutek Wilk
Signed-off-by: Greg Kroah-Hartman -
commit 24f7fc83b9204d20f878c57cb77d261ae825e033 upstream
Contemporary high performance processors use a common industry-wide
optimization known as "Speculative Store Bypass" in which loads from
addresses to which a recent store has occurred may (speculatively) see an
older value. Intel refers to this feature as "Memory Disambiguation" which
is part of their "Smart Memory Access" capability.Memory Disambiguation can expose a cache side-channel attack against such
speculatively read values. An attacker can create exploit code that allows
them to read memory outside of a sandbox environment (for example,
malicious JavaScript in a web page), or to perform more complex attacks
against code running within the same privilege level, e.g. via the stack.As a first step to mitigate against such attacks, provide two boot command
line control knobs:nospec_store_bypass_disable
spec_store_bypass_disable=[off,auto,on]By default affected x86 processors will power on with Speculative
Store Bypass enabled. Hence the provided kernel parameters are written
from the point of view of whether to enable a mitigation or not.
The parameters are as follows:- auto - Kernel detects whether your CPU model contains an implementation
of Speculative Store Bypass and picks the most appropriate
mitigation.- on - disable Speculative Store Bypass
- off - enable Speculative Store Bypass[ tglx: Reordered the checks so that the whole evaluation is not done
when the CPU does not support RDS ]Signed-off-by: Konrad Rzeszutek Wilk
Signed-off-by: Thomas Gleixner
Reviewed-by: Borislav Petkov
Reviewed-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman
29 Apr, 2018
1 commit
-
[ Upstream commit 686140a1a9c41d85a4212a1c26d671139b76404b ]
Implement CPU alternatives, which allows to optionally patch newer
instructions at runtime, based on CPU facilities availability.A new kernel boot parameter "noaltinstr" disables patching.
Current implementation is derived from x86 alternatives. Although
ideal instructions padding (when altinstr is longer then oldinstr)
is added at compile time, and no oldinstr nops optimization has to be
done at runtime. Also couple of compile time sanity checks are done:
1. oldinstr and altinstr must be
Signed-off-by: Martin Schwidefsky
Signed-off-by: Greg Kroah-Hartman
22 Feb, 2018
1 commit
-
commit 4675ff05de2d76d167336b368bd07f3fef6ed5a6 upstream.
Fix up makefiles, remove references, and git rm kmemcheck.
Link: http://lkml.kernel.org/r/20171007030159.22241-4-alexander.levin@verizon.com
Signed-off-by: Sasha Levin
Cc: Steven Rostedt
Cc: Vegard Nossum
Cc: Pekka Enberg
Cc: Michal Hocko
Cc: Eric W. Biederman
Cc: Alexander Potapenko
Cc: Tim Hansen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman
08 Feb, 2018
1 commit
-
commit 12c69f1e94c89d40696e83804dd2f0965b5250cd
The 'noreplace-paravirt' option disables paravirt patching, leaving the
original pv indirect calls in place.That's highly incompatible with retpolines, unless we want to uglify
paravirt even further and convert the paravirt calls to retpolines.As far as I can tell, the option doesn't seem to be useful for much
other than introducing surprising corner cases and making the kernel
vulnerable to Spectre v2. It was probably a debug option from the early
paravirt days. So just remove it.Signed-off-by: Josh Poimboeuf
Signed-off-by: Thomas Gleixner
Reviewed-by: Juergen Gross
Cc: Andrea Arcangeli
Cc: Peter Zijlstra
Cc: Andi Kleen
Cc: Ashok Raj
Cc: Greg KH
Cc: Jun Nakajima
Cc: Tim Chen
Cc: Rusty Russell
Cc: Dave Hansen
Cc: Asit Mallick
Cc: Andy Lutomirski
Cc: Linus Torvalds
Cc: Jason Baron
Cc: Paolo Bonzini
Cc: Alok Kataria
Cc: Arjan Van De Ven
Cc: David Woodhouse
Cc: Dan Williams
Link: https://lkml.kernel.org/r/20180131041333.2x6blhxirc2kclrq@treble
Signed-off-by: Greg Kroah-Hartman
17 Jan, 2018
2 commits
-
commit da285121560e769cc31797bba6422eea71d473e0 upstream.
Add a spectre_v2= option to select the mitigation used for the indirect
branch speculation vulnerability.Currently, the only option available is retpoline, in its various forms.
This will be expanded to cover the new IBRS/IBPB microcode features.The RETPOLINE_AMD feature relies on a serializing LFENCE for speculation
control. For AMD hardware, only set RETPOLINE_AMD if LFENCE is a
serializing instruction, which is indicated by the LFENCE_RDTSC feature.[ tglx: Folded back the LFENCE/AMD fixes and reworked it so IBRS
integration becomes simple ]Signed-off-by: David Woodhouse
Signed-off-by: Thomas Gleixner
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: Rik van Riel
Cc: Andi Kleen
Cc: Josh Poimboeuf
Cc: thomas.lendacky@amd.com
Cc: Peter Zijlstra
Cc: Linus Torvalds
Cc: Jiri Kosina
Cc: Andy Lutomirski
Cc: Dave Hansen
Cc: Kees Cook
Cc: Tim Chen
Cc: Greg Kroah-Hartman
Cc: Paul Turner
Link: https://lkml.kernel.org/r/1515707194-20531-5-git-send-email-dwmw@amazon.co.uk
Signed-off-by: Greg Kroah-Hartman -
commit 01c9b17bf673b05bb401b76ec763e9730ccf1376 upstream.
Add some details about how PTI works, what some of the downsides
are, and how to debug it when things go wrong.Also document the kernel parameter: 'pti/nopti'.
Signed-off-by: Dave Hansen
Signed-off-by: Thomas Gleixner
Reviewed-by: Randy Dunlap
Reviewed-by: Kees Cook
Cc: Moritz Lipp
Cc: Daniel Gruss
Cc: Michael Schwarz
Cc: Richard Fellner
Cc: Andy Lutomirski
Cc: Linus Torvalds
Cc: Hugh Dickins
Cc: Andi Lutomirsky
Link: https://lkml.kernel.org/r/20180105174436.1BC6FA2B@viggo.jf.intel.com
Signed-off-by: Greg Kroah-Hartman
03 Jan, 2018
2 commits
-
commit 41f4c20b57a4890ea7f56ff8717cc83fefb8d537 upstream.
Keep the "nopti" optional for traditional reasons.
[ tglx: Don't allow force on when running on XEN PV and made 'on'
printout conditional ]Requested-by: Linus Torvalds
Signed-off-by: Borislav Petkov
Signed-off-by: Thomas Gleixner
Cc: Andy Lutomirski
Cc: Andy Lutomirsky
Cc: Boris Ostrovsky
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Dave Hansen
Cc: Dave Hansen
Cc: David Laight
Cc: Denys Vlasenko
Cc: Eduardo Valentin
Cc: Greg KH
Cc: H. Peter Anvin
Cc: Josh Poimboeuf
Cc: Juergen Gross
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Will Deacon
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Link: https://lkml.kernel.org/r/20171212133952.10177-1-bp@alien8.de
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman -
commit aa8c6248f8c75acfd610fe15d8cae23cf70d9d09 upstream.
Add the initial files for kernel page table isolation, with a minimal init
function and the boot time detection for this misfeature.Signed-off-by: Thomas Gleixner
Reviewed-by: Borislav Petkov
Cc: Andy Lutomirski
Cc: Boris Ostrovsky
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Dave Hansen
Cc: David Laight
Cc: Denys Vlasenko
Cc: Eduardo Valentin
Cc: Greg KH
Cc: H. Peter Anvin
Cc: Josh Poimboeuf
Cc: Juergen Gross
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Will Deacon
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman
13 Sep, 2017
1 commit
-
Pull selinux updates from Paul Moore:
"A relatively quiet period for SELinux, 11 patches with only two/three
having any substantive changes.These noteworthy changes include another tweak to the NNP/nosuid
handling, per-file labeling for cgroups, and an object class fix for
AF_UNIX/SOCK_RAW sockets; the rest of the changes are minor tweaks or
administrative updates (Stephen's email update explains the file
explosion in the diffstat).Everything passes the selinux-testsuite"
[ Also a couple of small patches from the security tree from Tetsuo
Handa for Tomoyo and LSM cleanup. The separation of security policy
updates wasn't all that clean - Linus ]* tag 'selinux-pr-20170831' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: constify nf_hook_ops
selinux: allow per-file labeling for cgroupfs
lsm_audit: update my email address
selinux: update my email address
MAINTAINERS: update the NetLabel and Labeled Networking information
selinux: use GFP_NOWAIT in the AVC kmem_caches
selinux: Generalize support for NNP/nosuid SELinux domain transitions
selinux: genheaders should fail if too many permissions are defined
selinux: update the selinux info in MAINTAINERS
credits: update Paul Moore's info
selinux: Assign proper class to PF_UNIX/SOCK_RAW sockets
tomoyo: Update URLs in Documentation/admin-guide/LSM/tomoyo.rst
LSM: Remove security_task_create() hook.
09 Sep, 2017
1 commit
-
Pull ARC updates from Vineet Gupta:
- Support for HSDK board hosting a Quad core HS38x4 based SoC running
@1GHz (and some prerrquisite changes such as ability to scoot the
kernel code/data from start of memory map etc)- Quite a few updates for EZChip (Mellanox) platform
- Fixes to fault/exception printing
* tag 'arc-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: (26 commits)
ARC: Re-enable MMU upon Machine Check exception
ARC: Show fault information passed to show_kernel_fault_diag()
ARC: [plat-hsdk] initial port for HSDK board
ARC: mm: Decouple RAM base address from kernel link address
ARCv2: IOC: Tighten up the contraints (specifically base / size alignment)
ARC: [plat-axs103] refactor the DT fudging code
ARC: [plat-axs103] use clk driver #2: Add core pll node to DT to manage cpu clk
ARC: [plat-axs103] use clk driver #1: Get rid of platform specific cpu clk setting
ARCv2: SLC: provide a line based flush routine for debugging
ARC: Hardcode ARCH_DMA_MINALIGN to max line length we may have
ARC: [plat-eznps] handle extra aux regs #2: kernel/entry exit
ARC: [plat-eznps] handle extra aux regs #1: save/restore on context switch
ARC: [plat-eznps] avoid toggling of DPC register
ARC: [plat-eznps] Update the init sequence of aux regs per cpu.
ARC: [plat-eznps] new command line argument for HW scheduler at MTM
ARC: set boot print log level to PR_INFO
ARC: [plat-eznps] Handle user memory error same in simulation and silicon
ARC: [plat-eznps] use schd.wft instruction instead of sleep at idle task
ARC: create cpu specific version of arch_cpu_idle()
ARC: [plat-eznps] spinlock aware for MTM
...
07 Sep, 2017
1 commit
-
Patch series "cleanup zonelists initialization", v1.
This is aimed at cleaning up the zonelists initialization code we have
but the primary motivation was bug report [2] which got resolved but the
usage of stop_machine is just too ugly to live. Most patches are
straightforward but 3 of them need a special consideration.Patch 1 removes zone ordered zonelists completely. I am CCing linux-api
because this is a user visible change. As I argue in the patch
description I do not think we have a strong usecase for it these days.
I have kept sysctl in place and warn into the log if somebody tries to
configure zone lists ordering. If somebody has a real usecase for it we
can revert this patch but I do not expect anybody will actually notice
runtime differences. This patch is not strictly needed for the rest but
it made patch 6 easier to implement.Patch 7 removes stop_machine from build_all_zonelists without adding any
special synchronization between iterators and updater which I _believe_
is acceptable as explained in the changelog. I hope I am not missing
anything.Patch 8 then removes zonelists_mutex which is kind of ugly as well and
not really needed AFAICS but a care should be taken when double checking
my thinking.This patch (of 9):
Supporting zone ordered zonelists costs us just a lot of code while the
usefulness is arguable if existent at all. Mel has already made node
ordering default on 64b systems. 32b systems are still using
ZONELIST_ORDER_ZONE because it is considered better to fallback to a
different NUMA node rather than consume precious lowmem zones.This argument is, however, weaken by the fact that the memory reclaim
has been reworked to be node rather than zone oriented. This means that
lowmem requests have to skip over all highmem pages on LRUs already and
so zone ordering doesn't save the reclaim time much. So the only
advantage of the zone ordering is under a light memory pressure when
highmem requests do not ever hit into lowmem zones and the lowmem
pressure doesn't need to reclaim.Considering that 32b NUMA systems are rather suboptimal already and it
is generally advisable to use 64b kernel on such a HW I believe we
should rather care about the code maintainability and just get rid of
ZONELIST_ORDER_ZONE altogether. Keep systcl in place and warn if
somebody tries to set zone ordering either from kernel command line or
the sysctl.[mhocko@suse.com: reading vm.numa_zonelist_order will never terminate]
Link: http://lkml.kernel.org/r/20170721143915.14161-2-mhocko@kernel.org
Signed-off-by: Michal Hocko
Acked-by: Mel Gorman
Acked-by: Vlastimil Babka
Cc: Johannes Weiner
Cc: Joonsoo Kim
Cc: Shaohua Li
Cc: Toshi Kani
Cc: Abdul Haleem
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
06 Sep, 2017
3 commits
-
Pull power management updates from Rafael Wysocki:
"This time (again) cpufreq gets the majority of changes which mostly
are driver updates (including a major consolidation of intel_pstate),
some schedutil governor modifications and core cleanups.There also are some changes in the system suspend area, mostly related
to diagnostics and debug messages plus some renames of things related
to suspend-to-idle. One major change here is that suspend-to-idle is
now going to be preferred over S3 on systems where the ACPI tables
indicate to do so and provide requsite support (the Low Power Idle S0
_DSM in particular). The system sleep documentation and the tools
related to it are updated too.The rest is a few cpuidle changes (nothing major), devfreq updates,
generic power domains (genpd) framework updates and a few assorted
modifications elsewhere.Specifics:
- Drop the P-state selection algorithm based on a PID controller from
intel_pstate and make it use the same P-state selection method
(based on the CPU load) for all types of systems in the active mode
(Rafael Wysocki, Srinivas Pandruvada).- Rework the cpufreq core and governors to make it possible to take
cross-CPU utilization updates into account and modify the schedutil
governor to actually do so (Viresh Kumar).- Clean up the handling of transition latency information in the
cpufreq core and untangle it from the information on which drivers
cannot do dynamic frequency switching (Viresh Kumar).- Add support for new SoCs (MT2701/MT7623 and MT7622) to the mediatek
cpufreq driver and update its DT bindings (Sean Wang).- Modify the cpufreq dt-platdev driver to autimatically create
cpufreq devices for the new (v2) Operating Performance Points (OPP)
DT bindings and update its whitelist of supported systems (Viresh
Kumar, Shubhrajyoti Datta, Marc Gonzalez, Khiem Nguyen, Finley
Xiao).- Add support for Ux500 to the cpufreq-dt driver and drop the
obsolete dbx500 cpufreq driver (Linus Walleij, Arnd Bergmann).- Add new SoC (R8A7795) support to the cpufreq rcar driver (Khiem
Nguyen).- Fix and clean up assorted issues in the cpufreq drivers and core
(Arvind Yadav, Christophe Jaillet, Colin Ian King, Gustavo Silva,
Julia Lawall, Leonard Crestez, Rob Herring, Sudeep Holla).- Update the IO-wait boost handling in the schedutil governor to make
it less aggressive (Joel Fernandes).- Rework system suspend diagnostics to make it print fewer messages
to the kernel log by default, add a sysfs knob to allow more
suspend-related messages to be printed and add Low Power S0 Idle
constraints checks to the ACPI suspend-to-idle code (Rafael
Wysocki, Srinivas Pandruvada).- Prefer suspend-to-idle over S3 on ACPI-based systems with the
ACPI_FADT_LOW_POWER_S0 flag set and the Low Power Idle S0 _DSM
interface present in the ACPI tables (Rafael Wysocki).- Update documentation related to system sleep and rename a number of
items in the code to make it cleare that they are related to
suspend-to-idle (Rafael Wysocki).- Export a variable allowing device drivers to check the target
system sleep state from the core system suspend code (Florian
Fainelli).- Clean up the cpuidle subsystem to handle the polling state on x86
in a more straightforward way and to use %pOF instead of full_name
(Rafael Wysocki, Rob Herring).- Update the devfreq framework to fix and clean up a few minor issues
(Chanwoo Choi, Rob Herring).- Extend diagnostics in the generic power domains (genpd) framework
and clean it up slightly (Thara Gopinath, Rob Herring).- Fix and clean up a couple of issues in the operating performance
points (OPP) framework (Viresh Kumar, Waldemar Rymarkiewicz).- Add support for RV1108 to the rockchip-io Adaptive Voltage Scaling
(AVS) driver (David Wu).- Fix the usage of notifiers in CPU power management on some
platforms (Alex Shi).- Update the pm-graph system suspend/hibernation and boot profiling
utility (Todd Brandt).- Make it possible to run the cpupower utility without CPU0 (Prarit
Bhargava)"* tag 'pm-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (87 commits)
cpuidle: Make drivers initialize polling state
cpuidle: Move polling state initialization code to separate file
cpuidle: Eliminate the CPUIDLE_DRIVER_STATE_START symbol
cpufreq: imx6q: Fix imx6sx low frequency support
cpufreq: speedstep-lib: make several arrays static, makes code smaller
PM: docs: Delete the obsolete states.txt document
PM: docs: Describe high-level PM strategies and sleep states
PM / devfreq: Fix memory leak when fail to register device
PM / devfreq: Add dependency on PM_OPP
PM / devfreq: Move private devfreq_update_stats() into devfreq
PM / devfreq: Convert to using %pOF instead of full_name
PM / AVS: rockchip-io: add io selectors and supplies for RV1108
cpufreq: ti: Fix 'of_node_put' being called twice in error handling path
cpufreq: dt-platdev: Drop few entries from whitelist
cpufreq: dt-platdev: Automatically create cpufreq device with OPP v2
ARM: ux500: don't select CPUFREQ_DT
cpuidle: Convert to using %pOF instead of full_name
cpufreq: Convert to using %pOF instead of full_name
PM / Domains: Convert to using %pOF instead of full_name
cpufreq: Cap the default transition delay value to 10 ms
... -
Pull char/misc driver updates from Greg KH:
"Here is the big char/misc driver update for 4.14-rc1.Lots of different stuff in here, it's been an active development cycle
for some reason. Highlights are:- updated binder driver, this brings binder up to date with what
shipped in the Android O release, plus some more changes that
happened since then that are in the Android development trees.- coresight updates and fixes
- mux driver file renames to be a bit "nicer"
- intel_th driver updates
- normal set of hyper-v updates and changes
- small fpga subsystem and driver updates
- lots of const code changes all over the driver trees
- extcon driver updates
- fmc driver subsystem upadates
- w1 subsystem minor reworks and new features and drivers added
- spmi driver updates
Plus a smattering of other minor driver updates and fixes.
All of these have been in linux-next with no reported issues for a
while"* tag 'char-misc-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (244 commits)
ANDROID: binder: don't queue async transactions to thread.
ANDROID: binder: don't enqueue death notifications to thread todo.
ANDROID: binder: Don't BUG_ON(!spin_is_locked()).
ANDROID: binder: Add BINDER_GET_NODE_DEBUG_INFO ioctl
ANDROID: binder: push new transactions to waiting threads.
ANDROID: binder: remove proc waitqueue
android: binder: Add page usage in binder stats
android: binder: fixup crash introduced by moving buffer hdr
drivers: w1: add hwmon temp support for w1_therm
drivers: w1: refactor w1_slave_show to make the temp reading functionality separate
drivers: w1: add hwmon support structures
eeprom: idt_89hpesx: Support both ACPI and OF probing
mcb: Fix an error handling path in 'chameleon_parse_cells()'
MCB: add support for SC31 to mcb-lpc
mux: make device_type const
char: virtio: constify attribute_group structures.
Documentation/ABI: document the nvmem sysfs files
lkdtm: fix spelling mistake: "incremeted" -> "incremented"
perf: cs-etm: Fix ETMv4 CONFIGR entry in perf.data file
nvmem: include linux/err.h from header
... -
Pull s390 updates from Martin Schwidefsky:
"The first part of the s390 updates for 4.14:- Add machine type 0x3906 for IBM z14
- Add IBM z14 TLB flushing improvements for KVM guests
- Exploit the TOD clock epoch extension to provide a continuous TOD
clock afer 2042/09/17- Add NIAI spinlock hints for IBM z14
- Rework the vmcp driver and use CMA for the respone buffer of z/VM
CP commands- Drop some s390 specific asm headers and use the generic version
- Add block discard for DASD-FBA devices under z/VM
- Add average request times to DASD statistics
- A few of those constify patches which seem to be in vogue right now
- Cleanup and bug fixes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (50 commits)
s390/mm: avoid empty zero pages for KVM guests to avoid postcopy hangs
s390/dasd: Add discard support for FBA devices
s390/zcrypt: make CPRBX const
s390/uaccess: avoid mvcos jump label
s390/mm: use generic mm_hooks
s390/facilities: fix typo
s390/vmcp: simplify vmcp_response_free()
s390/topology: Remove the unused parent_node() macro
s390/dasd: Change unsigned long long to unsigned long
s390/smp: convert cpuhp_setup_state() return code to zero on success
s390: fix 'novx' early parameter handling
s390/dasd: add average request times to dasd statistics
s390/scm: use common completion path
s390/pci: log changes to uid checking
s390/vmcp: simplify vmcp_ioctl()
s390/vmcp: return -ENOTTY for unknown ioctl commands
s390/vmcp: split vmcp header file and move to uapi
s390/vmcp: make use of contiguous memory allocator
s390/cpcmd,vmcp: avoid GFP_DMA allocations
s390/vmcp: fix uaccess check and avoid undefined behavior
...
05 Sep, 2017
2 commits
-
Pull x86 cache quality monitoring update from Thomas Gleixner:
"This update provides a complete rewrite of the Cache Quality
Monitoring (CQM) facility.The existing CQM support was duct taped into perf with a lot of issues
and the attempts to fix those turned out to be incomplete and
horrible.After lengthy discussions it was decided to integrate the CQM support
into the Resource Director Technology (RDT) facility, which is the
obvious choise as in hardware CQM is part of RDT. This allowed to add
Memory Bandwidth Monitoring support on top.As a result the mechanisms for allocating cache/memory bandwidth and
the corresponding monitoring mechanisms are integrated into a single
management facility with a consistent user interface"* 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits)
x86/intel_rdt: Turn off most RDT features on Skylake
x86/intel_rdt: Add command line options for resource director technology
x86/intel_rdt: Move special case code for Haswell to a quirk function
x86/intel_rdt: Remove redundant ternary operator on return
x86/intel_rdt/cqm: Improve limbo list processing
x86/intel_rdt/mbm: Fix MBM overflow handler during CPU hotplug
x86/intel_rdt: Modify the intel_pqr_state for better performance
x86/intel_rdt/cqm: Clear the default RMID during hotcpu
x86/intel_rdt: Show bitmask of shareable resource with other executing units
x86/intel_rdt/mbm: Handle counter overflow
x86/intel_rdt/mbm: Add mbm counter initialization
x86/intel_rdt/mbm: Basic counting of MBM events (total and local)
x86/intel_rdt/cqm: Add CPU hotplug support
x86/intel_rdt/cqm: Add sched_in support
x86/intel_rdt: Introduce rdt_enable_key for scheduling
x86/intel_rdt/cqm: Add mount,umount support
x86/intel_rdt/cqm: Add rmdir support
x86/intel_rdt: Separate the ctrl bits from rmdir
x86/intel_rdt/cqm: Add mon_data
x86/intel_rdt: Prepare for RDT monitor data support
... -
Pull x86 mm changes from Ingo Molnar:
"PCID support, 5-level paging support, Secure Memory Encryption supportThe main changes in this cycle are support for three new, complex
hardware features of x86 CPUs:- Add 5-level paging support, which is a new hardware feature on
upcoming Intel CPUs allowing up to 128 PB of virtual address space
and 4 PB of physical RAM space - a 512-fold increase over the old
limits. (Supercomputers of the future forecasting hurricanes on an
ever warming planet can certainly make good use of more RAM.)Many of the necessary changes went upstream in previous cycles,
v4.14 is the first kernel that can enable 5-level paging.This feature is activated via CONFIG_X86_5LEVEL=y - disabled by
default.(By Kirill A. Shutemov)
- Add 'encrypted memory' support, which is a new hardware feature on
upcoming AMD CPUs ('Secure Memory Encryption', SME) allowing system
RAM to be encrypted and decrypted (mostly) transparently by the
CPU, with a little help from the kernel to transition to/from
encrypted RAM. Such RAM should be more secure against various
attacks like RAM access via the memory bus and should make the
radio signature of memory bus traffic harder to intercept (and
decrypt) as well.This feature is activated via CONFIG_AMD_MEM_ENCRYPT=y - disabled
by default.(By Tom Lendacky)
- Enable PCID optimized TLB flushing on newer Intel CPUs: PCID is a
hardware feature that attaches an address space tag to TLB entries
and thus allows to skip TLB flushing in many cases, even if we
switch mm's.(By Andy Lutomirski)
All three of these features were in the works for a long time, and
it's coincidence of the three independent development paths that they
are all enabled in v4.14 at once"* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (65 commits)
x86/mm: Enable RCU based page table freeing (CONFIG_HAVE_RCU_TABLE_FREE=y)
x86/mm: Use pr_cont() in dump_pagetable()
x86/mm: Fix SME encryption stack ptr handling
kvm/x86: Avoid clearing the C-bit in rsvd_bits()
x86/CPU: Align CR3 defines
x86/mm, mm/hwpoison: Clear PRESENT bit for kernel 1:1 mappings of poison pages
acpi, x86/mm: Remove encryption mask from ACPI page protection type
x86/mm, kexec: Fix memory corruption with SME on successive kexecs
x86/mm/pkeys: Fix typo in Documentation/x86/protection-keys.txt
x86/mm/dump_pagetables: Speed up page tables dump for CONFIG_KASAN=y
x86/mm: Implement PCID based optimization: try to preserve old TLB entries using PCID
x86: Enable 5-level paging support via CONFIG_X86_5LEVEL=y
x86/mm: Allow userspace have mappings above 47-bit
x86/mm: Prepare to expose larger address space to userspace
x86/mpx: Do not allow MPX if we have mappings above 47-bit
x86/mm: Rename tasksize_32bit/64bit to task_size_32bit/64bit()
x86/xen: Redefine XEN_ELFNOTE_INIT_P2M using PUD_SIZE * PTRS_PER_PUD
x86/mm/dump_pagetables: Fix printout of p4d level
x86/mm/dump_pagetables: Generalize address normalization
x86/boot: Fix memremap() related build failure
...
04 Sep, 2017
3 commits
-
* pm-docs:
PM: docs: Delete the obsolete states.txt document
PM: docs: Describe high-level PM strategies and sleep states -
* intel_pstate:
cpufreq: intel_pstate: Shorten a couple of long names
cpufreq: intel_pstate: Simplify intel_pstate_adjust_pstate()
cpufreq: intel_pstate: Improve IO performance with per-core P-states
cpufreq: intel_pstate: Drop INTEL_PSTATE_HWP_SAMPLING_INTERVAL
cpufreq: intel_pstate: Drop ->update_util from pstate_funcs
cpufreq: intel_pstate: Do not use PID-based P-state selection -
* pm-cpufreq: (33 commits)
cpufreq: imx6q: Fix imx6sx low frequency support
cpufreq: speedstep-lib: make several arrays static, makes code smaller
cpufreq: ti: Fix 'of_node_put' being called twice in error handling path
cpufreq: dt-platdev: Drop few entries from whitelist
cpufreq: dt-platdev: Automatically create cpufreq device with OPP v2
ARM: ux500: don't select CPUFREQ_DT
cpufreq: Convert to using %pOF instead of full_name
cpufreq: Cap the default transition delay value to 10 ms
cpufreq: dbx500: Delete obsolete driver
mfd: db8500-prcmu: Get rid of cpufreq dependency
cpufreq: enable the DT cpufreq driver on the Ux500
cpufreq: Loongson2: constify platform_device_id
cpufreq: dt: Add r8a7796 support to to use generic cpufreq driver
cpufreq: remove setting of policy->cpu in policy->cpus during init
cpufreq: mediatek: add support of cpufreq to MT7622 SoC
cpufreq: mediatek: add cleanups with the more generic naming
cpufreq: rcar: Add support for R8A7795 SoC
cpufreq: dt: Add rk3328 compatible to use generic cpufreq driver
cpufreq: s5pv210: add missing of_node_put()
cpufreq: Allow dynamic switching with CPUFREQ_ETERNAL latency
...
29 Aug, 2017
2 commits
-
We add ability for all cores at NPS SoC to control the number of cycles
HW thread can execute before it is replace with another eligible
HW thread within the same core. The replacement is done by the
HW scheduler.Signed-off-by: Noam Camus
Signed-off-by: Vineet Gupta
[vgupta: simplified handlign of out of range argument value] -
Reorganize the power management part of admin-guide by adding a
description of major power management strategies supported by the
kernel (system-wide and working-state power management) to it and
dividing the rest of the material into the system-wide PM and
working-state PM chapters.On top of that, add a description of system sleep states to the
system-wide PM chapter.Signed-off-by: Rafael J. Wysocki
Reviewed-by: Lukas Wunner
26 Aug, 2017
2 commits
-
Conflicts:
arch/x86/kernel/head64.c
arch/x86/mm/mmap.cSigned-off-by: Ingo Molnar
-
Command line options allow us to ignore features that we don't want.
Also we can re-enable options that have been disabled on a platform
(so long as the underlying h/w actually supports the option).[ tglx: Marked the option array __initdata and the helper function __init ]
Signed-off-by: Tony Luck
Signed-off-by: Thomas Gleixner
Cc: Fenghua"
Cc: Ravi V"
Cc: "Peter Zijlstra"
Cc: "Stephane Eranian"
Cc: "Andi Kleen"
Cc: "David Carrillo-Cisneros"
Cc: Vikas Shivappa
Link: http://lkml.kernel.org/r/0c37b0d4dbc30977a3c1cee08b66420f83662694.1503512900.git.tony.luck@intel.com
21 Aug, 2017
2 commits
-
…k/linux-rcu into core/rcu
Pull RCU updates from Paul E. McKenney:
- Removal of spin_unlock_wait()
- SRCU updates
- Torture-test updates
- Documentation updates
- Miscellaneous fixes
- CPU-hotplug fixes
- Miscellaneous non-RCU fixesSigned-off-by: Ingo Molnar <mingo@kernel.org>
15 Aug, 2017
1 commit
-
We want the firmware, and other changes, in here as well.
Signed-off-by: Greg Kroah-Hartman
09 Aug, 2017
1 commit
-
If memory is fragmented it is unlikely that large order memory
allocations succeed. This has been an issue with the vmcp device
driver since a long time, since it requires large physical contiguous
memory ares for large responses.To hopefully resolve this issue make use of the contiguous memory
allocator (cma). This patch adds a vmcp specific vmcp cma area with a
default size of 4MB. The size can be changed either via the
VMCP_CMA_SIZE config option at compile time or with the "vmcp_cma"
kernel parameter (e.g. "vmcp_cma=16m").For any vmcp response buffers larger than 16k memory from the cma area
will be allocated. If such an allocation fails, there is a fallback to
the buddy allocator.Signed-off-by: Heiko Carstens
Signed-off-by: Martin Schwidefsky