Eric Lee / smarc-fsl-linux-kernel

18 Apr, 2019

1 commit

71350eaff cpufreq: Add android's 'interactive' governor ... Browse Code »

Interactive governor has lived in Android sources for a very long time
and this commit is based on the code present in following branch:

https://android.googlesource.com/kernel/common android-4.4

The Interactive governor is designed for latency-sensitive workloads,
such as interactive user interfaces like the mobile phones and tablets.
The interactive governor aims to be significantly more responsive to
ramp CPU quickly up when CPU-intensive activity begins.

Existing governors sample CPU load at a particular rate, typically every
X ms and then update the frequency from a work-handler. This can lead
to under-powering UI threads for the period of time during which the
user begins interacting with a previously-idle system until the next
sample period happens.

The 'interactive' governor uses a different approach.

A real-time thread is used for scaling up, giving the remaining tasks
the CPU performance benefit, unlike existing governors which are more
likely to schedule ramp-up work to occur after your performance starved
tasks have completed.

The Android version of interactive governor also checks whether to scale
the CPU frequency up soon after coming out of idle. When the CPU comes
out of idle, the governor check if the CPU sampling is overdue or not.
If yes, it immediately starts the sampling. Otherwise, the utilization
hooks from the scheduler handle the sampling later. If the CPU is very
busy from exiting idle to when the evaluation happens, then it assumes
that the CPU is under-powered and ramps it to MAX speed.

If the CPU was not sufficiently busy to immediately ramp to MAX speed,
then the governor evaluates the CPU load since the last speed
adjustment, choosing the highest value between that longer-term load or
the short-term load since idle exit to determine the CPU speed to ramp
to.

Idle notifiers will be be handled later and are not included for now.

The core of this code is written and maintained (in Android
repositories) by Mike Chan and Todd Poyner over a long period of time.

Vireshk has made changes to to the governor to align it with the current
practices followed with mainline governors, like using utilization hooks
from the scheduler and handling kobject (for governor's sysfs directory)
in a race free manner. And of course this included general cleanup of
the governor as well.

Signed-off-by: Mike Chan
Signed-off-by: Todd Poynor
Signed-off-by: Viresh Kumar
Signed-off-by: Vipul Kumar

Viresh Kumar
2019-04-18 07:51:34 +0800

10 Jan, 2019

1 commit

86ba6f66c x86/speculation/l1tf: Drop the swap storage limit restriction when l1tf=off ... Browse Code »

commit 5b5e4d623ec8a34689df98e42d038a3b594d2ff9 upstream.

Swap storage is restricted to max_swapfile_size (~16TB on x86_64) whenever
the system is deemed affected by L1TF vulnerability. Even though the limit
is quite high for most deployments it seems to be too restrictive for
deployments which are willing to live with the mitigation disabled.

We have a customer to deploy 8x 6,4TB PCIe/NVMe SSD swap devices which is
clearly out of the limit.

Drop the swap restriction when l1tf=off is specified. It also doesn't make
much sense to warn about too much memory for the l1tf mitigation when it is
forcefully disabled by the administrator.

[ tglx: Folded the documentation delta change ]

Fixes: 377eeaa8e11f ("x86/speculation/l1tf: Limit swap file size to MAX_PA/2")
Signed-off-by: Michal Hocko
Signed-off-by: Thomas Gleixner
Reviewed-by: Pavel Tatashin
Reviewed-by: Andi Kleen
Acked-by: Jiri Kosina
Cc: Linus Torvalds
Cc: Dave Hansen
Cc: Andi Kleen
Cc: Borislav Petkov
Cc:
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20181113184910.26697-1-mhocko@kernel.org
Signed-off-by: Greg Kroah-Hartman

Michal Hocko
2019-01-10 00:38:41 +0800

06 Dec, 2018

4 commits

9f3baacee x86/speculation: Provide IBPB always command line options ... Browse Code »

commit 55a974021ec952ee460dc31ca08722158639de72 upstream

Provide the possibility to enable IBPB always in combination with 'prctl'
and 'seccomp'.

Add the extra command line options and rework the IBPB selection to
evaluate the command instead of the mode selected by the STIPB switch case.

Signed-off-by: Thomas Gleixner
Reviewed-by: Ingo Molnar
Cc: Peter Zijlstra
Cc: Andy Lutomirski
Cc: Linus Torvalds
Cc: Jiri Kosina
Cc: Tom Lendacky
Cc: Josh Poimboeuf
Cc: Andrea Arcangeli
Cc: David Woodhouse
Cc: Tim Chen
Cc: Andi Kleen
Cc: Dave Hansen
Cc: Casey Schaufler
Cc: Asit Mallick
Cc: Arjan van de Ven
Cc: Jon Masters
Cc: Waiman Long
Cc: Greg KH
Cc: Dave Stewart
Cc: Kees Cook
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20181125185006.144047038@linutronix.de
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2018-12-06 02:32:04 +0800
d1ec23547 x86/speculation: Add seccomp Spectre v2 user space protection mode ... Browse Code »

commit 6b3e64c237c072797a9ec918654a60e3a46488e2 upstream

If 'prctl' mode of user space protection from spectre v2 is selected
on the kernel command-line, STIBP and IBPB are applied on tasks which
restrict their indirect branch speculation via prctl.

SECCOMP enables the SSBD mitigation for sandboxed tasks already, so it
makes sense to prevent spectre v2 user space to user space attacks as
well.

The Intel mitigation guide documents how STIPB works:

Setting bit 1 (STIBP) of the IA32_SPEC_CTRL MSR on a logical processor
prevents the predicted targets of indirect branches on any logical
processor of that core from being controlled by software that executes
(or executed previously) on another logical processor of the same core.

Ergo setting STIBP protects the task itself from being attacked from a task
running on a different hyper-thread and protects the tasks running on
different hyper-threads from being attacked.

While the document suggests that the branch predictors are shielded between
the logical processors, the observed performance regressions suggest that
STIBP simply disables the branch predictor more or less completely. Of
course the document wording is vague, but the fact that there is also no
requirement for issuing IBPB when STIBP is used points clearly in that
direction. The kernel still issues IBPB even when STIBP is used until Intel
clarifies the whole mechanism.

IBPB is issued when the task switches out, so malicious sandbox code cannot
mistrain the branch predictor for the next user space task on the same
logical processor.

Signed-off-by: Jiri Kosina
Signed-off-by: Thomas Gleixner
Reviewed-by: Ingo Molnar
Cc: Peter Zijlstra
Cc: Andy Lutomirski
Cc: Linus Torvalds
Cc: Tom Lendacky
Cc: Josh Poimboeuf
Cc: Andrea Arcangeli
Cc: David Woodhouse
Cc: Tim Chen
Cc: Andi Kleen
Cc: Dave Hansen
Cc: Casey Schaufler
Cc: Asit Mallick
Cc: Arjan van de Ven
Cc: Jon Masters
Cc: Waiman Long
Cc: Greg KH
Cc: Dave Stewart
Cc: Kees Cook
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20181125185006.051663132@linutronix.de
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2018-12-06 02:32:04 +0800
7b62ef142 x86/speculation: Enable prctl mode for spectre_v2_user ... Browse Code »

commit 7cc765a67d8e04ef7d772425ca5a2a1e2b894c15 upstream

Now that all prerequisites are in place:

- Add the prctl command line option

- Default the 'auto' mode to 'prctl'

- When SMT state changes, update the static key which controls the
conditional STIBP evaluation on context switch.

- At init update the static key which controls the conditional IBPB
evaluation on context switch.

Signed-off-by: Thomas Gleixner
Reviewed-by: Ingo Molnar
Cc: Peter Zijlstra
Cc: Andy Lutomirski
Cc: Linus Torvalds
Cc: Jiri Kosina
Cc: Tom Lendacky
Cc: Josh Poimboeuf
Cc: Andrea Arcangeli
Cc: David Woodhouse
Cc: Tim Chen
Cc: Andi Kleen
Cc: Dave Hansen
Cc: Casey Schaufler
Cc: Asit Mallick
Cc: Arjan van de Ven
Cc: Jon Masters
Cc: Waiman Long
Cc: Greg KH
Cc: Dave Stewart
Cc: Kees Cook
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20181125185005.958421388@linutronix.de
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2018-12-06 02:32:04 +0800
711875432 x86/speculation: Add command line control for indirect branch speculation ... Browse Code »

commit fa1202ef224391b6f5b26cdd44cc50495e8fab54 upstream

Add command line control for user space indirect branch speculation
mitigations. The new option is: spectre_v2_user=

The initial options are:

- on: Unconditionally enabled
- off: Unconditionally disabled
-auto: Kernel selects mitigation (default off for now)

When the spectre_v2= command line argument is either 'on' or 'off' this
implies that the application to application control follows that state even
if a contradicting spectre_v2_user= argument is supplied.

Originally-by: Tim Chen
Signed-off-by: Thomas Gleixner
Reviewed-by: Ingo Molnar
Cc: Peter Zijlstra
Cc: Andy Lutomirski
Cc: Linus Torvalds
Cc: Jiri Kosina
Cc: Tom Lendacky
Cc: Josh Poimboeuf
Cc: Andrea Arcangeli
Cc: David Woodhouse
Cc: Andi Kleen
Cc: Dave Hansen
Cc: Casey Schaufler
Cc: Asit Mallick
Cc: Arjan van de Ven
Cc: Jon Masters
Cc: Waiman Long
Cc: Greg KH
Cc: Dave Stewart
Cc: Kees Cook
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20181125185005.082720373@linutronix.de
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2018-12-06 02:32:03 +0800

01 Dec, 2018

2 commits

bcec3b858 Documentation/security-bugs: Postpone fix publication in exceptional cases ... Browse Code »

commit 544b03da39e2d7b4961d3163976ed4bfb1fac509 upstream.

At the request of the reporter, the Linux kernel security team offers to
postpone the publishing of a fix for up to 5 business days from the date
of a report.

While it is generally undesirable to keep a fix private after it has
been developed, this short window is intended to allow distributions to
package the fix into their kernel builds and permits early inclusion of
the security team in the case of a co-ordinated disclosure with other
parties. Unfortunately, discussions with major Linux distributions and
cloud providers has revealed that 5 business days is not sufficient to
achieve either of these two goals.

As an example, cloud providers need to roll out KVM security fixes to a
global fleet of hosts with sufficient early ramp-up and monitoring. An
end-to-end timeline of less than two weeks dramatically cuts into the
amount of early validation and increases the chance of guest-visible
regressions.

The consequence of this timeline mismatch is that security issues are
commonly fixed without the involvement of the Linux kernel security team
and are instead analysed and addressed by an ad-hoc group of developers
across companies contributing to Linux. In some cases, mainline (and
therefore the official stable kernels) can be left to languish for
extended periods of time. This undermines the Linux kernel security
process and puts upstream developers in a difficult position should they
find themselves involved with an undisclosed security problem that they
are unable to report due to restrictions from their employer.

To accommodate the needs of these users of the Linux kernel and
encourage them to engage with the Linux security team when security
issues are first uncovered, extend the maximum period for which fixes
may be delayed to 7 calendar days, or 14 calendar days in exceptional
cases, where the logistics of QA and large scale rollouts specifically
need to be accommodated. This brings parity with the linux-distros@
maximum embargo period of 14 calendar days.

Cc: Paolo Bonzini
Cc: David Woodhouse
Cc: Amit Shah
Cc: Laura Abbott
Acked-by: Kees Cook
Co-developed-by: Thomas Gleixner
Co-developed-by: David Woodhouse
Signed-off-by: Thomas Gleixner
Signed-off-by: David Woodhouse
Signed-off-by: Will Deacon
Reviewed-by: Tyler Hicks
Acked-by: Peter Zijlstra
Signed-off-by: Greg Kroah-Hartman

Will Deacon
2018-12-01 16:37:26 +0800
160a390a9 Documentation/security-bugs: Clarify treatment of embargoed information ... Browse Code »

commit 14fdc2c5318ae420e68496975f48dc1dbef52649 upstream.

The Linux kernel security team has been accused of rejecting the idea of
security embargoes. This is incorrect, and could dissuade people from
reporting security issues to us under the false assumption that the
issue would leak prematurely.

Clarify the handling of embargoed information in our process
documentation.

Co-developed-by: Ingo Molnar
Acked-by: Kees Cook
Acked-by: Peter Zijlstra
Acked-by: Laura Abbott
Signed-off-by: Will Deacon
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman

Will Deacon
2018-12-01 16:37:26 +0800

27 Nov, 2018

2 commits

ed8acd13e USB: Wait for extra delay time after USB_PORT_FEAT_RESET for quirky hub ... Browse Code »

commit 781f0766cc41a9dd2e5d118ef4b1d5d89430257b upstream.

Devices connected under Terminus Technology Inc. Hub (1a40:0101) may
fail to work after the system resumes from suspend:
[ 206.063325] usb 3-2.4: reset full-speed USB device number 4 using xhci_hcd
[ 206.143691] usb 3-2.4: device descriptor read/64, error -32
[ 206.351671] usb 3-2.4: device descriptor read/64, error -32

Info for this hub:
T: Bus=03 Lev=01 Prnt=01 Port=01 Cnt=01 Dev#= 2 Spd=480 MxCh= 4
D: Ver= 2.00 Cls=09(hub ) Sub=00 Prot=01 MxPS=64 #Cfgs= 1
P: Vendor=1a40 ProdID=0101 Rev=01.11
S: Product=USB 2.0 Hub
C: #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr=100mA
I: If#= 0 Alt= 0 #EPs= 1 Cls=09(hub ) Sub=00 Prot=00 Driver=hub

Some expirements indicate that the USB devices connected to the hub are
innocent, it's the hub itself is to blame. The hub needs extra delay
time after it resets its port.

Hence wait for extra delay, if the device is connected to this quirky
hub.

Signed-off-by: Kai-Heng Feng
Cc: stable
Acked-by: Alan Stern
Signed-off-by: Greg Kroah-Hartman

Kai-Heng Feng
2018-11-27 23:13:09 +0800
9f0e46bf5 x86/earlyprintk: Add a force option for pciserial device ... Browse Code »

[ Upstream commit d2266bbfa9e3e32e3b642965088ca461bd24a94f ]

The "pciserial" earlyprintk variant helps much on many modern x86
platforms, but unfortunately there are still some platforms with PCI
UART devices which have the wrong PCI class code. In that case, the
current class code check does not allow for them to be used for logging.

Add a sub-option "force" which overrides the class code check and thus
the use of such device can be enforced.

[ bp: massage formulations. ]

Suggested-by: Borislav Petkov
Signed-off-by: Feng Tang
Signed-off-by: Borislav Petkov
Cc: "H. Peter Anvin"
Cc: "Stuart R . Anderson"
Cc: Bjorn Helgaas
Cc: David Rientjes
Cc: Feng Tang
Cc: Frederic Weisbecker
Cc: Greg Kroah-Hartman
Cc: H Peter Anvin
Cc: Ingo Molnar
Cc: Jiri Kosina
Cc: Jonathan Corbet
Cc: Kai-Heng Feng
Cc: Kate Stewart
Cc: Konrad Rzeszutek Wilk
Cc: Peter Zijlstra
Cc: Philippe Ombredanne
Cc: Thomas Gleixner
Cc: Thymo van Beers
Cc: alan@linux.intel.com
Cc: linux-doc@vger.kernel.org
Link: http://lkml.kernel.org/r/20181002164921.25833-1-feng.tang@intel.com
Signed-off-by: Sasha Levin

Feng Tang
2018-11-27 23:13:00 +0800

14 Sep, 2018

1 commit

197ecb380 xen/balloon: add runtime control for scrubbing ballooned out pages ... Browse Code »

Scrubbing pages on initial balloon down can take some time, especially
in nested virtualization case (nested EPT is slow). When HVM/PVH guest is
started with memory= significantly lower than maxmem=, all the extra
pages will be scrubbed before returning to Xen. But since most of them
weren't used at all at that point, Xen needs to populate them first
(from populate-on-demand pool). In nested virt case (Xen inside KVM)
this slows down the guest boot by 15-30s with just 1.5GB needed to be
returned to Xen.

Add runtime parameter to enable/disable it, to allow initially disabling
scrubbing, then enable it back during boot (for example in initramfs).
Such usage relies on assumption that a) most pages ballooned out during
initial boot weren't used at all, and b) even if they were, very few
secrets are in the guest at that time (before any serious userspace
kicks in).
Convert CONFIG_XEN_SCRUB_PAGES to CONFIG_XEN_SCRUB_PAGES_DEFAULT (also
enabled by default), controlling default value for the new runtime
switch.

Signed-off-by: Marek Marczykowski-Górecki
Reviewed-by: Juergen Gross
Signed-off-by: Boris Ostrovsky

Marek Marczykowski-Górecki
2018-09-14 20:51:10 +0800

09 Sep, 2018

1 commit

3243a89dc Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random ... Browse Code »

Pull random driver fix from Ted Ts'o:
"Fix things so the choice of whether or not to trust RDRAND to
initialize the CRNG is configurable via the boot option
random.trust_cpu={on,off}"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random:
random: make CPU trust a boot parameter

Linus Torvalds
2018-09-09 20:54:05 +0800

02 Sep, 2018

1 commit

9b2543666 random: make CPU trust a boot parameter ... Browse Code »

Instead of forcing a distro or other system builder to choose
at build time whether the CPU is trusted for CRNG seeding via
CONFIG_RANDOM_TRUST_CPU, provide a boot-time parameter for end users to
control the choice. The CONFIG will set the default state instead.

Signed-off-by: Kees Cook
Signed-off-by: Theodore Ts'o

Kees Cook
2018-09-02 00:51:54 +0800

25 Aug, 2018

1 commit

18b8bfdfb Merge tag 'iommu-updates-v4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu ... Browse Code »

Pull IOMMU updates from Joerg Roedel:

- PASID table handling updates for the Intel VT-d driver. It implements
a global PASID space now so that applications usings multiple devices
will just have one PASID.

- A new config option to make iommu passthroug mode the default.

- New sysfs attribute for iommu groups to export the type of the
default domain.

- A debugfs interface (for debug only) usable by IOMMU drivers to
export internals to user-space.

- R-Car Gen3 SoCs support for the ipmmu-vmsa driver

- The ARM-SMMU now aborts transactions from unknown devices and devices
not attached to any domain.

- Various cleanups and smaller fixes all over the place.

* tag 'iommu-updates-v4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (42 commits)
iommu/omap: Fix cache flushes on L2 table entries
iommu: Remove the ->map_sg indirection
iommu/arm-smmu-v3: Abort all transactions if SMMU is enabled in kdump kernel
iommu/arm-smmu-v3: Prevent any devices access to memory without registration
iommu/ipmmu-vmsa: Don't register as BUS IOMMU if machine doesn't have IPMMU-VMSA
iommu/ipmmu-vmsa: Clarify supported platforms
iommu/ipmmu-vmsa: Fix allocation in atomic context
iommu: Add config option to set passthrough as default
iommu: Add sysfs attribyte for domain type
iommu/arm-smmu-v3: sync the OVACKFLG to PRIQ consumer register
iommu/arm-smmu: Error out only if not enough context interrupts
iommu/io-pgtable-arm-v7s: Abort allocation when table address overflows the PTE
iommu/io-pgtable-arm: Fix pgtable allocation in selftest
iommu/vt-d: Remove the obsolete per iommu pasid tables
iommu/vt-d: Apply per pci device pasid table in SVA
iommu/vt-d: Allocate and free pasid table
iommu/vt-d: Per PCI device pasid table interfaces
iommu/vt-d: Add for_each_device_domain() helper
iommu/vt-d: Move device_domain_info to header
iommu/vt-d: Apply global PASID in SVA
...

Linus Torvalds
2018-08-25 04:10:38 +0800

23 Aug, 2018

2 commits

3d8b38eb8 mm, oom: introduce memory.oom.group ... Browse Code »

For some workloads an intervention from the OOM killer can be painful.
Killing a random task can bring the workload into an inconsistent state.

Historically, there are two common solutions for this
problem:
1) enabling panic_on_oom,
2) using a userspace daemon to monitor OOMs and kill
all outstanding processes.

Both approaches have their downsides: rebooting on each OOM is an obvious
waste of capacity, and handling all in userspace is tricky and requires a
userspace agent, which will monitor all cgroups for OOMs.

In most cases an in-kernel after-OOM cleaning-up mechanism can eliminate
the necessity of enabling panic_on_oom. Also, it can simplify the cgroup
management for userspace applications.

This commit introduces a new knob for cgroup v2 memory controller:
memory.oom.group. The knob determines whether the cgroup should be
treated as an indivisible workload by the OOM killer. If set, all tasks
belonging to the cgroup or to its descendants (if the memory cgroup is not
a leaf cgroup) are killed together or not at all.

To determine which cgroup has to be killed, we do traverse the cgroup
hierarchy from the victim task's cgroup up to the OOMing cgroup (or root)
and looking for the highest-level cgroup with memory.oom.group set.

Tasks with the OOM protection (oom_score_adj set to -1000) are treated as
an exception and are never killed.

This patch doesn't change the OOM victim selection algorithm.

Link: http://lkml.kernel.org/r/20180802003201.817-4-guro@fb.com
Signed-off-by: Roman Gushchin
Acked-by: Michal Hocko
Acked-by: Johannes Weiner
Cc: David Rientjes
Cc: Tetsuo Handa
Cc: Tejun Heo
Cc: Vladimir Davydov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Roman Gushchin
2018-08-23 01:52:45 +0800
8c9a134ca mm: clarify CONFIG_PAGE_POISONING and usage ... Browse Code »

The Kconfig text for CONFIG_PAGE_POISONING doesn't mention that it has to
be enabled explicitly. This updates the documentation for that and adds a
note about CONFIG_PAGE_POISONING to the "page_poison" command line docs.
While here, change description of CONFIG_PAGE_POISONING_ZERO too, as it's
not "random" data, but rather the fixed debugging value that would be used
when not zeroing. Additionally removes a stray "bool" in the Kconfig.

Link: http://lkml.kernel.org/r/20180725223832.GA43733@beast
Signed-off-by: Kees Cook
Reviewed-by: Andrew Morton
Cc: Jonathan Corbet
Cc: Laura Abbott
Cc: Naoya Horiguchi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kees Cook
2018-08-23 01:52:44 +0800

19 Aug, 2018

2 commits

a18d783fe Merge tag 'driver-core-4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core ... Browse Code »

Pull driver core updates from Greg KH:
"Here are all of the driver core and related patches for 4.19-rc1.

Nothing huge here, just a number of small cleanups and the ability to
now stop the deferred probing after init happens.

All of these have been in linux-next for a while with only a merge
issue reported"

* tag 'driver-core-4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (21 commits)
base: core: Remove WARN_ON from link dependencies check
drivers/base: stop new probing during shutdown
drivers: core: Remove glue dirs from sysfs earlier
driver core: remove unnecessary function extern declare
sysfs.h: fix non-kernel-doc comment
PM / Domains: Stop deferring probe at the end of initcall
iommu: Remove IOMMU_OF_DECLARE
iommu: Stop deferring probe at end of initcalls
pinctrl: Support stopping deferred probe after initcalls
dt-bindings: pinctrl: add a 'pinctrl-use-default' property
driver core: allow stopping deferred probe after init
driver core: add a debugfs entry to show deferred devices
sysfs: Fix internal_create_group() for named group updates
base: fix order of OF initialization
linux/device.h: fix kernel-doc notation warning
Documentation: update firmware loader fallback reference
kobject: Replace strncpy with memcpy
drivers: base: cacheinfo: use OF property_read_u32 instead of get_property,read_number
kernfs: Replace strncpy with memcpy
device: Add #define dev_fmt similar to #define pr_fmt
...

Linus Torvalds
2018-08-19 02:44:53 +0800
336722eb9 Merge tag 'tty-4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty ... Browse Code »

Pull tty/serial driver updates from Greg KH:
"Here is the big tty and serial driver pull request for 4.19-rc1.

It's not all that big, just a number of small serial driver updates
and fixes, along with some better vt handling for unicode characters
for those using braille terminals.

All of these patches have been in linux-next for a long time with no
reported issues"

* tag 'tty-4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (73 commits)
tty: serial: 8250: Revert NXP SC16C2552 workaround
serial: 8250_exar: Read INT0 from slave device, too
tty: rocket: Fix possible buffer overwrite on register_PCI
serial: 8250_dw: Add ACPI support for uart on Broadcom SoC
serial: 8250_dw: always set baud rate in dw8250_set_termios
dt-bindings: serial: Add binding for uartlite
tty: serial: uartlite: Add support for suspend and resume
tty: serial: uartlite: Add clock adaptation
tty: serial: uartlite: Add structure for private data
serial: sh-sci: Improve support for separate TEI and DRI interrupts
serial: sh-sci: Remove SCIx_RZ_SCIFA_REGTYPE
serial: sh-sci: Allow for compressed SCIF address
serial: sh-sci: Improve interrupts description
serial: 8250: Use cached port name directly in messages
serial: 8250_exar: Drop unused variable in pci_xr17v35x_setup()
vt: drop unused struct vt_struct
vt: avoid a VLA in the unicode screen scroll function
vt: add /dev/vcsu* to devices.txt
vt: coherence validation code for the unicode screen buffer
vt: selection: take screen contents from uniscr if available
...

Linus Torvalds
2018-08-19 01:50:41 +0800

18 Aug, 2018

4 commits

6ada4e282 Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge updates from Andrew Morton:

- a few misc things

- a few Y2038 fixes

- ntfs fixes

- arch/sh tweaks

- ocfs2 updates

- most of MM

* emailed patches from Andrew Morton : (111 commits)
mm/hmm.c: remove unused variables align_start and align_end
fs/userfaultfd.c: remove redundant pointer uwq
mm, vmacache: hash addresses based on pmd
mm/list_lru: introduce list_lru_shrink_walk_irq()
mm/list_lru.c: pass struct list_lru_node* as an argument to __list_lru_walk_one()
mm/list_lru.c: move locking from __list_lru_walk_one() to its caller
mm/list_lru.c: use list_lru_walk_one() in list_lru_walk_node()
mm, swap: make CONFIG_THP_SWAP depend on CONFIG_SWAP
mm/sparse: delete old sparse_init and enable new one
mm/sparse: add new sparse_init_nid() and sparse_init()
mm/sparse: move buffer init/fini to the common place
mm/sparse: use the new sparse buffer functions in non-vmemmap
mm/sparse: abstract sparse buffer allocations
mm/hugetlb.c: don't zero 1GiB bootmem pages
mm, page_alloc: double zone's batchsize
mm/oom_kill.c: document oom_lock
mm/hugetlb: remove gigantic page support for HIGHMEM
mm, oom: remove sleep from under oom_lock
kernel/dma: remove unsupported gfp_mask parameter from dma_alloc_from_contiguous()
mm/cma: remove unsupported gfp_mask parameter from cma_alloc()
...

Linus Torvalds
2018-08-18 07:49:31 +0800
59ae96ffc tools/vm/page-types.c: add support for idle page tracking ... Browse Code »

Add a flag which causes page-types to use the kernels's idle page
tracking to mark pages idle. As the tool already prints the idle flag
if set, subsequent runs will show which pages have been accessed since
last run.

[akpm@linux-foundation.org: simplify mark_page_idle()]
[chansen3@cisco.com: reorganize mark_page_idle() logic, add docs]
Link: http://lkml.kernel.org/r/20180706172237.21691-1-chansen3@cisco.com
Link: http://lkml.kernel.org/r/20180612153223.13174-1-chansen3@cisco.com
Signed-off-by: Christian Hansen
Reviewed-by: Andrew Morton
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christian Hansen
2018-08-18 07:20:28 +0800
7f1d23e60 tools/vm/page-types.c: include shared map counts ... Browse Code »

Add a new flag that will read kpagecount for each PFN and print out the
number of times the page is mapped along with the flags in the listing
view.

This information is useful in understanding and optimizing memory usage.
Identifying pages which are not shared allows us to focus on adjusting
the memory layout or access patterns for the sole owning process.
Knowing the number of processes that share a page tells us how many
other times we must make the same adjustments or how many processes to
potentially disable.

Truncated sample output:

voffset map-cnt offset len flags
561a3591e 1 15fe8 1 ___U_lA____Ma_b___________________________
561a3591f 1 2b103 1 ___U_lA____Ma_b___________________________
561a36ca4 1 2cc78 1 ___U_lA____Ma_b___________________________
7f588bb4e 14 2273c 1 __RU_lA____M______________________________

[akpm@linux-foundation.org: coding-style fixes]
[chansen3@cisco.com: add documentation, tweak whitespace]
Link: http://lkml.kernel.org/r/20180705181204.5529-1-chansen3@cisco.com
Link: http://lkml.kernel.org/r/20180612153205.12879-1-chansen3@cisco.com
Signed-off-by: Christian Hansen
Reviewed-by: Andrew Morton
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christian Hansen
2018-08-18 07:20:28 +0800
5e2d059b5 Merge tag 'powerpc-4.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux ... Browse Code »

Pull powerpc updates from Michael Ellerman:
"Notable changes:

- A fix for a bug in our page table fragment allocator, where a page
table page could be freed and reallocated for something else while
still in use, leading to memory corruption etc. The fix reuses
pt_mm in struct page (x86 only) for a powerpc only refcount.

- Fixes to our pkey support. Several are user-visible changes, but
bring us in to line with x86 behaviour and/or fix outright bugs.
Thanks to Florian Weimer for reporting many of these.

- A series to improve the hvc driver & related OPAL console code,
which have been seen to cause hardlockups at times. The hvc driver
changes in particular have been in linux-next for ~month.

- Increase our MAX_PHYSMEM_BITS to 128TB when SPARSEMEM_VMEMMAP=y.

- Remove Power8 DD1 and Power9 DD1 support, neither chip should be in
use anywhere other than as a paper weight.

- An optimised memcmp implementation using Power7-or-later VMX
instructions

- Support for barrier_nospec on some NXP CPUs.

- Support for flushing the count cache on context switch on some IBM
CPUs (controlled by firmware), as a Spectre v2 mitigation.

- A series to enhance the information we print on unhandled signals
to bring it into line with other arches, including showing the
offending VMA and dumping the instructions around the fault.

Thanks to: Aaro Koskinen, Akshay Adiga, Alastair D'Silva, Alexey
Kardashevskiy, Alexey Spirkov, Alistair Popple, Andrew Donnellan,
Aneesh Kumar K.V, Anju T Sudhakar, Arnd Bergmann, Bartosz Golaszewski,
Benjamin Herrenschmidt, Bharat Bhushan, Bjoern Noetel, Boqun Feng,
Breno Leitao, Bryant G. Ly, Camelia Groza, Christophe Leroy, Christoph
Hellwig, Cyril Bur, Dan Carpenter, Daniel Klamt, Darren Stevens, Dave
Young, David Gibson, Diana Craciun, Finn Thain, Florian Weimer,
Frederic Barrat, Gautham R. Shenoy, Geert Uytterhoeven, Geoff Levand,
Guenter Roeck, Gustavo Romero, Haren Myneni, Hari Bathini, Joel
Stanley, Jonathan Neuschäfer, Kees Cook, Madhavan Srinivasan, Mahesh
Salgaonkar, Markus Elfring, Mathieu Malaterre, Mauro S. M. Rodrigues,
Michael Hanselmann, Michael Neuling, Michael Schmitz, Mukesh Ojha,
Murilo Opsfelder Araujo, Nicholas Piggin, Parth Y Shah, Paul
Mackerras, Paul Menzel, Ram Pai, Randy Dunlap, Rashmica Gupta, Reza
Arbab, Rodrigo R. Galvao, Russell Currey, Sam Bobroff, Scott Wood,
Shilpasri G Bhat, Simon Guo, Souptick Joarder, Stan Johnson, Thiago
Jung Bauermann, Tyrel Datwyler, Vaibhav Jain, Vasant Hegde, Venkat
Rao, zhong jiang"

* tag 'powerpc-4.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (234 commits)
powerpc/mm/book3s/radix: Add mapping statistics
powerpc/uaccess: Enable get_user(u64, *p) on 32-bit
powerpc/mm/hash: Remove unnecessary do { } while(0) loop
powerpc/64s: move machine check SLB flushing to mm/slb.c
powerpc/powernv/idle: Fix build error
powerpc/mm/tlbflush: update the mmu_gather page size while iterating address range
powerpc/mm: remove warning about ‘type’ being set
powerpc/32: Include setup.h header file to fix warnings
powerpc: Move `path` variable inside DEBUG_PROM
powerpc/powermac: Make some functions static
powerpc/powermac: Remove variable x that's never read
cxl: remove a dead branch
powerpc/powermac: Add missing include of header pmac.h
powerpc/kexec: Use common error handling code in setup_new_fdt()
powerpc/xmon: Add address lookup for percpu symbols
powerpc/mm: remove huge_pte_offset_and_shift() prototype
powerpc/lib: Use patch_site to patch copy_32 functions once cache is enabled
powerpc/pseries: Fix endianness while restoring of r3 in MCE handler.
powerpc/fadump: merge adjacent memory ranges to reduce PT_LOAD segements
powerpc/fadump: handle crash memory ranges array index overflow
...

Linus Torvalds
2018-08-18 02:32:50 +0800

17 Aug, 2018

1 commit

4e31843f6 Merge tag 'pci-v4.19-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci ... Browse Code »

Pull pci updates from Bjorn Helgaas:

- Decode AER errors with names similar to "lspci" (Tyler Baicar)

- Expose AER statistics in sysfs (Rajat Jain)

- Clear AER status bits selectively based on the type of recovery (Oza
Pawandeep)

- Honor "pcie_ports=native" even if HEST sets FIRMWARE_FIRST (Alexandru
Gagniuc)

- Don't clear AER status bits if we're using the "Firmware-First"
strategy where firmware owns the registers (Alexandru Gagniuc)

- Use sysfs_match_string() to simplify ASPM sysfs parsing (Andy
Shevchenko)

- Remove unnecessary includes of (Bjorn Helgaas)

- Defer DPC event handling to work queue (Keith Busch)

- Use threaded IRQ for DPC bottom half (Keith Busch)

- Print AER status while handling DPC events (Keith Busch)

- Work around IDT switch ACS Source Validation erratum (James
Puthukattukaran)

- Emit diagnostics for all cases of PCIe Link downtraining (Links
operating slower than they're capable of) (Alexandru Gagniuc)

- Skip VFs when configuring Max Payload Size (Myron Stowe)

- Reduce Root Port Max Payload Size if necessary when hot-adding a
device below it (Myron Stowe)

- Simplify SHPC existence/permission checks (Bjorn Helgaas)

- Remove hotplug sample skeleton driver (Lukas Wunner)

- Convert pciehp to threaded IRQ handling (Lukas Wunner)

- Improve pciehp tolerance of missed events and initially unstable
links (Lukas Wunner)

- Clear spurious pciehp events on resume (Lukas Wunner)

- Add pciehp runtime PM support, including for Thunderbolt controllers
(Lukas Wunner)

- Support interrupts from pciehp bridges in D3hot (Lukas Wunner)

- Mark fall-through switch cases before enabling -Wimplicit-fallthrough
(Gustavo A. R. Silva)

- Move DMA-debug PCI init from arch code to PCI core (Christoph
Hellwig)

- Fix pci_request_irq() usage of IRQF_ONESHOT when no handler is
supplied (Heiner Kallweit)

- Unify PCI and DMA direction #defines (Shunyong Yang)

- Add PCI_DEVICE_DATA() macro (Andy Shevchenko)

- Check for VPD completion before checking for timeout (Bert Kenward)

- Limit Netronome NFP5000 config space size to work around erratum
(Jakub Kicinski)

- Set IRQCHIP_ONESHOT_SAFE for PCI MSI irqchips (Heiner Kallweit)

- Document ACPI description of PCI host bridges (Bjorn Helgaas)

- Add "pci=disable_acs_redir=" parameter to disable ACS redirection for
peer-to-peer DMA support (we don't have the peer-to-peer support yet;
this is just one piece) (Logan Gunthorpe)

- Clean up devm_of_pci_get_host_bridge_resources() resource allocation
(Jan Kiszka)

- Fixup resizable BARs after suspend/resume (Christian König)

- Make "pci=earlydump" generic (Sinan Kaya)

- Fix ROM BAR access routines to stay in bounds and check for signature
correctly (Rex Zhu)

- Add DMA alias quirk for Microsemi Switchtec NTB (Doug Meyer)

- Expand documentation for pci_add_dma_alias() (Logan Gunthorpe)

- To avoid bus errors, enable PASID only if entire path supports
End-End TLP prefixes (Sinan Kaya)

- Unify slot and bus reset functions and remove hotplug knowledge from
callers (Sinan Kaya)

- Add Function-Level Reset quirks for Intel and Samsung NVMe devices to
fix guest reboot issues (Alex Williamson)

- Add function 1 DMA alias quirk for Marvell 88SS9183 PCIe SSD
Controller (Bjorn Helgaas)

- Remove Xilinx AXI-PCIe host bridge arch dependency (Palmer Dabbelt)

- Remove Aardvark outbound window configuration (Evan Wang)

- Fix Aardvark bridge window sizing issue (Zachary Zhang)

- Convert Aardvark to use pci_host_probe() to reduce code duplication
(Thomas Petazzoni)

- Correct the Cadence cdns_pcie_writel() signature (Alan Douglas)

- Add Cadence support for optional generic PHYs (Alan Douglas)

- Add Cadence power management ops (Alan Douglas)

- Remove redundant variable from Cadence driver (Colin Ian King)

- Add Kirin MSI support (Xiaowei Song)

- Drop unnecessary root_bus_nr setting from exynos, imx6, keystone,
armada8k, artpec6, designware-plat, histb, qcom, spear13xx (Shawn
Guo)

- Move link notification settings from DesignWare core to individual
drivers (Gustavo Pimentel)

- Add endpoint library MSI-X interfaces (Gustavo Pimentel)

- Correct signature of endpoint library IRQ interfaces (Gustavo
Pimentel)

- Add DesignWare endpoint library MSI-X callbacks (Gustavo Pimentel)

- Add endpoint library MSI-X test support (Gustavo Pimentel)

- Remove unnecessary GFP_ATOMIC from Hyper-V "new child" allocation
(Jia-Ju Bai)

- Add more devices to Broadcom PAXC quirk (Ray Jui)

- Work around corrupted Broadcom PAXC config space to enable SMMU and
GICv3 ITS (Ray Jui)

- Disable MSI parsing to work around broken Broadcom PAXC logic in some
devices (Ray Jui)

- Hide unconfigured functions to work around a Broadcom PAXC defect
(Ray Jui)

- Lower iproc log level to reduce console output during boot (Ray Jui)

- Fix mobiveil iomem/phys_addr_t type usage (Lorenzo Pieralisi)

- Fix mobiveil missing include file (Lorenzo Pieralisi)

- Add mobiveil Kconfig/Makefile support (Lorenzo Pieralisi)

- Fix mvebu I/O space remapping issues (Thomas Petazzoni)

- Use generic pci_host_bridge in mvebu instead of ARM-specific API
(Thomas Petazzoni)

- Whitelist VMD devices with fast interrupt handlers to avoid sharing
vectors with slow handlers (Keith Busch)

* tag 'pci-v4.19-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (153 commits)
PCI/AER: Don't clear AER bits if error handling is Firmware-First
PCI: Limit config space size for Netronome NFP5000
PCI/MSI: Set IRQCHIP_ONESHOT_SAFE for PCI-MSI irqchips
PCI/VPD: Check for VPD access completion before checking for timeout
PCI: Add PCI_DEVICE_DATA() macro to fully describe device ID entry
PCI: Match Root Port's MPS to endpoint's MPSS as necessary
PCI: Skip MPS logic for Virtual Functions (VFs)
PCI: Add function 1 DMA alias quirk for Marvell 88SS9183
PCI: Check for PCIe Link downtraining
PCI: Add ACS Redirect disable quirk for Intel Sunrise Point
PCI: Add device-specific ACS Redirect disable infrastructure
PCI: Convert device-specific ACS quirks from NULL termination to ARRAY_SIZE
PCI: Add "pci=disable_acs_redir=" parameter for peer-to-peer support
PCI: Allow specifying devices using a base bus and path of devfns
PCI: Make specifying PCI devices in kernel parameters reusable
PCI: Hide ACS quirk declarations inside PCI core
PCI: Delay after FLR of Intel DC P3700 NVMe
PCI: Disable Samsung SM961/PM961 NVMe before FLR
PCI: Export pcie_has_flr()
PCI: mvebu: Drop bogus comment above mvebu_pcie_map_registers()
...

Linus Torvalds
2018-08-17 00:21:54 +0800

16 Aug, 2018

2 commits

99a2c789d Merge tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random ... Browse Code »

Pull random updates from Ted Ts'o:
"Some changes to trust cpu-based hwrng (such as RDRAND) for
initializing hashed pointers and (optionally, controlled by a config
option) to initialize the CRNG to avoid boot hangs"

* tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random:
random: Make crng state queryable
random: remove preempt disabled region
random: add a config option to trust the CPU's hwrng
vsprintf: Add command line option debug_boot_weak_hash
vsprintf: Use hw RNG for ptr_key
random: Return nbytes filled from hw RNG
random: Fix whitespace pre random-bytes work

Linus Torvalds
2018-08-16 12:16:02 +0800
5fc054a54 Merge branch 'pci/resource' ... Browse Code »

- Clean up devm_of_pci_get_host_bridge_resources() resource allocation
(Jan Kiszka)

- Fixup resizable BARs after suspend/resume (Christian König)

- Make "pci=earlydump" generic (Sinan Kaya)

- Fix ROM BAR access routines to stay in bounds and check for signature
correctly (Rex Zhu)

* pci/resource:
PCI: Make pci_get_rom_size() static
PCI: Add check code for last image indicator not set
PCI: Avoid accessing memory outside the ROM BAR
PCI: Make early dump functionality generic
PCI: Cleanup PCI_REBAR_CTRL_BAR_SHIFT handling
PCI: Restore resized BAR state on resume
PCI: Clean up resource allocation in devm_of_pci_get_host_bridge_resources()

# Conflicts:
# Documentation/admin-guide/kernel-parameters.txt

Bjorn Helgaas
2018-08-16 03:59:01 +0800

15 Aug, 2018

4 commits

8c479c2c0 Merge tag 'hardened-usercopy-v4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux ... Browse Code »

Pull hardened usercopy updates from Kees Cook:
"This cleans up a minor Kconfig issue and adds a kernel boot option for
disabling hardened usercopy for distro users that may have corner-case
performance issues (e.g. high bandwidth small-packet UDP traffic).

Summary:

- drop unneeded Kconfig "select BUG" (Kamal Mostafa)

- add "hardened_usercopy=off" rare performance needs (Chris von
Recklinghausen)"

* tag 'hardened-usercopy-v4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
usercopy: Allow boot cmdline disabling of hardening
usercopy: Do not select BUG with HARDENED_USERCOPY

Linus Torvalds
2018-08-15 23:45:54 +0800
e6ecec342 Merge tag 'docs-4.19' of git://git.lwn.net/linux ... Browse Code »

Pull documentation update from Jonathan Corbet:
"This was a moderately busy cycle for docs, with the usual collection
of small fixes and updates.

We also have new ktime_get_*() docs from Arnd, some kernel-doc fixes,
a new set of Italian translations (non so se vale la pena, ma non fa
male - speriamo bene), and some extensive early memory-management
documentation improvements from Mike Rapoport"

* tag 'docs-4.19' of git://git.lwn.net/linux: (52 commits)
Documentation: corrections to console/console.txt
Documentation: add ioctl number entry for v4l2-subdev.h
Remove gendered language from management style documentation
scripts/kernel-doc: Escape all literal braces in regexes
docs/mm: add description of boot time memory management
docs/mm: memblock: add overview documentation
docs/mm: memblock: add kernel-doc description for memblock types
docs/mm: memblock: add kernel-doc comments for memblock_add[_node]
docs/mm: memblock: update kernel-doc comments
mm/memblock: add a name for memblock flags enumeration
docs/mm: bootmem: add overview documentation
docs/mm: bootmem: add kernel-doc description of 'struct bootmem_data'
docs/mm: bootmem: fix kernel-doc warnings
docs/mm: nobootmem: fixup kernel-doc comments
mm/bootmem: drop duplicated kernel-doc comments
Documentation: vm.txt: Adding 'nr_hugepages_mempolicy' parameter description.
doc:it_IT: translation for kernel-hacking
docs: Fix the reference labels in Locking.rst
doc: tracing: Fix a typo of trace_stat
mm: Introduce new type vm_fault_t
...

Linus Torvalds
2018-08-15 05:29:31 +0800
73ba2fb33 Merge tag 'for-4.19/block-20180812' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block updates from Jens Axboe:
"First pull request for this merge window, there will also be a
followup request with some stragglers.

This pull request contains:

- Fix for a thundering heard issue in the wbt block code (Anchal
Agarwal)

- A few NVMe pull requests:
* Improved tracepoints (Keith)
* Larger inline data support for RDMA (Steve Wise)
* RDMA setup/teardown fixes (Sagi)
* Effects log suppor for NVMe target (Chaitanya Kulkarni)
* Buffered IO suppor for NVMe target (Chaitanya Kulkarni)
* TP4004 (ANA) support (Christoph)
* Various NVMe fixes

- Block io-latency controller support. Much needed support for
properly containing block devices. (Josef)

- Series improving how we handle sense information on the stack
(Kees)

- Lightnvm fixes and updates/improvements (Mathias/Javier et al)

- Zoned device support for null_blk (Matias)

- AIX partition fixes (Mauricio Faria de Oliveira)

- DIF checksum code made generic (Max Gurtovoy)

- Add support for discard in iostats (Michael Callahan / Tejun)

- Set of updates for BFQ (Paolo)

- Removal of async write support for bsg (Christoph)

- Bio page dirtying and clone fixups (Christoph)

- Set of bcache fix/changes (via Coly)

- Series improving blk-mq queue setup/teardown speed (Ming)

- Series improving merging performance on blk-mq (Ming)

- Lots of other fixes and cleanups from a slew of folks"

* tag 'for-4.19/block-20180812' of git://git.kernel.dk/linux-block: (190 commits)
blkcg: Make blkg_root_lookup() work for queues in bypass mode
bcache: fix error setting writeback_rate through sysfs interface
null_blk: add lock drop/acquire annotation
Blk-throttle: reduce tail io latency when iops limit is enforced
block: paride: pd: mark expected switch fall-throughs
block: Ensure that a request queue is dissociated from the cgroup controller
block: Introduce blk_exit_queue()
blkcg: Introduce blkg_root_lookup()
block: Remove two superfluous #include directives
blk-mq: count the hctx as active before allocating tag
block: bvec_nr_vecs() returns value for wrong slab
bcache: trivial - remove tailing backslash in macro BTREE_FLAG
bcache: make the pr_err statement used for ENOENT only in sysfs_attatch section
bcache: set max writeback rate when I/O request is idle
bcache: add code comments for bset.c
bcache: fix mistaken comments in request.c
bcache: fix mistaken code comments in bcache.h
bcache: add a comment in super.c
bcache: avoid unncessary cache prefetch bch_btree_node_get()
bcache: display rate debug parameters to 0 when writeback is not running
...

Linus Torvalds
2018-08-15 01:23:25 +0800
958f338e9 Merge branch 'l1tf-final' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Merge L1 Terminal Fault fixes from Thomas Gleixner:
"L1TF, aka L1 Terminal Fault, is yet another speculative hardware
engineering trainwreck. It's a hardware vulnerability which allows
unprivileged speculative access to data which is available in the
Level 1 Data Cache when the page table entry controlling the virtual
address, which is used for the access, has the Present bit cleared or
other reserved bits set.

If an instruction accesses a virtual address for which the relevant
page table entry (PTE) has the Present bit cleared or other reserved
bits set, then speculative execution ignores the invalid PTE and loads
the referenced data if it is present in the Level 1 Data Cache, as if
the page referenced by the address bits in the PTE was still present
and accessible.

While this is a purely speculative mechanism and the instruction will
raise a page fault when it is retired eventually, the pure act of
loading the data and making it available to other speculative
instructions opens up the opportunity for side channel attacks to
unprivileged malicious code, similar to the Meltdown attack.

While Meltdown breaks the user space to kernel space protection, L1TF
allows to attack any physical memory address in the system and the
attack works across all protection domains. It allows an attack of SGX
and also works from inside virtual machines because the speculation
bypasses the extended page table (EPT) protection mechanism.

The assoicated CVEs are: CVE-2018-3615, CVE-2018-3620, CVE-2018-3646

The mitigations provided by this pull request include:

- Host side protection by inverting the upper address bits of a non
present page table entry so the entry points to uncacheable memory.

- Hypervisor protection by flushing L1 Data Cache on VMENTER.

- SMT (HyperThreading) control knobs, which allow to 'turn off' SMT
by offlining the sibling CPU threads. The knobs are available on
the kernel command line and at runtime via sysfs

- Control knobs for the hypervisor mitigation, related to L1D flush
and SMT control. The knobs are available on the kernel command line
and at runtime via sysfs

- Extensive documentation about L1TF including various degrees of
mitigations.

Thanks to all people who have contributed to this in various ways -
patches, review, testing, backporting - and the fruitful, sometimes
heated, but at the end constructive discussions.

There is work in progress to provide other forms of mitigations, which
might be less horrible performance wise for a particular kind of
workloads, but this is not yet ready for consumption due to their
complexity and limitations"

* 'l1tf-final' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (75 commits)
x86/microcode: Allow late microcode loading with SMT disabled
tools headers: Synchronise x86 cpufeatures.h for L1TF additions
x86/mm/kmmio: Make the tracer robust against L1TF
x86/mm/pat: Make set_memory_np() L1TF safe
x86/speculation/l1tf: Make pmd/pud_mknotpresent() invert
x86/speculation/l1tf: Invert all not present mappings
cpu/hotplug: Fix SMT supported evaluation
KVM: VMX: Tell the nested hypervisor to skip L1D flush on vmentry
x86/speculation: Use ARCH_CAPABILITIES to skip L1D flush on vmentry
x86/speculation: Simplify sysfs report of VMX L1TF vulnerability
Documentation/l1tf: Remove Yonah processors from not vulnerable list
x86/KVM/VMX: Don't set l1tf_flush_l1d from vmx_handle_external_intr()
x86/irq: Let interrupt handlers set kvm_cpu_l1tf_flush_l1d
x86: Don't include linux/irq.h from asm/hardirq.h
x86/KVM/VMX: Introduce per-host-cpu analogue of l1tf_flush_l1d
x86/irq: Demote irq_cpustat_t::__softirq_pending to u16
x86/KVM/VMX: Move the l1tf_flush_l1d test to vmx_l1d_flush()
x86/KVM/VMX: Replace 'vmx_l1d_flush_always' with 'vmx_l1d_flush_cond'
x86/KVM/VMX: Don't set l1tf_flush_l1d to true from vmx_l1d_flush()
cpu/hotplug: detect SMT disabled by BIOS
...

Linus Torvalds
2018-08-15 00:46:06 +0800

14 Aug, 2018

1 commit

13e091b6d Merge branch 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull x86 timer updates from Thomas Gleixner:
"Early TSC based time stamping to allow better boot time analysis.

This comes with a general cleanup of the TSC calibration code which
grew warts and duct taping over the years and removes 250 lines of
code. Initiated and mostly implemented by Pavel with help from various
folks"

* 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits)
x86/kvmclock: Mark kvm_get_preset_lpj() as __init
x86/tsc: Consolidate init code
sched/clock: Disable interrupts when calling generic_sched_clock_init()
timekeeping: Prevent false warning when persistent clock is not available
sched/clock: Close a hole in sched_clock_init()
x86/tsc: Make use of tsc_calibrate_cpu_early()
x86/tsc: Split native_calibrate_cpu() into early and late parts
sched/clock: Use static key for sched_clock_running
sched/clock: Enable sched clock early
sched/clock: Move sched clock initialization and merge with generic clock
x86/tsc: Use TSC as sched clock early
x86/tsc: Initialize cyc2ns when tsc frequency is determined
x86/tsc: Calibrate tsc only once
ARM/time: Remove read_boot_clock64()
s390/time: Remove read_boot_clock64()
timekeeping: Default boot time offset to local_clock()
timekeeping: Replace read_boot_clock64() with read_persistent_wall_and_boot_offset()
s390/time: Add read_persistent_wall_and_boot_offset()
x86/xen/time: Output xen sched_clock time from 0
x86/xen/time: Initialize pv xen time in init_hypervisor_platform()
...

Linus Torvalds
2018-08-14 09:28:19 +0800

10 Aug, 2018

3 commits

aaca43fda PCI: Add "pci=disable_acs_redir=" parameter for peer-to-peer support ... Browse Code »

To support peer-to-peer traffic on a segment of the PCI hierarchy, we must
disable the ACS redirect bits for select PCI bridges. The bridges must be
selected before the devices are discovered by the kernel and the IOMMU
groups created. Therefore, add a kernel command line parameter to specify
devices which must have their ACS bits disabled.

The new parameter takes a list of devices separated by a semicolon. Each
device specified will have its ACS redirect bits disabled. This is
similar to the existing 'resource_alignment' parameter.

The ACS Request P2P Request Redirect, P2P Completion Redirect and P2P
Egress Control bits are disabled, which is sufficient to always allow
passing P2P traffic uninterrupted. The bits are set after the kernel
(optionally) enables the ACS bits itself. It is also done regardless of
whether the kernel or platform firmware sets the bits.

If the user tries to disable the ACS redirect for a device without the ACS
capability, print a warning to dmesg.

Signed-off-by: Logan Gunthorpe
[bhelgaas: reorder to add the generic code first and move the
device-specific quirk to subsequent patches]
Signed-off-by: Bjorn Helgaas
Reviewed-by: Stephen Bates
Reviewed-by: Alex Williamson
Acked-by: Christian König

Logan Gunthorpe
2018-08-10 06:37:19 +0800
45db33709 PCI: Allow specifying devices using a base bus and path of devfns ... Browse Code »

When specifying PCI devices on the kernel command line using a
bus/device/function address, bus numbers can change when adding or
replacing a device, changing motherboard firmware, or applying kernel
parameters like "pci=assign-buses". When bus numbers change, it's likely
the command line tweak will be applied to the wrong device.

Therefore, it is useful to be able to specify devices with a base bus
number and the path of devfns needed to get to it, similar to the "device
scope" structure in the Intel VT-d spec, Section 8.3.1.

Thus, we add an option to specify devices in the following format:

[:]:.[/.]*

The path can be any segment within the PCI hierarchy of any length and
determined through the use of 'lspci -t'. When specified this way, it is
less likely that a renumbered bus will result in a valid device
specification and the tweak won't be applied to the wrong device.

Signed-off-by: Logan Gunthorpe
[bhelgaas: use "device" instead of "slot" in documentation since that's the
usual language in the PCI specs]
Signed-off-by: Bjorn Helgaas
Reviewed-by: Stephen Bates
Reviewed-by: Alex Williamson
Acked-by: Christian König

Logan Gunthorpe
2018-08-10 05:24:39 +0800
07d8d7e57 PCI: Make specifying PCI devices in kernel parameters reusable ... Browse Code »

Separate out the code to match a PCI device with a string (typically
originating from a kernel parameter) from the
pci_specified_resource_alignment() function into its own helper function.

While we are at it, this change fixes the kernel style of the function
(fixing a number of long lines and extra parentheses).

Additionally, make the analogous change to the kernel parameter
documentation: Separate the description of how to specify a PCI device
into its own section at the head of the "pci=" parameter.

This patch should have no functional alterations.

Signed-off-by: Logan Gunthorpe
[bhelgaas: use "device" instead of "slot" in documentation since that's the
usual language in the PCI specs]
Signed-off-by: Bjorn Helgaas
Reviewed-by: Stephen Bates
Reviewed-by: Alex Williamson
Acked-by: Christian König

Logan Gunthorpe
2018-08-10 05:23:06 +0800

08 Aug, 2018

1 commit

6488a7f35 Merge branches 'arm/shmobile', 'arm/renesas', 'arm/msm', 'arm/smmu', 'arm/omap',… ... Browse Code »

… 'x86/amd', 'x86/vt-d' and 'core' into next

Joerg Roedel
2018-08-08 18:02:27 +0800

07 Aug, 2018

1 commit

26cb1f36c Documentation: Add nospectre_v1 parameter ... Browse Code »

Currently only supported on powerpc.

Signed-off-by: Diana Craciun
Signed-off-by: Michael Ellerman

Diana Craciun
2018-08-07 22:32:25 +0800

06 Aug, 2018

1 commit

05b9ba4b5 Merge tag 'v4.18-rc6' into for-4.19/block2 ... Browse Code »

Pull in 4.18-rc6 to get the NVMe core AEN change to avoid a
merge conflict down the line.

Signed-of-by: Jens Axboe

Jens Axboe
2018-08-06 09:32:09 +0800

05 Aug, 2018

3 commits

5b76a3cff KVM: VMX: Tell the nested hypervisor to skip L1D flush on vmentry ... Browse Code »

When nested virtualization is in use, VMENTER operations from the nested
hypervisor into the nested guest will always be processed by the bare metal
hypervisor, and KVM's "conditional cache flushes" mode in particular does a
flush on nested vmentry. Therefore, include the "skip L1D flush on
vmentry" bit in KVM's suggested ARCH_CAPABILITIES setting.

Add the relevant Documentation.

Signed-off-by: Paolo Bonzini
Signed-off-by: Thomas Gleixner

Paolo Bonzini
2018-08-05 23:10:20 +0800
583311361 Documentation/l1tf: Remove Yonah processors from not vulnerable list ... Browse Code »

Dave reported, that it's not confirmed that Yonah processors are
unaffected. Remove them from the list.

Reported-by: ave Hansen
Signed-off-by: Thomas Gleixner

Thomas Gleixner
2018-08-05 23:10:18 +0800
f2701b77b Merge 4.18-rc7 into master to pick up the KVM dependcy ... Browse Code »

Signed-off-by: Thomas Gleixner

Thomas Gleixner
2018-08-05 22:39:29 +0800

02 Aug, 2018

1 commit

c480bcf97 block: make iolatency avg_lat exponentially decay ... Browse Code »

Currently, avg_lat is calculated by accumulating the mean of every
window in a long running cumulative average. As time goes on, the metric
becomes less and less useful due to the accumulated history.

This patch reuses the same calculation done in load averages to make the
avg_lat metric more lively. Unlike load averages, the avg only advances
when a window elapses (due to an io). Idle periods extend the most
recent window. Bucketing is used to limit the history of avg_lat by
binding it to the window size. So, the window range for 1/exp (decay
rate) is [1 min, 2.5 min) when windows elapse immediately.

The current sample window size is exposed in the debug info to enable
calculation of the window range.

Signed-off-by: Dennis Zhou
Acked-by: Tejun Heo
Acked-by: Johannes Weiner
Acked-by: Josef Bacik
Signed-off-by: Jens Axboe

Dennis Zhou (Facebook)
2018-08-02 23:58:14 +0800