28 Oct, 2016
1 commit
-
The per-zone waitqueues exist because of a scalability issue with the
page waitqueues on some NUMA machines, but it turns out that they hurt
normal loads, and now with the vmalloced stacks they also end up
breaking gfs2 that uses a bit_wait on a stack object:wait_on_bit(&gh->gh_iflags, HIF_WAIT, TASK_UNINTERRUPTIBLE)
where 'gh' can be a reference to the local variable 'mount_gh' on the
stack of fill_super().The reason the per-zone hash table breaks for this case is that there is
no "zone" for virtual allocations, and trying to look up the physical
page to get at it will fail (with a BUG_ON()).It turns out that I actually complained to the mm people about the
per-zone hash table for another reason just a month ago: the zone lookup
also hurts the regular use of "unlock_page()" a lot, because the zone
lookup ends up forcing several unnecessary cache misses and generates
horrible code.As part of that earlier discussion, we had a much better solution for
the NUMA scalability issue - by just making the page lock have a
separate contention bit, the waitqueue doesn't even have to be looked at
for the normal case.Peter Zijlstra already has a patch for that, but let's see if anybody
even notices. In the meantime, let's fix the actual gfs2 breakage by
simplifying the bitlock waitqueues and removing the per-zone issue.Reported-by: Andreas Gruenbacher
Tested-by: Bob Peterson
Acked-by: Mel Gorman
Cc: Peter Zijlstra
Cc: Andy Lutomirski
Cc: Steven Whitehouse
Signed-off-by: Linus Torvalds
25 Oct, 2016
2 commits
-
Pull clk fixes from Stephen Boyd:
"This is the first batch of clk driver fixes for this release.We have a handful of fixes for the uniphier clk driver that was
introduced recently, as well as Kconfig option hiding, module
autoloading markings, and a few fixes for clk_hw based registration
patches that went in this merge window"* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: at91: Fix a return value in case of error
clk: uniphier: rename MIO clock to SD clock for Pro5, PXs2, LD20 SoCs
clk: uniphier: fix memory overrun bug
clk: hi6220: use CLK_OF_DECLARE_DRIVER for sysctrl and mediactrl clock init
clk: mvebu: armada-37xx-periph: Fix the clock gate flag
clk: bcm2835: Clamp the PLL's requested rate to the hardware limits.
clk: max77686: fix number of clocks setup for clk_hw based registration
clk: mvebu: armada-37xx-periph: Fix the clock provider registration
clk: core: add __init decoration for CLK_OF_DECLARE_DRIVER function
clk: mediatek: Add hardware dependency
clk: samsung: clk-exynos-audss: Fix module autoload
clk: uniphier: fix type of variable passed to regmap_read()
clk: uniphier: add system clock support for sLD3 SoC -
This patch unexports the low-level __get_user_pages() function.
Recent refactoring of the get_user_pages* functions allow flags to be
passed through get_user_pages() which eliminates the need for access to
this function from its one user, kvm.We can see that the two calls to get_user_pages() which replace
__get_user_pages() in kvm_main.c are equivalent by examining their call
stacks:get_user_page_nowait():
get_user_pages(start, 1, flags, page, NULL)
__get_user_pages_locked(current, current->mm, start, 1, page, NULL, NULL,
false, flags | FOLL_TOUCH)
__get_user_pages(current, current->mm, start, 1,
flags | FOLL_TOUCH | FOLL_GET, page, NULL, NULL)check_user_page_hwpoison():
get_user_pages(addr, 1, flags, NULL, NULL)
__get_user_pages_locked(current, current->mm, addr, 1, NULL, NULL, NULL,
false, flags | FOLL_TOUCH)
__get_user_pages(current, current->mm, addr, 1, flags | FOLL_TOUCH, NULL,
NULL, NULL)Signed-off-by: Lorenzo Stoakes
Acked-by: Paolo Bonzini
Acked-by: Michal Hocko
Signed-off-by: Linus Torvalds
24 Oct, 2016
2 commits
-
Pull SCSI target fixes from Nicholas Bellinger:
"Here are the outstanding target-pending fixes for v4.9-rc2.This includes:
- Fix v4.1.y+ reference leak regression with concurrent TMR
ABORT_TASK + session shutdown. (Vaibhav Tandon)- Enable tcm_fc w/ SCF_USE_CPUID to avoid host exchange timeouts
(Hannes)- target/user error sense handling fixes. (Andy + MNC + HCH)
- Fix iscsi-target NOP_OUT error path iscsi_cmd descriptor leak
(Varun)- Two EXTENDED_COPY SCSI status fixes for ESX VAAI (Dinesh Israni +
Nixon Vincent)- Revert a v4.8 residual overflow change, that breaks sg_inq with
small allocation lengths.There are a number of folks stress testing the v4.1.y regression fix
in their environments, and more folks doing iser-target I/O stress
testing atop recent v4.x.y code.There is also one v4.2.y+ RCU conversion regression related to
explicit NodeACL configfs changes, that is still being tracked down"* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
target/tcm_fc: use CPU affinity for responses
target/tcm_fc: Update debugging statements to match libfc usage
target/tcm_fc: return detailed error in ft_sess_create()
target/tcm_fc: print command pointer in debug message
target: fix potential race window in target_sess_cmd_list_waiting()
Revert "target: Fix residual overflow handling in target_complete_cmd_with_length"
target: Don't override EXTENDED_COPY xcopy_pt_cmd SCSI status code
target: Make EXTENDED_COPY 0xe4 failure return COPY TARGET DEVICE NOT REACHABLE
target: Re-add missing SCF_ACK_KREF assignment in v4.1.y
iscsi-target: fix iscsi cmd leak
iscsi-target: fix spelling mistake "Unsolicitied" -> "Unsolicited"
target/user: Fix comments to not refer to data ring
target/user: Return an error if cmd data size is too large
target/user: Use sense_reason_t in tcmu_queue_cmd_ring -
Pull IPMI updates from Corey Minyard:
"A small bug fix and a new driver for acting as an IPMI device.I was on vacation during the merge window (a long vacation) but this
is a bug fix that should go in and a new driver that shouldn't hurt
anything.This has been in linux-next for a month or so"
* tag 'for-linus-4.9-2' of git://git.code.sf.net/p/openipmi/linux-ipmi:
ipmi: fix crash on reading version from proc after unregisted bmc
ipmi/bt-bmc: remove redundant return value check of platform_get_resource()
ipmi/bt-bmc: add a dependency on ARCH_ASPEED
ipmi: Fix ioremap error handling in bt-bmc
ipmi: add an Aspeed BT IPMI BMC driver
23 Oct, 2016
4 commits
-
Pull timer updates from Thomas Gleixner:
"This updates contains:- A revert which addresses a boot failure on ARM Sun5i platforms
- A new clocksource driver, which has been delayed beyond rc1 due to
an interrupt driver issue which was unearthed by this driver. The
debugging of that issue and the discussion about the proper
solution made this driver miss the merge window. There is no point
in delaying it for a full cycle as it completes the basic mainline
support for the new JCore platform and does not create any risk
outside of that platform"* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Revert "clocksource/drivers/timer_sun5i: Replace code by clocksource_mmio_init"
clocksource: Add J-Core timer/clocksource driver
of: Add J-Core timer bindings -
Pull x86 fixes from Ingo Molnar:
"Three fixes, a hw-enablement and a cross-arch fix/enablement change:- SGI/UV fix for older platforms
- x32 signal handling fix
- older x86 platform bootup APIC fix
- AVX512-4VNNIW (Neural Network Instructions) and AVX512-4FMAPS
(Multiply Accumulation Single precision instructions) enablement.- move thread_info back into x86 specific code, to make life easier
for other architectures trying to make use of
CONFIG_THREAD_INFO_IN_TASK_STRUCT=y"* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot/smp: Don't try to poke disabled/non-existent APIC
sched/core, x86: Make struct thread_info arch specific again
x86/signal: Remove bogus user_64bit_mode() check from sigaction_compat_abi()
x86/platform/UV: Fix support for EFI_OLD_MEMMAP after BIOS callback updates
x86/cpufeature: Add AVX512_4VNNIW and AVX512_4FMAPS features
x86/vmware: Skip timer_irq_works() check on VMware -
Pull vmap stack fixes from Ingo Molnar:
"This is fallout from CONFIG_HAVE_ARCH_VMAP_STACK=y on x86: stack
accesses that used to be just somewhat questionable are now totally
buggy.These changes try to do it without breaking the ABI: the fields are
left there, they are just reporting zero, or reporting narrower
information (the maps file change)"* 'mm-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
mm: Change vm_is_stack_for_task() to vm_is_stack_for_current()
fs/proc: Stop trying to report thread stacks
fs/proc: Stop reporting eip and esp in /proc/PID/stat
mm/numa: Remove duplicated include from mprotect.c -
Pull irq fixes from Ingo Molnar:
"Mostly irqchip driver fixes, plus a symbol export"* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
kernel/irq: Export irq_set_parent()
irqchip/gic: Add missing \n to CPU IF adjustment message
irqchip/jcore: Don't show Kconfig menu item for driver
irqchip/eznps: Drop pointless static qualifier in nps400_of_init()
irqchip/gic-v3-its: Fix entry size mask for GITS_BASER
irqchip/gic-v3-its: Fix 64bit GIC{R,ITS}_TYPER accesses
22 Oct, 2016
4 commits
-
Pull ACPI fixes from Rafael Wysocki:
"These fix an issue related to system resume in the new WDAT-based
watchdog driver and a return value of a stub function in the ACPI CPPC
framework.Specifics:
- Update the ACPI WDAT-based watchdog driver to ping the hardware
during system resume to prevent a reset from occurring after the
resume is complete (Mika Westerberg).- Fix the return value of the pcc_mbox_request_channel() stub for
CONFIG_PCC unset (Hoan Tran)"* tag 'acpi-4.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
watchdog: wdat_wdt: Ping the watchdog on resume
mailbox: PCC: Fix return value of pcc_mbox_request_channel() -
* acpi-wdat:
watchdog: wdat_wdt: Ping the watchdog on resume* acpi-cppc:
mailbox: PCC: Fix return value of pcc_mbox_request_channel() -
…it/maz/arm-platforms into irq/urgent
Pull GIC updates from Marc Zyngier:
- Fix for 32bit accesses that should be 64bit on 64bit machines
- Fix for a field decoding macro
- Beautify a warning message -
Pull block fixes from Jens Axboe:
"A set of fixes that missed the merge window, mostly due to me being
away around that time.Nothing major here, a mix of nvme cleanups and fixes, and one fix for
the badblocks handling"* 'for-linus' of git://git.kernel.dk/linux-block:
nvmet: use symbolic constants for CNS values
nvme: use symbolic constants for CNS values
nvme.h: add an enum for cns values
nvme.h: don't use uuid_be
nvme.h: resync with nvme-cli
nvme: Add tertiary number to NVME_VS
nvme : Add sysfs entry for NVMe CMBs when appropriate
nvme: don't schedule multiple resets
nvme: Delete created IO queues on reset
nvme: Stop probing a removed device
badblocks: fix overlapping check for clearing
21 Oct, 2016
3 commits
-
Pull power management fix from Rafael Wysocki:
"This fixes the pointer arithmetics mess-up in the cpufreq core
introduced by one of recent commits and leading to all kinds of
breakage from kernel crashes to incorrect governor decisions (Sergey
Senozhatsky)"* tag 'pm-4.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: fix overflow in cpufreq_table_find_index_dl() -
* pm-cpufreq:
cpufreq: fix overflow in cpufreq_table_find_index_dl() -
At the hardware level, the J-Core PIT is integrated with the interrupt
controller, but it is represented as its own device and has an
independent programming interface. It provides a 12-bit countdown
timer, which is not presently used, and a periodic timer. The interval
length for the latter is programmable via a 32-bit throttle register
whose units are determined by a bus-period register. The periodic
timer is used to implement both periodic and oneshot clock event
modes; in oneshot mode the interrupt handler simply disables the timer
as soon as it fires.Despite its device tree node representing an interrupt for the PIT,
the actual irq generated is programmable, not hard-wired. The driver
is responsible for programming the PIT to generate the hardware irq
number that the DT assigns to it.On SMP configurations, J-Core provides cpu-local instances of the PIT;
no broadcast timer is needed. This driver supports the creation of the
necessary per-cpu clock_event_device instances.A nanosecond-resolution clocksource is provided using the J-Core "RTC"
registers, which give a 64-bit seconds count and 32-bit nanoseconds
that wrap every second. The driver converts these to a full-range
32-bit nanoseconds count.Signed-off-by: Rich Felker
Cc: Mark Rutland
Cc: devicetree@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: Daniel Lezcano
Cc: Rob Herring
Link: http://lkml.kernel.org/r/b591ff12cc5ebf63d1edc98da26046f95a233814.1476393790.git.dalias@libc.org
Signed-off-by: Thomas Gleixner
20 Oct, 2016
8 commits
-
'best' is always less or equals to 'pos', so `best - pos' returns
a negative value which is then getting casted to `unsigned int'
and passed to __cpufreq_driver_target()->acpi_cpufreq_target()
for policy->freq_table selection. This results inBUG: unable to handle kernel paging request at ffff881019b469f8
IP: [] acpi_cpufreq_target+0x4f/0x190 [acpi_cpufreq]
PGD 267f067
PUD 0Oops: 0000 [#1] PREEMPT SMP
CPU: 6 PID: 70 Comm: kworker/6:1 Not tainted 4.9.0-rc1-next-20161017-dbg-dirty
Workqueue: events dbs_work_handler
task: ffff88041b808000 task.stack: ffff88041b810000
RIP: 0010:[] [] acpi_cpufreq_target+0x4f/0x190 [acpi_cpufreq]
RSP: 0018:ffff88041b813c60 EFLAGS: 00010282
RAX: ffff880419b46a00 RBX: ffff88041b848400 RCX: ffff880419b20f80
RDX: 00000000001dff38 RSI: 00000000ffffffff RDI: ffff88041b848400
RBP: ffff88041b813cb0 R08: 0000000000000006 R09: 0000000000000040
R10: ffffffff8207f9e0 R11: ffffffff8173595b R12: 0000000000000000
R13: ffff88041f1dff38 R14: 0000000000262900 R15: 0000000bfffffff4
FS: 0000000000000000(0000) GS:ffff88041f000000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff881019b469f8 CR3: 000000041a2d3000 CR4: 00000000001406e0
Stack:
ffff88041b813cb0 ffffffff813347f9 ffff88041b813ca0 ffffffff81334663
ffff88041f1d4bc0 ffff88041b848400 0000000000000000 0000000000000000
0000000000262900 0000000000000000 ffff88041b813d00 ffffffff813355dc
Call Trace:
[] ? cpufreq_freq_transition_begin+0xf1/0xfc
[] ? get_cpu_idle_time+0x97/0xa6
[] __cpufreq_driver_target+0x3b6/0x44e
[] cs_dbs_timer+0x11a/0x135
[] dbs_work_handler+0x39/0x62
[] process_one_work+0x280/0x4a5
[] worker_thread+0x24f/0x397
[] ? rescuer_thread+0x30b/0x30b
[] ? nl80211_get_key+0x29/0x36a
[] kthread+0xfc/0x104
[] ? put_lock_stats.isra.9+0xe/0x20
[] ? kthread_create_on_node+0x3f/0x3f
[] ret_from_fork+0x22/0x30
Code: 56 4d 6b ff 0c 41 55 41 54 53 48 83 ec 28 48 8b 15 ad 1e 00 00 44 8b 41
08 48 8b 87 c8 00 00 00 49 89 d5 4e 03 2c c5 80 b2 78 81 8b 74 38 04 45
3b 75 00 75 11 31 c0 83 39 00 0f 84 1c 01 00
RIP [] acpi_cpufreq_target+0x4f/0x190 [acpi_cpufreq]
RSP
CR2: ffff881019b469f8
---[ end trace 16d9fc7a17897d37 ]---[ rjw: In some cases this bug may also cause incorrect frequencies to
be selected by cpufreq governors. ]Fixes: 899bb6642f2a (cpufreq: skip invalid entries when searching the frequency)
Link: http://marc.info/?l=linux-kernel&m=147672030714331&w=2
Reported-and-tested-by: Sedat Dilek
Reported-and-tested-by: Jörg Otte
Signed-off-by: Sergey Senozhatsky
Acked-by: Viresh Kumar
Cc: 4.8+ # 4.8+
Signed-off-by: Rafael J. Wysocki -
The following commit:
c65eacbe290b ("sched/core: Allow putting thread_info into task_struct")
... made 'struct thread_info' a generic struct with only a
single ::flags member, if CONFIG_THREAD_INFO_IN_TASK_STRUCT=y is
selected.This change however seems to be quite x86 centric, since at least the
generic preemption code (asm-generic/preempt.h) assumes that struct
thread_info also has a preempt_count member, which apparently was not
true for x86.We could add a bit more #ifdefs to solve this problem too, but it seems
to be much simpler to make struct thread_info arch specific
again. This also makes the conversion to THREAD_INFO_IN_TASK_STRUCT a
bit easier for architectures that have a couple of arch specific stuff
in their thread_info definition.The arch specific stuff _could_ be moved to thread_struct. However
keeping them in thread_info makes it easier: accessing thread_info
members is simple, since it is at the beginning of the task_struct,
while the thread_struct is at the end. At least on s390 the offsets
needed to access members of the thread_struct (with task_struct as
base) are too large for various asm instructions. This is not a
problem when keeping these members within thread_info.Signed-off-by: Heiko Carstens
Signed-off-by: Mark Rutland
Acked-by: Thomas Gleixner
Cc: Andrew Morton
Cc: Andy Lutomirski
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: keescook@chromium.org
Cc: linux-arch@vger.kernel.org
Link: http://lkml.kernel.org/r/1476901693-8492-2-git-send-email-mark.rutland@arm.com
Signed-off-by: Ingo Molnar -
Asking for a non-current task's stack can't be done without races
unless the task is frozen in kernel mode. As far as I know,
vm_is_stack_for_task() never had a safe non-current use case.The __unused annotation is because some KSTK_ESP implementations
ignore their parameter, which IMO is further justification for this
patch.Signed-off-by: Andy Lutomirski
Acked-by: Thomas Gleixner
Cc: Al Viro
Cc: Andrew Morton
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Jann Horn
Cc: Kees Cook
Cc: Linus Torvalds
Cc: Linux API
Cc: Peter Zijlstra
Cc: Tycho Andersen
Link: http://lkml.kernel.org/r/4c3f68f426e6c061ca98b4fc7ef85ffbb0a25b0c.1475257877.git.luto@kernel.org
Signed-off-by: Ingo Molnar -
This patch addresses a bug where EXTENDED_COPY across multiple LUNs
results in a CHECK_CONDITION when the source + destination are not
located on the same physical node.ESX Host environments expect sense COPY_ABORTED w/ COPY TARGET DEVICE
NOT REACHABLE to be returned when this occurs, in order to signal
fallback to local copy method.As described in section 6.3.3 of spc4r22:
"If it is not possible to complete processing of a segment because the
copy manager is unable to establish communications with a copy target
device, because the copy target device does not respond to INQUIRY,
or because the data returned in response to INQUIRY indicates
an unsupported logical unit, then the EXTENDED COPY command shall be
terminated with CHECK CONDITION status, with the sense key set to
COPY ABORTED, and the additional sense code set to COPY TARGET DEVICE
NOT REACHABLE."Tested on v4.1.y with ESX v5.5u2+ with BlockCopy across multiple nodes.
Reported-by: Nixon Vincent
Tested-by: Nixon Vincent
Cc: Nixon Vincent
Tested-by: Dinesh Israni
Signed-off-by: Dinesh Israni
Cc: Dinesh Israni
Cc: stable@vger.kernel.org # 3.14+
Signed-off-by: Nicholas Bellinger -
Ported over from nvme-cli.
Signed-off-by: Christoph Hellwig
Reviewed-by: Gabriel Krisman Bertazi
Reviewed-by: Keith Busch
Signed-off-by: Jens Axboe -
This makes life easier for nvme-cli and we don't really need the uuid
type anyway to start with.Signed-off-by: Christoph Hellwig
Reviewed-by: Gabriel Krisman Bertazi
Reviewed-by: Jay Freyensee
Signed-off-by: Jens Axboe -
Import a few updates to nvme.h from nvme-cli. This mostly includes a few
new fields and error codes, but also a few renames that so far are only
used in user space. Also one field is moved from an array of two le64
values to one of 16 u8 values so that we can more easily access it.Signed-off-by: Christoph Hellwig
Reviewed-by: Keith Busch
Reviewed-by: Gabriel Krisman Bertazi
Signed-off-by: Jens Axboe -
NVMe 1.2.1 specification adds a tertiary element to the version number.
This updates the macro and its callers to include the final number and
fixup a single place in nvmet where the version was generated manually.Signed-off-by: Gabriel Krisman Bertazi
Reviewed-by: Sagi Grimberg
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe
19 Oct, 2016
11 commits
-
Merge the gup_flags cleanups from Lorenzo Stoakes:
"This patch series adjusts functions in the get_user_pages* family such
that desired FOLL_* flags are passed as an argument rather than
implied by flags.The purpose of this change is to make the use of FOLL_FORCE explicit
so it is easier to grep for and clearer to callers that this flag is
being used. The use of FOLL_FORCE is an issue as it overrides missing
VM_READ/VM_WRITE flags for the VMA whose pages we are reading
from/writing to, which can result in surprising behaviour.The patch series came out of the discussion around commit 38e088546522
("mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing"),
which addressed a BUG_ON() being triggered when a page was faulted in
with PROT_NONE set but having been overridden by FOLL_FORCE.
do_numa_page() was run on the assumption the page _must_ be one marked
for NUMA node migration as an actual PROT_NONE page would have been
dealt with prior to this code path, however FOLL_FORCE introduced a
situation where this assumption did not hold.See
https://marc.info/?l=linux-mm&m=147585445805166
for the patch proposal"
Additionally, there's a fix for an ancient bug related to FOLL_FORCE and
FOLL_WRITE by me.[ This branch was rebased recently to add a few more acked-by's and
reviewed-by's ]* gup_flag-cleanups:
mm: replace access_process_vm() write parameter with gup_flags
mm: replace access_remote_vm() write parameter with gup_flags
mm: replace __access_remote_vm() write parameter with gup_flags
mm: replace get_user_pages_remote() write/force parameters with gup_flags
mm: replace get_user_pages() write/force parameters with gup_flags
mm: replace get_vaddr_frames() write/force parameters with gup_flags
mm: replace get_user_pages_locked() write/force parameters with gup_flags
mm: replace get_user_pages_unlocked() write/force parameters with gup_flags
mm: remove write/force parameters from __get_user_pages_unlocked()
mm: remove write/force parameters from __get_user_pages_locked()
mm: remove gup_flags FOLL_WRITE games from __get_user_pages() -
This removes the 'write' argument from access_process_vm() and replaces
it with 'gup_flags' as use of this function previously silently implied
FOLL_FORCE, whereas after this patch callers explicitly pass this flag.We make this explicit as use of FOLL_FORCE can result in surprising
behaviour (and hence bugs) within the mm subsystem.Signed-off-by: Lorenzo Stoakes
Acked-by: Jesper Nilsson
Acked-by: Michal Hocko
Acked-by: Michael Ellerman
Signed-off-by: Linus Torvalds -
This removes the 'write' argument from access_remote_vm() and replaces
it with 'gup_flags' as use of this function previously silently implied
FOLL_FORCE, whereas after this patch callers explicitly pass this flag.We make this explicit as use of FOLL_FORCE can result in surprising
behaviour (and hence bugs) within the mm subsystem.Signed-off-by: Lorenzo Stoakes
Acked-by: Michal Hocko
Signed-off-by: Linus Torvalds -
This removes the 'write' and 'force' from get_user_pages_remote() and
replaces them with 'gup_flags' to make the use of FOLL_FORCE explicit in
callers as use of this flag can result in surprising behaviour (and
hence bugs) within the mm subsystem.Signed-off-by: Lorenzo Stoakes
Acked-by: Michal Hocko
Reviewed-by: Jan Kara
Signed-off-by: Linus Torvalds -
This removes the 'write' and 'force' from get_user_pages() and replaces
them with 'gup_flags' to make the use of FOLL_FORCE explicit in callers
as use of this flag can result in surprising behaviour (and hence bugs)
within the mm subsystem.Signed-off-by: Lorenzo Stoakes
Acked-by: Christian König
Acked-by: Jesper Nilsson
Acked-by: Michal Hocko
Reviewed-by: Jan Kara
Signed-off-by: Linus Torvalds -
This removes the 'write' and 'force' from get_vaddr_frames() and
replaces them with 'gup_flags' to make the use of FOLL_FORCE explicit in
callers as use of this flag can result in surprising behaviour (and
hence bugs) within the mm subsystem.Signed-off-by: Lorenzo Stoakes
Acked-by: Michal Hocko
Reviewed-by: Jan Kara
Signed-off-by: Linus Torvalds -
This removes the 'write' and 'force' use from get_user_pages_locked()
and replaces them with 'gup_flags' to make the use of FOLL_FORCE
explicit in callers as use of this flag can result in surprising
behaviour (and hence bugs) within the mm subsystem.Signed-off-by: Lorenzo Stoakes
Acked-by: Michal Hocko
Reviewed-by: Jan Kara
Signed-off-by: Linus Torvalds -
This removes the 'write' and 'force' use from get_user_pages_unlocked()
and replaces them with 'gup_flags' to make the use of FOLL_FORCE
explicit in callers as use of this flag can result in surprising
behaviour (and hence bugs) within the mm subsystem.Signed-off-by: Lorenzo Stoakes
Reviewed-by: Jan Kara
Acked-by: Michal Hocko
Signed-off-by: Linus Torvalds -
This removes the redundant 'write' and 'force' parameters from
__get_user_pages_unlocked() to make the use of FOLL_FORCE explicit in
callers as use of this flag can result in surprising behaviour (and
hence bugs) within the mm subsystem.Signed-off-by: Lorenzo Stoakes
Acked-by: Paolo Bonzini
Reviewed-by: Jan Kara
Acked-by: Michal Hocko
Signed-off-by: Linus Torvalds -
This is an ancient bug that was actually attempted to be fixed once
(badly) by me eleven years ago in commit 4ceb5db9757a ("Fix
get_user_pages() race for write access") but that was then undone due to
problems on s390 by commit f33ea7f404e5 ("fix get_user_pages bug").In the meantime, the s390 situation has long been fixed, and we can now
fix it by checking the pte_dirty() bit properly (and do it better). The
s390 dirty bit was implemented in abf09bed3cce ("s390/mm: implement
software dirty bits") which made it into v3.9. Earlier kernels will
have to look at the page state itself.Also, the VM has become more scalable, and what used a purely
theoretical race back then has become easier to trigger.To fix it, we introduce a new internal FOLL_COW flag to mark the "yes,
we already did a COW" rather than play racy games with FOLL_WRITE that
is very fundamental, and then use the pte dirty flag to validate that
the FOLL_COW flag is still valid.Reported-and-tested-by: Phil "not Paul" Oester
Acked-by: Hugh Dickins
Reviewed-by: Michal Hocko
Cc: Andy Lutomirski
Cc: Kees Cook
Cc: Oleg Nesterov
Cc: Willy Tarreau
Cc: Nick Piggin
Cc: Greg Thelen
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds -
Pull perf fixes from Ingo Molnar:
"Four tooling fixes, two kprobes KASAN related fixes and an x86 PMU
driver fix/cleanup"* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf jit: Fix build issue on Ubuntu
perf jevents: Handle events including .c and .o
perf/x86/intel: Remove an inconsistent NULL check
kprobes: Unpoison stack in jprobe_return() for KASAN
kprobes: Avoid false KASAN reports during stack copy
perf header: Set nr_numa_nodes only when we parsed all the data
perf top: Fix refreshing hierarchy entries on TUI
18 Oct, 2016
2 commits
-
The new introduced macro CLK_OF_DECLARE_DRIVER is usually used to
declare clock driver init functions, which are mostly decorated with
__init. Add __init decoration for CLK_OF_DECLARE_DRIVER function to
avoid causing section mismatch warnings on client clock drivers.Signed-off-by: Shawn Guo
Fixes: c7296c51ce5d ("clk: core: New macro CLK_OF_DECLARE_DRIVER")
Signed-off-by: Stephen Boyd -
pkey_set() and pkey_get() were syscalls present in older versions
of the protection keys patches. They were fully excised from the
x86 code, but some cruft was left in the generic syscall code. The
C++ comments were intended to help to make it more glaring to me to
fix them before actually submitting them. That technique worked,
but later than I would have liked.I test-compiled this for arm64.
Fixes: a60f7b69d92c0 ("generic syscalls: Wire up memory protection keys syscalls")
Signed-off-by: Dave Hansen
Acked-by: Arnd Bergmann
Cc: Thomas Gleixner
Cc: x86@kernel.org
Cc: linux-arch@vger.kernel.org
Cc: mgorman@techsingularity.net
Cc: linux-api@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: luto@kernel.org
Cc: akpm@linux-foundation.org
Signed-off-by: Linus Torvalds
17 Oct, 2016
2 commits
-
Entry Size in GITS_BASER occupies 5 bits [52:48], but we mask out 8
bits.Fixes: cc2d3216f53c ("irqchip: GICv3: ITS command queue")
Cc: stable@vger.kernel.org
Signed-off-by: Vladimir Murzin
Signed-off-by: Marc Zyngier -
When CONFIG_PCC is disabled, pcc_mbox_request_channel() needs to
return ERR_PTR(-ENODEV), not a NULL pointer, as the callers of
this function use IS_ERR() to check for error code.Signed-off-by: Duc Dang
Signed-off-by: Hoan Tran
Signed-off-by: Rafael J. Wysocki
16 Oct, 2016
1 commit
-
I observed false KSAN positives in the sctp code, when
sctp uses jprobe_return() in jsctp_sf_eat_sack().The stray 0xf4 in shadow memory are stack redzones:
[ ] ==================================================================
[ ] BUG: KASAN: stack-out-of-bounds in memcmp+0xe9/0x150 at addr ffff88005e48f480
[ ] Read of size 1 by task syz-executor/18535
[ ] page:ffffea00017923c0 count:0 mapcount:0 mapping: (null) index:0x0
[ ] flags: 0x1fffc0000000000()
[ ] page dumped because: kasan: bad access detected
[ ] CPU: 1 PID: 18535 Comm: syz-executor Not tainted 4.8.0+ #28
[ ] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[ ] ffff88005e48f2d0 ffffffff82d2b849 ffffffff0bc91e90 fffffbfff10971e8
[ ] ffffed000bc91e90 ffffed000bc91e90 0000000000000001 0000000000000000
[ ] ffff88005e48f480 ffff88005e48f350 ffffffff817d3169 ffff88005e48f370
[ ] Call Trace:
[ ] [] dump_stack+0x12e/0x185
[ ] [] kasan_report+0x489/0x4b0
[ ] [] __asan_report_load1_noabort+0x19/0x20
[ ] [] memcmp+0xe9/0x150
[ ] [] depot_save_stack+0x176/0x5c0
[ ] [] save_stack+0xb1/0xd0
[ ] [] kasan_slab_free+0x72/0xc0
[ ] [] kfree+0xc8/0x2a0
[ ] [] skb_free_head+0x79/0xb0
[ ] [] skb_release_data+0x37a/0x420
[ ] [] skb_release_all+0x4f/0x60
[ ] [] consume_skb+0x138/0x370
[ ] [] sctp_chunk_put+0xcb/0x180
[ ] [] sctp_chunk_free+0x58/0x70
[ ] [] sctp_inq_pop+0x68f/0xef0
[ ] [] sctp_assoc_bh_rcv+0xd6/0x4b0
[ ] [] sctp_inq_push+0x131/0x190
[ ] [] sctp_backlog_rcv+0xe9/0xa20
[ ... ]
[ ] Memory state around the buggy address:
[ ] ffff88005e48f380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ ] ffff88005e48f400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ ] >ffff88005e48f480: f4 f4 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ ] ^
[ ] ffff88005e48f500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ ] ffff88005e48f580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ ] ==================================================================KASAN stack instrumentation poisons stack redzones on function entry
and unpoisons them on function exit. If a function exits abnormally
(e.g. with a longjmp like jprobe_return()), stack redzones are left
poisoned. Later this leads to random KASAN false reports.Unpoison stack redzones in the frames we are going to jump over
before doing actual longjmp in jprobe_return().Signed-off-by: Dmitry Vyukov
Acked-by: Masami Hiramatsu
Reviewed-by: Mark Rutland
Cc: Mark Rutland
Cc: Catalin Marinas
Cc: Andrey Ryabinin
Cc: Lorenzo Pieralisi
Cc: Alexander Potapenko
Cc: Will Deacon
Cc: Andrew Morton
Cc: Ananth N Mavinakayanahalli
Cc: Anil S Keshavamurthy
Cc: "David S. Miller"
Cc: Masami Hiramatsu
Cc: kasan-dev@googlegroups.com
Cc: surovegin@google.com
Cc: rostedt@goodmis.org
Link: http://lkml.kernel.org/r/1476454043-101898-1-git-send-email-dvyukov@google.com
Signed-off-by: Ingo Molnar